Abstract
The aim of this study was to obtain French affective norms for the film music stimulus set (FMSS). This data set consists of a relatively homogeneous series of musical stimuli made up of film music excerpts, known to trigger strong emotion. The 97 musical excerpts were judged by 194 native French participants using a simplified normative procedure in order to assess valence and arousal judgments. This normalization will (1) provide researchers with standardized rated affective music to be used with a French population, (2) enable the investigation of individual listeners’ differing emotional judgments, and (3) explore how cultural differences affect the ratings of musical stimuli. Our results, in line with those obtained in Finland and Spain, demonstrated the FMSS to be robust and interculturally valid within Western Europe. Age, sex, education, and musical training were not found to have any effects on emotional judgments. In conclusion, this study provides the scientific community with a standardized-stimulus set of musical excerpts whose emotional valence and arousal have been validated by a sampling of the French population.
Music is capable of inducing strong emotional reactions in listeners. By inducing intense and varied emotions (e.g., sadness, fear, joy), music is a valuable tool for the study of emotion and its neurobiological foundations (for a review, Koelsch, 2010, 2014). However, very few standardized norms exist for emotional musical stimuli. The goal of the present study is to identify affective norms for a series of musical excerpts.
In the field of emotions, affective norms exist for different sets of stimuli, including emotional pictures (Kurdi et al., 2017; Lang et al., 1999), sounds (Bradley & Lang, 1999b; Yang et al., 2018), words (Bradley & Lang, 1999a; Monnier & Syssau, 2014), faces (Ekman & Friesen, 1976; Goeleven et al., 2008), voices (Belin et al., 2008), and faces and voices (Ferdenzi et al., 2015). These stimulus sets offer a large range of emotional stimuli (approximately 100 to 1,000 stimuli per data set), validated in a variety of contexts and across many countries on very large samples of participants.
Very few studies have, however, proposed a standardized set of affective stimuli in the musical domain. Given the lack of consensus on the type of musical stimuli best suited to induce emotions, different musical genres have been used, including pop music (Song et al., 2016), classical music (Lepping et al., 2016), film music (Eerola & Vuoskoski, 2011; Vieillard et al., 2008), or a mix of genres (Imbir & Gołąb, 2017) involving either instrumental music (i.e., Eerola & Vuoskoski, 2011) and/or vocal music with lyrics (i.e., Song et al., 2016). Depending on the studies, these musical stimuli have been selected from the musical repertory (Eerola & Vuoskoski, 2011; Imbir & Gołąb, 2017; Lepping et al., 2016; Song et al., 2016) or composed specifically for the study (Vieillard et al., 2008). As a result, the level of familiarity with the stimuli, which was not systematically controlled for, might differ between listeners. As underlined by Janata et al. (2007), the emotion associated with listening to familiar music is more closely related to the emotional response to personal memory than to acoustic experience. The use of unfamiliar musical excerpts could therefore limit this bias. Another variable to control for is the duration of the musical excerpts, which also varies a lot between the cited studies (from 9 to 60 s) and sometimes within the same data set. It is therefore necessary to use musical excerpts that are relatively homogeneous and well-controlled for in terms of musical genre, instrumental composition, familiarity, and duration.
In studies of affective norms, type of emotional judgment is an important parameter. Based on a categorical approach suggesting that emotions are represented in a series of discrete categories (see Ekman, 1992), some studies assessed categorical judgments using verbal labeling (e.g., happiness, sadness) (Eerola & Vuoskoski, 2011; Fuentes-Sánchez et al., 2021; Song et al., 2016; Vieillard et al., 2008). The recognition of the intended emotion portrayed by the musical excerpts enabled the identification of the best emotional label derivation. The other type of emotional judgment used across all musical studies is based on the circumplex model of affect (Russell, 1980). According to this model, emotions are represented in a two-dimensional affective space. One dimension concerns emotional valence, plotted along a continuum from negative (unpleasant) to positive (pleasant). The other dimension is arousal, depicted along a continuum from low level of excitement (relaxing) to a high level of excitement (stimulating). However, some authors have proposed to represent emotion in three dimensions, rather than two: valence, tension-arousal, and energy-arousal (Schimmack & Grob, 2000). Eerola and Vuoskoski (2011) collected judgments according to these three dimensions. Based on the correlation observed between the ratings of tension-arousal and energy-arousal, the authors concluded that they could be grouped into a single “arousal” dimension. This finding was recently confirmed by Fuentes-Sánchez et al. (2021), emphasizing the relevance to using only two dimensions. By assessing the validity of the dimensional model against the commonly employed categorial approach, Eerola and Vuoskoski (2011) found high levels of agreement between the two models. However, the dimensional model seems to provide better inter-judge agreement when musical emotions are ambiguous. Unlike dimensional judgments, categorical judgments require the use of a verbal label calling upon other non-emotional cognitive resources. Dimensional judgments, however, account better for the complexity and richness of emotional feeling induced by music (Zentner et al., 2008) and minimize the use of verbal strategies. As previously described, the validation of the commonly used two-dimensional valence-arousal model emphasizes the relevance of this parsimonious model for characterizing musical emotion.
The previously mentioned normative studies of emotional excerpts were hampered by a strong limitation owing to their small (from 36 to 80) number of stimuli, restraining their use in behavioral studies. To our knowledge, only three studies proposed emotional norms for more than 100 musical excerpts (Eerola & Vuoskoski, 2011; Fuentes-Sánchez et al., 2021; Imbir & Gołąb, 2017). Moreover, the sample size was also small (fewer than 55 participants) in most studies making generalizing the findings difficult. The film music stimulus set (FMSS) is the only study validated in more than 100 participants, in Finland (Eerola & Vuoskoski, 2011) and more recently in Spain (Fuentes-Sánchez et al., 2021).
Finally, the effect of individual differences between listeners was rarely controlled for. Fuentes-Sánchez et al. (2021) examined the influence of sex on emotional responses. They found that women rated the musical excerpts as scarier, angrier, and more exciting than men, suggesting that sex has an effect on affective judgments, a finding not reported in other studies (Imbir & Gołąb, 2017; Song et al., 2016). Age and musical training did not seem to affect ratings (Song et al., 2016), but sample sizes and range of participants’ ages and degrees of musical training may have been too small to bring out a significant correlation. Controlling for the influence of the interindividual differences on emotional judgments in further studies therefore appears to be worthwhile.
This study aims at collecting French affective norms of a series of musical excerpts taken from the FMSS (Eerola & Vuoskoski, 2011) using emotional judgments of valence and arousal. This data set consists of film music excerpts, known to trigger strong emotion as underlined by Cohen (2011). They were selected by expert musicologists and then judged emotionally by Finnish students. For this purpose, participants were asked to judge valence (negative/positive) and arousal (peaceful/dynamic) induced by the musical excerpts on a 4-point rating scale to identify musical excerpts that fit within each emotional combination that resulted from crossing the two emotional dimensions. The final goal of this study was to compare the emotional ratings of valence and arousal obtained in different European countries (Finland, Spain, France) to test the robustness of the original findings.
Methods
Participants
One hundred ninety-seven French-speaking adults took part in this online study of their own volition via web-based social media. The musical hedonism of each participant was controlled using the four most representative items of the “emotion evocation” facet from the Barcelona Music Reward Questionnaire (BMRQ, Mas-Herrero et al., 2013). Three participants presenting musical anhedonia were excluded. Thus, the study was carried out using 194 participants, including 144 females and 50 men aged from 18 to 64 years (mean age = 29.87 ± 11.26), with 9–17 years of education (mean number of years = 14.63 ± 1.67) and with different levels of music training (143 non-musicians, 51 musicians including 43 amateurs, 6 semi-professionals, and 2 professional musicians).
Material
The material consisted of 97 instrumental film music excerpts without lyrics or sound effects taken from Eerola and Vuoskoski (2011), listed in Table S1 in Supplementary Materials online. Out of the 110 excerpts composing the FMSS, 7 duplicate excerpts, 1 single-instrumental excerpt, and 5 very familiar excerpts for French listeners, as attested by in a pilot study, were removed. The duration of the musical excerpts ranged from 11 to 27 s.
Procedure
Participants completed the online study on their personal computer. Before starting, each participant read an informational letter explaining the conditions of the study and gave their consent to participate. They were asked to move to a quiet place with adequate audio system or headphones. Then, participants completed a form requesting information (sex, age, education, and music training) and filled four items taken from the BMRQ (e.g., “I like to listen to music that contains emotions”) by providing their degree of agreement (ranging from completely disagree to completely agree).
To limit the duration of the test and to prevent the participants from giving up in the middle of the experiment, each participant listened to a sub-group of 48 excerpts randomly selected. Participants were instructed to assess their subjective emotional experience by self-reporting the emotion evoked by listening to the music. It was specified that their judgments should be spontaneous and independent of their musical preference in terms of style or instrumental composition. After listening to each musical excerpt, two questions appeared on-screen simultaneously alongside two 4-point rating scales: “Is the emotion evoked by the music negative or positive?” with a scale ranging from very negative to very positive and “Is the emotion evoked by the music not energetic (peaceful) or very energetic (dynamic)?” accompanied by a scale ranging from not energetic to very energetic. The participants had 10 s to give their judgments and each musical excerpt was presented only once. The duration of the task was about 20 min.
Results
Since the participants judged a random selection of 48 musical stimuli, we obtained between 68 and 84 judgments (mean number of ratings 77.3 ± 2.6) for each excerpt (as reported in Table S1 in Supplementary Materials online). An inter-judge agreement criterion equal or superior to 75% was retained to determine the valence and arousal of each musical excerpt (i.e., a musical excerpt judged to be very negative or negative by more than 75% of the participants was considered as having a negative valence). Using this method, 35 excerpts with positive valence [V+], 35 excerpts with negative valence [V–], and 27 excerpts with ambiguous valence were identified. With regard to arousal, 37 high-arousal excerpts [A+], 40 low-arousal excerpts [A–], and 20 excerpts with ambiguous arousal were identified. As illustrated in Figure 1, 54 excerpts caused participants to feel one of the four different emotional combinations, including 16 [A+, V+], 15 [A+, V–], 13 [A–, V+], and 10 [A–, V–] excerpts, whereas 44 excerpts elicited no specific emotion. These results are summarized in Table S1 in Supplementary Materials online. Moreover, Pearson correlation analysis showed that arousal and valence ratings were not correlated, r = –.006, p = .937.

Representation of the 97 Musical Excerpts According to the Judgments of Valence and Arousal by French Participants. The Different Symbols Correspond to Musical Excerpts Fitting to One of the Four Emotional Combinations Based on a Least 75% Inter-Rater Agreement or Not Fitting to Any Emotional Combination.
Effect of individual listeners’ differences on emotional judgments
Pearson correlation analyses showed that ratings for valence did not correlate with the participants’ age, r = .139, p = .053, or level of education, r = −.135, p = .061. Ratings for arousal were similarly uncorrelated with participants’ age, r = −.005, p = .946, and level of education, r = –.091, p = .208. Moreover, a Student’s t-test revealed that sex and music training had no effect on valence or arousal ratings (all ps > 0.05).
Effect of cultural differences on emotional judgments
The ratings obtained for the 97 musical excerpts used in both French and Finnish studies, as well as the ratings of the 96 musical excerpts used in both French and Spanish studies, were analyzed. Pearson correlation analyses showed a positive correlation between the mean of the valence ratings obtained in France and in the other countries, Finland: r = 0.956, p < .001; Spain: r = 0.928, p < .001. We also found mean arousal judgments in France to be positively correlated with energy-arousal judgments, Finland: r = 0.956, p < .001; Spain: r = 0.939, p < .001, and tension-arousal judgments, Finland: r = 0.566, p < .001; Spain: r = 0.752, p < .001.
Discussion
The aim of this study was to obtain French affective norms of the FMSS (Eerola & Vuoskoski, 2011). For this purpose, we used a normative rating procedure based on two dimensions, valence and arousal. The present study also aimed to assess the influence of individual differences (e.g., age, sex, education level, musical training) and cultural differences on the emotional ratings of musical stimuli. To avoid any bias due to familiarity with the musical excerpts and the presence of lyrics, we selected only unfamiliar instrumental music from the original database. The main finding of this study suggests that the FMSS proposes a set of musical excerpts with clearly defined emotional valence and arousal. The absence of correlation between these two emotional dimensions confirms that rating of valence can be dissociated from rating of arousal. Furthermore, the results obtained show that listeners’ individual differences and cultural origins do not affect emotional judgments, confirming the relevance of such normative musical studies and the cross-cultural validity of this specific affective stimulus set, at least in Western European countries.
In the current study, we first established average emotional valence and arousal ratings for each musical excerpt. Based on inter-judge agreements of individual ratings, we succeeded at identifying a set of musical excerpts which consistently elicited either positive or negative valence. We also identified a set of stimuli triggering either high or low levels of arousal in most participants. All the remaining excerpts were considered emotionally ambiguous, as they were unclear in terms of valence and arousal. In line with previously reported ratings with this stimulus set (Eerola & Vuoskoski, 2011; Fuentes-Sánchez et al., 2021), we found that the FMSS provides a large range of emotional stimuli inducing different types of valence and arousal. Such normative data should enable researchers to select the most appropriate stimulus sets for investigating the influence of both emotional valence and arousal on various cognitive mechanisms such as attention or memory (see Nineuil et al., 2020). These data could also make it possible to explore the psychological mechanisms and cerebral bases of emotional ambiguity, which have seldom been examined in the musical domain (Dellacherie et al., 2008; for review, Schoth & Liossi, 2017) compared to other domains (for faces, see Cooney et al., 2006).
Another important finding of the present study concerns the absence of correlation between valence and arousal ratings. It confirms that valence and arousal are separable emotional features (Russell, 1980). Although some studies in the musical domain and non-musical domain also reported no correlation between these emotional dimensions (Kurdi et al., 2017; Vieillard et al., 2008), some divergent results have been reported (Eerola & Vuoskoski, 2011; Fuentes-Sánchez et al., 2021; Imbir & Gołąb, 2017; Yang et al., 2018). Another stimulus set involving a mix of musical genres including instrumental and vocal music, Imbir and Gołąb (2017) found a positive correlation between arousal and valence. Thus, positive-valence musical excerpts were considered arousing, whereas negative-valence excerpts elicited low arousal. However, whether the divergent results can be explained by differences between the stimulus sets and/or the participants remains difficult to determine. The other studies using the FMSS also reported correlations between emotional ratings (Eerola & Vuoskoski, 2011; Fuentes-Sánchez et al., 2021). By using three emotional dimensions, these authors found a negative correlation between valence and tension-arousal. The excerpts rated as positive were evaluated as less tense, suggesting that emotional valence and tension-arousal in music are not independent emotional features. Conversely, the absence of correlation between valence and energy-arousal ratings suggests that these emotional dimensions depend on distinct features.
The results of our study did not reveal age, sex, level of education, or musical training to have any effect on emotional judgments, suggesting that individual differences among listeners do not influence ratings. These findings are in line with most previously published data using music (Imbir & Gołąb, 2017; Song et al., 2016), although all these factors have never been controlled for together in previous studies. Fuentes-Sánchez et al. (2021) reported an effect of sex on arousal ratings, a finding already obtained with emotional pictures (Bradley et al., 2001; Kurdi et al., 2017), but no effect of sex on valence ratings has been documented in music, although such an effect has been found with emotional sounds (Yang et al., 2018).
The strong agreement between the emotional ratings of the French participants and that obtained in Finland (Eerola & Vuoskoski, 2011) and Spain (Fuentes-Sánchez et al., 2021) is striking and underlines the robustness validity of our data. First, we found a positive correlation between our respective valence ratings. Second, we observed correlations between our study’s arousal ratings and the energy-arousal ratings and tension-arousal ratings obtained in the previous studies. These findings confirm the hypothesis that the arousal dimension, developed by Russell (1980), allows scholars to replace energy-arousal and tension-arousal by a single dimension (Eerola & Vuoskoski, 2011; Fuentes-Sánchez et al., 2021). The current study offers evidence that consensus in emotional valence and arousal judgments is possible within the musical domain. The replication of the previous findings emphasizes the cross-cultural validity within Western Europe and indicates that self-reported subjective emotional responses obtained online with a simpler rating scale (i.e., 4-point rather than 9-point rating scales) appear to be valid. Based on these results, we can be confident that testing environments do not affect emotional judgments. This result is all the more interesting in view of the situation brought on by COVID-19, which has increasingly prompted researchers to perform tasks remotely (Papatzikis et al., 2020). In addition, using two dimensions rather than three and simplifying the response scales would limit comprehension problems, especially in clinical populations with comprehension disorders, and would make the tasks easier to administer.
The present study nevertheless contains a few methodological limits. Even though the sample of participants was large compared to other musical validation studies, larger and more diverse samples of men and women are necessary to generalize the reported findings. Due to the dynamic nature of music, it would be relevant to add a dimension of dominance to assess the extent to which the emotion evoked by a stimulus is controllable (Fontaine et al., 2007), as has already been done in visual studies. Likewise, to avoid confounds related to verbal labeling in the rating procedure, it would be preferable in future studies to use a graphic-rating scale such as Self-Assessment Manikin (Bradley & Lang, 1994). Finally, the duration of the musical stimuli used in the present study varied from 11 to 27 s. Although it should not affect emotional judgments, according to Bigand et al. (2005), standardization of stimuli with the same duration could be useful for many behavioral or psychophysiological studies. Moreover, each participant did not judge all the musical excerpts, and the number of judgments per excerpt was not exactly the same. Despite all the methodological discrepancies between these studies, the judgments obtained in the current study correlate with those previously reported by Eerola and Vuoskoski (2011) and Fuentes-Sánchez et al. (2021) emphasizing the robustness of the data.
In conclusion, these results suggest that reliable emotional judgments can be performed online. Moreover, this study provides the scientific community with a French standardized-stimulus set of musical excerpts validated with respect to emotional valence and arousal judgments. This internationally accessible set of emotionally evocative stimuli complements those previously validated. It can be used for future research on music, emotion, and cognition (e.g., emotional judgment task, musical memory task, musical interventions, electrophysiological study), and to improve our understanding of musical emotions in healthy population or patients.
Supplemental Material
sj-docx-1-pom-10.1177_03057356211050683 – Supplemental material for French adaptation of a film music stimulus set: Normative emotional ratings of valence and arousal prompted by music excerpts
Supplemental material, sj-docx-1-pom-10.1177_03057356211050683 for French adaptation of a film music stimulus set: Normative emotional ratings of valence and arousal prompted by music excerpts by Clémence Nineuil, Delphine Dellacherie and Séverine Samson in Psychology of Music
Footnotes
Acknowledgements
The authors would like to thank all participants who agreed to take part in this study.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article was supported by the European Center for the Humanities and Social Sciences (MESHS-Lille, France), by the Hauts-de-France Regional Council, and by the University of Lille for Clémence Nineuil and the Institut Universitaire de France for Séverine Samson.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
