Abstract
A common approach to study emotional reactions to music is to attempt to obtain direct links between musical surface features such as tempo and a listener’s response. However, such an analysis ultimately fails to explain why emotions are aroused in the listener. In this article, we propose an alternative approach, which seeks to explain musical emotions in terms of a set of underlying mechanisms that are activated by different types of information in musical events. We illustrate this approach by reporting a listening experiment, which manipulated a piece of music to activate four mechanisms: brain stem reflex; emotional contagion; episodic memory; and musical expectancy. The musical excerpts were played to 20 listeners, who were asked to rate their felt emotions on 12 scales. Pulse rate, skin conductance, and facial expressions were also measured. Results indicated that target mechanisms were activated and aroused emotions largely as predicted by a multi-mechanism framework.
Music moves us. Indeed, it may evoke anything from mere arousal and basic emotions such as happiness and sadness to complex emotions such as nostalgia. Such emotional responses add personal significance to the processes of music perception and cognition, and constitute one of the main reasons for engaging with music (Juslin, 2011).
Evidence of musical emotions comes from many strands of research 1 (for a review, see Juslin & Sloboda, 2013). In accordance with the multi-component view of emotions (Oatley, Keltner, & Jenkins, 2006), music may influence feelings (Pike, 1972), expressions (Witvliet & Vrana, 2007), psychophysiological reactions (Krumhansl, 1997), brain activation (Brown, Martinez, & Parsons, 2004), action tendencies (Fried & Berkowitz, 1979), as well as various indirect measures (Västfjäll, 2010). Moreover, there is some evidence of a ‘synchronization’ of the various components (Lundqvist, Carlsson, Hilmersson, & Juslin, 2009). However, the question of why music arouses emotions has largely remained a mystery.
Explaining musical emotions
In the past, the most common approach to understand what causes an emotion to music has been to map factors (in the music, the listener, and the situation) that somehow influence emotions. Such research may provide valuable information about the conditions under which musical emotions occur. Yet producing lists of factors that affect emotions does not actually constitute an explanation of why they occur; this lesson was learned long time ago in general research on emotions, where investigators soon discovered that it is difficult to find objective situation predictors that will invariably influence different people in the same way: different people tend to react in different ways to the ‘same’ stimulus. This realization forms the basis of theories of emotion causation (for a review, see Moors, 2009). To explain why an emotion occurs, we have to understand how the emotion induction process works – that is, the kind of ‘information-processing’ that leads to the arousal of a specific emotion. A theory of emotions must be able to explain both why a given stimulus arouses an emotion (‘elicitation’) and why the aroused emotion is of a particular kind (‘differentiation’). The fundamental psychological process by which this is achieved is referred to as the underlying mechanism.
The most commonly discussed mechanism previously is cognitive appraisal (Scherer, 1999). This notion refers to a process whereby an emotion is aroused in a person because an event is interpreted as having implications for the person’s goals in life (e.g., in terms of goal congruence, coping potential, or compatibility with social norms). The problem is that music does not usually have implications for life goals. 2 In fact, results so far suggest that cognitive appraisal is rarely the cause of musical emotions (e.g., Juslin, Liljeström, Västfjäll, Barradas, & Silva, 2008). Thus it seems necessary to consider alternative mechanisms which are more relevant in the case of music. Much research has proceeded as if there were only one ‘causal route’ to emotions, but emotions can be aroused in a number of ways (Izard, 1993).
Thus a better approach to explain musical emotions than to produce lists of influencing factors might be to develop theories of the underlying mechanisms. Meyer (1956), one of the pioneers in the music and emotion field, recognized early on the importance of psychological theory: ‘Given no theory as to the relation of musical stimuli to affective responses, observed behavior can provide little information as to either the nature of the stimulus, the significance of the response, or the relation between them’ (p. 10). Meyer himself chose to concentrate on the role of musical expectancy, eloquently elaborating his theory in what is probably the most cited book in the field. Still, as pointed out by Huron (2006), ‘music can also evoke emotions through many other means – apart from whether the sounds are expected or not’ (p. 365). And ironically, though Meyer was clear about the importance of psychological theory in explaining emotions, his ideas would eventually lead to a neglect of such theorizing: given a ‘competent’ listener, the study of emotions could be comfortably reduced to the study of musical structure.
Unfortunately, emotional responses to music can never be explained merely in terms of the musical structure – what matters is how psychological mechanisms of specific listeners in specific contexts engage with selected aspects of the musical structure. Yet a recent search of the literature indicated that few articles have proposed or tested any theory about mechanisms (Juslin & Västfjäll, 2008). In general, emotions to music have been studied without respect to how they were aroused. Researchers have tried to obtain direct links between surface features and aroused emotions. This approach has prevented us from explaining individual differences (e.g., that the same piece of music can evoke different emotions in different listeners), and has led to overly simple conclusions (e.g., that ‘fast tempo evokes positive emotions’; Gomez & Danuser, 2007, p. 380). The solution to this dilemma, we argue, is a theory-based approach to musical emotions that goes beyond mere surface features.
A unified theoretical framework: BRECVEMA
Although mechanisms have been mostly neglected in previous studies, several scholars have discussed possible mechanisms, typically focusing on a few possibilities (e.g., Berlyne, 1971; Dowling & Harwood, 1986; Meyer, 1956; Sloboda, 1998). Scherer and Zentner (2001) offered a more extensive overview of mechanisms and moderators of emotion induction that might be involved, but the most comprehensive attempt to outline a set of mechanisms is the BRECVEMA framework (Juslin & Västfjäll, 2008; Juslin, Liljeström, Västfjäll, & Lundqvist, 2010), which currently (Juslin, in press) features eight mechanisms (besides appraisal): 3
Brain stem reflex: a hard-wired attention response to simple acoustic features such as extreme or increasing loudness or speed (Simons, 1996);
Rhythmic entrainment: a gradual adjustment of an internal body rhythm (e.g., heart rate) towards an external rhythm in the music (Harrer & Harrer, 1977);
Evaluative conditioning: a regular pairing of a piece of music and other positive or negative stimuli leading to a conditioned association (Blair & Shimp, 1992);
Contagion: an internal ‘mimicry’ of the perceived voice-like emotional expression of the music (Juslin, 2001);
Visual imagery: inner images of an emotional character conjured up by the listener through a metaphorical mapping of the musical structure (Osborne, 1980);
Episodic memory: a conscious recollection of a particular event from the listener’s past triggered by the music (Baumgartner, 1992);
Musical expectancy: a reaction to the gradual unfolding of the musical structure and its expected or unexpected continuation (Meyer, 1956); and
Aesthetic judgment: a subjective evaluation of the aesthetic value of the music based on an individual set of weighted criteria (Juslin, in press).
By synthesizing theory and data from many domains mostly outside music, Juslin and Västfjäll (2008) were able to develop the first set of hypotheses that may help researchers to distinguish among the mechanisms. The hypotheses concern such aspects as the information focus, key brain regions, representations, and extent of cultural impact. (For an update of the hypotheses, see Juslin, in press.) One important implication is that it may not be sufficient to study musical emotions in general. In order for data to contribute in a cumulative fashion to our knowledge, we need to specify, as far as possible, the mechanism involved in each case.
Empirical studies of mechanisms
As noted earlier, few studies so far have investigated specific mechanisms empirically. Steinbeis, Koelsch, and Sloboda (2006) tested Meyer’s (1956) theory of musical expectancy, and Baumgartner (1992) and Janata, Tomic, and Rakowski (2007) surveyed memories linked to music. A broader range of psychological mechanisms were surveyed by Juslin et al. (2008) and Dingle, Savill, Fraser, and Vieth (2011). However, self-reports of mechanisms from field studies must be interpreted with caution. The listeners may sometimes be unaware of the true cause of their emotions (Fox, 2008, p. 36). Thus mechanisms that are more implicit in nature (e.g., conditioning, expectancy) might be underreported relative to mechanisms that are more ‘salient’ in conscious experience (e.g., episodic memory). Most importantly, field data do not enable researchers to draw strong conclusions with regard to causal relationships because of insufficient experimental control. Thus it is necessary to conduct experiments in a laboratory setting where specific mechanisms can be manipulated so as to produce immediate effects on behavioral measures.
There are at least two complementary experimental strategies in exploring mechanisms. First, one can attempt to find existing pieces of music that feature characteristics relevant to a particular mechanism. The use of ‘real’ music makes it easier to arouse an intense emotion in listeners. However, the internal validity may be limited, due to a lack of experimental control. Thus a second approach is to directly manipulate features of the music (as well as the listener and the situation) to activate a specific mechanism, using highly controlled synthesized (or re-synthesized) pieces. Although such manipulations could suffer from low ‘ecological validity’, they allow stronger conclusions regarding causal relationships. In the present experiment, we explored a manipulative approach. 4
To separate the effects of different mechanisms, one must be able to activate, as well as suppress, specific mechanisms in each case. This may be done in three ways (Juslin, in press). First, one can select or manipulate pieces of music in such a manner as to provide or withhold information required for a certain mechanism to be activated, while leaving or removing other information (the principle of information selection). Second, one can design the test procedure in such a way that it will prevent the type of information-processing required for a mechanism to be activated (the principle of interference). Third, one can manipulate listeners, by creating specific memories during the experimental procedure prior to presenting the ‘target’ stimulus (the principle of procedural history), in order to investigate mechanisms such as conditioning. In the present experiment, we focused on the first of these principles.
Rationale for the present study
The main aim of this experiment was to make a first attempt to manipulate some of the mechanisms underlying musical emotions. Specifically, we intended to test whether it would be possible to selectively activate the four mechanisms brain stem reflex, contagion, episodic memory, and musical expectancy, respectively, such that we could predict to some extent the emotions (e.g., happiness) that would be aroused. The selection of mechanisms was based on practical considerations: given the complexity of the manipulations, it would not be possible to test all mechanisms at the same time, and some mechanisms (e.g., visual imagery) seemed more difficult to manipulate than others (e.g., expectancy). All music excerpts were based on an original piece which was manipulated in different ways to activate each target mechanism. Before describing the experiment, we summarize the four mechanisms:
Brain stem reflex refers to a process whereby an emotion is evoked by music because some fundamental acoustic characteristic of the music is taken by the brain stem to indicate a potentially important and urgent event that needs attention. In music, this may involve sounds that are sudden, loud, dissonant, or feature a fast or rapidly increasing temporal pattern. Brain stem reflexes are quick, automatic, and unlearned. A response to an auditory event suggesting ‘danger’ can be emitted as early as at the level of the inferior colliculus of the brain stem (e.g., Brandao, Melo, & Cardoso, 1993). As a consequence, the brain stem reflex can quickly evoke arousal so that attention can be selectively directed at sensory stimuli of potential importance. In the present study, a brain stem reflex was aroused by inserting an extreme sound event into the original piece (for details, see Method section). We expected this version to arouse mainly surprise in listeners (Simons, 1996).
Emotional contagion refers to a process whereby the emotion is evoked because the listener perceives the emotional expression of the music, and then ‘mimics’ or ‘mirrors’ this expression internally (Juslin, 2001). Contagion has primarily been studied in regard to facial expression (e.g., Hatfield, Cacioppo, & Rapson, 1994), although Neumann and Strack (2000) found evidence of contagion from emotion in speech. Because music often includes acoustic patterns that are similar to those that occur in emotional speech (e.g., Juslin & Laukka, 2003), it has been theorized that we get aroused by voice-like aspects of musical expression through a process in which a neural ‘module’ responds quickly and automatically to specific stimulus features that lead us to ‘mimic’ the perceived emotion internally (Juslin, 2001, p. 329). In the present study, a contagion reaction was produced by featuring an expressive, voice-like cello timbre within a sad emotional expression (Juslin & Laukka, 2003, pp. 792–995). We expected this version to arouse mainly sadness in listeners.
Episodic memory refers to a process whereby an emotion is aroused in a listener when the music evokes a memory of a specific event in life (Baumgartner, 1992; Janata et al., 2007). When the memory is evoked so is the emotion linked to this memory. Such emotions can be intense, maybe because the psychophysiological pattern to the original event is stored along with the memory trace (e.g., Lang, 1979). Listeners use music to remind them of valued past events, indicating that music often serves an important nostalgic function in everyday-life contexts (e.g., Sloboda, O’Neill, & Ivaldi, 2001). One might expect episodic memories evoked by music to be particularly emotionally ‘vivid’ for music from adolescence and young adulthood. (For empirical support, see Schulkind, Hennis, & Rubin, 1999.) In this study, episodic memories were evoked by inserting a short musical quote from the soundtrack of a well-known and appreciated movie series (which came out when the participants were in their adolescence/young adulthood) into the original piece. We expected this version to arouse mainly nostalgia and happiness in listeners.
Musical expectancy refers to a process whereby an emotion is evoked when a specific feature of the music violates, delays, or confirms a listener’s schematic expectations about the continuation of the music, as famously theorized by Leonard Meyer (1956). The expectations are based on the listener’s previous experiences of the same style. Although Meyer’s theory is highly regarded, it has not stimulated much research with regard to emotions. A seminal study by Steinbeis et al. (2006) demonstrated, however, that violations of musical expectancies may also evoke emotions in listeners. In the present study, the musical expectancy mechanism was activated by inserting violations of melodic and harmonic expectations into the original piece. Due to the repeated transgressions of musical expectations, we expected this version to arouse mainly anxiety (Meyer, 1956, p. 27) and irritation (Huron, 2006, p. 348) in listeners.
In summary, we manipulated a piece of music to selectively activate four mechanisms (brain stem reflex, contagion, episodic memory and musical expectancy). The resulting four musical excerpts were played to 20 listeners, who reported aroused emotions. In accordance with a multi-component view of emotions (e.g., Scherer & Zentner, 2001), we used multiple measures (verbal self-reports, facial expressions, and autonomic activity) to enhance the validity of our conclusions about aroused emotions.
Method
Participants
Twenty listeners (10 males and 10 females, aged 20 – 61 years, M = 28) took part in the experiment. They were either paid or given course credits for their anonymous and voluntary participation. Most of the participants were students who were recruited by means of posters throughout Uppsala University. Sixty percent of them played at least one musical instrument. Forty percent of these had received music education; the rest were self-taught.
Design
The experiment used a within-subjects design, with ‘target’ mechanism as the independent variable (four levels: contagion, brain stem reflex, episodic memory, and musical expectancy) and self-reported feeling (15 scales), facial expression (zygomaticus and corrugator muscles), and autonomic activity (skin conductance and heart rate) as dependent variables.
Musical material
The experiment featured four music excerpts which were synthesized using the Vienna Symphonic Library samples to obtain highly realistic performances of music. These excerpts were all based on a short piece by Ernest Bloch (1880–1959) titled Prayer, from Jewish Life No. 1, composed for cello and piano, with expression marked as andante moderato (circa 80 bpm). This is a slow, lyrical, and expressive piece which has been recorded a few times but is not generally well known. In the present study, we used an excerpt of Prayer consisting of the A1–A2–B1–D1–D2 sections of the piece (the original structure is: A1–A2–B1–C–D1–D2–A3–A4–Coda). The excerpt is roughly 120 seconds in duration. 5
Contagion
The starting point was the contagion version, which served as the ‘template’ for all the other versions. The contagion mechanism is believed to be activated by a particularly moving emotional expression in the music, and it is assumed that the effect is strengthened by a voice-like lead part, either a real voice or a musical instrument reminiscent of a human voice. It has often been argued that ‘the cello is the closest-sounding instrument to the human voice’, 6 and previous data suggest that ‘sad’ performances are perceived as particularly expressive (Juslin, 1997, p. 245). Thus a high-quality cello performance with a sad expression was considered ideal for this condition. To meet this demand, we used a digital rendition of Prayer created by Jay Bacal (available online at http://www.vsl.co.at/en/67/245/255.vsl). This particular version of the work has been made to resemble a ‘real’ human performance and includes a number of expressive performance variations in terms of dynamics, micro-timing and articulation styles. (The performance variations were crafted using ‘Vienna Symphonic Library, Solo Strings I’). Based on their review of 145 studies of music and speech, Juslin and Laukka (2003, pp. 792–795) described patterns of acoustic cues associated with basic emotions, and the present version of Prayer is consistent with an expression of sadness because it features slow tempo, low sound level, legato articulation, and slow tone attacks. Moreover, vibrato contributes to a contagion reaction through its relationship to vocal expression of emotion (e.g., Juslin & Scherer, 2005). Consistent with a ‘mimicry’ response to the voice-like emotional expression of the music, we expected the contagion version to arouse mainly sadness in listeners.
Brain stem reflex
The brain stem reflex mechanism is believed to be activated by ‘extreme’ acoustic features such as high sound level, quick attack, and sharp timbre (Juslin et al., 2010). This mechanism was therefore targeted by inserting a novel sound event into the existing piece. The contagion version was taken as the starting point. However, in this and all other non-contagion versions, the voice-like cello playing the melody was replaced by piano samples (PMI Bosendorfer 290 by Vienna Symphonic Library). More importantly, a sudden, loud chord with broad spectrum and quite sharp attack was inserted at the beginning of the 10th bar of the piece. The goal was to mimic naturally occurring brain stem reflex events such as the well-known drum strokes in Joseph Haydn’s Symphony No. 94 (‘Surprise’) or Gustav Mahler’s Symphony No. 10, ‘Finale’. Special care was taken to calibrate the sound level of the event, although pre-testing indicated that the peak sound level did not quite have to reach the level used in research on the acoustic ‘startle’ response (Levenson, 2007, p. 163) to produce a reliable effect on the listener. Thus a ‘peak’ sound level of 72 dB(a) was considered sufficient. Because brain stem reflexes involve ‘local’ events, we used a shorter excerpt for this mechanism (1 min 30 s) to reduce the time lag from the critical event to the self-report of feelings. We expected the brain stem reflex version to arouse mainly surprise in listeners, consistent with an early response that occurs before any elaborate classification of the sound event has taken place (Simons, 1996).
Episodic memory
The episodic memory mechanism is thought to be activated by salient melodic themes, which are associated with emotionally charged events that the listener remembers. To evoke music-associated episodic memories, without having to encode them during this experiment, we inserted a musical ‘quote’ from the soundtrack of an extremely well-known movie series, Star Wars. The melodic theme, featured in John Williams’ original movie soundtrack (1977), was expected to be familiar to many people who grew up with the Star Wars movies over the last three decades. The theme, referred to as Binary sunset, was inserted at the position of the second repetition of the initial theme in Prayer, beginning at bar 5. It should be noted that the tempo and harmony of the theme are very similar to those of Prayer and that the melody has a similarly wistful and sad character to it. However, due to memories associated with the Star Wars movies, we expected this version to arouse mainly nostalgia and happiness in listeners.
Musical expectancy
The musical expectancy mechanism is believed to be activated by unexpected melodic, harmonic, or rhythmic sequences (Meyer, 1956). Hence, in order to activate this mechanism, we altered the piece to violate melodic and harmonic expectations (while keeping the overall structure and performance nuances intact). Each musical phrase was subjected to 1–3 random transpositions of ± 6 semitones. The transpositions were carried out by hand, using a logic by which the shape of a melodic phrase was preserved, but its harmonies were rendered more or less ‘unconventional’. (The manipulations were evaluated by the authors, in terms of musical plausibility, and slightly resembled the harmonic choices characteristic of Stravinsky’s serial period.) The result of these manipulations was a decrease in tonal stability. Thus in the normal version, for instance, 65% of the 5-second segments were above the ‘typical’ key correlation in classical music (r = .66, based on a sample of classical music and using an audio-based key finding algorithm created by Gomez, 2006), whereas in the expectancy version, only 27% of the segments were above the typical correlation. Though Meyer (1956) is often interpreted as saying that expectancy violations mainly induce undifferentiated arousal, Meyer did consider more specific emotions as well. For example, he argued that that uncertainty (in music and in real life) might arouse anxiety and apprehension (pp. 27–29). Similarly, Huron (2006) argued that the ‘contrarian aesthetic’ of modernist composers such as Stravinsky will evoke irritation or unease in most listeners (p. 350). Therefore we expected the musical expectancy version to arouse mainly anxiety and irritation in listeners.
Acoustic measures
General acoustic characteristics of all four conditions, extracted using the music information retrieval (MIR) toolbox (Lartillot, Toiviainen, & Eerola, 2008), are presented in Figure 1, along with reference levels based on a large-scale analysis of 482 examples of classical music.

Acoustic characteristics of the experimental conditions.
Experiential measures
We measured the subjective feeling component of the aroused emotions in listeners by means of a 12-item adjective scale, which was developed at Uppsala University specifically for the measurement of emotions to music (see Appendix 1). The scale represents a kind of compromise among the response formats currently used in the music-emotion field (Zentner & Eerola, 2010) since the selected terms include ‘basic’ emotions characteristic of discrete emotion theories (Izard, 1977), cover all four quadrants of a circumplex model in terms of valence and arousal (Russell, 1980), and feature possibly more music-related terms such as nostalgia, expectancy, and awe (Juslin & Laukka, 2004). (The selected terms roughly cover the nine factors of the Geneva Emotional Music Scale [GEMS]-9, proposed by Zentner, Grandjean, & Scherer [2008], but since there exists no validated version of GEMS-9 in Swedish, and the scale lacks terms that were needed in this study [e.g., surprise], we decided to use a customized scale.) The list features the emotions reported most commonly in prevalence studies. In addition to the 12 emotions, listeners also rated their liking and familiarity for each version and whether they experienced any ‘chills’ (defined as piloerection; ‘gåshud’ in Swedish everyday terminology). All ratings were made on a scale from 0 (not at all) to 4 (a lot), except for ‘chills’, which were reported in a dichotomous fashion (see Appendix 1).
In addition to reporting their feelings, the participants also filled out a second response sheet (MecScale) for each musical excerpt (see Appendix 2). This sheet purported to capture the mechanisms that had occurred and consisted of eight simple questions, each targeting one of the mechanisms in the BRECVEM framework (Juslin et al., 2010) plus appraisal. The idea was that, although some of the mechanisms are ‘implicit’ in nature, they might co-occur with subjective impressions that can be reported by listeners. For example, a listener who becomes aroused through the expectancy mechanism might find the music difficult to predict, whereas a listener who becomes aroused through the episodic memory mechanism could report having a conscious recollection of a previous event. Though self-reports of this type cannot be taken as veridical, we submit that they can at least complement other indices. After pilot testing and refinement, eight items were included here for exploratory purposes, indexing: (1) brain stem reflex; (2) rhythmic entrainment; (3) episodic memory; (4) conditioning; (5) visual imagery; (6) contagion; (7) musical expectancy; and (8) cognitive appraisal.
Psychophysiology: Facial expression and autonomic activity
To enhance the validity of the measurement of emotion, we also measured physiological indices. The goal was to obtain evidence of an emotional response in order to distinguish felt emotions from mere perception of emotions. In the former case, we would expect to discover some changes in physiological indices (as part of an emotional reaction), whereas in the latter case there would be no reason to expect such changes. Furthermore, the goal was to be able to distinguish emotions by locating them in one of the four quadrants of the ‘circumplex’ model of affect (Russell, 1980). For instance, if listeners report happiness, their psychophysiological responses should suggest high arousal and positive valence. These emotion dimensions should be evident from measures of autonomic activity and facial expressions respectively, as argued in the previous literature (Andreassi, 2007, pp. 248–251).
Psychophysiological indices were obtained using the BIOPAC MP 150 System (Biopac Systems, Santa Barbara, CA) and the AcqKnowledge version 4.1 software. Skin conductance level (SCL) was measured using the GSR100C Electrodermal Activity Amplifier module and EL507 disposable snap electrodes that were placed on the palmar surface of the non-dominant hand, at the thenar and the hypothenar eminences (Fowles et al., 1981). Skin conductance was recorded in microSiemens (μmho).
Pulse rate (PR) was measured based on the arterial pulse pressure, using the PPG 100C Pulse Plethysmogram Amplifier and the TSD200 (Biopac Systems, Santa Barbara, CA) photoplethysmogram transducer attached to the index finger on the non-dominant hand. The TSD200 uses a matched infrared emitter and photodiode detector that transmits changes in infrared reflectance which results from varying blood flow. Band-pass filter was used to remove frequencies below 0.05 Hz and above 10 Hz. Pulse rate was recorded in beats per minute (bpm).
Bipolar facial electromyography (EMG) recordings were made from the left corrugator and zygomatic muscle regions in accordance with Fridlund and Cacioppo’s (1986) guidelines. Before attaching the 4 mm miniature surface Ag/AgCl electrodes, filled with EMG gel (GEL 100, Biopac Systems), we cleansed the participant’s skin to reduce interelectrode impedance. All impedance was reduced to less than 10 kΩ (Fridlund & Cacioppo, 1986). The electrodes were connected to the EMG100C amplifier module with low- and high-pass filters set at 500 Hz and 10 Hz, respectively, and notch filters set at 50 Hz were used to diminish interference with the electric mains. The sampling rate was set at 1.000 Hz. Facial EMG was measured in microvolts (μV) and analyzed using the root mean square (RMS).
Mean values for PR, SCL, and facial EMG (zygomaticus, corrugator) were calculated for baseline and the experimental conditions. Baseline recordings were obtained prior to the listening test during relaxation under silent conditions. During the listening test there was a break between musical excerpts to allow levels to return to baseline before the next stimulus.
Procedure
When participants arrived at the laboratory, they were seated in a comfortable armchair and received the following instructions (translated from Swedish): Welcome to the music laboratory. You will soon listen to a selection of short pieces of music. After each piece we want you to describe your experience of the music. This should be done in two ways: first, we want you to describe your feelings during the music on a response sheet. This sheet consists of 12 emotions. Your task is to rate how much of each emotion you felt on a scale from 0 (‘not at all’) to 4 (‘a lot’). You also report whether you experienced ‘chills’, as well as how much you liked the music and how familiar you were with it. Then we want you to fill out a second response sheet featuring eight questions concerning other aspects of your music experience. You will also be fitted with some electrodes so that we can conduct physiological measurements. These electrodes are completely harmless and do not emit strong radiation or electricity. However, in order to obtain as accurate measurements as possible, it is important that you don’t touch any of the electrodes during the experiment. Watches and rings have to be removed and your cell phone must be switched off. First, you will be asked to relax for a while during silence. Then the actual listening test begins. When the playback of a piece of music ends, there will be a brief intermission before the next piece begins, to give you time to fill out the two response sheets. Then you will relax again for a while before the next piece begins. Note that any emotion you may experience during listening need not correspond to the music’s emotional expression. That is, you should rate your own emotions, not what the music expresses. After the experiment you will be asked to respond to some background questions.’
Participants were tested individually in a soundproofed room, and listened to the music through a pair of high-quality loudspeakers (Dali Ikon 6 MK2). Sound level was pre-set to a comfortable level and was held constant across participants. Stimulus order was randomized for each participant, whereas the order of rating scales was kept constant across participants. After the listening test, the participants filled out a short questionnaire with regard to various background variables (e.g., age, gender, music education). They were also interviewed about the experiment. However, the participants were not fully de-briefed about the purpose of the experiment until all had been tested, to prevent confounding effects (Neale & Liebert, 1986). An experimental session lasted about 50 minutes.
Results
Self-reports: Ratings
To evaluate the effects of target mechanism on listeners’ self-reports, we conducted an analysis of variance (ANOVA) with mechanism as the within-subjects factor (four levels) on each rating scale. Table 1 shows the results. As can be seen, mechanism yielded significant effects on all rating scales, except interest-expectancy, disgust-contempt, and pride-confidence. The right-most column presents effect sizes in terms of eta-squared. Beginning with the emotions, the largest effects occurred for the scales sadness-melancholy, anxiety-nervousness, surprise-astonishment, admiration-awe, happiness-elation, and nostalgia-longing (ηp2 ≥ .306). Table 1 (lower part) also shows the results from the ANOVAs on the two additional scales liking and familiarity. As seen, mechanism produced significant effects on both scales, though the effect was larger on the liking scale.
Analysis of variance for listeners’ ratings.
Note: df = mechanism (3), error (57). aBonferroni corrected from α = .05 to α = .0036.
Figure 2 shows means and standard errors for listeners’ ratings on all scales that showed significant effects of the experimental manipulation in the ANOVAs (see preceding paragraph). We begin by looking at the six emotions related to our predictions, and then look at the additional emotions featured in the response sheet. For the six emotions involved in our predictions, we conducted planned comparisons (t-tests) between the ‘target’ mechanism and the other three mechanisms to examine whether the ‘target’ mechanism received the highest ratings. For the remaining six emotions and the ratings of familiarity and liking, we conducted post-hoc tests in the form of Tukey’s HSD to explore further contrasts.

Means and standard errors for the listeners’ ratings on each scale as a function of target mechanism.
Predicted emotions
Inspection of Figure 2 suggests that the episodic memory version produced the highest mean rating on the happiness-elation scale. Planned comparisons confirmed that the episodic memory version received significantly higher ratings than the brain stem reflex (t = −3.37, p < .01), contagion (t = −5.15, p < .001), and musical expectancy (t = −3.94, p < .001) versions.
Similarly, on the sadness-melancholy scale (see Figure 2), planned comparisons showed that the contagion version received significantly higher ratings than the brain stem reflex (t = 5.20, p < .001), episodic memory (t = 16.66, p < .001), and musical expectancy (t = 13.85, p < .001) versions.
On the surprise-astonishment scale (see Figure 2), planned comparisons indicated that the brain stem reflex version received higher ratings than the contagion (t = 4.88, p < .001), episodic memory (t = 3.74, p < .01), and musical expectancy (t = 3.38, p < .01) versions.
Concerning the anger-irritation scale, inspection of Figure 2 suggests that the musical expectancy version received the highest mean rating, and planned comparisons revealed that the musical expectancy version received significantly higher ratings of anger-irritation than the episodic memory version (t = 3.86, p < .01). However, the remaining two contrasts were not significant. Note that the anger-irritation ratings were fairly low overall.
For nostalgia-longing, planned comparisons revealed that the episodic memory version received significantly higher ratings than the brain stem reflex (t = 3.52, p < .01) and musical expectancy (t = 4.93, p < .001) versions. In contrast, though the episodic memory version did obtain higher mean ratings than the contagion version (see Figure 2), this difference fell short of statistical significance (t = 1.98, p = .06).
Finally, as regards anxiety-nervousness, planned comparisons revealed that the musical expectancy version produced significantly higher ratings than the episodic memory version (t = 4.24, p < .001). However, the remaining two contrasts were not significant.
Additional emotions
Post-hoc tests (Tukey’s HSD) on the additional emotion scales featured (see Figure 2) revealed significant contrasts for the episodic memory and musical expectancy versions. The episodic memory version produced higher ratings of calm-contentment than the contagion (p < .01) and musical expectancy (p < .001) versions. The musical expectancy version, however, received lower ratings of love-tenderness than the brain stem reflex (p < .05), contagion (p < .01), and episodic memory (p < .001) versions; and lower ratings of admiration-awe than the brain stem reflex (p < .05), contagion (p < .01), and episodic memory (p < .001) versions (the remaining differences were not statistically significant).
Familiarity and liking
Figure 2 also presents listeners’ mean ratings of familiarity and liking as a function of target mechanism. Starting with familiarity, it can be seen that the episodic memory version was rated as familiar by many listeners, whereas the other versions were not. Post-hoc tests (Tukey’s HSD) confirmed that the episodic memory version was rated as more familiar than the contagion (p < .01) and musical expectancy (p < .001) versions, but the contrast with the brain stem reflex version fell just short of statistical significance (p < .06).
Regarding liking, inspection of Figure 2 indicates that the musical expectancy version was less liked than the other versions – particularly the contagion and the episodic memory versions. Post-hoc tests confirmed that the musical expectancy version yielded lower ratings of liking than the contagion (p < .001) and episodic memory (p < .001) versions; and, further, that the brain stem reflex version yielded lower ratings of liking than the contagion (p < .01) and episodic memory (p < .01) versions. (Remaining differences were not significant.) Note that a low mean value for familiarity (M = 0.80) and a fairly high mean value for liking (M = 2.44) suggests that the musical excerpts were mostly unfamiliar to the listeners, but that they were reasonably well liked on the whole. 7
Self-reports: Chills
The occurrence of ‘chills’ (i.e., a tingling sensation of piloerection) was reported in a dichotomous manner (did/did not occur) by listeners for each musical stimulus. The results indicated that the music evoked ‘chills’ in 12 of the 80 trials (15%). The contagion version yielded the largest number (7), followed by the brain stem reflex (3), episodic memory (2), and musical expectancy (0) versions. A non-parametric test, in terms of a Friedman ANOVA, confirmed a significant effect of mechanism on the number of self-reported chills (χ2 = 9.75, df = 3, p < .05, Kendall’s W = .163). However, there were large individual differences, such that 50% of the listeners did not experience any ‘chills’ at all.
Self-reports: MecScale
The listeners also responded to eight items which targeted specific mechanisms (see Appendix 2). A primary question is whether the listeners’ responses may predict the target mechanisms. To address this question, we computed the non-parametric Spearman’s rho (ρ) correlations between the four target mechanism conditions and the eight mechanism items featured in the MecScale (all variables coded dichotomously). To the extent that MecScale has predictive value, we would expect only four of the 32 (possible) correlations to be statistically significant as well as positive in direction; more specifically, those correlations that involve items corresponding to the four ‘target’ mechanisms of the experiment. All other correlations should be negative and/or non-significant. The results indicated that only two out of the 32 correlations (circa 6%) deviated from this pattern. There were thus significant and positive correlations between target mechanism condition and corresponding MecScale item for the brain stem (ρ = .51), contagion (.33), episodic memory (.28) and musical expectancy (.25) versions. In addition, however, the imagery item correlated significantly with the contagion version (.28) and the expectancy item correlated positively with the brain stem version (.30; all ps < .05).
In order to investigate whether the MecScale self-reports would also be predictive of the feelings experienced by listeners, we carried out one simultaneous – as opposed to stepwise – multiple regression on the listeners’ ratings for each of the six emotion scales included in our predictions. The dependent variable was the rating on each scale (continuously coded) and the independent variables were the scale items corresponding to each mechanism (dichotomously coded). Table 2 presents the results. Note that ‘target’ mechanism items (set in bold) received significant beta weights (β) on their expected scales (e.g., contagion → sadness-melancholy), whereas ‘non-target’ items generally did not. (The singular exception was anger-irritation for which no item was significantly related to listeners’ ratings). In addition to the expected links, the expectancy item also predicted ratings of surprise-astonishment to some degree. However, the multiple correlations in the left-most column of Table 2 suggest that the prediction was far from perfect (mean R = .45).
Summary of multiple regression analyses: Prediction of emotion ratings from responses to MecScale questionnaire (see Appendix 2).
Notes: R = multiple correlations; β = beta weights; df = 8, 71; * p < .05; expected correlations with ‘target’ mechanism items are set in bold.
Psychophysiology: Facial expression and autonomic activity
To evaluate the manipulation of target mechanism on psychophysiology, we conducted an ANOVA with mechanism as the within-subjects factor (four levels) on each physiological index. All data were z-transformed prior to analyses in order to reduce the impact of differences in baseline. Mechanism yielded a highly significant overall effect on both skin conductance level (F4,76 = 27.256, MS = 11.783, p < .001) and EMG zygomaticus (F4,76 = 5.187, MS = 4.311, p < .001), with the former showing a larger effect (η2 = .589) than the latter (.214). No significant effect of mechanism was observed for either pulse rate or EMG corrugator.
Figure 3 presents means and standard errors for listeners’ skin conductance level and EMG zygomaticus muscle activity as a function of target mechanism. Beginning with skin conductance, it may be seen that all four experimental conditions yielded higher levels than baseline, which suggests that all versions were somewhat arousing. It is also clear, however, that the brain stem version yielded the highest skin conductance level. Planned comparisons between baseline and each of the experimental conditions indicated that the brain stem reflex (p < .0001), the episodic memory (p < .05), and the musical expectancy versions (p < .05) all produced significantly higher skin conductance level than baseline, whereas the contrast with the contagion version was not significant (p = .11).

Means and standard errors for the listeners’ skin conductance level and zygomaticus muscle activity (z-scores) as a function of target mechanism.
Moving on to the facial EMG zygomaticus data, inspection of Figure 3 suggests that the two versions that involved predictions for negative emotions (contagion → sadness, musical expectancy → anxiety, irritation) produced little zygomaticus muscle activity, consistent with a negatively valenced response, whereas the two versions that involved predictions for neutral (brain stem reflex → surprise) or primarily positive emotions (episodic memory → nostalgia, happiness) produced more zygomaticus muscle activity, consistent with a positively valenced response. However, planned comparisons among baseline and each of the conditions revealed that only the contrast involving the brain stem reflex version was significant (p < .05).
Discussion
Summary of findings
This study explored four of the mechanisms believed to underlie emotional reactions to music, using theoretically based manipulations of a short original piece of music. The results enable us to draw the following tentative conclusions. First, the four experimental conditions aroused emotions in listeners primarily in accordance with our predictions: the listeners’ self reports indicated that the brain stem reflex version aroused the most surprise; the contagion version aroused the most sadness; 8 the episodic memory version aroused the most happiness and nostalgia; and the musical expectancy version aroused the most irritation. The singular exception was that, contrary to our prediction, the musical expectancy version did not arouse the most anxiety, though it did arouse more anxiety than two of the other versions (Figure 2). Therefore, though the effects were not as clear-cut as one would hope, they were in line with our predictions.
Second, additional measures, in terms of facial expressions and autonomic responses, confirmed that listeners actually experienced felt emotions – as opposed to simply perceiving the emotions expressed in the music. 9 The results involved significant differences, at least for some of the indices, both between baseline and experimental conditions (which confirms that the stimuli aroused emotions) and between the conditions (indicating that the stimuli aroused different emotions). In addition, based on the assumption that skin conductance level indexes autonomic arousal and that facial expressions index emotional valence (e.g., Andreassi, 2007; Fox, 2008), it may be concluded that the results were consistent with listeners’ self-reports of feelings: for instance, self-reported surprise involved higher skin conductance level and more zygomatic muscle activity than did self-reported sadness, consistent with how these emotions are typically located on the dimensions of arousal and pleasure (Russell, 1980).
Third, the obtained results do not simply reflect commonly studied surface features of the music such as tempo, sound level, or timbre. Indeed, inspection of acoustic measures (see Figure 1) shows that these contradict the relationships expected from a ‘traditional’ approach which presumes that perceived and aroused emotions will involve the same acoustic features: for instance, contrary to such music-emotion links, tempo was faster for the sadness-arousing music than for the surprise- and happiness-arousing music; amount of high-frequency energy in the spectrum of the music was higher in the sadness-arousing music than in the anger- and happiness-arousing music; and mean attack time was longer in the anger-arousing music than in the sadness-arousing music. We submit that this is because listeners’ responses are ‘driven’ by underlying mechanisms (e.g., whether a particular memory was evoked, whether the music deviated from stylistic expectations), rather than by surface features as such. We do not claim that listeners’ reactions are typically contrary to surface-feature relationships. They may often be consistent with them – in particular in laboratory studies, where participants are exposed to unfamiliar, experimenter-selected music (for a discussion, see Liljeström, Juslin, & Västfjäll, 2012), which means that some mechanisms (e.g., contagion) are more likely than others (e.g., episodic memory). However, we do believe that mere demonstration that surface features can deviate from ‘traditional’ patterns is sufficient to show that a surface-feature approach (alone) will not provide a satisfactory account of musical emotions. It is not the case that correlations among surface features and emotions constitute a ‘rival’ explanation to mechanisms – they do not constitute an explanation at all. They only move the burden of explanation from one level (why does the slow movement of Beethoven’s Eroica symphony arouse sadness?) to another level (why does slow tempo arouse sadness?). It is a description of the process that ‘mediates’ between surface features and emotions that constitutes an explanation.
Problems and future directions
How can we be sure that the listeners’ emotional reactions were caused mainly by the target mechanism? Although it is rarely possible to completely rule out all rival hypotheses, we find it difficult to conceive of plausible rival hypotheses that can account for the present data. A strength of the experimental design in terms of statistical conclusion validity is that the musical excerpts were fairly simple, and that several aspects of the stimulus (e.g., mode, performance variation, approximate length, instrument arrangement) were kept more or less constant across conditions or controlled (e.g., acoustic characteristics). In many cases, rival hypotheses (i.e., alternative mechanisms) can be ruled out based on the stimuli themselves: because we used music unfamiliar to the listeners, evaluative conditioning appears unlikely. Were conditioned responses to occur in the absence of any common conditioning procedure featured in the experiment, these responses would be idiosyncratic and could hardly explain the consistent effects seen here. Similarly, because the music did not feature a marked pulse, as indicated for instance by pulse clarity estimates in Figure 1, arousal of emotions through rhythmic entrainment appears implausible. In addition, entrainment could hardly explain the specific types of emotions aroused (e.g., nostalgia). The post-hoc reports (MecScale) offer some further tentative support for our conclusions by being predictive of both experimental conditions (i.e., mechanisms) and self-reported subjective feelings. Though we concede that the scale items are only preliminary and require further testing, some of them would seem to have high face validity; for instance, the item on whether the music evoked memories or not is straightforward. Hence we can be reasonably confident that the episodic memory version tended to evoke memories, whereas the other versions did not. Similarly, a brain stem reflex is a distinct event that listeners will hardly fail to detect or report reliably in a questionnaire.
Although our manipulation of mechanisms aroused primarily the predicted emotions, it is apparent that some other emotions were also aroused to some extent, albeit in weaker form. In most cases these effects were consistent with the predicted emotions, for instance in terms of valence. However, the contagion version aroused some nostalgia, which is more typically associated with episodic memories linked to music (cf. Janata et al., 2007; Juslin et al., 2008). This finding appears quite puzzling considering that the music was unfamiliar to the listeners and that they thus presumably did not have specific memories associated with the music. One possible explanation could be that the nostalgia was a by-effect of the music-evoked sadness, rather than an effect of the music. It has been reported previously that one common trigger of nostalgia is negative affect such as sadness (Wildschut, Sedikides, Arndt, & Routledge, 2006). Thus, in the present experiment we may speculate that, once the listener became sad, through contagion, this emotion triggered episodic memories, which aroused nostalgia-longing also.
The least successful of our manipulations – at least in terms of confirming predictions – was the musical expectancy version. Although it did arouse some irritation, it did not arouse anxiety as much as we had expected. Modern listeners might be too familiar with ‘modernist’ music for them to find it anxiety-provoking. It is also possible that expectancy reactions may be stronger if the deviations occur in familiar music. Specifically, with unfamiliar music, the amount of learned music schemata relevant to the case may not be sufficient to evoke strong expectancies. It could seem strange that the expectancy version did not arouse more surprise reactions in listeners, since this is often viewed as a prototypical expectancy reaction (Huron, 2006), but this is consistent with Meyer’s (1956) views. He argued that ‘conditions of active expectation … are not the most favorable to surprise. For the listener is on guard, anticipating a new and possibly unexpected consequence’. Therefore, Meyer proposed, ‘Surprise is most intense where no special expectation is active’ (p. 29), which is precisely the case with brain stem reflexes – there is nothing in the foregoing structure that leads the listener to expect this sudden event. In contrast, when schematic expectations are activated by musical structure, as in the expectancy stimuli, the emotion evoked by deviations is more likely to be anxiety than surprise. In any case, developing (better) expectancy-based music excerpts that arouse strong emotions in listeners is an important goal for future research.
There are several other limitations of this experiment that should be acknowledged. To start with, we only used a single (original) piece of music. Though it might be argued that the original piece mainly serves as a ‘carrier’ of different types of information, the results clearly need to be replicated with other pieces. It should also be noted that lyrics were not included in the present stimuli, and that this may also be an influencing factor in some circumstances (see Dingle et al., 2011; Juslin et al., 2008). Another problem is that the psychophysiological data were not consistent: for instance, we obtained significant overall effects for skin conductance, but not for pulse rate. It may be noted, however, that skin conductance has been more reliably related to arousal-inducing music than pulse rate in previous research (cf. Hodges, 2010). It is unclear why the zygomaticus muscle activity was largest in the brain stem condition, but part of the observed activity might reflect ‘cross-talk’ from other muscles of the middle and lower facial regions during a startle response.
One final issue concerns the context (or lack thereof) in this experiment. The increased control of laboratory studies comes with a price in terms of a lower ‘ecological validity’. For instance, it can perhaps be argued that it is problematic that the participants listened to music that they were not familiar with, and that belonged to a music genre they did not prefer. Still, their ratings suggested that they liked some of the pieces, and that the pieces aroused intense emotions, even ‘chills’. Research has revealed that listeners commonly encounter unfamiliar music they have not selected themselves, and that this does not preclude experiencing strong or positive emotions (Juslin et al., 2008). In this regard at least, the emotion episodes elicited here are not particularly ‘unrepresentative’ of everyday music listening. On the other hand, it must be acknowledged that the social context of musical events is missing in this experiment and that leaving out this important factor could give the impression that the task of predicting emotional reactions to music is more straightforward than it really is. 10 What difference does the context make? Presumably, different aspects of the context determine which mechanisms are actually activated by influencing factors such as the music choice, the listener’s attention, and the functions of the music during specific activities (North & Hargreaves, 2008).
Implications for research and applications
This is the first study to selectively manipulate and contrast distinct target mechanisms in the induction of emotions through music listening. The results suggest that the relationship between music and emotion is too complex to be captured in terms of simple one-to-one links among musical features and emotions (e.g., fast tempo → happiness, slow tempo → sadness). To predict what emotion a piece of music will evoke requires consideration of the underlying mechanism (what is the salient information? How is it processed by the listener?) rather than only looking at surface features. It must be noted that a mechanism approach is more flexible than a surface-feature approach because psychological analyses can be cross-culturally valid at the level of mechanisms even if there is cross-cultural diversity in musical surface features and aroused emotions (Juslin, 2012).
The present results also have some implications for applications of music and emotion research. For instance, most empirical studies of music and health thus far (for an up-to-date review, see MacDonald, Kreutz, & Mitchell, 2012) have tended to focus on quite superficial one-to-one relations between music and response, by-passing any intervening psychological processes in what is really a ‘behaviorist’ approach. 11 We submit that only an understanding of underlying mechanisms will permit practitioners to apply the music in a way that actively manipulates specific mechanisms, so as to achieve predictable effects on emotion and health.
Whatever its limitations, this study has provided some support for the idea that musical emotions are aroused through the ways in which musical events are processed by underlying mechanisms. What matters, ultimately, is not acoustic parameters as such, but what meaning they are given by our psychological processes, a distinction between sound and significance. By highlighting this distinction, and allowing stronger causal conclusions than can be drawn from field data (see Dingle et al., 2011; Juslin et al., 2008; Juslin, Liljeström, Laukka, Västfjäll, & Lundqvist, 2011), the present experiment has illustrated one promising avenue towards explaining the emotional significance of music.
Footnotes
Appendix 1
Appendix 2
Funding
The present study was supported by the Swedish Research Council, through a grant to Patrik N. Juslin (Dnr 421-2010-2129). We are grateful to the reviewers for useful comments on a preliminary version of the manuscript.
