Abstract
The emotional valence of a given stimulus is more quickly identified when such stimulus is preceded by another stimulus with a congruent valence (e.g., a positive word preceded by another positive word), a phenomenon termed affective priming. The present study investigated whether affective priming occurs when chords (consonant/dissonant, high/low pitch) are used as primes, and faces (happy or sad) are used as targets. Forty undergraduate students were submitted to 64 trials divided in eight experimental conditions with eight trials each. The eight experimental conditions were half congruent and half incongruent. The congruent combinations consisted of (a) consonant/high pitch—happy, (b) consonant/low pitch—happy, (c) dissonant/high pitch—sad, and (d) dissonant/low pitch—sad, while the incongruent combinations consisted of (e) dissonant/high pitch—happy, (f) dissonant/low pitch—happy, (g) consonant/high pitch—sad, and (h) consonant/low pitch—sad. Faster responses were found in the congruent condition when compared with the incongruent one, except when the high pitch—happy face combination was presented. These results partially replicate previous findings suggesting that the salience of the target stimuli can hinder the affective priming effect, which seems consistent with the happiness superiority effect literature.
Our daily experience indicates that the main reason people get involved in listening to music is the emotional content conveyed to the listeners (Corrigall & Schellenberg, 2015; Hopyan et al., 2011; Peretz et al., 1998; Sollberger et al., 2003). For this reason, an extensive body of knowledge regarding the music emotional meaning has been produced over the years (for reviews, see Eerola & Vuoskoski, 2013; Juslin & Sloboda, 2010).
The initial interest in evaluating the emotional content conveyed by musical pieces gradually changed into an interest to evaluate the acoustic elements (e.g., consonance, mode, pitch, timbre) that constitute these complex auditory stimuli (e.g., Dalla-Bella et al., 2001; Gangnon & Peretz, 2003; Hunter et al., 2010; Kastner & Crowder, 1990; Scherer, 1995—for a review, see Gabrielsson & Lindström, 2001; Juslin & Laukka, 2003). According to Bakker and Martin (2015), a better understanding of music’s emotional effects could be achieved by removing and isolating each of the acoustic elements from the context of a musical piece.
Indeed, interesting results about the emotional meaning of the musical stimuli have been provided presenting isolated acoustic elements in affective priming tasks (e.g., Bakker & Martin, 2015; Costa, 2012; Sollberger et al., 2003; Steinbeis & Koelsch, 2011; Zhou et al., 2019). Based on the affective priming paradigm (Donges et al., 2012; Fazio et al., 1986; Gohier et al., 2013; Klauer & Musch, 2003; Murphy & Zajonc, 1993), a typical affective priming task consists in evaluating target stimuli as positive (e.g., a printed word LOVE) or negative (e.g., a printed word HATE) by pressing designated keys on a computer keyboard. Each target is preceded by a prime stimulus, which can also be positive (e.g., PEACE) or negative (e.g., DEATH), thereby the relationship between the prime and the target stimuli can be classified as affectively congruent (e.g., PEACE-LOVE) or incongruent (e.g., PEACE-HATE). According to Schneider and Shiffrin (1977), the prime stimulus presentation automatically activates associated concepts in memory, so the responses to target stimuli related to that prime are facilitated. For this reason, response latencies are expected to be shorter in congruent compared with incongruent trials (Fazio, 2001; Herring et al., 2013).
As in tasks using words, musical stimuli presented as primes can also bias the latency of the emotional evaluation of target stimuli (e.g., Costa, 2012; Daltrozzo & Schön, 2009; Goerlich et al., 2012; Sollberger et al., 2003; Steinbeis & Koelsch, 2011; Tay & Ng, 2019; Zhou et al., 2014, 2019). Regarding the acoustic elements, Sollberger et al. (2003) conducted an experiment with non-musician in which consonant or dissonant chords were presented as primes for positive or negative target-words. The participants’ task was to classify the emotional valence of each target word. The authors found that positive words were classified faster when preceded by consonant chords in comparison to dissonant chords, and negative words were classified faster when preceded by dissonant chords in comparison to consonant chords. The shorter latencies were interpreted as reflecting congruent relations, whereas longer latencies were interpreted as reflecting incongruent relations between the consonant/dissonant chords and the positive/negative words.
Further research provided a more comprehensive understanding of the extent to which different acoustic elements and different stimulus-target modalities can bias the priming effect on affective priming tasks. For instance, Steinbeis and Koelsch (2011) carried out two experiments to study affective priming effects on musician and non-musician participants. Consonant or dissonant chords (Experiment 1) and major and minor chords (Experiment 2) were presented to prime the processing of positive or negative target-words respectively. In both experiments, musicians showed the expected affective priming effect for positive and negative target-words, while non-musicians showed affective priming effect only for positive target-words.
Stimulus-target modalities, in turn, were evaluated by Costa (2012; Experiment 2). Consonant and dissonant chords were used to prime pictures with emotional content (e.g., pictures of newborn babies, car accidents) for non-musician participants. In contrast to Sollberger et al. (2003), Costa reported no influence of the consonant or dissonant chords on participants’ latencies upon evaluating the emotional content of the pictures. Since pupillary response measures indicate a higher level of arousal elicited by emotional pictures compared with words (Bayer et al., 2011), Costa argued that the affective priming effect would have been overshadowed by the degree of arousal evoked by the target pictures.
The affective priming effect based on the chords’ pitch (i.e., high pitch vs low pitch) was also evaluated by Costa (2012; Experiment 2). Under this condition, it was reported that the pairs high pitch chord/positive picture and low pitch chord/negative picture were identified faster than the pairs low pitch chord/positive picture and high pitch chord/negative picture. Considering that previous studies have suggested that high frequencies are usually related to positive feelings while low frequencies are related to negative feelings (e.g., Gabrielsson & Lindström, 2001; Juslin & Laukka, 2003; Rocha & Boggio, 2013), Costa argued that the high pitch chords might have primed the positive pictures while the low pitch chords might have primed the negative pictures.
As words and pictures, human facial expressions have also been used as stimulus to evaluate the emotional content of acoustic elements (e.g., Bakker & Martin, 2015; Ignacio et al., 2019; Kamiyama et al., 2013; Zhou et al., 2019). Bakker and Martin (2015), for instance, used happy faces preceded by major chords and sad faces preceded by minor chords as congruent pairs, while happy faces preceded by minor chords and sad faces preceded by major chords were presented as incongruent pairs. In general, behavioral data obtained from non-musician participants showed that sad faces were evaluated faster than happy faces regardless of the prime stimulus, so no priming effect was described. Zhou et al. (2019), in turn, also carried out an experiment in which major or minor chords were presented as primes for happy or sad target-faces. Unlike Bakker and Martin, however, behavioral data for non-musician participants showed longer latencies for incongruent than congruent pairs.
In summary, although Sollberger et al. (2003) showed that consonant and dissonant chords presented as primes can bias the latency of the emotional evaluation of target-word stimuli, such results were not entirely replicated by Steinbeis and Koelsch (2011). Furthermore, Zhou et al. (2019) showed that major and minor chords presented as primes can bias the latency of the evaluation of happy and sad faces, although Bakker and Martin (2015) did not obtain such results in a highly similar experiment. Therefore, taken together, these findings seem to indicate that the modulatory effect of different elements presented as primes and different stimulus-target modalities need to be further investigated. In other words, more fine-grained examinations of prime-target relations remain important for the advancement of knowledge on music perception.
Interestingly, the use of human facial expressions could become a helpful condition for advancing the understanding of the prime-target relations in affective priming tasks. For instance, considering that facial expressions are processed more automatically than word-targets (Beall & Herbert, 2008), human face’s salience can be strong enough to hinder the priming effect in a similar manner to that described by Costa (2012) regarding arousal level of picture-targets. In addition, several studies demonstrated that happy faces are recognized faster than sad or angry faces (e.g., Bortoloti et al., 2019; Calvo & Beltrán, 2013; Calvo & Lundqvist, 2008; Palermo & Coltheart, 2004), a phenomenon termed happiness superiority effect (HSE—Becker et al., 2011; Craig et al., 2014; Lee & Kim, 2017). Calvo and colleagues (e.g., Calvo & Beltrán, 2013; Calvo et al., 2012; Calvo & Nummenmaa, 2008) have stated that the happy face advantage is probably due to its perceptual and categorical distinctiveness. More specifically, the smile provides to the happy expressions a salient facial feature that allows for faster categorization (Adolphs, 2002; Leppänen & Hietanen, 2007) when compared with other facial expressions.
Considering that Costa (2012) argued that the arousal level of target stimuli, which is higher for pictures than for words (Bayer et al., 2011), hinders the affective priming effect, the current study sought to evaluate whether a similar hindered effect would be described in affective priming tasks due to the HSE (Becker et al., 2011; Craig et al., 2014). For addressing this question, an affective priming experimental study was designed comprising eight trial types; four of them were termed “congruent” and the other four were termed “incongruent.” The congruent trials consisted of the following prime/target combinations: (1) consonant/high pitch—happy face, (2) consonant/low pitch—happy face, (3) dissonant/high pitch—sad face, and (4) dissonant/low pitch—sad face. The incongruent trials consisted of: (5) dissonant/high pitch—happy face, (6) dissonant/low pitch—happy face, (7) consonant/high pitch—sad face, and (8) consonant/low pitch—sad face. We predicted that, although overall priming effects would still be produced by the prime-target affective congruency, such effects might be smaller for happy faces because this type of stimuli is already quickly recognizable. In addition, considering that pitch (high vs low) seemed to be a powerful sound resource for enabling the affective priming effect (Costa, 2012), the present study also sought to evaluate whether this priming effect would be modulated by any interaction between HSE and pitch.
Methods
Participants
Forty undergraduate students, aged between 20 and 24 years old (M = 21.3; SD = 1.3), participated in the experiment. All participants were right-handed (McManus, 2019; Papadatou-Pastou et al., 2020; Scharoun & Bryden, 2014) and read and signed an informed consent form approved by the Human Research Ethics Committee (process number CAAE: 44508615.2.0000.5149).
Setting and equipment
The data were collected in a well-lit 4 m × 6 m room with low level of noise. Computers with 23-inch screens, keyboard, mouse, and headphones were used. High-fidelity, bilateral stereo headphones were used to reproduce all wave frequencies between 10 Hz and 23 kHz, with a 32 Ω impedance, a maximum output of 50 mW, and a sensitivity of 96 dB. PsychoPy software (Peirce, 2009) was used to present all the stimuli.
Visual and auditory stimuli
The consonant chords contained a D root note, A a perfect fifth above the root, and D an octave above the root, and the dissonant chords contained a D root note, A-flat a diminished fifth above the root, and D an octave above the root. Chord frequencies were controlled by using D2 (146.83 Hz) and D4 (587.33 Hz) root notes. Therefore, the low pitch chords were composed by D2–A2–D3 and D2–Ab2–D3, and the high pitch chords were composed by D4–A4–D5 and D4–Ab4–D5. Chords were composed and recorded using the Ableton® software (version 9.4.7—64 bits). Moreover, the notes that comprised each chord were executed using a Musical Instrument Digital Interface (MIDI). All chords were recorded within the same intensity, loudness, duration, and attack parameters.
The visual target stimuli consisted of 32 front facing happy faces and 32 front facing sad faces taken from the Karolinska Directed Emotional Faces (KDEF—Lundqvist et al., 1998) database.
Procedure
Before starting the session, participants were asked about their musical background. Playing an instrument or having received formal teaching in musical skills were considered exclusion criteria to proceed with the experiment.
Participants were asked to judge each face as happy or sad by pressing the K and D keys on the computer keyboard with their left or right index finger. Key assignment was counterbalanced across participants. Participants were asked to keep their index fingers resting on the assigned keys to maximize response time precision and were instructed to respond as fast as they could without sacrificing accuracy.
Each trial began with the presentation of a prime stimulus (i.e., a chord). After 200 ms of the beginning of the chord presentation, a target stimulus (i.e., a face) was presented; henceforth, prime and target stimuli were presented simultaneously until the end of the trial. Participants could respond as soon as the target stimulus appeared on the screen, and the trials ended immediately after they pressed the response keys. After each response, a cross was presented in the center of the computer screen during the inter-trial interval (ITI). The ITI duration varied randomly throughout the trials and could be any value between 1,000 and 5,000 ms. All task parameters were based on previous experiments carried out by Sollberger et al. (2003) and Costa (2012).
The experiment began with four practice trials to familiarize participants with the task. More specifically, they were exposed to a consonant chord—happy face trial, a consonant chord—sad face trial, a dissonant chord—happy face trial, and a dissonant chord—sad face trial. The presentation order of the trials and the chord pitches varied among participants. These practice trials were not included in the data analysis.
After responding to four practice trials, participants began the experimental task, which included 64 trials divided in eight experimental conditions with eight trials each. The eight experimental conditions were half congruent and half incongruent. The congruent combinations consisted of (a) consonant/high pitch—happy, (b) consonant/low pitch—happy, (c) dissonant/high pitch—sad, and (d) dissonant/low pitch—sad, while the incongruent combinations consisted of (e) dissonant/high pitch—happy, (f) dissonant/low pitch—happy, (g) consonant/high pitch—sad, and (h) consonant/low pitch—sad. The order of presentation of the chord/face combinations was randomized before the beginning of data collection, and then presented identically for all participants.
Results
The data analysis was conducted on the median response times for each of the eight experimental conditions mentioned above. To analyze the data, we conducted a 2 × 2 × 2 repeated measures analysis of variances (ANOVAs) with the factors of congruency (congruent, incongruent), face expression (happy, sad), and pitch (high, low). Because consonance/dissonance served to produce the congruency or incongruency factor in combination with face expressions, it could not be entered as an independent factor in the analysis. The Holm–Bonferroni post-hoc tests were conducted when necessary to elucidate significant effects and interactions. Based on parameters from previous experiments to constrain the inclusion of extreme values (e.g., Costa, 2012; Hermans et al., 1994; Sollberger et al., 2003), trials with response times longer than 1,500 ms (5% of data) and shorter than 300 ms (< 1% of data) were excluded from all analyses. In addition, trials with incorrect responses (3% of data) were also excluded. The response times data for the variables of interest are shown in Figure 1.

Mean Median Response Times for Each Face Expression and Pitch Combination for Congruent and Incongruent Trials. Standard Errors Are in Parentheses.
The ANOVA revealed main effects of congruency, F(1, 39) = 20.33, p < .001, ηp2 = 0.34, and a main effect of face expression, F(1, 39) = 9.57, p = .004, ηp2 = 0.20, but no main effect of pitch, F(1, 39) = 2.94, p = .095, ηp2 = 0.07. These main effects indicated that in general participants responded faster to congruent than to incongruent trials, and faster to happy than to sad faces (see Figure 1). The ANOVA also revealed interactions between congruency and face expression, F(2, 78) = 6.15, p = .018, ηp2 = 0.14, pitch and face expression, F(2, 78) = 58.37, p < .001, ηp2 = 0.60, and congruency and pitch, F(2, 78) = 8.26, p = .007, ηp2 = 0.17. More importantly, the three-way interaction among these factors was also significant, F(2, 78) = 4.90, p = .033, ηp2 = 0.11.
As can be seen in Figure 1, the main effects and interactions reported above indicate that further than the expected congruency effects, responses to happy faces were in general faster than responses to sad faces. Moreover, high pitch clearly facilitated the processing of happy faces relative to low pitch, exerting a particularly strong effect on happy faces in the incongruent trials, an effect strong enough to apparently offset the expected incongruency effects for this condition. These patterns were corroborated by the Holm–Bonferroni post hoc tests comparing response times for congruent and incongruent trials for each combination of face expression and pitch (see Figure 1). These tests showed that responses to congruent trials were significantly faster than to responses to incongruent trials for sad faces: high pitch, t = 2.94, p = .052; low pitch, t = 4.15, p = .001, and for happy faces when the pitch was low, t = 3.53, p = .009. However, responses were equivalently fast for happy faces in congruent and incongruent trials when the pitch was high, t = 1.51, p = 1.
Discussion
The present experiment evaluated whether the priming effect would be hindered by the HSE (Becker et al., 2011; Craig et al., 2014). The main effect observed in our tasks with facial expressions (i.e., faster responses to happy faces) seems to indicate that the HSE was not enough to completely hinder the affective priming effect considering that responses to congruent trials were faster than to incongruent trials regardless of the emotional expression of the faces. However, when combined, happy faces and high pitch chords seemed to overcome the priming effect expected for incongruent trials. These results partially replicate previous findings suggesting that the salience of the target stimuli can hinder the affective priming effect, which seems consistent with the HSE literature.
Affective priming tasks have been used to evaluate the emotional meaning of the musical stimuli over the years (e.g., Costa, 2012; Daltrozzo & Schön, 2009; Goerlich et al., 2012; Sollberger et al., 2003; Steinbeis & Koelsch, 2011; Tay & Ng, 2019; Zhou et al., 2014, 2019). In general terms, the cross-modal integration process between music and visual stimuli have been evaluated by means of neurophysiological and behavioral measures. Although neurophysiological measures indicate a consistent affective priming effect based on the prime-target congruence (e.g., Bakker & Martin, 2015; Baumgartner, Esslen, & Jäncke, 2006; Baumgartner, Lutz, et al., 2006; Jolj & Meurs, 2011; Kamiyama et al., 2013; Steinbeis & Koelsch, 2011; Zhou et al., 2019), behavioral measures seem to be modulated by other orthogonal aspects of the stimuli than their congruence (e.g., Bakker & Martin, 2015; Costa, 2012; Sollberger et al., 2003; Steinbeis & Koelsch, 2011; Zhou et al., 2019).
Since the target stimuli arousal may also hinder the affective priming effect (Costa, 2012), it seems important to consider how such a variable could have modulated the results in the present experiment. The KDEF (Lundqvist et al., 1998) is composed of numerous human faces, which are rated according to emotion and arousal. For instance, in that database, there are happy faces with high arousal levels, as well as happy faces with low arousal levels. And the same can be said for sad faces. The choice of the target stimuli to be used in the experimental procedure, however, considered only the emotion conveyed by them (i.e., happy and sad faces did not match to each other in their respective arousal levels), and this fact could have added an intervening variable. Over the 64 experimental trials, 32 different happy and 32 sad faces were randomly presented. The faces’ arousal levels thus varied, balancing a possible effect of this intervening variable.
Furthermore, Zhou et al. (2019) conducted an experiment (Experiment 2) in which consonant and dissonant chords were used as primes to human facial expressions. To control the arousal level, happy faces were used as positive valence targets and angry or scared faces were used as negative valence targets. Even in this case with controlled arousal level, individuals with normal hearing skills categorized happy faces faster than sad faces. Although these analyzes seem to indicate that HSE disrupts the priming effect regardless of the face’s level of arousal, this issue needs to be further evaluated in future research. For instance, an experimental protocol using low-arousal happy faces and high-arousal sad faces as target stimuli will contrast these two aspects for evaluating any sort of interaction or overlapping.
The relationship between pitch and emotion is well established on the literature—for instance, in a literature review, Juslin and Laukka (2003) describe a series of acoustical characteristics related to the basic emotions. According to them, happiness is frequently associated with high fundamental pitch, while sadness is frequently associated with low fundamental pitch. Our results, as well as those obtained by Costa (2012), indicate that the effect of pitch seems to be more strength than that obtained exclusively from chord consonance. Such strength was particularly evident when considered high pitch chords and happy faces, where the mean median response times were similar in the congruent and incongruent trials (see Figure 1). In previous experiments (Costa, 2012; Sollberger et al., 2003), as well as in the present study, the high-pitch chords (D4) were two octaves higher than the low-pitch chords (D2). Future research could emphasize this difference (i.e., three or four octaves) to evaluate whether a greater distance would produce a stronger effect.
Still related to the musical elements presented in the affective priming tasks, Costa (2012) argued that a better generalization of the results would be obtained using chords composed of other dissonant intervals, such as the major and minor second or major and minor seventh instead of the augmented fourth used in the most experiments so far. In addition to using different compositions of dissonant and consonant chords, future research could evaluate musical elements different from that usually present so far. For instance, a promising research line manipulates the expectations that listeners generate about how musical sequences will sound (Ignacio, 2016; Ignacio et al., 2019; Muller et al., 2011). More specifically, a pleasure experience is provided when pitch expectations are satisfied and disappointment when it is violated (Ignacio et al., 2019; Pearce & Wiggins, 2012). Ignacio et al. (2019), for instance, presented twenty-five-chord progressions, which were different from each other based on the degree that they fulfilled participants’ automatically formulated expectations of how each musical sequence should sound. The results suggest that short musical sequences influence individuals’ emotional processing.
A final orthogonal aspect of the musical stimuli that could be addressed in future research is related to their pre-experimental history. Zhou et al. (2014), for instance, used musical excerpts taken from iconic Western songs composed by Beethoven, Mussorgsky, and Brahms as prime stimuli and words as targets. According to the authors, the establishment of valence for musical stimuli could be much less standardized among the participants compared with the establishment of valence for words. Thus, a musical stimulus that should be considered sad based on its acoustic characteristics may have been related to positive valence due to a random association. Although possible, such a situation involving the establishment of the chord’s valence would be less probable. Previous studies have demonstrated that consonant chords are generally related to positive valence stimuli and dissonant chords, to negative valence stimuli (Costa, 2012; Sollberger et al., 2003); thereby, chords seem to be less susceptible to these random associations described by Zhou et al. Even in this case, future experiments could include an evaluative chord task, in which participants would be asked to rate them in a positive-negative scale. This evaluation would be useful to understand some possible idiosyncratic results.
In summary, our findings indicate that the faster categorization of happy faces when compared with other facial expressions known as HSE (e.g., Becker et al., 2011; Bortoloti et al., 2019; Craig et al., 2014; Lee & Kim, 2017; Leppänen & Hietanen, 2003) was not enough to hinder the affective priming effect in the congruent trials. However, the combination of happy faces and high pitch chords seemed to cancel the priming effect in the incongruent ones. In this sense, our findings provide additional evidence that behavioral measures seem to be modulated by other orthogonal aspects of the stimuli than just their congruence in priming tasks involving musical stimuli and such an issue should be addressed and considered in future experiments.
Footnotes
Acknowledgements
The authors would like to thank Dr. Deisy de Souza, chairperson of INCT/ECCE, for her leadership and for encouraging the publication of this manuscript.
Availability of data and materials
The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request.
Compliance with ethical standards
This manuscript has not been published or presented elsewhere, in part or entirely, and is not under consideration by another journal.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Institute of Science and Technology on Behavior, Cognition, and Teaching (INCT/ECCE). The INCT/ECCE is financially supported by the Fundação de Amparo à Pesquisa do Estado de São Paulo [FAPESP, Grant number 2014/50909-8]; the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior [CAPES, Grant number 88887.136407/2017-00]; and the Conselho Nacional de Desenvolvimento Científico e Tecnológico [CNPq, Grant number 465686/2014-1]. Raone Rodrigues had a doctoral scholarship from the Fundação de Amparo à Pesquisa do Estado de Minas Gerais [FAPEMIG, Covenant number 5.308/15, ID 13514]. Átila Cedro had a doctoral scholarship from the CAPES [Process number 88882.381505/2019-01].
Informed consent
Proper informed consent was obtained, and the study design was approved by the appropriate ethics review boards. All of the authors have approved the manuscript and agree with its submission to your esteemed journal.
Open practices statements
None of the data or materials for the experiments reported here is available, and none of the experiments was preregistered.
