Abstract
The ability to infer the emotional states of others is central to our everyday interactions. These inferences can be drawn from several different sources of information occurring simultaneously in the communication situation. Based on previous studies revealing that children pay more heed to situational context than to emotional prosody when inferring the emotional states of others, we decided to focus on this issue, broadening the investigation to find out whether the natural combination of emotional prosody and faces (that is, paralinguistic cues) can overcome the dominance of situational context (that is, extralinguistic cues), and if so, at what age? In Experiment 1, children aged 3–9 years played a computer game in which they had to judge the emotional state of a character, based on two sources of information (that is, extralinguistic and paralinguistic) that were either congruent or conflicting. In Condition 1, situational context was compared with emotional prosody; in Condition 2, situational context was compared with emotional prosody combined with emotional faces. In a complementary study (Experiment 2) the same 3-year-olds performed recognition tasks with the three cues presented in isolation. Results highlighted the fundamental role of both cues, as a) situational context dominated prosody in all age groups, but b) the combination of emotional facial expression and prosody overcame this dominance, especially among the youngest and oldest children. We discuss our findings in the light of previous research and theories of both language and emotional development.
The ability to interpret how others feel is crucial to the way we interact with people in the course of our everyday lives. Spoken language confronts listeners with a complex and multidimensional set of cues that occur simultaneously in communication situations. Over and above what other people say per se (i.e., “what we say”), extralinguistic (i.e., “why we say”) and paralinguistic (i.e., “how we say”) cues can also be regarded as sources of information that make a vital contribution to the final decoding of the message (e.g., Feldman, Philippot, & Custrini, 1991; Izard, 2001; Keltner & Haidt, 1999). Based on previous studies (Aguert, Laval, Le Bigot, & Bernicot, 2010; Hortaçsu & Ekinci, 1992) revealing that children give priority to situational context over emotional prosody when inferring the emotional states of others, we focused on this issue and broadened the investigation to find out whether the natural combination of emotional prosody and facial expressions (i.e., paralinguistic cues) can overcome the influence of situational context (i.e., extralinguistic cues), and if so, at what age.
Scientists have examined whether humans rely more heavily on certain cues than on others to infer the emotional states of others, depending on how old they are, and in doing so have attempted to isolate and explain the development of a broader aspect of social cognition (Crick & Dodge, 1994). However, an examination of the literature on the ability to infer the emotional states of others suggests that two research areas have often been explored in isolation, even when dealing with the same question. Language-related studies have addressed this issue by focusing on lexical content and situational context, and, to a lesser extent, on paralinguistic cues such as prosody. Research on the psychology of emotions, on the other hand, has frequently examined the ability to process emotional cues primarily using the presentation of single cues and focusing predominantly on one type of cue (i.e., emotional facial expressions and, to a lesser extent, emotional prosody). Our intention was to combine these two perspectives within a real-life communication situation in which situational context, the other person’s emotional facial expressions and the associated prosody, all potentially contributed to the inference of the emotional state.
The importance of situational context in children’s understanding of speech is widely acknowledged today. The term situational context refers to three types of parameters that are exclusively linked to social interactions: the participants’ location in space and time, their characteristics, and their activities (Brown & Fraser, 1979). In particular, research has shown that 3-year-old children are able to identify whether situations are evoked by emotions (Mood, Johnson, & Shantz, 1978; Ribordy, Camras, Stefani, & Spaccarelli, 1988), and that 5-year-olds are able to report situations that typically refer to a single emotion (Strayer, 1986). In addition, we know that situational context plays a fundamental role in indirect request comprehension by children as young as 1.5 years of age (Shatz, 1978) and continues to do so until at least the age of 6 or 7 years (Bernicot & Legros, 1987; Spekman & Roth, 1985). Nor is the influence of situational context limited to requests; as it has been shown to play a role in other kinds of language processing, too, such as understanding promises (Astington, 1988; Laval & Bernicot, 1999) and idiomatic expressions (Laval, 2003; Levorato & Cacciari, 1992), even in adolescence, albeit to a lesser degree (Cain, Towse, & Knight, 2009; Nippold & Martin, 1989). Finally, situational context seems to play an important role in how children understand the communicative intention of others, as revealed by the widely used experimental conflict situation, taking priority over paralinguistic cues such as emotional prosody.
Emotional prosody can be defined as specific variations in pitch that help to convey a specific emotion through vocal expression. These variations may concern fundamental frequency, mean pitch, range, or intensity (Bachorowski & Owren, 2008; Murray & Arnott, 1993; Scherer, 1986). Emotional prosody is a behavioral form of emotion expression, conveying a wealth of information that adults are able to decode accurately and quickly (e.g., Sauter, Eisner, Ekman, & Scott, 2010; Scherer, 2003; Scherer, Banse, Wallbott, 2001). From a developmental point of view, evidence suggests that the ability to use this cue to identify basic emotions emerges quite early in life, that is, by the second half of the first year (e.g., Flom & Bahrick, 2007; Grossmann, Striano, & Friederici, 2005; Walker-Andrews & Grolnick, 1983); and recent neuroscience findings suggest a very early sensitivity (Blasi et al., 2011; Cheng, Lee, Chen, Wang, & Decety, 2012). Nonetheless, researchers frequently refer to a U-shaped developmental curve to describe the fact that while prosody seems to be the cue that mainly orients human beings in social interactions in both infancy and adulthood, this is not the case across a broad swathe of childhood. This developmental trajectory results by the fact that although children are able to use prosody to identify discrete emotions as early as at the age of 4–5 years (Friend, 2000; Morton & Trehub, 2001; Quam & Swingley, 2012; Sauter, Panattoni, & Happé, 2012), if it is in conflict with another information source, they only start to use it as a basis for judgment from the age of 9–10 years onwards. Instead, young children prefer to rely on either lexical content (Friend, 2000; Friend & Bryant, 2000; Lawrence & Fernald, 1993; Morton & Trehub, 2001), or situational context (Aguert et al., 2010; Hortaçsu & Ekinci, 1992). One explanation for this is that, in the wake of the language spurt, paralinguistic cues come to be regarded as secondary. Moreover, the subordinate role of emotional prosody was highlighted in a recent study where children aged 5–13 years, and exposed to emotional prosody and a situational context devoid of emotional information (the former being a relevant source of emotional information, but not the latter), inferred the emotional state of others from the situational context, apparently ascribing meaning to it a posteriori (Aguert, Laval, Lacroix, Gil, & Le Bigot, 2013).
In everyday life communication situations, paralinguistic cues are not restricted to prosody, as there is growing evidence that emotional prosody is intrinsically linked to the corresponding emotional facial expression. Research has shown that face and voice are embedded, and that this association has a powerful impact on interpersonal interactions because it is “prone to activate conceptual or ‘prototypic’ knowledge pertaining to discrete emotions” (Pell, 2005, p. 194) (e.g., De Gelder & Bertelson, 2003; De Gelder & Vroomen, 2000; Rigoulot & Pell, 2012). It is therefore a logical next step to investigate their combined influence on the ability to infer the emotional states of others. Emotional facial expressions are considered to be a key cue for human adaptation, as they convey information about both the emotional state of the expresser and his or her environment (Fridlund, 1997; Keltner & Ekman, 1999). Fortunately, emotional faces are efficiently processed by adults (e.g., Adolphs, 2002; Ekman & Cordaro, 2011; Tracy & Robins, 2008). Moreover, infants and young children seem to be particularly sensitive to emotional faces (e.g., Bornstein & Arteberry. 2003; Hoehl & Striano, 2010; Schwartz, Izard, & Ansul, 1985). However, the ability to process emotional facial expressions, as opposed to prosody, is thought to follow a developmental curve up to about 10 years of age. Moreover, and despite inconsistent results regarding the age at which specific emotions are recognized, it is generally agreed that children as young as 3 years of age are very good at recognizing basic emotions from faces (e.g., Camras & Allison, 1985; Gao & Maurer, 2009; Gosselin & Pelissier, 1996; Gross & Ballif, 1991; Herba & Phillips, 2004; Hoffner & Badzinski, 1989; Philippot & Feldman, 1990). Finally, the handful studies that have compared emotional faces with situational context suggest that 3- and 5-year-olds focus on facial emotions rather than on situational context, but that this reliance decreases with age, with 8–9-year-olds preferring to rely on the contextual information available in the situation (Gnepp, 1983; Hoffner & Badzinski, 1989; Reichenbach & Masters, 1983).
Overall, and despite previous research in the areas of emotion and language, the way in which paralinguistic (i.e., emotional prosody and faces) and extralinguistic (i.e., situational context) cues contribute to the comprehension of the emotional states of others across development is far from clear. To our knowledge, no study has ever compared and contrasted these potential sources of information. Moreover, contrasting paralinguistic (the “how” we say something) with extralinguistic (the “why” we say something) cues echoes a recent model of the different sources of information used to infer the emotional states of others or to use the term employed by the authors – to mentalize (Achim, Guitton, Jackson, Boutin, & Monetta, 2013). In this model, the “why” corresponds to immediate perceptual information about the context, and the “how” to immediate perceptual information about the agent (i.e., the person whose mental state has to be inferred). The aim of the first experiment in the present study was thus to generate new results in this research area by using a child-friendly computer game to manipulate the emotional information conveyed by each kind of cue in a same environment, in order to determine whether children use one type more than the other. Based on previous studies (Aguert et al., 2010; Laval, Aguert, & Gil, 2012), this game featured two protagonists named Pilou and Edouard. Pilou spoke to Edouard in meaningless syllables (Friend & Bryant, 2000), and the participants could make use of conflicting cues to identify Pilou’s emotional state (either sad or happy). In the first condition, prosody conflicted with situational context; in the second condition, prosody and the corresponding emotional facial expression contrasted with the situational context. The aim of the first condition was to replicate previous findings indicating that, unlike older children and adults, young children primarily rely on situational context (Aguert et al., 2010) when it is only in conflict with emotional prosody. The second condition, in which facial expression was congruent with prosody, was designed to examine children’s ability to use countercontextual information, namely paralinguistic information characterized by the natural association of prosody and emotional facial expression. By comparing the same children across the two conditions, we were able to identify the relative advantage conferred by each paralinguistic cue. Four groups of children aged 3, 5, 7 and 9 years took part in our study. Our choice of age range reflected the need to examine an extended period of development that corresponded to the ages investigated in previous studies (i.e., 5–9 years). It also allowed us to examine an age that is not well documented (i.e., 3 years) even though it represents an intermediate period between the paralinguistic and linguistic spurts (Friend, 2001, 2003). As the above-reported literature suggests that paralinguistic cues can have a predominant influence on very young children (before they master language), just as they can in older ones, we hypothesized that 1) situational context is less influential for younger and older children, and 2) especially when it is contrasted with a combination of paralinguistic cues (i.e., emotional prosody plus face).
Experiment 1
Method
Participants
A total of 96 children were included in the final sample: 22 children aged 3 years (mean age = 3.77, SD = 0.21); 25 children aged 5 years (mean age = 5.77, SD = 0.30); 24 children aged 7 years (mean age = 7.53, SD = 0.32); and 25 children aged 9 years (mean age = 9.78, SD = 0.27). As we were interested in which cues children rely on when there are different types of conflicting cues, failure on stories featuring congruent cues was used as grounds for excluding nine children (including seven 3-year-olds) who had been included in the initial sample. The children, who were all attending ordinary French mainstream schools in Poitiers, France, had no hearing or visual problems. They all came from upper middle- and upper-class homes, and none of them had skipped or repeated a school grade.
Material
The children were tested individually in a quiet room in their school. They sat in front of a PC that presented stories consisting of visual and auditory stimuli and recorded their responses. Each story featured two characters (Pilou the rabbit and Edouard the duck) in a general communication situation. Either two or three informative cues were presented, each of which conveyed either sadness or happiness. In these stories, Pilou and Edouard found themselves in a situation (situational context) in which Pilou talked to Edouard using emotional speech consisting of meaningless syllables (emotional prosody), sometimes simultaneously expressing a facial emotion (emotional facial expression). For example, in Condition 1, Pilou’s face was not visible, whereas in Condition 2, Pilou displayed an expressive face (to indicate sadness, the eyebrows were raised, the eyes were slightly droopy and Pilou cried; to indicate happiness, Pilou wore a large smile characteristic of an intense zygomatic muscle contraction). There were 12 situational contexts and 12 pseudo-utterances spoken with emotional prosody. Half the cues from each type were judged 1 to convey hapiness (e.g., Pilou and Edouard make a snowman), and half were judged to convey sadness (e.g., Pilou is sick, he has a stomach ache). In addition, two situational contexts (e.g., Pilou and Edouard are in the street) and prosodies that were judged to be neutral were used in a familiarization phase. A pink circle corresponding to happiness and a black circle corresponding to sadness were displayed on the touch screen to allow the children to make their responses. This methodology was chosen because using faces as symbols might have resulted in confusion with the emotional facial expressions used as a factor in Condition 2 (see Hoffner & Bazinski, 1989; Perron & Gosselin, 2009). Finally, printed materials were used to introduce the characters (wearing neutral facial expressions) in a general manner, and to explain the various stages of the procedure to the children.
Procedure
Each child underwent the two conditions of the experimental design individually at school, with Condition 1 (prosody and context) first, followed by Condition 2 (prosody + facial expression and context). To ensure that they remained focused on the task, particularly in the case of the youngest participants, each child was given a break of at least 3 hours between the two task conditions. Both conditions consisted of the stages described below. During all the phases, the experimenter sat next to the child and tried to remain as neutral as possible, both in face and in voice.
Presentation phase
In the first phase, which took place away from the computer interface, the children were introduced to the two characters, the aim of the game, and the rating method, using the visual printed materials. The children were told that they would have to decide whether Pilou was sad or happy. They were then shown the colored circles, and the experimenter explained to them that they had to point to the pink circle when they thought that Pilou was happy, and the black one when they thought that Pilou was sad. This instruction phase was followed by training trials in which the experimenter made sure that the children had understood the link between the colored circles and the represented emotion. To do this, he presented Pilou, said alternately that he was sad or happy, and asked the children to point to the appropriate circle to indicate how Pilou felt. When a child had correctly completed six consecutive trials, the training phase was terminated and immediately followed by the computerized phase.
Computerized phase
The children sat in front of the computer and began with a familiarization phase in which two stories (featuring neutral cues) were presented in order to familiarize them with the procedure. Each story followed the same three stages as in the test phase (see Figure 1): 1) the first picture on the screen introduced the characters in a particular situational context described in a voiceover; 2) the second picture showed the two characters closer together, with Pilou speaking to Edouard using a particular prosody; 3) all the cues then disappeared and the colored response circles appeared. The child then had to point to the circle corresponding to his or her comprehension of Pilou’s emotional state (“Is Pilou sad?” or “Is Pilou happy?”). The circles were randomly displayed on the right or left of the screen and varied between participants and between trials.

Screen shots of a story just before the participant responds. Top panel: situational context vs. emotional prosody (Condition 1); bottom panel: situational context vs. emotional prosody + emotional facial expression (Condition 2). Yellow boxes have been added to indicate the audio component of the story.
The test phase that followed featured twelve stories. In half of these, the cues were congruent (i.e., Condition 1: the situational context and the emotional prosody used by Pilou were both sad or both happy; Condition 2: the situational context, the emotional prosody and Pilou’s emotional facial expression were all sad or all happy). In the other half, the situational context conveyed emotional information that contradicted either the emotional prosody (Condition 1) or both the emotional prosody and the facial expression (Condition 2). Consequently, in each condition, the six stories in which all the cues were congruent allowed us to check whether the children were able to perform the task as intended and attended to it. The other six stories, in which conflicting cues were used, were the trials that interested us in the present study. The prosodic utterances, situational contexts and emotional faces, as well as the different associations between the cues, and the story presentation order, were all randomized across participants.
Results
Analyses of variance (ANOVAs) were run with age (3, 5, 7, and 9 years) as a between-participants factor and condition (Condition 1: situational context vs. emotional prosody; Condition 2: situational context vs. emotional prosody + face) as a within-participants factor. Moreover, in order to undertake a fine-tuned analysis of the developmental trajectory of these data, we conducted trend analyses (see Howell, 2011, pp. 402–403). A linear trend generally characterizes a linear relationship (increase or decrease) between the modalities of an independent variable, ordered along a continuum, and a dependent variable. A quadratic trend reflects a U-shaped or inverted U-shaped relationship (e.g., y = x2 or −x2) between the modalities of an independent variable and a dependent one. These two trends may overlap, in which case, the pattern generally results from an increase or decrease in the dependent variable, followed by a relative stabilization (or vice versa). The cubic trend was not calculated because it could not be interpreted in the context of these analyses.
Stories with congruent cues
We began by examining the children’s performance on stories in which the different cues were congruent (i.e., they all conveyed the emotion of sadness or happiness). Owing to this congruence, the dependent variable here corresponded to the number of correct responses, and allowed us to test whether the children were able to perform the experiment as intended. The ANOVA revealed a significant main effect of age, F(3, 92) = 9.323, p < .001, η 2 p = .494, resulting from the age-related increase in the ability to perform the task whatever the cues. However, the analysis also revealed a main effect of condition, F(1, 92) = 8.998, p = .003, η 2 p = .089, as well as an Age × Condition interaction, F(3, 92) = 5.291, p = .002, η 2 p = .147. As illustrated in Figure 2 (left panel) the latter effects indicated that the increase with age was greater for Condition 1 than for Condition 2. The trend analysis confirmed this effect, with the increase in correct responses following both a linear and a quadratic trend in Condition 1, respectively F(1, 92) = 88.28, p < .001 (ES = .107, IC95 = .789 1.212) and, F(1, 92) = 16.07, p < .001 (ES = .106, IC95 = −6.333 −.213), but only a linear one in Condition 2, F(1, 92) = 14.04, p < .001 (ES = .119, IC95 = .210 .683). The number of correct responses was much lower in Condition 1 (situational context and prosody) (M = 4.55, SD = .74) than in Condition 2 (emotional face added) (M = 5.32, SD = .83), for the 3-year-old children, whereas this difference gradually disappeared for the older children: Condition 1 (5 years: M = 5.56, SD = .58; 7 years: M = 5.79, SD = .41; 9 years: M = 5.96, SD = .20); Condition 2 (5 years: M = 5.72, SD = .61; 7 years: M = 5.79, SD = .51; 9 years: M = 5.96, SD = .20). Despite differences in performance as a function of both condition and age, results showed that all these children were eventually able to perform the task.

Mean numbers of situational context responses (and standard deviation), for each age group and for each condition: Condition 1, situational context vs. emotional prosody; Condition 2, situational context vs. emotional prosody + face. Left panel: stories in which all the cues were congruent. Right panel: stories with conflicting cues.
Stories with conflicting cues
We then went on to examine the trials that were of interest to our research, namely, the stories in which the cues were discrepant. To infer Pilou’s emotional state, the children had to base their judgments on either situational context or emotional prosody in Condition 1, and on either situational context or both emotion prosody and emotional facial expression in Condition 2. The dependent variable was the number of responses that corresponded to the extralinguistic cue (i.e., situational context) (see Figure 2, right panel). The ANOVA revealed a main effect of condition, F(1, 92) = 88.70, p < .001, η 2 p = .495. The age effect tended toward significance, F(3, 92) = 2.57, p = .059, η 2 p = .077, and there was no interaction between these two factors, F(3, 92) = 1.52, p = .21. These results indicated that the number of responses based on situational context was higher in Condition 1 than in Condition 2 (3 years: M = 4.14, SD = 1.08 vs. M = 1.82, SD = 1.27; 5 years: M = 4.28, SD = 1.49 vs. M = 2.72, SD = 2.42; 7 years: M = 4.79, SD = 2.08 vs. M = 2.50, SD = 2.58; 9 years: M = 4.04, SD = 2.17 vs. M = 1.04, SD = 1.95), and that this number rose and then fell as a function of age. However, the trend analysis showed that only the data obtained in Condition 2 followed a quadratic trend, F(1, 92) = 7.289, p <.01 (ES = .437, IC95 = −2.050 −.312), with the youngest and oldest children taking less account of the situational context and paying more attention to paralinguistic cues than the 5- and 7-year-old children. Compared with Condition 1, the findings obtained in Condition 2 therefore showed that the children inferred more the emotional state of the other person from the combination of emotional facial expression and prosody. This was true for all the age groups, but particularly so for the 3- and 9-year-old children.
Discussion
In two experimental conditions, children were presented with different discrepant social cues embodied in a communication situation in order to identify the kind of cue on which they based their judgments when they were required to infer the emotional states of others. In Condition 1 in particular, analyses showed that when the emotional prosody conflicted with the situational context, children of all age groups based their judgments on the situational context. In other words, irrespective of the emotional prosody expressed by the character Pilou, the children stated that Pilou was sad when he was in a sad situational context, and happy when he was in a happy situational context. In the case of the 3–7-year-old children, this result was fully consistent with previous research revealing that situational context takes precedence over emotional prosody when children are required to infer the emotional states of others. This effect has been referred to as contextual bias (Aguert et al., 2010; Hortaçsu & Ekinci, 1992). However, in our study, the 9-year-olds exhibited the same pattern of responses as the other children, despite the fact that previous studies have suggested that this age marks a developmental turning point.
Only the data obtained in Condition 2, in which the situational context contradicted the paralinguistic cues (i.e., prosody plus face), followed a quadratic trend, with the 3- and 9-year-old children exhibiting the same pattern of results and the intermediate age groups exhibiting a different pattern. Unlike Condition 1, in Condition 2, both the youngest and oldest children primarily based their judgments on the emotional prosody in combination with the emotional face, rather than on the situational context. When Conditions 1 and 2 were taken together, our findings suggested that situational context plays a greater role in all children’s inferential decision making than prosody, but that adding facial expression to prosody shifts this balances, especially for the youngest and oldest children. The pattern of results across the different ages was particularly interesting, as it showed that the 3-year-old children behaved similarly to the older ones. Moreover, it is interesting to note that in the trials featuring consistent cues, the youngest children performed significantly better in Condition 2, when facial expressions were included, than they did in Condition 1. This was not the case for the other children. This suggests that these facial expressions are particularly meaningful for very young children and enable them to understand the communication situation.
Because most published studies have focused on development from 4–5 years upwards, with research into 3-year-olds being less frequent, we thought it would be worthwhile, before discussing our study further, to examine the ability of our youngest participants to understand the emotions evoked by each of our cues presented in isolation. Consequently, the same 3-year-olds performed recognition tasks in which we tested their ability to understand the emotion evoked by either situational context, emotional prosody, or emotional facial expression. The findings from this complementary investigation would allow us to discuss their ability to process the cues used in Experiment 1, especially prosody. Although the youngest children did not rely on prosody when it conflicted with the situational context, we wanted to find out whether they were capable of processing it when it was presented in isolation.
Experiment 2
Method
Participants
The same 3-year-old children who took part in Experiment 1 participated in this second experiment within 3 days of Experiment 1.
Material
To ensure that this complementary study yielded direct keys to understanding the results of Experiment 1, we used the same stimuli as in Experiment 1, the only difference being that the different types of cue were presented separately. For the situational context, each picture showed the characters looking away from the participant (i.e., as in Condition 1 of Experiment 1, the characters’ faces could not be seen). For emotional prosody, each utterance was presented alone, and for emotional faces, the pictures focused on Pilou’s face.
Two sets of 12 triptychs were constructed (see Figure 3), one for the situational contexts and one for the emotional facial expressions. This material corresponded to that used in classic studies on emotion recognition in children (Camras & Allison, 1985; Gil & Droit-Volet, 2011; Tracy, Robins, & Lagattuta, 2005). For the situational contexts, each triptych featured a sad or happy situational context (pictures used in Experiment 1) or a neutral one. For the emotional facial expressions, each triptych featured a sad Pilou face, a happy one, or a picture in which there was no indication of Pilou’s facial expression (i.e., Pilou had his back turned). For the training phase, participants were shown triptychs with nonrelevant stimuli (a horse, a cow, or a dog).

Examples of triptychs consisting of Pilou faces or situational contexts (stimuli that evoked happiness, sadness, or no emotion).
To convey emotional prosody, all the prosodic pseudo-utterances (six happy and six sad) used in Experiment 1 were presented one by one, together with six neutral utterances, as a triptych-based procedure was not applicable for these kinds of audio stimuli. A PC controlled the stimulus presentation using E-Prime 1.2 software (Psychology Software Tools, Pittsburgh, PA). The same color circles as in Experiment 1 (i.e., a pink circle for happiness and a black circle for sadness) were used for the responses, together with a new white circle for a “Don’t know” response.
Procedure
During the training phase for both the situational context recognition task and the emotional facial expression recognition task, the animal triptychs were shown four times, with the position order of presentation changing between trials. The children had to point to the picture of an animal (horse, cow or dog) if they could see it. To make sure that the children had understood the instructions, they were asked in the fourth training trial to point to the picture of another animal; a duck. In the subsequent testing phase, the children were presented with the triptychs of situational contexts, or the triptychs of Pilou’s face. For each type of cue, they were told to point to one picture in which Pilou was sad, happy, or else neutral. There were 12 trials (i.e., four trials for each of the two emotions and the neutral condition). As in Tracy et al.’s study (2005), each evoked emotion was located in each position of the triptych four times, and each position corresponded to the correct answer four times.
In the emotional prosody recognition task, the children were first presented with three emotional prosodies (i.e., one for each evoked emotion, plus neutral), and were familiarized with the color circles with which they had to give their responses. The experimenter explained that he had listened to Pilou speaking again, but that he did not understand the language. The children then had to decide whether Pilou was sad or happy, or whether they did not know (i.e., neutral prosody), by pointing to the corresponding color circle.
Results
Figure 4 shows the proportion of correct responses for each emotional cue. Because performance on all neutral cues was poor, and because they did not constitute an emotional condition that was of interest to our research, we focused our analyses on those cues that evoked either sadness or happiness. The neutral label is in itself a source of debate with regard to how far it is possible to express neutrality in any behavior or event. Moreover, this debate has its origins in part in children’s poor performances in this modality (e.g., Gross & Ballif, 1991; Hortaçsu & Ekinci, 1992).

Mean proportion (and standard deviation) of emotions correctly recognized by 3-year-olds for situational context, emotional facial expression and prosody for the cues evoking happiness and sadness.
We analyzed the proportion of correct responses for each kind of stimulus. As Figure 4 suggests, because the 3-year-old children were very good at recognizing emotions based on emotional faces, which involved very little variability, we ran nonparametric analyses in two steps: 1) the effect of cue was examined for each emotion (i.e., happiness and sadness), using Friedman’s ANOVA and paired-sample Wilcoxon tests; and then 2) a series of paired-sample Wilcoxon tests was run to compare emotions for each type of cue.
Concerning happiness, the analysis revealed an effect of type of cue, Chi 2(2) = 29.10, p < .001. Paired-sample comparisons showed that the proportion of correct responses for emotional facial expressions was higher than for the other two types of cue (p < .001). Moreover, this proportion did not differ significantly between situational context and prosody (p = .115). Concerning sadness, the analysis revealed a similar pattern of results, with an effect of type of cue (Chi 2(2) = 18.03, p < .001) showing that emotional facial expressions elicited significantly more correct responses than the other two types of cue (p < .001). The proportions of correct responses for situational context and for prosody did not differ (p = .418). Finally, the analyses contrasting the emotions for each type of cue did not reveal any differences between the proportions of correct responses for happiness and sadness for either situational context (p = .712), emotional facial expression (p = .157), or emotional prosody (p = .418).
General discussion
The aim of the present study was to determine, from a developmental perspective, when children preferentially rely on extralinguistic (i.e., situational context) or paralinguistic (i.e., emotional prosody and emotional faces) cues to infer the emotional states of others in a communication situation. To investigate this question, children performed an experiment in which they had to judge the emotional state of a particular character (Pilou) from incongruent information sources. As indicated above, some of our findings were consistent with previous research, but others were not. Our results raise new and fundamental questions about how children acquire the ability to recognize other people’s states and, more importantly, to use the different sources of information needed to understand everyday interactions. These different questions are discussed below.
It is thought that emotional prosody is accurately recognized relatively early in life, when presented in isolation (e.g., Grossmann et al., 2005; Morton & Trehub, 2001; Quam & Swingley, 2012; Sauter et al., 2012). Nevertheless, research on emotional prosody is under-represented in the literature, compared with research on other types of emotional cues, and much less is therefore known about the processes involved and how they change in the course of development. Nevertheless, one consistent result reported in the literature is that when prosody conflicts with other cues, children only start to take it into account at a relatively late age. Children tend to pay little attention to prosody compared with lexical content (Friend, 2000; Moore, Harris, & Patriquin, 1993; Morton & Trehub, 2001; Waxer & Morton, 2011), situational context (Aguert et al., 2010, 2012; Hortaçsu & Ekinci, 1992), emotional faces (Gnepp, 1983; Nelson & Russell, 2011; Reichenbach & Masters, 1983), or body language (Quam & Swingley, 2012). Taken together, previous studies make it clear that even though infants make extensive use of emotional prosody, other sources of information are prioritized during childhood. The results of our study are consistent with the literature in this respect, and extend these findings to a less-frequently examined age group (i.e., 3-year-olds). When processed in isolation, emotional prosody was recognized as the same level as situational context from the age of 3 years. However, the children prioritized situational context over prosody. Nevertheless, studies also suggest that from around 9–10 years, children, like adults, lend considerable importance to prosodic elements in communication situations (Aguert et al., 2010; Hortaçsu & Ekinci, 1992). In Aguert et al.’s study, the 9-year-old children exhibited an intermediate pattern, in which they tended to prefer emotional prosody to context more than the younger children did, but still not as extensively as the adults. Our results did not follow this developmental curve, however, as the 9-year-old children performed similarly to the other children. However, this apparent contradiction between previous research and our own findings may not actually have any basis in reality. It is quite conceivable that 9 years marks a developmental turning point involving changes in the use of different social cues. It takes time for these changes to become established, and they may be more or less visible relative to other modes of behavior. In sum, previous findings, like our own results, suggest that whereas prosody recognition abilities are present early in life, their use significantly improves with age (e.g., Cohen, Prather, Town, & Hynd, 1990; Matsumoto & Kishimoto, 1983; Nelson & Russell, 2011).
To some extent, our results support the idea that situational context is a very important source of information that is used by individuals to understand other people’s thoughts, beliefs and emotional states. When children are able to base their judgments on either situational context or emotional prosody, they predominantly make use of the former between the ages of 3 and 9 years. Moreover, at age 5–7 years, children are still influenced by the information provided by the situational context, even when this conflicts with both prosody and emotional facial expression. In line with these findings, situational context has been widely assumed to be the main source of information in many different research areas. In the field of language theory, several authors have suggested that it influences language comprehension and use (e.g., Cain, Oakhill, & Lemmon, 2005; Laval, de Weck, Chaminaud, & Lacroix, 2009; Shatz, 1978), whereas in the field of emotion theory, some authors have claimed that situational context is closely bound up with the concept of emotion. For example, Frijda (1969, p. 169) states that “situational reference is the ‘meaning’ of most expressive behavior; or rather, expressive behavior is usually categorized under this system.” In the same vein, Paul Ekman (1972) elaborated the concept of display rules, which refers to the fact that, with development, people become aware that the emotional expressions exhibited by others can be controlled according to the situational context. In other words, emotional expressions do not systematically reflect people’s actual emotional state, and may also depend on other situational factors (e.g., Denham, McKinley, Couchoud, & Holt, 1990; Jones, Abbey, & Cumberland, 1998; Zimmermann & Stansbury, 2003).
However, according to a substantial number of emotion theories, like those developed by the well-known authors cited above, there is another source of information that is more meaningful to individuals than situational context in everyday life interactions, namely emotional facial expressions, which are thought to be critical for adaptation during interactions (Ekman, 1977; Izard, 1971). Our results support this point of view, for when emotional faces are present, they convey enough meaning to overcome the predominance of situational context. In contrast to the first condition of Experiment 1 (i.e., situational context vs. prosody), when emotional faces were added to prosody in Condition 2, the 5- and 7-year-olds shifted their judgments in favour to these cues. Moreover, not only did both the 3- and 9-year-olds stop basing their judgments of the main character’s emotional state on the situational context, but they also responded to a significant extent on the basis of the emotion expressed in his voice and face. Finally, Experiment 2 showed that out of the three types of emotion-evoking cues we used, emotional facial expressions were most readily recognized by the 3-year-old children. This overall result therefore provides evidence of the special status of emotional facial expressions when it comes to understanding the emotional states of others. However, the quadratic trend of the data observed when all four age groups were taken into consideration raises certain questions, such as whether the youngest and oldest children performed similarly (i.e., by prioritizing paralinguistic cues via emotional facial information) for the same reasons? Although somewhat speculative at this point, the pattern of results we observed suggests that emotional faces represent the most influential cue for very young children because of their phylogenetically-based accessibility. There is ample evidence that infants and children use this emotional cue for deciphering communication with other human beings before they acquire mastery of other communication tools (i.e., language). Understanding situational information may then take priority, with children specifically acquiring this type of ability in parallel with that of both language and display rules. Finally, older children, who have learned to understand the significance of situational context and possess established language skills, may return to the most significant and obvious cue in interpersonal communications (i.e., emotional facial expressions). We can also draw a parallel between this question of salience and what Morton called the strength of representation to explain the lexical bias: the idea that some representations are stronger than others at a given age, leading children to choose the source of information that corresponds to the most firmly established knowledge. In some instances, they may even be unable to control this dominant representation (Morton & Munakata, 2002). In the same vein, and in accordance with Achim et al.’s model (2013), sources of information can be divided into immediate perceptual information and information stored in memory, the latter corresponding to knowledge which—as a representation can be more or less firmly established—may involve the high-order cognitive processes that are mastered in the course of development. Within this framework, and in the light of our findings, we can assume that very young children do not give priority to the meaning of a situational context in terms of the emotions it evokes because they lack a strong representation to set against the behavioural expression of emotion. This representation becomes more firmly established between the ages of 5 and 7 years, although the priority given to this aspect cannot initially be controlled; and older children have an established representation that they can control if necessary, in order to make way for other indices. Clearly, this possibility could be tested in future studies with a wide range of age groups including both children and adults. Moreover, one limitation of our study is that we asked the same children to perform all the different tasks. This made the experiment cognitively costly for the children and meant that we could not examine all the combinations of cues and disentangle the respective influence of prosody and face. Based on the natural combination of face and prosody (e.g., De Gelder & Bertelson, 2003; De Gelder & Vroomen, 2000; Rigoulot & Pell, 2012), our objective was as a first step—to compare an extralinguistic cue that is known to predominate with paralinguistic cues. Further work is therefore necessary to extend our examination of the various combinations of cues.
In sum, the present study is the first to have explored how children infer the emotional states of others from the natural combination of face and prosody versus the situational context, across a broad developmental period. Overall, our results lend support to the idea that children rely heavily on two cues (i.e., situational context and emotional facial expression) in communication situations. However, further work will be required in this area in order to identify the exact processes involved in using these cues and way these processes develop.
Footnotes
Acknowledgments
The authors would like to thank Coline Burgeot, Annabelle Perrault, Léa Porte and Marine Proust for their assistance during data collection.
*This article accepted during Marcel van Aken’s term as Editor-in-Chief.
Funding
This work was supported by a grant from the French Agence Nationale de la Recherche (ANR) for research into emotion, cognition and behavior (EMCO, 2011).
