Abstract
An ongoing debate in affective science concerns whether certain discrete, “basic” emotions have evolutionarily based signals (facial expressions) that are easily, universally, and (perhaps) innately identified. Studies with preverbal infants (younger than 24 months) have the potential to shed light on this debate. This review summarizes what is known about preverbal infants’ understanding of discrete emotional facial expressions. Overall, while many studies suggest that preverbal infants differentiate positive and negative facial expressions, few studies have tested whether infants understand discrete emotions (e.g., anger vs. disgust). Moreover, results vary greatly based on methodological factors. This review also (a) discusses how language may influence the development of emotion understanding, and (b) proposes a new developmental hypothesis for infants’ discrete emotion understanding.
Over the past century, there has been much disagreement about the nature of human emotion. One ongoing debate concerns whether certain discrete, “basic” emotions have evolutionarily based signals (i.e., facial expressions) that are easily and universally identified (e.g., Ekman, 1994; Izard, 1994) or whether emotion experience, expression, and perception are highly variable processes, potentially influenced by language (e.g., Barrett, 2017; Russell, 1994). Empirical work exploring this debate has primarily focused on adults or verbal children. Largely missing are studies with preverbal infants (younger than 24 months). Research with preverbal infants, who have comparatively little experience with language and others’ emotions, could potentially help elucidate whether emotion understanding is an (a) early emerging or innate ability, based in our shared evolutionary history, or (b) an ability that develops slowly over time, shaped by language and social experience.
Currently, there is little communication between developmental and affective science. Although affective scientists cite select infancy studies as evidence for their theories (e.g., Barrett, 2017; Lindquist & Gendron, 2013), most infancy studies were not designed to test or inform these issues. In an effort to move the debate forward, this review (a) summarizes the literature on preverbal infants’ understanding of emotional facial expressions, 1 (b) discusses how language may influence the development of emotion understanding, and (c) proposes a new developmental hypothesis for infants’ discrete emotion understanding.
Emotion Theories and Emotion Concept Development
Emotion theories are often categorized as either “classical” or “constructionist” (Barrett, 2016). In short, classical theories argue that certain “basic” emotions—typically happiness, sadness, anger, fear, surprise, and disgust—have corresponding facial expressions that are evolutionarily based, universal, and easily recognized in others (e.g., Ekman, 1994). In contrast, constructionist theories reject the idea of universal, basic emotion signals (e.g., Russell, 1994). Instead, these theories argue that emotions are experienced and expressed in highly variable ways. One particular constructionist theory, the theory of constructed emotion (Barrett, 2017), further argues that emotion words (e.g., “happy”) impose a categorical structure on these variable facial expressions. In this theory, language is fundamental to emotion perception and understanding. While an in-depth discussion of these theories is beyond the scope of this review, detailed descriptions can be found in many excellent books (e.g., Barrett, 2017; Fernández-Dols & Russell, 2017).
These theories have made predictions about the nature and development of infants’ emotion concepts, or “conceptual emotion categories.” In general, a category is a collection of objects, actions, or events that are considered to be equivalent. Some constructionist theories have argued that preverbal infants have perceptual emotion categories, based on salient facial features (Barrett, Lindquist, & Gendron, 2007). For example, prototypical happy expressions have upturned smiles, whereas prototypical fearful expressions have wide eyes (Ekman & Friesen, 1975). In this way, young infants need not attribute any affective meaning to these stimulus configurations. If preverbal infants do have conceptual emotion categories, constructionist theories argue that these are based on broad dimensions of valence (positive vs. negative) and arousal (high vs. low; Barrett, 2017; Russell, 1980). These broad, valence- and arousal-based concepts are thought to gradually narrow into discrete emotion concepts over the first decade of life (Widen, 2013). According to the theory of constructed emotion, this broad-to-narrow progression occurs alongside the acquisition of emotion labels (e.g., “happy”; Barrett, 2017; Shablack & Lindquist, 2019). Not all constructionist theories emphasize the importance of language, however, thus far, the theory of constructed emotion has outlined one of the most complete hypotheses for infants’ emotion concept development (Barrett, 2017).
Classical views have not always made firm predictions about infants’ emotion understanding. However, given the evolutionary importance of facial expressions to these theories, it follows that discrete emotion concepts for “basic emotions” are innately specified, rather than predominantly learned or language-dependent (Ekman, 2017). Although not all classical theorists have explicitly endorsed the existence of an innate or easily acquired conceptual system for emotions, others have made this assertion (e.g., Izard, 1994; C. A. Nelson, 1987; Oster, 1981). In a similar vein, some have argued that infants initially have perceptual emotion categories that transform into conceptual categories sometime in the first year of life, before emotion labels are acquired (e.g., Walker-Andrews, 1997). In a classical view, discrete emotion concepts are based on the communicative intent of facial expressions (e.g., happy expressions indicate safety; Shariff & Tracy, 2011).
Layout of the Review
This review (a) summarizes what is currently known about infants’ understanding of emotional facial expressions, and (b) discusses findings in relation to classical versus constructionist views of emotion. Different components of infants’ emotion understanding are considered, ranging from “basic” or “foundational” skills (e.g., expression discrimination) to more “advanced” abilities (e.g., social referencing; for similar developmental sequences, see Walker-Andrews, 1997; Walle & Campos, 2012). This review also differentiates between studies comparing (a) between-valence expressions (e.g., happiness vs. anger), and (b) within-valence expressions (e.g., anger vs. disgust). The vast majority of studies have tested facial expressions across the dimension of valence. Consequently, it is impossible to determine whether infants are responding to these expressions based on valence alone (positive vs. negative) or discrete emotions (e.g., happiness vs. anger). To determine whether infants have discrete emotion concepts, researchers must compare facial expressions within one dimension of valence and arousal (e.g., high arousal, negative emotions, such as anger and fear). If compelling evidence were obtained that preverbal infants have discrete emotion concepts, this would call into question the claims put forward by some constructionist theories (e.g., Widen, 2013), particularly those that anchor these concepts to language (e.g., Barrett, 2017).
Discrimination of Emotional Expressions
Discrimination of emotional expressions—or perceiving the difference between two expressions (e.g., happiness vs. fear) portrayed by the same person (Walker-Andrews, 1997)—is the most fundamental ability for emotion understanding. Discrimination studies generally use static photographs of posed facial expressions from validated databases (e.g., Tottenham et al., 2009), although a handful of studies have tested dynamic, multimodal (facial and vocal) expressions. While most studies have used looking-time paradigms, in the last 15 years, researchers have also measured event-related potentials (ERP). These two methods are described separately.
Looking-Time Paradigms
Infant emotion discrimination has traditionally been measured with looking-time paradigms (for detailed discussion, see Oakes, 2010). In paired-preference paradigms, infants are shown two static facial expressions side by side. If infants look longer at one expression compared to the other, it is assumed that they (a) have a visual preference and (b) can discriminate the expressions. These looking-time differences are thought to reflect either a familiarity preference (e.g., in the case of happy expressions; Farroni, Menon, Rigato, & Johnson, 2007) or a novelty preference (e.g., in the case of fearful expressions; C. A. Nelson & Dolgin, 1985). However, if infants look equally long at both expressions, it is impossible to determine whether they (a) cannot discriminate the expressions or (b) do not have an expression preference.
Familiarization and habituation paradigms provide a more conclusive test of discrimination. In these paradigms, infants are repeatedly shown one expression (e.g., happiness). After a fixed number of trials (familiarization) or after infants’ looking time decreases to a certain criterion (habituation), infants are sequentially shown a novel expression (e.g., sadness) and a familiar expression (e.g., happiness). Infants provide evidence of discrimination if they look longer at the novel expression. Although these studies generally indicate that infants discriminate between various facial expressions, findings depend on the (a) stimuli (static vs. dynamic), (b) paradigm (paired-preference vs. habituation/familiarization), and (c) expression contrast (happiness–anger vs. happiness–fear).
Between-valence
There is disagreement regarding when emotion discrimination first emerges (for reviews, see Grossmann, 2010; Quinn et al., 2011). Although it has been reported that newborns discriminate happy from sad and fearful expressions (Farroni et al., 2007; Field et al., 1983; Field, Woodson, Greenberg, & Cohen, 1982), these findings are controversial. In particular, limitations in infants’ visual systems (e.g., contrast sensitivity, scanning, acuity) should make facial expression discrimination difficult in the early months of life (C. A. Nelson, 1987). In fact, when tested with static images, 3- and 4-month-olds do not reliably discriminate positive from negative facial expressions (Barrera & Maurer, 1981; Young-Browne, Rosenfeld, & Horowitz, 1977). On the other hand, infants at this age may discriminate between dynamic, multimodal (facial and vocal) expressions. For instance, 4-month-olds discriminate multimodal happy expressions from angry and fearful expressions, but findings are less consistent for happy–sad comparisons (e.g., Flom & Bahrick, 2007; Flom, Bahrick, & Pick, 2018; Montague & Walker-Andrews, 2001). In these studies, however, it is unclear whether infants are simply responding to movement differences between the stimuli (Grossmann & Jessen, 2017).
Due to ceiling effects in infants’ attention to dynamic facial expressions (Oster, 1981), emotion discrimination studies have typically used static stimuli. Further, given the limitations in younger infants’ visual capacities, most research has focused on infants older than 4 months of age. With few exceptions (Grossmann, Striano, & Friederici, 2007; LaBarbera, Izard, Vietze, & Parisi, 1976), 5- to 14-month-olds do not have a visual preference for angry over happy expressions (e.g., Krol, Monakhov, Lai, Ebstein, & Grossmann, 2015; LoBue & DeLoache, 2010). On the other hand, by 7 months, infants prefer fearful to happy expressions (e.g., Geangu et al., 2016; Krol et al., 2015; LoBue & DeLoache, 2010; Miguel, McCormick, Westerlund, & Nelson, 2019; Safar & Moulson, 2017).
As noted earlier, a lack of preference does not necessarily indicate an inability to discriminate. Thus, familiarization/habituation studies (but not preference studies) have indicated that, by 5 months, infants discriminate happy from negative facial expressions (sad, anger, fear), albeit only after habituation to happiness (Bornstein & Arterberry, 2003; Kestenbaum & Nelson, 1990; C. A. Nelson, Morse, & Leavitt, 1979; C. A. Nelson, Parker, Guthrie & Bucharest Early Intervention Project Core Group, 2006; although see Flom & Bahrick, 2007; Leppänen, Richmond, Vogel-Farley, Moulson, & Nelson, 2009). These habituation asymmetries are common, especially when infants are familiarized/habituated to fearful expressions (e.g., Parker & Nelson, 2005). It is thought that the novelty/negativity of fearful expressions sustains infants’ attention during the habituation and test events.
Within-valence
Only a few looking-time studies have explored whether infants discriminate between within-valence expressions. In an early study, 5-month-olds discriminated sadness and fear from anger, but not when anger was the familiarized emotion (Schwartz, Izard, & Ansul, 1985). This is consistent with findings that 7-month-olds prefer angry over sad expressions (Soken & Pick, 1999), and 5- to 12-month-olds prefer fear to anger expressions (Miguel et al., 2019). Finally, although 5-month-olds discriminate between sadness and fear (Schwartz et al., 1985), older infants (13- to 24-month-olds) only provide evidence of discrimination when sadness is the familiarized expression (C. A. Nelson et al., 2006).
Event-Related Potential (ERP) Paradigms
Researchers initially turned to ERP paradigms to examine why infants have visual preferences for some facial expressions. In these paradigms, infants observe multiple brief (< 1,000 ms) presentations of static facial expressions. ERPs are averaged from a continuous recording of electrical signals at the scalp, time-locked to the presentation of each expression. Infant studies have primarily focused on three ERP components: the N290, P400, and Nc (negative central). The N290 and P400 are thought to be “precursors” to the face-sensitive adult N170 (Rigato, Farroni, & Johnson, 2010), whereas the Nc is thought to relate to increased allocation of attention (de Haan, Johnson, & Halit, 2003).
Between-valence
Although there are many inconsistencies in the ERP literature (for a summary table, see van den Boomen, Munsters, & Kemner, 2018), the most reliable differences between positive and negative facial expressions have been found in the Nc. Multiple studies report that 7-month-olds have larger Nc amplitudes to fearful than happy expressions (e.g., Jessen & Grossmann, 2015, 2017; Taylor-Colls & Pasco Fearon, 2015). This result suggests that infants allocate more attention to fearful expressions. In comparison, most studies fail to find differences in P400 and N290 responses to positive versus negative facial expressions at any age (e.g., Jessen & Grossmann, 2017; Vanderwert et al., 2014; Xie, McCormick, Westerlund, Bowman, & Nelson, 2019).
Within-valence
Most studies have also failed to find differences between anger, fear, and sad expressions for the Nc, N290, and P400 (e.g., Vanderwert et al., 2014; Yrttiaho, Forssman, Kaatiala, & Leppänen, 2014). However, in some studies, infants 7 months and older show greater Nc and N290 amplitudes to anger than fearful and sad expressions (e.g., Kobiella, Grossmann, Reid, & Striano, 2008; Parker & Nelson, 2005; but see Xie et al., 2019). Findings are more variable for the P400. In some studies, 5- to 12-month-olds show greater P400 responses to anger than fear (Hoehl & Striano, 2008; Xie et al., 2019), but greater P400 responses to fearful than anger and sad expressions have also been reported in 7- to 24-month-olds (Kobiella et al., 2008; Parker & Nelson, 2005).
The “Fear Bias”
One of the most consistent findings in both looking-time and ERP paradigms is heightened attention to fearful compared to happy expressions (for a review, see Leppänen & Nelson, 2012). Developmental researchers typically explain this “fear bias” in terms of the threat-signaling value of fearful expressions (e.g., Jessen & Grossmann, 2014), which might be adaptive for self-locomoting infants (Campos et al., 2000). Thus, the emergence of crawling is often used to explain why 7-month-olds, but not younger infants, attend more to fearful than happy expressions (e.g., Grossmann & Jessen, 2017; Jessen & Grossmann, 2016; Leppänen, Cataldo, Bosquet Enlow, & Nelson, 2018; but see Bayet et al., 2017; Heck, Hock, White, Jubran, & Bhatt, 2016, 2017). However, this is very much a classical explanation. In line with a constructivist view, there may be other explanations for these findings.
One possibility is that heightened attention to fearful expressions reflects a general negativity/threat bias (Vaish, Grossmann, & Woodward, 2008). This seems unlikely, however, given that 5- to 14-month-old infants do not have a visual preference for angry over happy expressions. Furthermore, even though 7-month-olds show larger Nc amplitudes in response to fearful than happy expressions, Nc differences are not found when anger is compared to happiness (Grossmann et al., 2007; Parker & Nelson, 2005). Thus, there may be something especially “attention-grabbing” about fearful expressions that is not shared with other negative, threat-related expressions like anger (although see Morales et al., 2017).
Another possibility is that low-level perceptual features of fearful expressions—such as wide eyes—elicit infant attention. Interestingly, however, 7-month-olds allocate more attention (i.e., longer looking times, larger Nc responses) to happy eyes than fearful eyes (Jessen & Grossmann, 2014, 2016; Krol et al., 2015). Also, in attentional disengagement tasks, 7-month-olds more slowly shift their attention to a peripheral target when presented with fearful expressions, compared to neutral expressions with wide, “fearful” eyes (Peltola, Leppänen, Mäki, & Hietanen, 2009). Thus, while wide eyes may elicit attention, this feature may not be the sole explanation for the fear bias.
A more likely explanation is that fear expressions are unfamiliar. Young infants are rarely exposed to these expressions (Malatesta & Haviland, 1982) and caregivers describe prototypical fear displays as “unnatural” or “uncharacteristic of their normal behavior” (Camras & Sachs, 1991; Rosen, Adamson, & Bakeman, 1992). Direct evidence also suggests that infants view fear expressions as “novel.” For example, in an attentional disengagement task, 7-month-olds were equally likely to fixate on nonemotional, “novel” expressions (i.e., lips closed, cheeks blown full of air, eyes open) compared to fearful expressions (Peltola, Leppänen, Palokangas, & Hietanen, 2008). In addition, the fear bias relates to positive maternal emotionality (de Haan, Belsky, Reid, Volein, & Johnson, 2004), suggesting that infants are attentive to expressions that are not typically encountered in their daily lives. Finally, the fear bias has been found to decline around 11 to 12 months of age (Peltola, Hietanen, Forssman, & Leppänen, 2013), presumably as infants gain more experience with fearful expressions (Xie et al., 2019).
Summary
These findings indicate that, by 5 months of age, infants can discriminate between happy and negative facial expressions (fear, anger, sadness). However, it is unclear whether infants at any age can discriminate different negative facial expressions at a behavioral or neural level. There is some, albeit limited, evidence for within-valence discrimination in the looking-time literature (e.g., Schwartz et al., 1985), but the ERP findings are inconsistent. Despite the lack of concrete evidence for within-valence discrimination, classical emotion theories have interpreted the discrimination literature as support for an early emerging preparedness for emotion understanding, particularly with regard to fearful expressions (e.g., Leppänen & Nelson, 2006). However, heightened attention to fear expressions does not necessarily mean that infants “understand” these expressions as “threatening.” Consistent with this interpretation, one study recently reported that the fear bias at 7 months does not correlate with emotion understanding at 48 months (Peltola, Yrttiaho, & Leppänen, 2018). In contrast, some constructionist theories have argued that infants discriminate facial expressions on the basis of isolated perceptual features, without understanding emotional meaning (Barrett, 2017; Lindquist & Gendron, 2013).
Given the reviewed literature, the interpretations made by both theories seem premature. There is currently no empirical metric to determine the nature of infants’ responses in looking-time and ERP tasks (Madole & Oakes, 1999). In other words, it is not possible to determine whether infants discriminate facial expressions based on (a) salient perceptual features alone (e.g., mouth shape), (b) affective meaning alone (i.e., the communicative signal of the expression), or (c) some combination of the two. Discrimination studies, as they are currently designed, are ultimately unable to provide meaningful insights into whether infants have discrete emotion concepts.
Categorization of Emotional Expressions
Categorization studies provide an additional test of infants’ ability to differentiate between discrete emotional facial expressions. Categorization is the ability to group different instances of a facial expression (i.e., multiple people expressing the same emotion) together as members of a category. This ability has been tested using habituation/familiarization paradigms and, in most instances, static facial expressions. Unlike discrimination studies, in which infants are repeatedly shown facial expressions posed by a single model/person, categorization studies use multiple models/people expressing one emotion (e.g., happiness). At test, infants are thought to form a category if they show heightened attention to familiar models expressing a different emotion (e.g., fear) compared to novel models expressing the familiarized/habituated emotion (e.g., happiness). To form a category, infants need to attend to the relevant, invariant affective information (i.e., the emotion), while ignoring irrelevant, variable perceptual differences (i.e., the person expressing the emotion). Given the memory demands (i.e., infants need to track which models and emotions were presented during habituation; Aslin, 2007), categorization studies typically test 7- to 12-month-olds. Studies with 3- to 6-month-olds have yielded mixed results (Bornstein & Arterberry, 2003; A. Caron, Caron, MacLean, 1988; R. Caron, Caron, & Myers, 1982; Serrano, Iglesias, & Loeches, 1992, 1995; Walker-Andrews, Krogh-Jespersen, Mayhew, & Coffield, 2011).
Between-Valence
There is some evidence that 7- to 10-month-olds can form a category of happiness (i.e., after habituation to happy faces) and differentiate this category from novel anger and fear expressions at test (A. Caron et al., 1988; Kestenbaum & Nelson, 1990; Ludemann, 1991; C. A. Nelson & Dolgin, 1985; C. A. Nelson et al., 1979; Safar & Moulson, 2017). It remains unclear, however, whether 3- to 12-month-olds can differentiate a category of happiness from novel sad expressions at test (A. Caron et al., 1988; Lee, Cheal, & Rutherford, 2015; Walker-Andrews et al., 2011). Moreover, some studies fail to find any evidence of happy categorization, even in infants as old as 11 months of age (Amso, Fitzgerald, Davidow, Gilhooly, & Tottenham, 2010; Phillips, Wagner, Fells, & Lynch, 1990; Schwartz et al., 1985; Serrano et al., 1995).
With respect to negative emotions, 4- to 12-month-olds can sometimes form a category of anger expressions and differentiate this category from happiness at test (R. Caron, Caron, & Myers, 1985; Lee et al., 2015; Serrano et al., 1995; but see Phillips et al., 1990; Schwartz et al., 1985). However, 6- to 11-month-olds do not seem to form a category of fearful expressions (Amso et al., 2010; Ludemann & Nelson, 1988; C. A. Nelson & Dolgin, 1985; C. A. Nelson et al., 1979; Safar & Moulson, 2017; but see Cong et al., 2018) or sad expressions (Lee et al., 2015; Walker-Andrews et al., 2011) when presented with happy expressions at test.
In these studies, it is unclear whether infants’ categories are based on salient perceptual features (e.g., teeth) alone or affective meaning. To test this question, infants have been presented with happy expressions that vary either in intensity or amount of teeth. Infants at 5 to 12 months of age can form a happy category even when the expressions vary in intensity during habituation (i.e., small, closed-mouth smiles and big, toothy smiles; Bornstein & Arterberry, 2003; Kotsoni, de Haan, & Johnson, 2001; Lee et al., 2015; Ludemann & Nelson, 1988; but see Cong et al., 2018; Phillips et al., 1990). In contrast, when salient facial features vary systematically between the habituation and test trials, infants use those features as the basis for categorization. Specifically, after habituation to nontoothy happy expressions, 4- to 7-month-olds show heightened attention to a novel model expressing toothy happiness (R. Caron et al., 1985). However, when the amount of teeth is held constant from habituation to test, 7-month-olds provide evidence of categorization (Kestenbaum & Nelson, 1990). Although this sensitivity seems to decrease over the first year of life, it is still evident around 10 months of age (R. Caron et al., 1985; Ludemann, 1991). It is unknown whether infants older than 10 months continue to be influenced by these perceptual cues.
Within-Valence
Only a handful of studies have used negative facial expressions during both habituation and test. These studies indicate that 4- to 18-month-olds can form a category of anger (i.e., after habituation to anger expressions) and differentiate this category from novel sad, fear, and disgust expressions (Ruba, Johnson, Harris, & Wilbourn, 2017; Schwartz et al., 1985; Serrano et al., 1992). Moreover, 10- and 18-month-olds can form a category of disgust and differentiate this category from anger expressions at test (Ruba et al., 2017). Findings are mixed as to whether 4- to 6-month-olds can form a category of sadness or fear during habituation (Schwartz et al., 1985; Serrano et al., 1992). Given that infants’ categorization abilities are tenuous before 7 months of age (e.g., A. Caron et al., 1988; R. Caron et al., 1982), it is possible that only older infants can form these within-valence categories. On the other hand, in a recent paired-preference study, 5-month-olds were sensitive to the categorical boundary between sadness and disgust, as well as sadness and anger (White et al., 2019). In the same study, however, 5- and 9-month-olds were not sensitive to the category boundary between anger and disgust.
Summary
The categorization literature provides some evidence that, by 7 months of age, infants can form (a) a category for happiness and differentiate this category from (some) negative expressions, and (b) a category for anger and differentiate this category from happy expressions. There is also emerging evidence that infants can form categories of discrete negative expressions and differentiate these categories from other negative expressions. From a classical perspective, these findings could be used to argue that infants “understand” discrete facial expressions (e.g., Walker-Andrews, 1997). If infants can perceive that multiple people are displaying the same emotion, then these categories might be conceptual (i.e., based on affective meaning). On the other hand, constructionist theories would argue that these categories are still perceptual in nature. Infants may attend to a shared facial feature across models (e.g., scrunched noses on disgust expressions) as the basis for these categories. To date, however, there have been few systematic efforts to manipulate or control for salient facial features in categorization tasks.
Similar to the discrimination literature, it is difficult to discern the nature of infants’ responses in categorization tasks. If infants understand the affective meaning of discrete facial expressions, then they would likely draw on this information, even if they are still influenced by salient perceptual features. In fact, even though adults have discrete emotion concepts, their emotion categorization is still influenced by facial features (e.g., presence of teeth; Ruba, Wilbourn, Ulrich, & Harris, 2018). Thus, the existent categorization literature also cannot answer the question of whether infants have discrete emotion concepts.
Intermodal Matching of Emotional Expressions
Another test of infants’ emotion understanding is intermodal matching—the ability to match emotions across expressive modalities (e.g., face and voice). In these studies, infants are typically presented with two dynamic facial expressions side by side (e.g., happy and sad). A vocal expression is played that is congruent with one of the facial expressions. Vocal expressions are usually single words or sentences spoken in an emotional tone, although some studies use musical tones (Phillips et al., 1990) or vowel sounds (Palama, Malsert, & Gentaz, 2018). These vocalizations are presented asynchronously with the facial expressions to prevent infants from matching based on temporal information alone. If infants are sensitive to the common affective information shared by face and voice, they should look longer at the facial expression that “matches” the auditory cue.
Between-Valence
Multiple studies have confirmed that 5- to 12-month-olds can match happy and sad vocalizations/tones to their respective facial expressions (Flom & Whiteley, 2014; Phillips et al., 1990; Walker, 1982; but see Soken & Pick, 1999). Younger (3.5-month-old) infants can also form these matches, albeit only when the expressions are posed by their mothers (Kahana-Kalman & Walker-Andrews, 2001). Infants between 5 and 7 months of age can also match happy and angry vocalizations to facial expressions (Grossmann, Striano, & Friederici, 2006; Soken & Pick, 1992, 1999; Vaillant-Molina, Bahrick, & Flom, 2013; Walker, 1982; Walker-Andrews, 1986; but see Palama et al., 2018). More recently, however, Ogren, Burling, and Johnson (2018) reported that 9-month-olds did not form intermodal matches for happy, sad, and angry expressions (when paired with a neutral expression). Unlike previous research, this study controlled for baseline expression preferences, thereby providing a more stringent test of intermodal matching.
Within-Valence
Only two studies have tested whether infants can form intermodal matches when two negative facial expressions are presented. Phillips et al. (1990) found that 7-month-olds did not match loud and quiet tones to anger and sad faces, respectively. However, 7-month-olds formed intermodal matches for angry and sad faces when the vocal expressions contained human speech (Soken & Pick, 1999). Thus, successful intermodal matching might depend on using ecologically valid auditory cues. Further, although infants might match negative emotions across the dimension of arousal (anger is high arousal, sadness is low arousal; Russell, 1980), it is unknown whether they can match negative emotions within the dimension of arousal (e.g., anger vs. fear).
Summary
In summary, 5- to 12-month-olds can match positive and negative faces to vocalizations. However, because these studies have yet to compare expressions within one dimension of valence and arousal, it is unknown whether these responses are based on discrete emotions (e.g., happy vs. sad). Regardless, from a classical standpoint, intermodal matching is assumed to go beyond simple expression discrimination, instead signifying “emotion recognition” (Walker-Andrews, 1997). More specifically, infants are thought to recognize the common affective information communicated across modalities.
However, a leaner interpretation cannot be discounted. Infants might have simply learned that certain facial expressions (e.g., a smile) and certain vocal expressions (e.g., laughter) co-occur in their social environment. Consequently, infants could make intermodal matches without understanding the affective meaning of the expressions. In support of this, Grossmann et al. (2006) found that 7-month-old infants showed larger Nc and Pc (positive component) amplitudes to congruent facial–vocal expression pairs, compared to incongruent pairs. As previously mentioned, the Nc is thought to reflect heightened visual attention (de Haan et al., 2003), while the Pc is thought to reflect memory for familiar items (C. A. Nelson, Thomas, de Haan, & Wewerka, 1998). This finding suggests that infants may have been relying on their memory for the learned associations between facial and vocal expressions.
Even if intermodal matching recruits emotion concept knowledge, the current findings do not refute constructionist emotion views. For instance, infants’ ability to match positive and negative facial expressions to vocalizations is consistent with the hypothesis that infants have valence- and arousal-based emotion concepts. However, given that discrete emotional expressions are thought to occur with statistical irregularity (Barrett, 2017), preverbal infants should have more difficulty forming intermodal matches for emotions within a dimension of valence and arousal (e.g., fear vs. anger). Because these emotion contrasts have yet to be studied, the current intermodal matching literature cannot address this claim.
Matching Events and Emotional Expressions
Recently, researchers have begun to explore another component of infants’ emotion understanding: event–emotion matching—the ability to match facial expressions with eliciting events. These studies use the violation-of-expectation (VOE) paradigm (Baillargeon, Spelke, & Wasserman, 1985). In this paradigm, infants are shown a video of an eliciting event (e.g., receiving a gift) followed by an emoter expressing a congruent (e.g., happiness) or incongruent emotion (e.g., sadness; for live procedures, see Chiarella & Poulin-Dubois, 2013). 2 Typically, infants’ visual attention to unimodal facial expressions is measured (see Hepach & Westermann, 2013, for a pupil dilation measure). If infants have formed links between facial expressions and eliciting events, they should look longer at an expression that is incongruent with that event compared to a congruent expression. For example, if infants have formed links between receiving a gift and happiness, they should look longer to a sad than a happy expression. This ability is more advanced than intermodal matching since it is thought to reflect an understanding of the causes of facial expressions (Hepach & Westermann, 2013; Reschke, Walle, Flom, & Guenther, 2017).
Between-Valence
Two studies suggest that by late in the first year of life, infants match positive facial expressions with positive events. Hepach and Westermann (2013) reported that 10- and 14-month-olds expected an emoter to express happiness, rather than anger, when patting a stuffed animal. Similarly, Skerry and Spelke (2014) found that 8- and 10-month-olds expected an agent to express happiness, rather than sadness, after completing a goal. In this study, the agent was a circle that expressed positive affect by smiling, giggling, and bouncing, or negative affect by frowning, crying, and slowly rocking side to side. Thus, it is possible that infants’ responses were driven by the vocal and/or movement cues, rather than by the agent’s “facial expressions.”
Infants do not seem to match negative emotions to negative events until the second year of life. For instance, neither 8- nor 10-month-olds in Skerry and Spelke’s research (2014) expected an agent to express sadness, rather than happiness, after failing to complete a goal. Moreover, in Hepach and Westermann’s study (2013), 14-month-olds, but not 10-month-olds, expected an emoter to express anger, rather than happiness, when hitting a stuffed animal. Consistent with these findings, Reschke et al. (2017) reported that 12-month-olds expected an emoter to express (a) sadness or anger, rather than happiness, after fighting over a toy, and (b) happiness, rather than anger, after receiving a toy. Infants did not expect an emoter to express sadness, rather than happiness, after someone broke the emoter’s toy. Finally, Chiarella and Poulin-Dubois (2013) reported that 18-month-olds, but not 15-month-olds, expected an emoter to express (a) sadness, rather than happiness, after an object was taken away, and (b) happiness, rather than sadness, after receiving a desired object.
Within-Valence
To date, only two studies have examined whether infants match different negative facial expressions to different negative events. Reschke et al. (2017) reported that 12-month-olds did not expect an emoter to express (a) sadness, rather than anger, after another person broke the emoter’s toy, or (b) anger, rather than sadness, after fighting over a toy with another person. In contrast, Ruba, Meltzoff, and Repacholi (2019) found that 14- and 18-month-olds expected an emoter to express (a) anger, but not disgust or fear, after failing to achieve a goal, and (b) disgust, but not anger or fear, after tasting a novel food. Infants at this age did not expect an emoter to express fear, rather than anger or disgust, after encountering a novel object.
Summary
This relatively new body of research suggests that infants can match (a) positive emotions to positive events late in the first year of life, and (b) negative emotions to negative events in the second year of life. In the second year of life, infants are also beginning to match different negative facial expressions to specific negative events. A classical interpretation of these findings is that infants “understand” something about the causes of discrete emotions. However, similar to the intermodal matching literature, it could be argued that infants are simply remembering specific event–emotion associations encountered in their daily lives, without understanding the emotions or the causal link between emotions and events. We hypothesize that infants’ event–emotion matching reflects some, albeit limited, degree of emotion understanding. For instance, infants might link emotional expressions with specific classes of events (e.g., goal achievement vs. goal failure; Meltzoff, 1995; Woodward, 1998) as opposed to specific events experienced or observed by the infant (e.g., obtaining a desired stuffed toy). 3
However, the extent to which infants can form event–emotion matches based on discrete emotions remains unclear. To date, only one study has tested emotions within a dimension of valence and arousal (Ruba et al., 2019). Importantly, this study found that preverbal infants were able to match some negative emotional expressions to specific events. This suggests that discrete event–emotion matching may be possible before infants learn emotion labels. In other words, infants may be able to detect the regularities between facial expressions and eliciting events, without needing to be explicitly taught these associations via language. Although the findings of Ruba et al. (2019) challenge the theory of constructed emotion (Barrett, 2017), it is worth noting that this study tested bimodal expressions (i.e., face and voice).
Social Referencing
Social referencing—the ability to use another person’s emotional expression to guide one’s own behavior (Campos & Stenberg, 1981)—is perhaps the most “advanced” test of infants’ emotion understanding. In social referencing paradigms, an experimenter/caregiver expresses an emotion in response to a novel object (e.g., a moving, noise-making toy robot). Other novel stimuli have included live animals (Hornik & Gunnar, 1988), human strangers (e.g., Feinman & Lewis, 1983), and the “visual cliff” (e.g., Sorce, Emde, Campos, & Klinnert, 1985). Several infant responses have been measured, including approach (e.g., latency to touch object), contact (e.g., duration of touch), and affect (e.g., facial/vocal expressions).
Between-Valence
Most studies have compared a happy or neutral expression to a negative expression (for a table of studies, see Vaish et al., 2008). Few differences emerge in 10- to 18-month-old’s responses to objects that have been the target of a happy versus a neutral expression (e.g., Hornik, Risenhoover, & Gunnar, 1987; Mumme & Fernald, 2003; Mumme, Fernald, & Herrera, 1996; Repacholi, 2009). In contrast, numerous studies have indicated that 10- to 24-month-olds approach and/or touch an object if the emoter expresses happiness or neutral affect, but avoid the object (delayed and/or reduced object contact) if the emoter expresses fear (e.g., Kim & Kwak, 2011; Kim, Walden, & Knieps, 2010; although see Leventon & Bauer, 2013). Similar findings have been obtained for 11- to 18-month-olds with happy/neutral versus disgust expressions (e.g., Carver & Vaccaro, 2007; Chiarella & Poulin-Dubois, 2018; Flom & Johnson, 2011; although see Schieler, Koenig, & Buttelmann, 2018). In addition, 15- and 18-month-olds are less likely to imitate a model’s actions that have been the target of sad or angry expressions, compared to happy or neutral expressions (e.g., Patzwald, Curley, Hauf, & Elsner, 2018; Repacholi, Meltzoff, Spiewak Toub, & Ruba, 2016). Taken together, these findings suggest that, by 10 to 12 months of age, infants understand something about the functional behavioral responses specified by positive and negative facial expressions.
Most of this research has focused on infants 10 months of age and older, and it is unclear whether younger infants also engage in social referencing. A few studies have failed to find evidence that 6- to 9-month-olds regulate their behavior in response to an adult’s happy, fearful, and disgust expressions (Slaughter & McConnell, 2003; Walden & Baxter, 1989; Walden & Ogan, 1988). In contrast, Vaillant-Molina and Bahrick (2012) reported that 5.5-month-olds preferentially touched a toy that had been linked with a happy expression compared to a fearful expression. Infants in this study were habituated to the emotion–object pairings before the behavioral response period, and this increased exposure may have helped infants encode these pairings. In addition, several ERP studies have found that infants as young as 3 months show increased Nc activity to pictures of objects previously paired with fear or disgust expressions, compared to neutral or happy expressions (e.g., Carver & Vaccaro, 2007; Hoehl & Striano, 2010; Hoehl, Wiese, & Striano, 2008; but see Aktar et al., 2016; Leventon & Bauer, 2013). These ERP findings suggest that infants are more attentive to objects that are linked to negative facial expressions. In summary, younger infants may engage in social referencing when the tasks are more developmentally appropriate.
Within-Valence
To date, few social referencing studies have examined infants’ responses to within-valence emotions. In a classic study, 12-month-olds were more likely to cross a visual cliff when their mothers posed sadness compared to anger and fear expressions (Sorce et al., 1985). Similarly, Martin and colleagues (Martin, Maza, McGrath, & Phelps, 2014; Martin, Witherington, & Edwards, 2008) reported that 16- to 18-month-olds touched target objects for shorter durations in response to anger and fear expressions compared to sadness. One explanation for these findings is that—in the context of an ambiguous object/event—high arousal, negative emotions (anger, fear) communicate threat and danger (Shariff & Tracy, 2011), and avoidance is an appropriate response to both emotions (Walle & Campos, 2012). These studies do, however, provide evidence for arousal-based behavioral responses (sadness vs. anger/fear).
One limitation of social referencing studies is the use of a relatively limited behavioral coding system (i.e., behaviors coded as either approach or avoidance). Recently, Walle, Reschke, Camras, and Campos (2017) designed a coding system focused on the “goal” of infants’ behavioral response (e.g., prosocial responding, relaxed play, social avoidance). Infants (16-, 19-, and 24-month-olds) saw an emoter displaying multimodal (face, voice, posture, gesture) expressions of sadness, anger, fear, or disgust in response to two events. Compared to the other three negative emotions, 24-month-olds showed greater avoidance of the emoter when she displayed anger. In addition, 19-month-olds (but not 16- or 24-month-olds) demonstrated more “information seeking” (i.e., alternating their gaze between the object and emoter) in response to disgust than anger. However, infants’ “information seeking” did not differ in response to fear compared to disgust or anger. Thus, evidence for differential responding to negative emotions was less clear-cut at this age.
Summary
By 10 to 12 months of age, infants can use another person’s positive and negative emotional expressions to regulate their own behavior. According to a classical view of emotion, this suggests that infants understand the meaning of these emotional expressions. For instance, infants may understand that positive emotions communicate safety and approach, while negative emotions signify danger and avoidance (Shariff & Tracy, 2011).
However, an alternative interpretation is that adults’ emotional expressions directly modify infants’ own felt emotions and subsequent behavior. In this interpretation, infants need not understand the emotional expression as a meaningful signal in order to regulate their behavior. Consistent with this contagion hypothesis, some evidence suggests that infants display more negative affect in response to an emoter’s fearful expressions, and more positive affect when the emoter expresses happiness (e.g., Hirshberg & Svejda, 1990; Mumme et al., 1996). However, other studies have failed to find differences in infants’ affect, particularly when the emoter expresses anger or disgust (e.g., Hertenstein & Campos, 2004; Repacholi, 2009; Repacholi, Meltzoff, Hennings, & Ruba, 2016).
Further evidence against the contagion hypothesis comes from social referencing studies that have manipulated stimulus ambiguity, emoter competence, and emoter attention. For instance, when the experimental stimuli are low in ambiguity, adults’ emotional expressions have little or no impact on infants’ behavior (e.g., Kim & Kwak, 2011; Tamis-LeMonda et al., 2008). Behavioral regulation is also less likely if the emoter is “incompetent” (e.g., Stenberg, 2012, 2013). Finally, infants are less likely to regulate their behavior if the emoter is not visually attending when the infant has access to the object (Botto & Rochat, 2018; Repacholi, Meltzoff, & Olsen, 2008; Repacholi, Meltzoff, Rowe, & Spiewak Toub, 2014). These modulations suggest that infants’ behavioral regulation cannot be reduced to emotional contagion. From a contagion perspective, the adult’s expression directly modifies the infant’s own affective state (e.g., a fearful expression causes the infant to become scared, which in turn inhibits their object exploration). If infants are “catching” adults’ emotion via contagion, then they should regulate their behavior regardless of these manipulated task features. However, this is not the case.
Even if social referencing reflects true understanding of emotions, the findings are not inconsistent with the constructionist view that infants have valence- and arousal-based emotion concepts (e.g., Barrett, 2017; Widen, 2013). For instance, Walle et al. (2017) found that 24-month-olds differentially responded to different negative emotions, but at this age, infants are quite verbal. Specifically, emotion labels (e.g., “mad,” “angry”) are beginning to emerge in infants’ productive vocabularies (Ridgeway, Waters, & Kuczaj, 1985). Although no language data were reported in this study, it could be argued that infants’ newly acquired emotion language facilitated their understanding of these discrete emotions. Another potential issue is the use of multimodal expressions. Some studies have suggested that the vocal expression, rather than the facial expression alone, drives social referencing (Kim et al., 2010; Mumme et al., 1996; Vaillant-Molina & Bahrick, 2012; Vaish & Striano, 2004). Thus, even if preverbal infants were to show distinct behavioral responses to discrete negative emotions, it would be useful to determine which expressive modality primarily influences infants’ behavior.
Language and Emotion Concept Development
Some constructionist theorists have used the lack of conclusive evidence for discrete emotion understanding to argue that preverbal infants are unable to “interpret” or “perceive” discrete negative emotions (Lindquist & Gendron, 2013; Widen, 2013). Instead, the acquisition of discrete emotion concepts is thought to follow another fundamental developmental achievement: language acquisition. Specifically, the theory of constructed emotion argues that emotion words (e.g., “happy”) impose a categorical structure on otherwise variable facial expressions (Barrett, 2017; Barrett et al., 2007). In this way, the word “happy” can refer to toothy and nontoothy smiles, expressed across a variety of individuals, and in a myriad of contexts. Without emotion labels to serve as category anchors, naturalistic expressions of “happiness” may not share enough similarities to bind them together in a category (Fugate, 2013). For this reason, infants may not be able to form conceptual categories for discrete emotions until they have acquired emotion labels (Lindquist & Gendron, 2013; Widen, 2013). As previously noted, studies with preverbal infants have not provided definitive evidence for or against this hypothesis.
Language and Emotion Categorization in Children and Adults
Research with older, verbal children and adults, however, does suggest that language constructs emotion categories. First, emotion words influence how facial expressions are encoded and remembered (e.g., Brooks et al., 2017; Doyle & Lindquist, 2018; Fugate, Gendron, Nakashima, & Barrett, 2018). For example, adults remember facial expressions as “angrier” or “happier” depending on whether the expressions were paired with the word “angry” or “happy” (Halberstadt & Niedenthal, 2001). In addition, the inclusion of emotion labels in emotion categorization tasks improves children’s and adults’ performance (e.g., Camras & Allison, 1985; Carroll & Russell, 1996; N. L. Nelson et al., 2018; N. L. Nelson & Russell, 2016; Nook, Lindquist, & Zaki, 2015). For example, when asked to sort facial expressions into different categories, (a) 2- to 7-year-olds are more accurate when the categories are specified by an emotion label (Russell & Widen, 2002; Widen & Russell, 2004), and (b) adults are more accurate after reading instructions that include specific emotion labels (i.e., “you will sort anger and disgust expressions”; Ruba et al., 2018). In contrast, reduced accessibility to emotion labels leads to slower and less accurate facial expression categorization in adults (Gendron, Lindquist, Barsalou, & Barrett, 2012; Lindquist, Barrett, Bliss-Moreau, & Russell, 2006; Lindquist, Gendron, Barrett, & Dickerson, 2014). Taken together, these studies suggest that language fundamentally impacts emotion categorization in children and adults.
However, one clear limitation to this research is that children and adults have considerable experience with facial expressions and emotion labels. In particular, emotion labels and concepts are always implicitly available in participants’ minds (Lindquist & Gendron, 2013), and participants may draw on this knowledge during the testing session (N. L. Nelson et al., 2018; Ruba et al., 2018). To address this problem, some studies have examined adults with various neurological deficiencies (e.g., Lindquist et al., 2014; Nook et al., 2015) or presented healthy adults with unfamiliar, nonhuman faces (e.g., Doyle & Lindquist, 2018; Fugate, Gouzoules, & Barrett, 2010). Although insightful, these studies cannot address the question of how language constructs emotion categorization with human facial expressions in typically developing populations. A potential solution is to study preverbal infants. To date, no published work has examined how language influences infants’ emotion categories.
Language and Object Categorization in Infancy
Nevertheless, over two decades of research has documented how labels influence object categorization in infancy (for a review, see Ferguson & Waxman, 2016). In a seminal study, Waxman and Markow (1995) familiarized 13-month-olds with four objects from either a basic-level category (e.g., cars) or a superordinate category (e.g., vehicles, including cars and airplanes). An experimenter presented and labeled each object with either a noun (“look, a car”) or no-noun (“look what’s here”). In the subsequent test phase, infants were shown two new objects: a novel object from the familiarized category (e.g., car) and a novel object from an unfamiliar category (e.g., horse). Infants formed a basic-level category (cars) regardless of whether a noun or no-noun was presented during familiarization. However, infants only formed a superordinate category (vehicles) when a noun was presented. Similar facilitative labeling effects have subsequently been found with basic-level categories (e.g., LaTourrette & Waxman, 2019), novel objects (e.g., Fulkerson & Haaf, 2006), and other object properties (e.g., spatial relationships; Casasola, Bhagwat, & Burke, 2009).
Waxman and Markow (1995) argue that labels are “invitations” to form categories. In fact, research has found that labels are unique in their ability to facilitate categorization, compared to other sounds, such as instrumental music (Roberts & Jacob, 1991), nonlinguistic tones (e.g., Althaus & Westermann, 2016), and nonsensical/backwards human speech (e.g., Ferry, Hespos, & Waxman, 2013). Infants at 12 months of age are also unable to form categories when inconsistent labels are used (i.e., each object is given a different label; Waxman & Braun, 2005). These findings suggest that labels do not facilitate categorization simply by heightening infants’ attention to objects (Waxman, 1999). Instead, labels appear to facilitate category formation by highlighting commonalities between objects (for alternative explanations, see Ferguson & Waxman, 2016). Recent eye tracking and EEG research has confirmed that, for 12-month-olds, labels (a) direct visual attention to perceptual commonalities (Althaus & Plunkett, 2016), and (b) increase neural activity over the visual cortex (Gliga, Volein, & Csibra, 2010). This suggests that labels impact infants’ visual processing of objects. With evidence from a connectionist model, Westermann and Mareschal (2014) further hypothesize that labels modify visual perception, so that objects from the same-labeled category are perceived as more similar to one another. This is congruent with findings that labels influence facial expression perception in adults (e.g., Brooks et al., 2017; Fugate et al., 2018). Thus, while similar processes between language and categorization are evident at multiple stages of development, it remains to be seen whether language also influences emotion categorization in infancy.
Conclusion
For over 50 years, developmental psychologists have examined how preverbal infants understand others’ emotional facial expressions. The resulting empirical research suggests that infants can differentiate positive and negative facial expressions. By 5 months of age, infants can discriminate one positive expression from one negative expression, in looking-time and ERP paradigms. By 7 months of age, infants can also (a) form distinct categories of positive and negative facial expressions, and (b) match positive and negative facial expressions to positive and negative vocal expressions, respectively. Around 12 months of age, infants can (a) match positive and negative facial expressions to positive and negative eliciting events, respectively, and (b) use another person’s positive and negative expressions to determine whether to approach or avoid an ambiguous object. Thus, in the first 2 years of life, infants display a remarkable capacity to perceive, interpret, and differentially respond to other people’s positive and negative facial expressions.
However, these studies have largely failed to address whether infants understand emotions on the basis of valence/arousal or discrete emotions (e.g., happy vs. fear). To answer this question, studies need to compare facial expressions within one dimension of valence and arousal (e.g., anger vs. fear). Although few studies have made this comparison, there is some suggestion that infants can discriminate and categorize within-valence (negative) facial expressions (e.g., Ruba et al., 2017; Schwartz et al., 1985; Xie et al., 2019). However, it is not possible to determine whether these discrimination and categorization abilities are purely perceptual in nature. The few studies that examine more advanced forms of emotion understanding provide mixed evidence as to whether infants understand discrete emotions. While studies have found that infants form intermodal and event–emotion matches on the basis of valence and/or arousal (e.g., Reschke et al., 2017), others have reported that infants are sensitive to discrete emotions (e.g., Ruba et al., 2019). Similarly, most social referencing studies have demonstrated that infants respond to others’ emotional expressions on the basis of valence and/or arousal (Martin et al., 2014; Martin et al., 2008; Sorce et al., 1985). Although some evidence for discrete behavioral responses has been found with 24-month-olds (Walle et al., 2017), at this age, emotion labels are emerging (Ridgeway et al., 1985). Thus, from the existent research, it remains unclear whether and how infants understand discrete emotions before emotion labels are learned.
For this reason, both classical and constructionist theorists would benefit from a more precise description of preverbal infants’ emotion concepts. It is not accurate to conclude that preverbal infants are unable to “interpret” or “perceive” discrete negative expressions (e.g., Lindquist & Gendron, 2013; Widen, 2013), nor is it sufficient to say that preverbal infants “understand” or “recognize” facial expressions (e.g., Walker-Andrews, 1997). Interpreting the literature in this way ignores the nuances and complexities of infants’ emotion-understanding abilities. Moving forward, descriptions of infants’ emotion understanding should include information about (a) the component being measured (e.g., categorization, intermodal matching), and (b) the emotion contrasts tested (e.g., across- or within-valence). Distinctions about whether an ability is “perceptual” or “conceptual” must be made with caution, since most infant paradigms cannot make this distinction (Madole & Oakes, 1999). Furthermore, given that very little research has examined within-valence emotions, it is premature to make any definitive claims about infants’ ability to perceive and understand discrete facial expressions.
A New Hypothesis
Currently, a comprehensive developmental hypothesis describing infants’ understanding of emotional facial expressions does not exist. Based on the existent literature, we outline the following proposal. At birth, infants’ visual systems are likely not sufficiently mature to discriminate facial expressions (C. A. Nelson, 1987). However, by around 5 months of age, infants should be able to visually discriminate between all pairs of “basic” facial expressions, including within-valence pairs (e.g., Schwartz et al., 1985). Differences in neural activity to different facial expressions may also emerge at this time. With increased cognitive maturation at around 7 months of age, infants begin to form perceptual categories for these facial expressions. By the end of the first year of life, infants should be able to form perceptual categories for all pairs of “basic” facial expressions, including within-valence pairs (e.g., Ruba et al., 2017). Thus, we argue that infants develop the requisite perceptual and cognitive skills needed to discriminate and categorize “basic” facial expressions in the first year of life. However, the ability to discriminate and categorize these expressions (even at a neural level) does not require infants to attribute any affective meaning to these displays. We argue that discrimination and categorization tasks likely test perceptual abilities, rather than emotion concepts. Similarly, in the first year of life, infants should form intermodal matches between facial and vocal expressions, both across- and within-valence. These tasks likely test infants’ ability to detect regularities between facial and vocal expressions encountered in their environment, and also need not reflect any conceptual understanding of emotions.
In regard to infants’ emotion concepts, we argue that infants’ conceptual understanding of emotional facial expressions is initially broad, based on valence and arousal. This is congruent with constructionist theories (Barrett, 2017; Lindquist & Gendron, 2013) and research findings with preschoolers (Widen, 2013). It is unclear, however, when this broad understanding first emerges: it may be innately specified or gradually learned in the first year of life through observation of emotions in the infants’ environment. In contrast to constructionist theories—specifically, the theory of constructed emotion—we predict that in the second year of life, but before emotion labels are learned, this broad conceptual understanding of facial expressions gradually becomes more refined. Specifically, we argue that the acquisition of emotion labels is not necessary for infants to begin forming event–emotion matches for within-valence emotions (Ruba et al., 2019; Wu, Muentener, & Schulz, 2017). Further, it is possible that infants may begin to show differential functional responses to some within-valence emotions before emotion labels are learned (Walle et al., 2017).
Thus, in the second year of life, infants may understand something about the causes and functional behavioral responses for discrete “basic” emotions. Infants could learn about these components of emotion understanding through observation of emotions in their environment, without needing to be explicitly taught this information via language. However, it is very unlikely that preverbal infants have robust or fully formed discrete emotion concepts at this age. Further, this emerging and rudimentary understanding of discrete emotions is likely influenced by language. In other words, while emotion language may not be necessary for infants to discriminate, categorize, and match facial expressions to voices and events, or respond to others’ emotions, language may still play a constructive role in all of these abilities (e.g., language may change how emotions are categorized; Plunkett, Hu, & Cohen, 2008).
Overall, this hypothesis is unique from current proposals in that it (a) clearly differentiates between individual components of infants’ emotion understanding, while (b) emphasizing developmental change over the first 2 years of life. However, far more work is needed to empirically confirm this developmental sequence. Critically, future research should focus on within-valence emotion contrasts and infants at multiple ages. Currently, discrimination, categorization, and intermodal matching are primarily studied in the first year of life, while event–emotion matching and social referencing are studied in the second year of life. For this reason, it is unclear how these abilities emerge and change over time, particularly as older infants begin to learn emotion labels. In fact, no studies have examined how language influences emotion concept development in infancy, even though emotion labels (e.g., “happy”) begin to appear in infants’ productive vocabularies late in the second year of life (Bretherton, Fritz, Zahn-Waxler, & Ridgeway, 1986; Ridgeway et al., 1985). Finally, given that emotion concepts undergo developmental change throughout infancy and early childhood, it is important to document the learning mechanisms that account for these changes. While constructionist theories have largely advocated for language-dependent learning mechanisms, it is likely that language-independent learning mechanisms (e.g., observational and statistical learning; Plate, Wood, Woodard, & Pollak, 2019) also play an important role in emotion concept development.
Future research with preverbal infants has the potential to dramatically influence our understanding of human emotions. Infants are an ideal test case to isolate the relative roles of evolution, language, and social experience in the development of emotion understanding. Historically, developmental psychologists have not designed studies to test emotion theories, and affective scientists have not fully integrated infant studies into these theories. However, by connecting these two disciplines, researchers can turn towards collaborative projects to answer fundamental questions about the nature of human emotions.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
