Abstract
Children’s ability to recognize emotional expressions from faces and bodies develops during childhood. However, the low-level features that support accurate body emotion recognition during development have not been well characterized. This is in marked contrast to facial emotion recognition, which is known to depend upon specific spatial frequency and orientation sub-bands during adulthood, biases that develop during childhood. Here, we examined whether children’s reliance on vertical vs. horizontal orientation energy for recognizing emotional expressions in static images of bodies changed during middle childhood (5 to 10 years old). We found that while children of all ages had an adult-like bias favoring vertical orientation energy, this effect was larger at younger ages. We conclude that in terms of information use, a key feature of the development of emotion recognition is improved performance with sub-optimal features for recognition – that is, learning to use less diagnostic features of the image is a slower process than learning to use more useful features.
Keywords
Emotion recognition is most frequently examined in the context of facial expressions, though information from the body also makes an important contribution to how emotions are perceived (de Gelder, 2009). Adult observers are capable of quickly and accurately recognizing emotions from static images of bodies. Atkinson, Dittrich, Gemmel, and Young (2004) found that static images from dynamic sequences depicting the “peak” of emotional expression supported accurate emotion categorization, even when static form was substantially impaired by the imposition of point-light appearance on static images. Similar results from static images of computer-generated mannequins further suggest that adults can extract basic emotional categories from static body posture (Coulson, 2004). Indeed, not only does the appearance of the body support accurate emotion categorization, in some circumstances the emotion expressed by the body is preferentially used for emotion categorization over the emotion expressed by the face. For example, intense positive and negative emotion expressions (e.g. the facial expressions made by professional athletes following positive/negative outcomes) are differentiated not by facial information, but by the body posture (Aviezer, Trope, & Todorov, 2012a). Even if body information does not dominate face information for emotion categorization, these two sources of emotion information interact in person processing, such that the body and face are “holistically” perceived in terms of overall emotion so long as they are presented in the typical aligned configuration (Aviezer, Trope, & Todorov, 2012b; Mereen, van Heijnsbergen, & de Gelder, 2005; Mondloch, Nelson, & Horner, 2013). Understanding the nature of emotion recognition from naturalistic images of people thus depends critically on characterizing the properties of both facial emotion recognition and body emotion recognition.
How does emotion recognition from static images of the body develop? Multiple studies have examined the development of facial emotion perception (Durand, Gally, Seigneuric, Robichon, & Baudouin, 2007; Gao & Maurer, 2009; Widen & Russell, 2008), but there is comparatively little work examining how children perceive emotion from body stimuli, especially static images. When presented with dynamic bodies, young children (3 to 5 years old) can recognize some emotion categories accurately (Boone & Cunningham, 1998; Nelson & Russell, 2011a), and older children (approximately 8 years old) exhibit near adult-like sensitivity to a wider range of categories (Vieillard & Guidetti, 2009). When viewing static bodies, young children categorize emotions at above-chance levels (Nelson & Russell, 2011b), but do not show face/body emotion congruence effects until six years of age (Mondloch, Horner, & Mian, 2013). The existing literature thus suggests that young children can categorize some emotions using static body posture, though the developmental trajectory of this ability has not as yet been characterized for a wide range of ages and emotion categories. In particular, the information children use to categorize emotion from body images is a largely open question: What visual features do children recruit for this task, and how does their vocabulary for emotion categorization change with development?
In the current study, our goal was to examine how children’s use of specific low-level features for body emotion categorization develops between the ages of five to 10 years. Specifically, how do children develop information biases for low-level image features (e.g. specific spatial frequencies or orientation energies)? Understanding the nature of such biases is a key step towards linking visual recognition to computations that are carried out in early stages of visual processing, and characterizing the developmental timecourse of their emergence places important constraints on how representations of complex object appearance are refined by experience and maturation. Adults exhibit information biases favoring horizontal information in multiple face recognition tasks (Dakin & Watt, 2009; Goffaux & Dakin, 2010; Pachai, Sekuler, & Bennett, 2013) including facial emotion processing (Huynh & Balas, 2014; Yu, Chai, & Chung, 2011). Children’s dependence on horizontal orientation energy for face detection (Balas, Schmidt, & Saville, 2015; Goffaux, Poncin, & Schlitz, 2015) and emotion recognition (Balas, Huynh, Saville & Schmidt, 2015) also changes during early to middle childhood, such that younger children do not show as much of a difference between performance with horizontal vs. vertical structure than older children and adults.
What low-level features are used to categorize emotions from bodies? Critical orientation sub-bands are a particularly interesting feature set to use to address this issue. Besides being easily manipulated in natural images (like spatial frequency sub-bands), the information carried by the external contour, or silhouette, of a body is distributed across orientations as a function of posture and limb position. Changes in how the arms, trunk and legs are positioned will lead to substantial changes in how energy is distributed across orientation sub-bands, and these same postural changes also account for a great deal of variability across emotional expressions and other aspects of body language. While spatial frequency sub-bands may also vary somewhat as a function of these postural changes (and may therefore also be another interesting target domain), the intuitive relationship between local orientation energy, limb/body positions, and expressive poses made orientation sub-bands an intriguing target for our study.
For adults, vertical orientations are more important than horizontal orientation (Balas & Huynh, 2015), which is the opposite pattern observed for facial emotion recognition. This differential dependence on horizontal vs. vertical orientation for emotion categorization as a function of stimulus category suggests that the development of these two aspects of emotion categorization may proceed on different timescales, or have different properties. Thus, we asked whether children’s abilities to recognize emotion from static body images exhibited an adult-like bias for vertical orientation energy early in childhood, or if the nature of this bias changes over the course of middle childhood. In particular, we considered two hypotheses: (1) Information use may develop in a manner similar to “Perceptual Narrowing” for own vs. other-race faces – young children may use information broadly (showing no bias for any one in particular), and narrow their representation to a preferred band as they age. (2) Young children may instead already have a “narrow” preference favoring vertical orientation energy, but show continued gains in performance with either the preferred band, the non-preferred band, or both.
Methods
Participants
Our final sample consisted of 60 participants, including 20 five to seven-year-old children (13 female), 20 eight to 10-year-old children (9 female), and 20 adults (10 female) between the ages of 18 and 25 years old. Child participants were recruited from community centers in the Fargo-Moorhead area and received compensation for participating in the study. Adult participants were recruited from the North Dakota State University Undergraduate Psychology study pool and received course credit for their participation. We obtained written informed consent from all participants (or their legal guardians) before beginning each testing session. Children over the age of seven also gave written assent to participate. Finally, all participants reported either normal or corrected-to-normal vision.
Stimuli
We selected a total of 20 body images (10 happy, 10 sad) from the Bochum Emotional Stimulus Set database (Thoma, Soria Bauser, & Suchan, 2013). These images depict male and female models with their faces obscured exhibiting a variety of poses expressing happy or sad affect. These poses incorporated variability within each emotion category such that specific positions of the arms and legs were not fixed within a category. The original grayscale images were 500x500 pixels in size.
To create our orientation band-limited stimuli (Figure 1), we applied custom code for manipulating the power spectra of the original images following a Fast Fourier Transform. Horizontally-filtered and vertically-filtered images were created by applying a Gaussian window with a standard deviation of 20 degrees to the power spectrum of the original image, centered either on 0 degrees or 90 degrees (Dakin & Watt, 2009). This is essentially a windowing procedure that removes parts of the power spectrum outside a “bow-tie” shape centered on the target orientation. Following the application of this envelope, we reconstructed each image using the new power spectrum. To create images containing both orientation bands, we transformed the original images’ power spectra by applying both the horizontal and vertical windows and reconstructing the images using these new spectra. This procedure thus resulted in a grand total of 60 images, since each original image yielded three unique band-limited stimuli. We matched the mean grayscale intensity value and the root mean-squared (RMS) contrast across all of our stimulus images, but we note that this does not guarantee that there are not still differences in local contrast between the images in different conditions.

Examples of orientation-filtered emotional bodies used in the current study. The top row depicts a body in a happy pose and the bottom row depicts a body in a sad pose. In each row, we present the original image, a vertically-filtered image, a horizontally-filtered image, and an image containing both the horizontal and vertical orientation sub-bands. Participants were only tested with the filtered images.
Procedure
We asked participants to complete an emotion categorization task using the filtered images described above. Specifically, we told participants that they would be presented with a series of pictures depicting a person who was happy or sad, and they should label each picture according to the emotion they thought was being expressed. Because filtered images look unusual, we told children that our pictures were taken by a “funny camera” that made the pictures look strange and that some parts of the picture might be hard to see. Children were asked to nonetheless make their best guess about whether the person was happy or sad. All participants categorized stimuli using a touch screen interface: On each trial, a single body appeared at the center of the display, with cartoon happy and sad faces appearing simultaneously below the body stimulus and offset to the left/right of center. To categorize each image, participants touched the cartoon face expressing their choice of emotion. The left/right position of the cartoon faces was randomized to ensure that children had to actively attend to each trial to respond correctly.
Participants completed the task using an 800×600 Elo touch-sensitive display, seated at a comfortable viewing and reaching distance from the display. At this distance, the body stimuli subtended approximately 5 degrees of visual angle, though this varied across participants due to variation in viewing/reaching distance. Stimulus order was pseudo-randomized for each participant (using the Shuffle.m function included in the Psychtoolbox extensions for Matlab) and each target image was presented until participants made a response by touching one of the cartoon images. Participants categorized each image twice per condition for a total of 120 trials. All response collection routines and stimulus display parameters were implemented using Psychtoolbox v3 (Brainard, 1997; Pelli, 1997).
Results
For each participant, we calculated the proportion correct for both emotion categories and all orientation conditions (Table 1) and used these values to compute the hit rate (correct “happy” responses) and false alarm rate (incorrect “happy” responses) for emotion categorization in each orientation condition. Given these two values, we calculated both d’ (a signal detection measure of sensitivity) and c (a measure of response bias). In each case, we submitted these values to a 3×3 mixed-design ANOVA with orientation energy (vertical, horizontal, or both) as a within-subjects factor and age group (5 to 7 years old, 8 to 10 years old, and adults) as a between-subjects factor.
Average Proportion Correct Within Each Age Group for Both Emotion Categories and all Filtering Conditions.
Note. Standard deviation is reported in parentheses within each cell. Five to seven-year-olds, N = 20; eight to 10-year-olds, N = 20; Adults, N = 20.
Sensitivity
Our analysis of d’ values (Figure 2) across filter conditions and age groups revealed main effects of age (F(2,57) = 33.0, p < 0.001, partial η2 = 0.54) and orientation energy (F(2,114) = 27.85, p < 0.001, partial η2 = 0.33). The main effect of age was driven by significant differences between five to seven-year-olds (M = 1.81, 95% CI = [1.57 2.05]) and both eight to 10-year-olds (M = 2.84, 95% CI = [2.60 3.08]) and adults (M = 3.11, 95% CI = [2.87 3.35]), but eight to 10-year-olds’ performance did not differ overall from that of adults. The main effect of orientation energy was driven by significant differences between performance in the Horizontal condition (M = 2.15, 95% CI = [1.96 2.34]) and both the Vertical condition (M = 2.74, 95% CI = [2.59 2.90]) and the Horizontal+Vertical condition (M = 2.87, 95% CI = [2.68 3.06]). Performance in the Vertical and Horizontal+Vertical condition did not differ significantly.

Average d’ values across orientation filtering conditions as a function of discrete age groups (young children, older children, and adults).
We also observed a significant interaction between orientation energy and age group (F(4,114) = 3.05, p = 0.02, partial η2 = 0.097). To examine the nature of this interaction in more depth, we carried out two post-hoc univariate ANOVAs using difference scores computed by subtracting each participant’s performance in the Horizontal and Vertical condition from their performance in the Horizontal+Vertical condition. These difference scores represent the cost of each type of band-limited filtering (Horizontal only/Vertical only), which we then analyzed as a function of age as a between-subjects factor. These tests allow us to examine how the cost of vertical filtering and horizontal filtering each affect sensitivity across ages, which is the key aspect of visual development we hoped to investigate in this study. These difference scores were not appropriate for characterizing how baseline performance (assessed using the Horizontal+Vertical condition) might change developmentally, however, which is why we did not use difference scores to characterize performance initially.
The ANOVA examining the cost of horizontal filtering did reveal a main effect of age group (F(2,57) = 4.24, p = 0.019, partial η2 = 0.13), such that horizontal filtering incurred a larger cost in our youngest participants (M = 1.19, 95% CI = [0.78 1.61]) than in eight to 10-year-olds (M = 0.37, 95% CI = [-.042 0.78]). No other pairwise comparisons reached significance, including the comparison between five to seven-year-olds and adults. This latter outcome is obviously somewhat surprising, but this analysis still suggests that there are measurable differences in the cost of horizontal filtering across different developmental stages in childhood. Moreover, the ANOVA examining the cost of vertical filtering did not reveal a significant difference between age groups (F(2,57) = 1.00, p = 0.37, partial η2 = 0.034). Overall, these results thus suggest that the interaction we observed between orientation energy and age group is driven by a disproportionately large impact of horizontal filtering on younger children (5 to 7 years old) relative to performance when information in both orientation channels is available to them.
Response criterion
Our analysis of c values across filter conditions and age groups also revealed significant main effects of age (F(2,57) = 90.9, p < 0.001, partial η2 = 0.76) and orientation energy (F(2,114) = 10.67, p < 0.001, partial η2 = 0.16). The main effect of age was driven by significant differences between all three age groups such that values of c were more positive with increasing age, while the main effect of filter orientation was driven by significant differences between response bias in the Horizontal condition (M = −0.77, 95% CI = [–.92 –.62]) and both bias in the Vertical condition (M = –1.04, 95% CI = [–1.20 –.88]) and bias in the Horizontal+Vertical condition (M = –1.13, 95% CI = [–1.31 –.95]). Performance did not differ between the Vertical condition and the Horizontal+Vertical condition. Finally, the interaction between orientation energy and age did not reach significance (F(2,114) = 2.31, p = 0.062, partial η2 = 0.075).
Discussion
Our results reveal that the low-level tuning of body emotion recognition changes during childhood. First, we replicated Balas and Huynh’s (2015) observation that adults exhibit a vertical orientation bias for body emotion recognition, in contrast to the horizontal orientation bias that is typically observed for facial emotion recognition. Children exhibited qualitatively the same bias across our entire age range, but with important quantitative changes in the magnitude of that bias. Children’s overall performance increases with age, which is consistent with many prior reports examining face and body emotion recognition (Parker, Mathis, & Kupersmidt, 2013; Pitterman & Nowicki, 2004; Widen & Russell, 2008), but young children were disproportionately worse at using horizontal orientation energy for body emotion categorization. In terms of our candidate hypotheses, this result demonstrates that information use does not “narrow” with development in this task, but instead development is characterized by increasing competence with non-preferred information, and little change in performance for preferred information channels relative to a stimulus containing multiple features (our Horizontal+Vertical stimulus).
Several prior studies have demonstrated that body recognition performance improves during childhood in tandem with face recognition. Seitz (2002), for example, reported better body recognition performance in 10-year-olds relative to eight-year-olds, which was closely matched by performance with faces. Bank, Rhodes, Read, and Jeffery (2015) however, using a similar memory paradigm, found no difference between face and body performance when comparing six-year-olds, 10-year-olds, and adults. Specifically, performance in face and body memory tasks improved between ages six and 10, and again between age 10 and adulthood, with no interaction between age group and object category. Similarly, Robbins and Coltheart (2015) reported that static face and body matching and recognition developed similarly when comparing performance at ages eight and 10 to adult levels of performance, further suggesting a common developmental trajectory for face and body processing. Our work extends this literature to include the possibility that the tuning of face and body representations to low-level features like spatial frequency and orientation (as opposed to higher level aspects of appearance like static vs. dynamic appearance (Robbins & Coltheart, 2015) or configural/holistic processing (Mondloch, Horner, & Mian, 2013)), may also follow similar developmental trajectories even though the specific features that support recognition differ. Specifically, mechanisms for fine-tuning information use for emotion categorization (and maybe other tasks) may be well-characterized in terms of a unitary learning process that prioritizes rapid tuning to preferred information channels and ultimately leads to the slow acquisition of adult-like competence by gradual improvements in the use of non-preferred information.
The current study does have several important limitations that constrain the scope of our conclusions somewhat. First, our use of happy/sad emotion categories in the current task (rather than a broader set of emotions) does potentially limit the generalizability of our findings. These two emotions were shown by Mondloch, Horner, and Mian (2013) to be insufficient to induce congruency effects in children, and as such may not be a useful choice of emotion categories to support general claims about the development of emotion recognition more generally. Further, the use of only two emotion categories (whatever they might be) potentially limits the specific postural cues that signal emotion category. For example, distinguishing happy poses from sad poses in our task may be achievable by reducing the emotion judgment to something like an arm position judgment. While our stimuli included enough variability across images to preclude very simple strategies, an expanded set of emotions would also likely lead to a wider range of specific postural cues, which would clarify the extent to which we are measuring properties of emotion recognition or properties of pose judgments that can be used to infer emotion categories. Presently, we are confident that our results reveal a developmental change in how low-level features can be recruited, but allow that this may reflect posture judgments that could affect tasks beyond emotion categorization. Expanding the current design to include a wider range of emotions may also reveal whether or not the observed vertical orientation bias is a general property of body emotion recognition, or if this bias differs across emotion categories. To support more general conclusions about the nature of face and body representations across childhood, considering emotion categories within the framework of dimensional models like the circumplex model (Russell, 1980) may be particularly useful. Indeed, considering how emotion categories are situated within a dimensional model under different filtering conditions may be one way to understand why such models have the structure that they do, or at least to understand the structure of emotion space developmentally subject to low-level appearance. Children’s structural representation of emotion does appear to develop within the timespan we have considered here (Gao, Maurer, & Nishimura, 2010) making this approach likely to reveal important properties of face and body feature vocabularies during childhood. Characterizing the relationships between specific low-level computations and low-dimensional models of emotion processing is an important step towards developing a more general account of how emotion recognition develops using both face and body information.
Besides broadening the emotion categories used in the present study, we suggest that it would also be fruitful to consider how a wider range of low-level visual features contribute to static body emotion recognition. It is curious, for example, that there remains very little work describing the contribution of different spatial frequencies to body emotion recognition. The dependence of different frequency bands on facial emotion recognition has, by contrast, been documented both in adult populations and in children (Deruelle & Fagot, 2005; Stein, Seymour, Hebart, & Sterzer, 2013; Vuilleumier, Armony, Driver, & Dolan, 2003). Examining how spatial frequency tuning for body emotion recognition develops during childhood would further reveal the extent to which information biases are present at early ages, and potentially provide more support for the observation made both here and in Balas, Huynh, Saville & Schmidt (2015) that a key feature of the development of emotion recognition in childhood may be the ability to use sub-optimal features for categorization. The use of information biases as a means of characterizing the development of emotion recognition could be further extended by using “informative” grayscale fragments of body images (Harel, Ullman, Harari, & Bentin, 2011), to examine how mid-level information biases are expressed in childhood. In general, we suggest that using well-defined low- to mid-level features as a means of describing the development of emotion recognition in particular (and visual object recognition more generally) is an important complement to studies examining the development of higher-level mechanisms like holistic processing, the use of dynamic information for recognition, or the integration of face and body information.
Finally, our focus on static bodies in the current study obviously limits the ecological validity of our results. The contribution of dynamic information to body emotion recognition appears to be relatively small developmentally (Nelson & Russell, 2011b), but even so, information biases for dynamic faces and bodies may differ from the biases we observe from static stimuli. While our current results do reveal intriguing aspects of how body emotion categorization develops, investigating these issues using stimuli that more closely resemble the bodies children and adults actually do categorize in real-world settings would be of obvious importance.
Conclusions
We have found that body emotion recognition in childhood, like facial emotion recognition, relies heavily on specific orientation sub-bands. Children, like adults, perform better when vertical orientation energy is available, and this bias is evident at the youngest ages we tested. We also showed that young children may be disproportionately bad at using sub-optimal image features in this task, which is also consistent with previous reports describing the development of information biases supporting facial emotion recognition. Overall, we suggest that our results support a common developmental trajectory for face and body emotion recognition, with similar developmental trends describing information usage even though the specific information use for recognition differs for bodies.
Footnotes
Funding
The author(s) declared receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by NSF grant BCS-1348627 awarded to BB.
