Abstract
A key feature of facial behavior is its dynamic quality. However, most previous research has been limited to the use of static images of prototypical expressive patterns. This article explores the role of facial dynamics in the perception of emotions, reviewing relevant empirical evidence demonstrating that dynamic information improves coherence in the identification of affect (particularly for degraded and subtle stimuli), leads to higher emotion judgments (i.e., intensity and arousal), and helps to differentiate between genuine and fake expressions. The findings underline that using static expressions not only poses problems of ecological validity, but also limits our understanding of what facial activity does. Implications for future research on facial activity, particularly for social neuroscience and affective computing, are discussed.
Facial behavior consists of dynamically changing configurations of morphological features as a function of unfolding patterns of underlying muscle activation. Since the publication of Darwin’s The Expression of the Emotions in Man and Animals (1872) there has been scientific interest in facial configurations that are usually referred to as “emotional expressions.” Much of this research has focused on particular patterns at (or very near) the peak intensity of facial movement. There are two types of questions about these patterns that are particularly relevant for emotion research: (a) Do such peak expressions correlate with specific feeling states, physiological responses? (b) Are such peak expressions seen by observers as typical for the presence of a particular emotion?
The former question is relatively easy to answer. There is only a loose coupling between different components of emotions—facial behavior is not a reliable indicator of feeling states or vice versa (e.g., Kappas, 2003; Mauss & Robinson, 2009). However, it is quite likely that the prediction of subjective experience can be improved when taking into account indicators of regulation, low-intensity behavior, and particularly dynamic measures. Answering the second question is more complicated. It relates to inferences that are drawn in interaction about the affective states of others. These inferences need not be correlated with the self-report of an interaction partner. Here the question is rather whether particular morphological features are associated with a particular perception or interpretation (see also Russell, Bachorowski, & Fernández-Dols, 2003). In this line of research, faces are typically presented in a visually isolated fashion (i.e., without a visual context) and without information about who is depicted and what the situation was at the time the picture was taken. To ensure that impressions or judgments are based on a particular facial pattern, researchers often employ actors to portray stereotypes of emotions, or specific predefined patterns of facial actions. In this case it is obvious that the encoders typically do not feel the emotions they are asked to portray. Judges are therefore asked to decode what the actor is supposed to be expressing. Overall, this research has been very successful in achieving its goals (see also Russell et al., 2003). Thus, it could be shown that particular static patterns are associated with the attribution of specific affective states—even across cultural boundaries. Despite certain criticisms of the methodology used in this research (e.g., number of response alternatives provided; see Russell, 1994), there is little doubt that there are stereotypical patterns of facial activation that are interpreted as representing particular states, for example, happiness, anger, or fear (see Ekman, Friesen, & Ellsworth, 1972).
In everyday language, as well as in the scientific discourse, the expression “to recognize an emotion” has been used confusingly but nevertheless consistently to mean attributing the label that the researcher intended, based on previous research and/or theory, rather than what the encoder actually felt at the moment. Of course, this also applies to research using synthetic or artificial stimuli, such as line drawings, which cannot logically refer to an underlying affective state.
The present article focuses on the question of whether and how the comparatively neglected dynamic aspect of facial behavior influences the perception of facial patterns, attribution of emotion category labels, along with other aspects of emotion, such as intensity, or authenticity. Facial motion may convey information not only about the presence of an emotional state, but also its unfolding and ending, which can provide strong signals of actions and intent for researchers interested in a nonintrusive measure of emotion, or for interaction partners who are using verbal and nonverbal streams of information in a communication process (Kappas & Descôteaux, 2003). Given that the visual system evolved under dynamic conditions (see Gibson, 1966), it seems reasonable to assume that we are highly attuned to motion signals.
The primary objective of the present contribution is to review existing evidence on the role played by facial dynamics in the attributions of affective state. First, the effects of dynamic information are examined with respect to the recognition of emotions as belonging to the intended emotion category. This is followed by an overview of the effects of dynamics on emotion judgments more generally, and on behavioral responses and intentions. Finally, some conclusions concerning the role of dynamic aspects are drawn, finishing with a discussion of the long-term theoretical benefits of studying dynamic expressions.
Effects on Emotion Recognition Accuracy
Research using point-light or biological motion displays, line drawings, schematic and computer-animated faces suggests that movement enhances the accuracy of attribution of facial affect (Bassili, 1978, 1979; Bruce & Valentine, 1988; Wallraven, Breidt, Cunningham, & Bülthoff, 2008; Wehrle, Kaiser, Schmidt, & Scherer, 2000). These benefits of dynamic information are most evident when static information is limited (e.g., through degradation in geometry, shape, or texture). The beneficial effects of movement are somewhat weaker or redundant in natural or unmodified faces when complete spatial and textural information is available (e.g., Fiorentini & Viviani, 2011; Kamachi et al., 2001, Experiment 2). For example, comparing the identification of basic emotions from two stimulus types, Kätsyri and Sams (2008) and Ehrlich, Schiano, and Sheridan (2000) found a recognition advantage for motion in synthetic/schematic faces. However, no difference between static and dynamic displays was observed for natural faces. Similarly, in a study by Cunningham and Wallraven (2009a) using stimuli of varying resolution (i.e., animated full-surface faces, wireframe faces, and point-light faces), dynamic information generally led to higher recognition performance than static displays, but this difference tended to be larger for point-light faces with low spatial resolution. Motion therefore confers particular benefits when static information is inefficient or unavailable, thereby mitigating the negative consequences of degradation.
As well as the compensating role played by dynamic information under compromised conditions, a benefit is evident when it comes to people who have neurological or developmental disorders (e.g., brain damage or autism). In neuropsychological studies, dynamic presentation significantly facilitated emotion identification in adults and children who were unable to identify or impaired in identifying intended emotional expressions from static displays (Back, Ropar, & Mitchell, 2007; Harwood, Hall, & Shinkfield, 1999), supporting the assumption that different neural pathways underpin responses to moving and static stimuli (Adolphs, Tranel, & Damasio, 2003; Humphreys, Donnelly, & Riddoch, 1993).
One possible explanation for the benefit conferred by dynamic displays is that a moving sequence consists of multiple static images. Thus the effect of dynamic displays could be attributed to an increase in the amount of static information. However, this is not the case. Using normal human faces, Ambadar, Schooler, and Cohn (2005) showed that identification of subtle expressions was significantly better for moving sequences compared to “multistatic” images that contained the same number of frames, but with a mask interspersed between each frame in order to disrupt the apparent motion. Thus a dynamic sequence seems to provide a functionally distinct type of information that is not attributable to additional static cues.
However, the intensity of the facial expression moderates the extent to which moving displays result in emotion recognition benefits. Using both subtle and intense expressions, Bould and Morris (2008) demonstrated that the motion advantage for dynamic as opposed to multistatic displays was reduced for expressions of higher intensity (see also Kamachi et al., 2001, Experiment 2; Wehrle et al., 2000). When expressions are intense, it therefore seems that static faces are already strong carriers of emotional signals by corresponding to the shared stereotypes, leaving little scope for improvement through the provision of dynamic information.
In the case of lower intensity expressions it is worth considering how the provision of dynamic information helps perceivers to identify the emotion in question. Clearly, dynamics should enable perceivers to observe how expressions change over time. However, the role of motion extends beyond the mere detection of what has changed in the face. As demonstrated by Bould, Morris, and Wink (2008, Experiment 1), greater recognition benefits are afforded by dynamic moving sequences than by showing only the first (neutral) and final (peak) frame of an expression (but see Ambadar et al., 2005, Experiment 2, for contrasting results). The critical advantage seems to lie in the perception of the direction in which facial expressions change. This is supported by evidence showing that people are sensitive (even haptically; see Lederman et al., 2007) to temporal development and can accurately reproduce the temporal progression of a target person’s expression from a scrambled set of photographs (Edwards, 1998). Such adherence to temporal characteristics was found to be most apparent in the early stages of the expression (see also Leonard, Voeller, & Kuldau, 1991). By distorting the temporal direction, Cunningham and Wallraven (2009b, Experiments 3 & 4) demonstrated that the recognition of dynamic expressions significantly decreased when the order of frames was scrambled or reversed (played backwards). Thus, the dynamic advantage does not seem to be solely due to the presence of motion signals, but also arises from diagnostic information embedded in the temporal sequence of the expression.
Moreover, the quality of this embedded information plays an important role in the visual processing of facial expressions. Recent evidence suggests that linear motion animation, in which facial changes occur in a linear manner (as in morphing), results in slower and less accurate emotion recognition, as well as lower judgments of intensity, sincerity, naturalness, and typicality, by comparison with nonlinear (i.e., naturally deforming) facial motions of the same expressions (Cosker, Krumhuber, & Hilton, 2010; Wallraven et al., 2008, Experiment 1). Other studies have revealed that the speed with which the face moves significantly affects emotion identification. When speeding up or slowing down the velocity of dynamic change, observers’ performance and naturalness ratings varied in accordance with the type of emotion displayed (Bould et al., 2008, Experiment 2; Hill, Troje, & Johnston, 2005; Kamachi et al., 2001, Experiment 1; Sato & Yoshikawa, 2004). These findings therefore suggest that characteristics such as the direction, quality, and speed of motion are distinctive features of dynamic information that influence perceivers’ identification and discrimination of emotional expressions.
Effects on Emotion Judgments and Behavioral Responses
Apart from their beneficial role in emotion recognition, facial dynamics have been shown to contribute to various aspects of emotion judgments. For example, there is consistent evidence that dynamic expressions are perceived as more intense and realistic than static expressions (Biele & Grabowska, 2006; Cunningham & Wallraven, 2009a; Weyers, Mühlberger, Hefele, & Pauli, 2006). This perception of greater emotional intensity might reflect the fact that dynamic change implies a forward shift in the direction of the observed motion (commonly known as “representational momentum”). In a study by Yoshikawa and Sato (2008), participants perceived the final image of a dynamic sequence as being more intense than it objectively was. Moreover, as the velocity of facial movement increased, the perceptual image of the facial expression intensified. Facial dynamics may therefore lead to stronger emotional perceptions by inducing larger forward displacements in the apparent motion. Similar effects of dynamic expressions have been demonstrated through ratings of experienced and recognized emotional arousal (Sato, Fujimura, & Suzuki, 2008; Sato & Yoshikawa, 2007a).
In addition to enhanced judgments of intensity and arousal, observers are sensitive to dynamic information when judging the authenticity of an expression. For example, shorter durations (i.e., onset, offset) and more irregular onset actions have been found to be associated with judgments of politeness (rather than amusement), and lower genuineness and spontaneity in the case of smile expressions (Ambadar, Cohn, & Reed, 2009; Hess & Kleck, 1994; Krumhuber & Kappas, 2005). There is also supportive evidence for the impact of temporal dynamics on person ratings (Krumhuber, Manstead, & Kappas, 2007) and behavioral intentions and decisions of the observer. Specifically, children showed increased verbal responsiveness, and adults made more favorable employment decisions and more cooperative choices in response to smiles that had longer (compared to shorter) onset and offset durations (Bugental, 1986; Krumhuber, Manstead, Cosker, Marshall, & Rosin, 2009; Krumhuber, Manstead, Cosker et al., 2007).
Several studies have reported stronger and more frequent emotion-specific reactions to dynamic as opposed to static expressions (Sato et al., 2008; Sato & Yoshikawa, 2007b; Weyers et al., 2006). These imitative responses, interpretable as facial mimicry, occurred spontaneously and rapidly (Vinter, 1986) and were found to play a significant role in detecting the dynamic course of emotion facial expressions. For example, in a study by Niedenthal, Brauer, Halberstadt, and Innes-Ker (2001), participants who were prevented from mimicking took significantly longer to detect the point at which an emotional expression changed to a categorically different emotion (e.g., happiness changing into sadness, or vice versa), by comparison with when they were allowed to mimic. Other work has shown that spontaneous and deliberate smiles could be distinguished from each other on the basis of dynamic displays, but not static ones (Krumhuber & Manstead, 2009), and when participants could freely mimic the expressions (Maringer, Krumhuber, Fischer, & Niedenthal, 2011); when facial mimicry was blocked, perceivers’ ratings of the genuineness of smiles did not distinguish between those that were more or less authentic in their dynamic qualities. Facial mimicry may therefore help perceivers to detect the trajectory of dynamic displays and thereby facilitate the perception of the emotion in question. Together, these findings suggest that temporal dynamics convey unique information that is not only used for judging emotional expressions, but also drives behavior-specific responses and intentions in the perceiver.
Summary and Outlook
A key feature of facial behavior is its dynamic nature. Historically, facial expressions have been studied as slices of behavior, frozen in time, presented out of context, portrayed by encoders who did not actually experience the emotions they are depicting. We have reviewed empirical evidence concerning the role played by dynamic features in the perception of facial emotional behavior. The beneficial effect of dynamic information on recognition accuracy, in the sense that decoders identified specific emotional states with greater coherence, was shown to be particularly apparent for degraded and subtle expressive patterns, and occurred over and above the additional static information contained in moving displays. The direction, quality, and speed of motion emerged as important components of this dynamic information, with significant advantages afforded by expressions which retained their original temporal sequences. Dynamic displays were also shown to enhance emotional judgments (i.e., intensity and arousal), as well as influencing inferences about emotion authenticity, such as whether an expression appears to be genuine or fake. Finally, these responses were found to be facilitated by spontaneous facial mimicry by those perceiving dynamic expressions.
Together, the findings provide strong support for the influential value of facial movements in emotional expressions. As emotions unfold over time, dynamic displays of the temporal sequence are not only of higher ecological validity, but also evoke differential neural activation. In several neuroimaging studies higher brain activity occurred in regions associated with the processing of social- (superior temporal sulci) and emotion-relevant information (amygdalae) when viewing dynamic rather than static expressive faces. Moreover, enhanced activation has been observed in areas linked to perceiving motion (middle temporal gyri) and form-related aspects of faces (fusiform gyri), as well as cognitive processes in general (inferior frontal gyri; Kessler et al., 2011; Kilts, Egan, Gideon, Ely, & Hoffman, 2003; LaBar, Crupain, Voyvodic, & McCarthy, 2003; Sato, Kochiyama, Yoshikawa, Naito, & Matsumura, 2004; Schultz & Pilz, 2009; for a review see Arsalidou, Morris, & Taylor, 2011). This neuroscientific evidence points to a neural network that facilitates social interaction by helping us to understand others, identify their needs, and predict their actions. It is worth noting that the function of such a system may go well beyond a simple mirroring system that employs empathy as its key concept by blending “mind reading,” simulation, empathy, with projected behaviors that are also based on knowledge of others, the situational, and the social context. Here, much research is needed that addresses the complex reality of nonverbal behavior in context, rather than the identification of stereotypical static patterns of facial actions.
To advance scientific knowledge of the communicative and relational processes engaged by facial expressions, the systematic study of the role of dynamic information should be a focus in future research. This requires sophisticated approaches to the measurement and analysis of temporal aspects of facial displays, together with continuous measurements of self-report (e.g., Affect Rating Dial by Ruef & Levenson, 2007; Dynamic Decoding Device by Tcherkassof, Bollon, Dubois, Pansu, & Adam, 2007) that allow subjective ratings to be made over time, as well as physiological responses (see also Mauss & Robinson, 2009). As facial expressions commonly appear alongside other verbal and nonverbal cues (e.g., gaze, head orientation, gestures, speech), emotion perception needs to be considered as a process in which several dynamic acts are temporally integrated to produce meaning. It is these dynamic patterns of facial expressions that demand future attention and multilevel analysis (see Krumhuber & Scherer, 2011; With & Kaiser, 2011). The broader issue is what nonverbal behavior does in interaction (Kappas & Descoôteaux, 2003). Arguably, facial behavior can be understood as serving a variety of functions that include intra- and interpersonal emotional regulation (Butler, 2011; Kappas, 2011). From this perspective, the cohesion of subjective feeling state and expression may be often low because facial behavior is not a running commentary on what we feel, but also relates to conscious and unconscious attempts to influence what we feel, what others think we want, what we want others to feel, etcetera. In colloquial terms: facial behaviors do things to us and to others. We will fail to arrive at a proper understanding of what faces do if we continue to use static snapshots of faces as a paradigm for researching facial expressions.
With the emergence of technology in the field of affective computing, dynamic properties promise to be key factors in the automatic extraction and resynthesis of realistic human behavior. The first advances have already been made by incorporating dynamic data into the measurement and modeling of facial actions (e.g., Cosker, Krumhuber, & Hilton, 2011; Pantic & Patras, 2006; see also Calvo & D’Mello, 2010; Kappas, 2010). It falls to future research to make use of these techniques and to treat dynamic information as an integral part of facial behavior. Once the tradition of employing highly intense and prototypical static expressions has been overcome, we will be able to grasp the true nature and function of facial actions as they take place in everyday interactions.
