Abstract
Co-speech hand gestures offer a rich avenue for research into studying emotion communication because they serve as both prominent expressive bodily cues and an integral part of language. Despite such a strategic relevance, gesture-speech integration and interaction have received less research focus on its emotional function compared to its cognitive function. This review aims to shed light on the current state of the field regarding the interplay between co-speech hand gestures and emotions, focusing specifically on the role of gestures in expressing and understanding both others’ and one's own emotions. The article concludes by addressing current limitations in the field and proposing future directions for researchers investigating gesture-emotion interaction. Our goal is to provide a roadmap to researchers in their exploration of the role of gestures in emotions, ultimately contributing to a more comprehensive understanding of how gestures and emotions intersect.
Communicating emotions is complex and may sometimes require an orchestration of several multimodal cues. Speakers communicate emotions in speech through emotion-label words that refer to a particular affective state (e.g., sadness, anger), emotion-related words that refer to behaviors related to a particular emotion (e.g., crying, tantrum), and emotion-laden words that elicit emotions without referring to them directly (e.g., darling, idiot; Majid, 2012). These examples highlight that emotions can be communicated through categorical semantic information in speech. 1 Additionally, speakers can communicate the strength, duration, and frequency of their emotional experiences in speech by qualifying their descriptions with adverbs (e.g., I am very happy, she never cries; Athanasiadou, 2007). Yet, speakers also communicate emotional information through bodily channels of expression such as facial expressions, body posture, pitch accent, and gestures (de Gelder et al., 2015; Witkower & Tracy, 2019). These signals complement the spoken cues and provide additional layers of meaning and context for the expression and decoding of emotions (Kessous et al., 2010). For example, when we talk about a recent achievement, our speech is accompanied by a high-pitched tone of voice and a wide grin, vividly conveying the extent of our excitement and happiness. Furthermore, bodily cues alone possess the ability to convey emotions, independent of any accompanying speech (Witkower & Tracy, 2019). For instance, it is easy to recognize disgust when someone makes a gagging sound and grimaces at the sight of a dirty sock and waves their hand in front of their nose, representing the repelling smell. These examples demonstrate how both spoken and bodily cues, whether in concert or independently, influence the expression and perception of emotions.
Hand gestures play an important role in face-to-face communication, serving as a significant part of our repertoire of bodily cues. However, they are also an integral part of our language system. Many theories suggest that speech and gesture together constitute language by forming a composite and integrated signal during language production and comprehension (e.g., Clark, 1996; Kelly et al., 2010; McNeill, 1992). Gestures naturally intertwine with speech in everyday interactions to enhance the communicative experience for both speakers and listeners (Kendon, 2004; McNeill, 1992). In that sense, co-speech hand gestures offer a promising field of investigation in understanding the nuances of emotion communication as they are part of our bodily cues but also the language system. Despite that, emotions have not been extensively studied within the framework of hand gestures. Researchers in the gesture field have primarily focused on the interaction of gestures with language and thought processes (Kelly & Ngo Tran, 2023). Emotion researchers, on the other hand, have focused on hands less compared to other bodily cues, such as facial expressions (see de Gelder, 2009), and those who have explored hands did not treat gesture as part of the language system (e.g., Blythe et al., 2023; Dael et al., 2012; Ross & Flack, 2020).
In their recent review, Kelly and Ngo Tran (2023) highlighted the significance of uniting these two perspectives in investigating the interaction between hand gestures and emotions by reviewing relevant research from diverse fields, including psychology, affective sciences, linguistics, education, and human–computer interaction. Building on their work, in this current paper, we review recent psycholinguistic research that examines the interplay between gestures, as part of the linguistic system, and emotions to further emphasize the immense potential of studying gestures in deepening our understanding of both communication and experience of emotions. Moreover, and different from Kelly and Ngo Tran (2023), we acknowledge certain challenges that might have contributed to the neglect of emotions in the study of gestures and outline these as caveats for researchers. We conclude by discussing areas open to further investigation in the field. By addressing both challenges and promising avenues, this article aims to provide a roadmap for researchers interested in exploring this long-overlooked relationship.
Hands as Bodily Cues for Emotions
Research that examines the roles of individual body parts in emotion communication has identified hands as prominent diagnostic cues for emotional experiences (Dael et al., 2012; Wallbott, 1998). For instance, Dael et al. (2012) showed that hand and arm movements together populate under the two largest factors in the principal component analyses of their body expressions data, explaining approximately 24% of the total variance. Corroborating this finding, Ross and Flack (2020) removed hands (and arms) from stimuli of whole-body expressions and tested the effect of this manipulation on participants’ emotion recognition accuracy. When both hands and arms were removed from the images, participants’ recognition accuracy for all emotions (i.e., happiness, sadness, fear, and anger) decreased significantly, and the accuracy for expressions of fear and anger dropped significantly even when only the hands were removed. Interestingly, recognition accuracy of emotions was not affected when only arms were missing in the images (hence, the hands were floating in the air). Similarly, in an eye-tracking study, Calbi et al. (2021) showed that when looking at angry body postures compared to happy postures, participants directed both faster and higher proportions of fixations at hands than head region. Following these, in a very recent work, Blythe et al. (2023) showed that emotions can be reliably identified from images of isolated hand expressions. More importantly, they asked participants to do the same emotion identification task while viewing other isolated body parts as well (arms, head, and torso) and found that accuracy was significantly higher for the hands compared to the other body parts.
These findings demonstrate that hands are indeed relevant and important in the study of emotion communication as they can serve as a useful cue to interpret others’ emotions. This line of research, however, treats hands in isolation, relying on posed hand movements without accompanying speech. In everyday communication, we typically gesture spontaneously with our hands while speaking. These co-speech gestures form an integrated system with speech and together they constitute language (Kendon, 2004; McNeill, 1992). Although it is very helpful in illustrating the relevance of hands in emotion communication, confining studies of gesture-emotion interaction solely to the examination of isolated posed hand shapes would fail to capture the full potential of hand gestures. Next, we will specifically emphasize how analyzing gestures together with accompanying speech can offer a deeper insight into how we communicate emotions through our hands. However, we will first briefly define what a co-speech gesture is (and what it is not).
What is a Co-Speech Gesture?
In our daily lives, we use our hands for various functions, many of which can convey emotional information. For example, we use our hands to perform goal-directed actions on objects such as touching, grasping, or rotating. Previous research demonstrated that the emotional state of individuals can be discerned from the way they interact with objects through such manipulative actions (Gao et al., 2012; Ghosh et al., 2019; Niewiadomski et al., 2022). For instance, machine learning models can be trained to infer an individual's affective states from their typing and swiping actions on a smartphone by correlating features such as typing speed, touch pressure, and error rate with self-reported affective states (Ghosh et al., 2019). Touching and grasping can also be employed in social contexts to communicate affect (Hertenstein et al., 2006, 2009; Kirsch et al., 2018). Touch behaviors such as patting and stroking are commonly associated with feelings of sympathy, while hitting and squeezing are more often linked to anger (Hertenstein et al., 2006, 2009). Similarly, self-touch behaviors—scratching, tucking hair behind ear, fidgeting—are perceived as manifestations of a negative emotional state (Ekman & Friesen, 1969) and can be used to predict psychological distress (Lin et al., 2021). Although these manipulative actions and self-touch behaviors are integral bodily cues for examining emotional expressions (and are considered as gestures in fields like human–computer interaction and affective science), they are not the focus of the current article as they may not necessarily co-occur with speech and they are typically not performed with a communicative intent (except for social touch).
Emblems, on the other hand, are hand movements that symbolize unambiguous messages shared among members of a particular culture (Efron, 1941; Ekman & Friesen, 1969) and speakers intentionally use these gestures to communicate their conventionalized meaning. While there are exceptions, it is common for emblems to carry emotional connotations in a specific cultural environment. For example, in certain cultures, the “victory” gesture, formed by raising the index and middle fingers in a V shape, is associated with feelings of triumph, success, and positive emotions. On the other hand, the “middle-finger” gesture (i.e., extending the middle finger while keeping the other fingers folded) is used to express strong negative emotions such as contempt, or anger towards the intended recipient in many cultures. Due to the widespread understanding of the emotional meaning behind emblems within a community, their use can effectively disambiguate the emotional state of the individual making the gesture, thereby aiding emotion communication. Emblems are, indeed, shown to recruit attentional resources early on during stimulus processing, leading to rapid and efficient discrimination of emotional information (Flaisch et al., 2011; Redcay & Carlson, 2015).
While emblems are generally produced with a communicative intent, they can occur independently—without an accompanying speech—due to their standalone meanings in a cultural context. Co-speech gestures, on the other hand, naturally accompany speech and typically rely on spoken words for interpretation (Cartmill & Goldin-Meadow, 2016; Goldin-Meadow, 2003). Speakers use gestures to communicate information along with speech during language production and listeners analyze gestures for semantic information and benefit from gestures during language comprehension (Dargue et al., 2019; Hostetter, 2011).
Co-speech gestures can be classified as: iconic gestures that refer to concrete events, actions, or object attributes, metaphoric gestures that refer to abstract concepts and ideas, deictic gestures that indicate a direction, location, or object through pointing, and beat gestures that are rhythmic hand movements, which go along with the prosody of speech without any specific semantic meaning (McNeill, 1992). In this classification, iconic, metaphoric, and deictic gestures can be further grouped under referential gestures as they imagistically or indexically represent certain referents whereas beat gestures can be considered non-referential gestures as they have no particular referent. Apart from this classification, gestures serve different functions such as emphasizing certain parts of speech or providing additional and/or disambiguating information to the accompanying speech (Cartmill & Goldin-Meadow, 2016).
When people communicate, co-speech gestures and speech form an integrated system, allowing for a synchronized and cohesive communication experience. Intriguingly, this integrated system is not exclusively used to convey meaning oriented to listeners. Apart from communicative functions, gestures serve many speaker-oriented cognitive functions during thinking and speaking, such as facilitating lexical retrieval (Krauss et al., 2000), packaging complex information-to-be-expressed into small chunks (i.e., linguistic units such as words or phrases) that can be readily verbalizable, and for the activation, maintenance, and manipulation of information for thinking and speaking processes (Kita et al., 2017; Wesp et al., 2001). Thus, this paper focuses on the interaction of this integrated system with emotion communication as well as experience. From this point on, for brevity, we will refer to co-speech gestures as simply “gestures.”
Gestures in Emotion Communication and Experience
Recent research focusing specifically on the gestures that accompany emotional speech has highlighted the supplementary role of gestures in the communication of emotions. For instance, Rowbotham et al. (2012) showed that the gestures produced by participants while describing their recent pain experiences contained additional diagnostic information regarding the size and location of the pain that was absent in the accompanying speech. Thus, the inclusion of these gestures provided supplemental details that enriched the communication of the pain experience beyond what was expressed through verbal means alone. This finding highlights the significant potential of gestures in facilitating a more precise and effective communication of complex sensations and emotions that may be challenging to express solely through words.
Gestures benefit the listener during communication by enhancing memory encoding and learning (Aussems & Kita, 2019; Dargue & Sweller, 2020; Roth, 2001). Studies suggest that using gestures, as in the case of enactment, enhances the construction of new representations in multiple modalities and thus lead to better encoding and retention of information (e.g., Cook & Fenn, 2017; Cook et al., 2010; Sweller et al., 2023). In a study by Guilbert et al. (2021), both children and adults were presented with videos of a narrator describing emotional narratives with or without accompanying gestures. For children, observing gestures improved recall performance regardless of the emotional valence of the narrative or the specific referent of the gestures inside the narratives. However, for adults, gestures did not have a similar boosting effect (observing gestures describing the main event in the narrative even hindered recall). In another study by Levy and Kelly (2020), adult participants were presented with emotionally valenced sentences accompanied with gestures that convey additional information that is not present in the accompanying speech (e.g., a gesture illustrating playing bowling alongside the spoken sentence “My uncle played a perfect game”) or no gestures. Overall, the presence of gestures enhanced memory performance in a surprise recall task. However, for neutral and positively valenced sentences, additional information conveyed by the gestures interfered with memory, leading to false recall (e.g., falsely recalling the target sentence as “My uncle played a perfect bowling game”). In contrast, memory for negative sentences remained relatively unaffected. These findings suggest that the impact of gestures on memory can vary depending on the age of the listener as well as the emotional valence of the accompanying speech.
In addition, gestures also benefit the memory of the speaker as producing them during encoding of an information in speech enhances its subsequent recall (Bharadwaj et al., 2022; Sweller et al., 2023). In the context of emotion communication, a similar enhancement could lead to heightened saliency of the encoded affective information for the speaker. In a recent study, L1-Turkish and L2-English speakers were asked to retell positive and negative emotional narratives immediately after they read them (Özder et al., 2023). Regardless of the narrative's valence and language, the frequency of referential gestures produced by participants during retellings correlated positively with participants’ subsequent intensity ratings for the emotional narratives. This indicates that participants who produced more gestures while recounting the emotional narratives tended to rate their emotional experience as more intense. Notably, when only emotional phrases in the retelling were analyzed, referential gesture production was associated with increased ratings of emotional intensity only for L2 speakers. This finding further hints at a facilitatory role of specifically referential gestures in the encoding of emotional information, particularly when speech is deprived of emotionality (as past research has demonstrated that L2 emotional words and phrases carry less emotional resonance than their L1 counterparts; Caldwell-Harris, 2015; Harris et al., 2003; Pavlenko, 2008).
In the same study, participants also used more referential gestures concurrently with negative emotional phrases compared to positive emotional phrases (Özder et al., 2023). This finding underscores the modulatory role of emotional valence in the gesture use during emotion communication, specifically by highlighting the potential of negative emotions to stimulate more referential gesture production compared to positive emotions. However, this study cannot address the causal relationship between the intensity of negative emotions felt and the amount of gestures produced, as participants were not randomly assigned to produce more or fewer gestures or to feel more or less negative emotions. Consequently, it is not clear whether more intense negative emotions cause more gestures or if more gestures cause more intense negative emotions. Corroborating this finding, Rowbotham et al. (2014) demonstrated that the number of gestures produced during pain communication was associated with the severity of pain experienced by participants. Furthermore, this study also hints at a potential directionality for the relationship between gesture production and intensity of negative emotions felt as participants produced higher number of gestures (both overall and referential gestures) while talking about their pain when they were subjected to greater experimental pain induction. Thus, while the preliminary findings point at a potential directionality, additional research is necessary to elucidate the precise direction of the causal relationship between gesture production and perception of emotional intensity.
Another function of gestures is to decrease the cognitive load of the speaker by offloading internal representations to an external space (Pouw et al., 2014). Gestures represent meaning in the visual-spatial medium of expression. Thus, gestures provide a useful tool as a physical entity in space to project internal representations to the exterior domain by providing visual feedback to the speaker. For example, when solving mental rotation problems, people might use their hands to represent the figures and make rotation gestures (Chu & Kita, 2008). Similarly, previous research has shown that individuals project their emotional states onto the perceptual environment via gestures based on their pre-existing affective-spatial conceptual mappings. For instance, Casasanto and Jasmin (2010) conducted a study on right- or left-handed politicians and found that they gestured more with their dominant hand during positive speeches and their non-dominant hand during negative speeches. In contrast, Kipp and Martin (2009) investigated the gestures of two right-handed stage actors and showed that the actors used their dominant hand (right-hand) to express negative emotions, such as hostility, and their non-dominant hand (left-hand) to express more positive emotions. While the exact function of affective-spatial mappings in gestural space for speakers remains to be explored, studies on comprehension suggest that such mappings are expected and tracked by listeners for information collection purposes. For example, in a study conducted by Çatak et al. (2018), participants focused more on the actor's left-hand gestures (i.e., the participant's right visual space) when the actor was describing a positive narrative, and vice versa for a negative narrative.
Although limited in quantity, these studies provide insights into how gestures are used to express emotions (e.g., Casasanto & Jasmin, 2010; Kipp & Martin, 2009; Rowbotham et al., 2012, 2014), contribute to the comprehension and recall of the content shared in emotion communication (e.g., Guilbert et al., 2021; Levy & Kelly, 2020) and modulate the emotional arousal experienced by the speaker (Özder et al., 2023). These studies highlight the intricate dynamics involved in conveying and interpreting emotions through gestures both for the listener and the speaker, allowing for a more comprehensive exploration of the multifaceted nature of emotion communication.
Studying Emotions Through Gestures is Challenging
Although the studies reviewed in the previous section are promising in showcasing the significance of studying gestures to gain better insight into emotion communication and experience, there might be a reason why their quantity is limited, and they have relatively recently garnered researchers’ attention. In this section, we will cover three such reasons we think might have contributed to the oversight of emotions in the psycholinguistic study of gestures. Studying emotions through gestures is indeed challenging but as the recent studies reviewed in the previous section demonstrated, it is certainly not impossible.
Gestures Excel at Conveying Spatial Information, but Emotions are not Typically Communicated Spatially
Gestures are visual cues and realized in the visual-spatial modality. Given their medium of expression, they are particularly adept at expressing visual and spatial information (Alibali, 2005). Indeed, theories and accounts regarding gesture use and comprehension mainly focus on the spatial context, and hence iconic gestures that refer to physical characteristics of concrete events and objects (de Ruiter, 2000; Hostetter & Alibali, 2008; Kita, 2000; Kita & Özyürek, 2003; Kita et al., 2017; Krauss et al., 2000; Pouw et al., 2014). A recent comprehensive framework (Gesture for Conceptualization Hypothesis, Kita et al., 2017) suggests that gestures are generated from the cognitive processes that generate practical goal-directed actions and they help speakers and listeners activate, maintain, package, and explore visual-spatial and motoric information. In line with this account, many studies suggest that speakers use and benefit more from using gestures when communicating spatial than non-spatial information (e.g., Alibali et al., 2001; Arslan & Göksun, 2021; Feyereisen & Havard, 1999; Hostetter, 2011; So et al., 2015). This places emotions at a disadvantage when it comes to communicating through gestures as affective information is not easily conceptualized and represented spatially. For example, it is challenging to represent “happiness” spatially through a referential gesture or by pointing to a referent in one's physical or conceptual environment.
People, however, still employ gestures when talking about non-spatial, abstract concepts (Kita et al., 2017). That is, the account applies to metaphoric gestures that refer to abstract concepts through links with the spatial-motoric mappings in the concrete domain (Cienki & Müller, 2008; Kita et al., 2017). According to the Conceptual Metaphor Theory (Lakoff & Johnson, 1980), most of the metaphors we use in language map abstract concepts to concrete domains, such as space, by drawing upon our physical bodily interactions and experiences with the environment. Gestures, particularly metaphoric gestures that refer to abstract concepts, are conceptual mapping tools and express abstract concepts by visual-spatial depictions (Williams, 2008). For example, gestures express the flow of time by representing it as a movement in space (Casasanto & Jasmin, 2012; Gu et al., 2019), having an idea by cupping hands as if to hold an idea (Kita et al., 2007), inflation rates by placing the hand higher in the space, and number magnitudes with the spatial location of the hands (Alibali & Nathan, 2012).
At times, we also communicate emotions abstractly by employing metaphors that map emotions into spatial (e.g., “His mood sank,” “She stands tall with pride”) or other concrete domains (e.g., “I’m on an emotional rollercoaster,” “He is cold as ice,” “Love is a battlefield”). As reviewed previously, evidence demonstrates the spatial mapping of abstract emotions by associating right versus left hand gestures with positive or negative emotions (Casasanto & Jasmin, 2010; Kipp & Martin, 2009). However, the findings of these studies are mixed, regarding the mapping of the valence of information conveyed with the right- versus left-hand gestures. In addition, the results are often explained by handedness rather than metaphors such as “Right is good.” Thus, more research needs to be conducted to better understand how emotion metaphors, using different spatial mappings (e.g., “Happiness is up,” “Anger is a fluid in a container”; Lakoff & Johnson, 1980; Lakoff & Kövecses, 1987) may be communicated in the gestural space.
Other times, physically experienced affective states can be expressed through the kinematic characteristics of the gestures. For example, the rate of speed with which someone produces their beat gestures while discussing a joyful experience they had could convey the intensity of their feelings. A faster rate of beat gestures may signal heightened excitement and even recency of the experience, while a slower rate might indicate a more subdued emotional state. Similarly, two individuals using the same iconic gesture to express anger may produce their gestures in different gestural spaces (e.g., at the center versus periphery), conveying the saliency of the emotional state for them at that moment. Alternatively, while sharing her experience of running to catch a bus, our friend may solely produce a running gesture by bending her fingers and moving them in an alternating forward, mimicking the action of running legs. The speed at which she moves her fingers could signal the level of anxiety she felt about potentially missing the bus.
These examples suggest that together with the type of gesture, the spatiotemporal characteristics of gestures can also provide valuable cues for a more precise communication of emotions. In a recent study, Asalıoğlu and Göksun (2023) investigated this possibility. Their results showed that when emotional narratives were accompanied by iconic gestures, participants rated the emotional intensity higher compared to narratives accompanied by beat gestures. Furthermore, narratives described with narrower gestures received higher emotional intensity ratings compared to those described with wider gestures, particularly when participants were exposed to different types and sizes of gestures in all trials. These results highlight the modulatory roles of both gesture type and size in shaping the perceived emotional intensity of communicated narratives.
The significance of spatiotemporal characteristics of gestures in emotion communication was also shown in studies that examined the kinematic cues of hand movements. Although without taking spoken information into account, these studies found that movements that were high in velocity and acceleration were more likely to be classified as high arousal emotions (Dael et al., 2013; Glowinski et al., 2011; Pollick et al., 2001). Additionally, same studies showed that jerkiness of the movement (i.e., movement discontinuity/disfluency) was associated with arousal level of the expressed emotion such that jerkier hand movements were more likely to be classified as displaying high arousal emotions (but see Dael et al., 2013). Similarly, Dael et al. (2013) investigated various other spatiotemporal characteristics of hand (and arm) movements such as the amount of movement, physical effort (force), size and height, and found that most of these movement dynamics were also related specifically with the arousal level of the emotion.
Thus, emotion metaphors, linked to spatial domains, offer a fruitful venue for investigation through metaphoric gestures. Additionally, individuals produce various types of gestures to communicate and conceal their affective states through spatiotemporal aspects of their gesture production (e.g., speed, size). This, however, introduces another methodological challenge to the field which necessitates a collective effort to tackle.
The Coding Schemes for Gestures Require a Revamp for Better Capturing Affective Information
The findings reviewed above suggest that a considerable amount of affective information might be embedded within the spatiotemporal features of hand movements, encompassing elements such as speed, size, shape, direction of motion, motion fluency, and place in the gestural space. This calls for a coding scheme that incorporates these features in its analysis to account for the diagnostic information that can only be identified by examining the kinematic features of gestures. However, in the most widely used gesture classification system in psycholinguistic literature (McNeill, 1992), gestures are only categorized into different types based on their specific referents in speech (e.g., iconic, metaphoric, and deictic gestures) or the absence of such referents (e.g., beat gestures), largely ignoring the spatiotemporal features of gestures. While coding schemes incorporating such spatiotemporal characteristics do exist in fields like affective science and human–computer interaction, most of them are not specific to hand gestures (for review, see Witkower & Tracy, 2019) and those specifically designed for hand gestures do not incorporate speech (e.g., Dael et al., 2013). Kipp and Martin (2009) proposed a coding scheme that is specific to hand gestures and also incorporates speech. However, in this coding scheme, the semantic link between gesture and speech co-occurrences (i.e., the referent of the gesture within the speech, if any) was not analyzed semantically. The role of speech in the coding scheme was more to categorize emotions, which was annotated at the utterance level. In addition, even though these coding schemes exist, none of them seem to capture all spatiotemporal characteristics of gestures comprehensively. Thus, moving forward, the field should endeavor to create a unified coding scheme that accounts for both the semantic and kinematic features of gestures to facilitate a more nuanced investigation of emotional gestures. The establishment of a single comprehensive coding scheme is also vital for the advancement of the field in terms of allowing for the comparability and replicability of the research findings.
Determining the Appropriate Level of Analysis for Examining the Semantic and Temporal Relationship Between Gesture and Speech in Emotional Contexts is Challenging
Another challenge in studying the interaction between emotional gestures and speech is determining the appropriate level of analysis for gesture-speech combinations. Level of analysis might include examining gestures in the context of entire narratives, isolated sentences, or specific types of emotional content, such as explicit emotion words. Investigating specific emotional words (e.g., I am angry) or phrases (e.g., I am in love) used by individuals during emotion communication is an obvious starting point for exploring the interaction between gestures and emotional speech (e.g., Guilbert et al., 2021; Özder et al., 2023). However, the challenge lies in recognizing that affective information may not always be conveyed solely within the gesture accompanying the respective emotion word. For instance, let's revisit our friend's experience of running to catch the bus: while recounting her experience, she might say, “I had to run to catch the bus. I got very anxious,” while simultaneously making the rapid two-finger running gesture with the word “run.” In such a case, the emotional content (“I got very anxious”) does not align with the gesture, which rather co-occurs with the action verb (“run”). Therefore, researchers focusing solely on gestures accompanying the emotional words in their analyses might overlook the broader context in which gestures occur and miss important information that details the speaker’s emotional experience. This highlights the importance of analyzing gestures in a broader speech context, rather than isolating them to specific emotional words or phrases. Moreover, individuals do not necessarily have to rely on overt emotional language when expressing their emotional experiences. In fact, it is possible for the entire content of speech to be emotionally charged, even without the explicit use of any individual emotional phrases (e.g., “I stared at the door for hours”). In such cases, determining the appropriate level of analysis for investigating the semantic integration of gestures and speech becomes an intriguing methodological and theoretical question that requires researchers to operationalize their approach guided by their research questions.
Within a defined level of analysis, it would be further intriguing to explore the temporal relationship between gestures and emotional speech. Examining the timing of the onset of the gesture stroke in relation to its semantic affiliate during emotional speech (e.g., an emotional phrase) and comparing it to the temporal synchrony in non-emotional gesture-speech combinations would offer valuable insights into whether emotionality introduces any temporal advantages or hindrances to gesture-speech integration. Additionally, examining how these effects are modulated by the valence and intensity of the emotion would further enhance our understanding of the temporal dynamics between gestures and emotional speech. Previous studies have demonstrated the predictive potential of referential gestures by highlighting that gestures often precede their corresponding lexical affiliates in speech (Church et al., 2014; Graziano et al., 2020; ter Bekke et al., 2020). Expanding our investigation into the temporal dynamics of gesture-speech combinations within emotional contexts can help determine if this predictive potential extends to emotional expressions as well. Moreover, integrating these analyses with physiological measures can contribute to our understanding of how gesture onset and duration align not only with speech but also with the somatic markers of the affective sensory experience itself.
Although the potential for emotions to interact with speech and gesture across various levels of analysis poses a methodological challenge, it also signifies new directions for the field to explore. Further research is necessary to comprehend the intricacies of how gesture and speech coordinate both semantically and temporally to convey emotional content.
Future Directions
The already-existing body of work reviewed in this article underscores the important role of gestures in facilitating effective and accurate emotion processing during communication both for the listener and the speaker. Additionally, this research emphasizes that gestures transcend their role as mere bodily cues that provide categorical information about emotions. They possess the capacity to enhance emotional expressivity and communicate extra-linguistic information by modulating, for example, the perceived intensity of emotional expressions of others’ as well as our own emotional experiences (e.g., Asalıoğlu & Göksun, 2023; Özder et al., 2023). While these preliminary findings are promising, there is still a lot to uncover regarding the relationship between speech and gesture within the context of emotion communication and experience.
One important area of research that is open to further examination is the sources of variation in how and to what extent hand gestures play a role in the communication of emotions. The way individuals employ and benefit from gestures during communication and thinking varies and interacts with their cognitive dispositions (see Özer & Göksun, 2020 for review). Earlier research suggests that individuals might employ gestures as an alternative compensatory tool in cases of compromised verbal and visual-spatial abilities (Chu et al., 2014; Gillespie et al., 2014; Marstaller & Burianová, 2013; Özer et al., 2019). Likewise, in the case of emotion communication, gestures might be particularly important for people with suboptimal emotional abilities. For example, gestures might provide an important compensatory information source in certain neuropsychological cases that exhibit problems in facial emotional expressions and recognition, such as schizophrenia (Chan et al., 2010), Alzheimer's disease (Weiss et al., 2008), and autism spectrum disorder (Briot et al., 2021). For example, Kret et al. (2017) examined how high and low socially anxious individuals attend to different body parts as information sources for emotion identification. High socially anxious individuals avoid their interlocutors’ eye gaze and perceive emotions by gazing at the hands instead of the head more than low anxious participants. These findings suggest that hands could provide an alternative information source for socially anxious individuals for the information missed from the facial cues. Future research should examine the extent of the compensatory role of emotional gestures in neuropsychological cases in which encoding and decoding emotions is deteriorated.
Emotional gestures might also provide a compensatory tool for individuals with non-optimal linguistic skills to decode and express emotions, as suggested by gesture-as-compensation-tool framework (Özer & Göksun, 2020). For example, emotional speech goes through a protracted developmental process. Children who do not have fully developed language skills for emotion communication might have limited understanding of emotion in speech compared to adults (Morton & Trehub, 2001) and have limited vocabulary to express and regulate their emotional state at a particular moment (Konishi et al., 2018). In such a case, hand gestures could provide a reliable alternative channel of information for children to express and identify emotions. For instance, previous research has shown that both preverbal infants and verbal toddlers (between 11 and 28 months of age) rely on more frequent and varied gestures than words as emotion regulation strategies, such as initiating a coping routine by asking a caregiver to read a book (Konishi et al., 2018). Importantly, the reliance on gestures has been shown to increase during heightened distress episodes when toddlers struggle to find their words, demonstrating the compensatory role of gestures when verbal skills are compromised. Future studies can examine a similar regulatory role of hand gestures in emotion communication in adults in situations where verbal means of accessing such strategies are limited. One example of such a context in which one's verbal skills are suboptimal is communication in a non-native language. Preliminary findings suggest that emotional speech in a second language is accompanied by more (referential) gestures compared to native speech, alluding to the existence of a similar compensatory mechanism in adults when verbal skills are limited (Özder et al., 2023).
Furthermore, cross-linguistic investigations of gesture use can provide an opportunity to explore potential variations in emotion expression and comprehension, especially across languages where emotion communication is constrained in diverse ways. For example, in Japanese, when talking about the mental states of other people, including their emotions, it is not appropriate to use psychological predicates (e.g., feel, want, think) without using a term that emphasizes the evidential nature of the information (e.g., looks like, seems as if; Hasegawa & Hirose, 2005; Majid, 2012). To compensate for the lack of direct linguistic expressions of emotions, Japanese speakers might rely more heavily on non-verbal cues, such as gestures, to convey emotional information. In addition, in a recent study, Jackson et al. (2019) analyzed the colexification structures of emotion vocabularies (i.e., the extent to which two or more emotion concepts are represented by the same word in a language) across nearly 2500 languages, revealing significant variations between languages in how words, which are often assumed to be translation equivalents, are used to label emotions. These findings raise an intriguing question for the field of gesture-emotion research: Do gesture patterns align with the cross-linguistic variance in emotion labels during communication, or do they exhibit more universal features? By examining these questions, we can not only learn more about the cross-linguistic similarities and differences in gesture use but also gain a deeper understanding of the nature of gesture-speech integration in emotion communication.
Overall, emotional gestural space can introduce an additional dimension of consideration to the emotion theories concerning emotion-language interaction. For example, constructivist theory of emotion proposes that emotions are constructed in light of the available contextual information (Barrett, 2006; Barrett et al., 2011). According to this theory, language provides a rich context that facilitates the employment of available and appropriate conceptual knowledge necessary for categorizing sensory affective experiences (Lindquist et al., 2015). To test this hypothesis, empirical studies have manipulated the accessibility of emotion labels and demonstrated that increasing language accessibility enhances both emotion perception and experience, while reducing it impairs them (Lindquist et al., 2015). In parallel, gestures, as part of our language, may also contribute to this meaning-making process by conveying an additional layer of information that describes the internal variations within a specific category of emotion, encompassing aspects such as its intensity, saliency, and duration. For example, while labeling our affective state as “fear” can help us solidify and put our emotional experience in context, the features of the gestures used to communicate this experience can further specify the affective state, signaling, for example, its level of intensity. Therefore, in alignment with research in the language-emotion field, exploring the impact of manipulating gesture accessibility on emotion perception and experience can be insightful. This can be studied in production by restricting or encouraging gesturing, and in comprehension by assessing how people infer emotions from video clips with different levels of gesture involvement. Considering the crucial role of gestures as a fundamental component of language, gaining a deeper understanding of their specific function in emotion communication can enhance our understanding of how language operates as a “context” during emotion construction.
Conclusion
Emotion communication and experience are multifaceted, often requiring the synchronous employment and interpretation of various spoken and/or bodily cues. Among these cues, co-speech hand gestures have long been overlooked despite their significant potential in expressing and comprehending emotions. In this review, we sought to highlight the impact that gestures have on emotion communication and experience. Recent studies have already recognized the unique role of gestures as an integral aspect of emotional language production and comprehension. We provided specific challenges and possible solutions to study the interaction between gesture and emotions. As future research continues to explore this interaction between gesture and emotions, we can advance our understanding of how gestures enrich the depth and meaning of emotions.
Footnotes
Acknowledgments
Tilbe Göksun is supported by the James S. McDonnell Foundation Human Cognition Scholar Award (Grant no:
). The authors thank the members of the Language and Cognition Lab at Koç University for their valuable discussions, which greatly contributed to the enhancement of the ideas discussed in this paper.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the James S. McDonnell Foundation Human Cognition Scholar Award.
