Abstract
Successful social interactions depend on the ability to quickly evaluate emotional facial expressions. Research has shown that head orientation and eye gaze are informative affective signals. Across four experiments, we explored a novel eye-gaze cue grounded in a consideration of English spatial metaphors, where up connotes positive feelings (“I’m flying high”) and down connotes negative feelings (“I’m feeling low”). Participants either rated the valence of or categorised a set of sad and happy faces gazing in different directions along the vertical axis. We expected to find a spatial–valence congruency effect, where valence ratings and reaction times would be moderated by whether or not the face was gazing in a metaphor-consistent direction. The results partially supported this hypothesis: sad faces gazing upwards (as opposed to downwards) were rated as happier or more positive (Experiments 1 and 2) and classified slower (Experiments 3 and 4). This was true whether the looking direction was cued by eye gaze in front-view faces (Experiment 1) or by the orientation of profile faces (Experiments 2–4). In addition, this spatial–valence congruency effect was only reliable in the environmental frame of reference (Experiment 4). We found little evidence for a comparable effect of gaze direction on judgements of happy faces, suggesting that eye gaze along the vertical axis may differentially affect judgements of approach and avoidance-related emotional expressions. This has implications for the inferences scholars draw about underlying cognitive representations from observations of conventional metaphorical language.
Keywords
The ability to quickly infer the emotional states of others based on a quick glance at the face plays an important role in daily life. Deficits in emotional expression identification have been linked to a variety of pathologies associated with impaired social functioning, including autism spectrum disorders (Celani et al., 1999), schizophrenia (Kohler et al., 2003), and alcoholism (Kornreich et al., 2001). As a result, researchers have worked to uncover the cognitive, developmental, and neural processes that support emotional face processing (e.g., Adolphs, 2002; Batty & Taylor, 2006; Ekman et al., 1987; Nook et al., 2015).
One important line of research is concerned with identifying cues that facilitate, moderate, or impair how people evaluate emotion in faces. In addition to examining evolutionary, cultural, and contextual influences on emotional face processing (e.g., Ekman et al., 1987; Fridlund, 2014; Jack et al., 2012; Russell, 1994; Wieser & Brosch, 2012), many studies have focused on the effects of spatial signalling with the head and eyes (e.g., Adams & Kleck, 2003, 2005; Bayliss et al., 2007; Ganel et al., 2005; Hess et al., 2007; Mignault & Chaudhuri, 2003; Rigato & Farroni, 2013; Sander et al., 2007). For instance, a bowed head makes an otherwise neutral face appear sadder and more submissive, expressing “inferiority” emotions like shame and embarrassment, while an upturned head makes the same face appear happier and more dominant, expressing “superiority” emotions like contempt and pride (Mignault & Chaudhuri, 2003). These effects are thought to reflect conserved communicative behaviours in mammalian evolutionary history (see also Darwin, 1872).
Head tilts in the horizontal plane also modulate emotion perception: angry faces are perceived more accurately when they are directly facing the observer, while fearful faces elicit more negative affect when they are oriented or looking towards the left or right, away from the observer (Hess et al., 2007). Similar effects occur when eye gaze alone signals where the face is looking (Adams & Kleck, 2003, 2005; Sander et al., 2007). These studies reveal that, in general, direct gaze (towards the observer) enhances the perception of approach-oriented emotions like joy and anger, while indirect or averted gaze (away from the observer) enhances the perception of avoidance-related emotions like sadness and fear. This dovetails with research showing that people have difficulty ignoring gaze direction when processing facial expressions, which may be a result of the natural tendency to track where and what others are attending to (Ganel et al., 2005; Rigato and Farroni, 2013). Taken together, these findings highlight the fact that emotional facial expressions are communicative tools, and that head orientation and eye gaze can serve as informative signals.
Verbal communication about emotions also involves spatial signalling, as people often use spatial metaphors to talk about how they are feeling (Lakoff & Johnson, 1980). In English, for example, talk of emotional valence is organised along a vertical axis, where a higher spatial position connotes happiness and positivity, and a lower spatial position connotes sadness and negativity. This is evident when we say things like, “She sank into a deep depression and is feeling quite low; we need to give her a lift and raise her spirits until she’s flying high.” This mapping between space and valence may have its origins in everyday experience: sad feelings are frequently accompanied by a drooping posture, bowed head, and drowsiness, while happy feelings are associated with a more alert and erect bearing and an upturned mouth. We may observe these spatial–valence relationships in ourselves or in our companions (though it is rare to see people literally jumping for joy).
Research into “spatial–valence congruency effects” has shown that this association between space and valence goes beyond language: we seem sensitive to these correlations in experience and incorporate them into semantic representations (cf. Pitt & Casasanto, 2020). Evidence for this comes from experiments showing people automatically activate metaphorically congruent spatial representations while processing valenced stimuli (and vice versa; e.g., Barber & Reimer, 2021; Brookshire et al., 2010; Casasanto & Dijkstra, 2010; Crawford et al., 2006; Lynott & Coventry, 2014; Meier & Robinson, 2004). In one study, for example, participants were faster and more likely to retrieve positive memories while making concurrent motor movements upwards, and faster and more likely to retrieve negative memories while making concurrent motor movements downwards (Casasanto & Dijkstra, 2010). In another experiment, participants were faster to identify positive words like hero when they appeared at the top of the screen than when they appeared at the bottom of the screen, while the reverse was true for negative words like liar (Meier & Robinson, 2004). Lynott and Coventry (2014) extended these findings using non-linguistic stimuli: in their study, participants responded faster to happy faces that appeared at the top, rather than the bottom, of the screen. These experiments provide support for the view that people use vertical spatial representations to organise the concept of emotional valence (though there is some debate about the nature of these underlying representations and whether they are truly “metaphorical.” See, e.g., Dolscheid & Casasanto, 2015; Lakens, 2012; Lynott and Coventry, 2014; Pecher et al., 2010. We return to this issue in the “General discussion” section).
Notice, though, that the vertical position of a stimulus on a screen is only one method among many for representing or drawing attention to the structure of space (Carlson-Radvansky & Irwin, 1993). Even within the vertical dimension, up and down can be cued both by stimulus position—as in the studies described in the previous paragraph—and by stimulus orientation. For example, rather than presenting a face at the top of a computer screen, attention could be drawn upwards by presenting a centrally located face looking or gazing upwards. Even basic spatial relations like “upright” and “orientation” must be defined with respect to a particular frame of reference. Objects that are upright with respect to a computer screen (environmental frame of reference) would appear upside-down to a person standing on their head (egocentric frame of reference). In other words, spatial relationships are multifaceted. A fuller consideration of this complexity can help us to understand precisely how people use space to represent valence, as well as how spatial signalling might affect how people process and perceive emotional facial expressions.
These observations raise the possibility that eye gaze might bear a more complex relationship to emotional face processing than previous studies would suggest. Because people appear to represent emotional valence along a vertical dimension, we reasoned that an averted gaze might have differential effects, depending on whether that gaze is directed upwards or downwards. Specifically, we hypothesised that upward-gazing faces would be perceived as relatively happier while downward-gazing faces would be perceived as relatively sadder, even when holding facial expression constant. We investigated this vertical orientation-based spatial–valence congruency effect across four experiments.
In Experiment 1, participants rated the perceived happiness levels of a series of front-view faces. The faces depicted stereotypically happy or sad expressions and the eyes were directed either upwards, downwards, or directly towards the observer. In Experiment 2, participants completed the same task, only this time the stimuli consisted of profile view faces rotated so that they looked upwards towards the top of the screen, downwards towards the bottom of the screen, or sideward towards the left or right sides of the screen. This helped ensure that subtle differences in facial expressions resulting from shifting eyes up or down in our front-view face stimuli were not driving our results. We included neutral faces as part of the stimulus set in Experiment 2 to explore the generalisability of the orientation-based spatial–valence congruency effects.
In Experiment 3, we used a speeded reaction time task as a more implicit measure of emotional face processing. Participants viewed the same rotated images of happy and sad profile faces from Experiment 2 and had to identify the emotion depicted in each face as quickly and as accurately as possible. While this task requires an explicit judgement, small differences in reaction times provide a more implicit measure of underlying representations. We expected reaction times to be moderated by whether or not the face was gazing in a metaphor-consistent direction. Finally, in Experiment 4, participants completed the same classification task while lying on their sides, thereby disassociating environmental and egocentric reference frames. In everyday experience, environmental and egocentric reference frames are highly correlated: most of the time we see faces that are upright in the world (environmental reference frame) while we sit or stand in an upright position (egocentric reference frame). The design of Experiment 4 allowed us to examine whether the effects of eye gaze on emotional face processing are more strongly tied to the reference frame of the world (environmental) or the individual (egocentric; see Davidenko & Flusberg, 2012).
All experiments received ethical approval from the first author’s Institutional Review Board. Data and materials for all experiments can be found on the Open Science Framework: https://osf.io/6br5s/.
Experiment 1
Method
Participants
We recruited 80 participants (33% female) via Amazon’s Mechanical Turk crowdsourcing website (Berinsky et al., 2012; Buhrmester et al., 2016), using the TurkPrime platform (now called CloudResearch; Litman et al., 2017). Similar within-subjects studies of emotional face perception frequently include significantly smaller sample sizes (N = 20–40 in Ganel et al., 2005; N = 31 in Mignault & Chaudhuri, 2003). Since we did not have a strong prediction about effect sizes we might observe in our studies, we aimed to recruit 80–100 participants in each of our four experiments. Based on our final sample sizes, we had 80% power to detect effect sizes of .06, .03, .04, and .1 in Experiments 1–4, respectively. We only sampled people who were at least 18 years old, living in the United States, and who had an excellent performance record on previous tasks (⩾90% approval). The average age was 32 (SD = 8.9). No participants were excluded from analysis.
Stimuli
Face stimuli consisted of 36 digital photographs (600 × 800 pixels) depicting front-view faces of six White individuals (3 men, 3 women), taken using the camera on an iPhone 6 smartphone. The face models included five faculty colleagues of the first author and the first author himself; the former were naïve as to the nature and design of the experiment. All photographs were framed from the shoulders up against an off-white backdrop.
Each model was photographed making six different facial expressions: (1) sad face gazing directly at the camera, (2) sad face with the eyes oriented upwards, (3) sad face with the eyes oriented downwards, (4) happy face staring directly at the camera, (5) happy face with the eyes oriented upwards, and (6) happy face with the eyes oriented downwards. This yielded a total of 36 face images (6 models × 2 emotions × 3 eye orientations). To capture the face images, the models were initially instructed to make a specific emotional display (e.g., sadness) while gazing directly at the camera. They were then told to hold their face as still as possible and only move their eyes upwards or downwards, in an effort to keep everything else about their expression constant besides the orientation of the eyes; see Figure 1. 1

Sample stimuli from Experiment 1.
Procedure
The experiment was created using Qualtrics online survey software. Participants were told that we were interested in how people perceive images of others and that they would be tasked with rating the level of emotion displayed in a series of faces. They were instructed to work as quickly and as accurately as possible and to not overthink things. We requested that their ratings reflect their first impression of each face.
Participants were then presented with each of the 36 face images one at a time in a randomised order. On each trial, instructions appeared at the top of the screen indicating that participants should use the slider bar, which appeared below the face image, to rate how happy they perceived the individual to be, on a scale from 0 (extremely unhappy) to 100 (extremely happy). The bar was initialised on 50/100. After rating all the images, participants completed a basic demographics questionnaire.
Results
We calculated mean happiness ratings for each combination of emotion and eye orientation for each participant, averaging across the six individual face models (see Table 1 and Figure 2). A 2(Emotion: happy, sad) × 3 (Orientation: upwards, direct, or downwards gaze) repeated-measures ANOVA revealed a significant main effect of Emotion, F(1, 79) = 1,319.45, p < .001, ηp2 = .94, as well as Orientation, F(2, 158) = 60.63, p < .001, ηp2 = .43. 2 Participants rated the happy faces as significantly happier than the sad faces, t(79) = 36.3, p < .001, d = 4.06. While directly gazing faces did not differ from upward-gazing faces, t(79) = 0.76, p = .45, d = 0.09, downward-gazing faces were rated as less happy than both upward, t(79) = 8.25, p < .001, d = 0.92, and directly, t(79) = 9.66, p < .001, d = 1.08, gazing faces, partially supporting our initial hypothesis.
Mean happiness face ratings (and SDs) in Experiment 1 by emotional expression (happy, sad) and eye-gaze orientation (upward, direct, downward).

Mean happiness ratings for happy and sad faces gazing upwards, directly towards the observer, and downwards in Experiment 1.
The model also revealed a significant interaction between Emotion and Orientation, F(2, 158) = 30.28, p < .001, ηp2 = .28. Therefore, we ran two additional repeated-measures analyses of variance (ANOVAs) to test for an effect of the orientation manipulation separately by emotional expression. The results revealed a significant main effect of orientation for both happy faces, F(2, 158) = 71.3, p < .001, ηp2 = .47, and sad faces, F(2, 158) = 16.1, p < .001, ηp2 = .17 (see Table 1 and Figure 2). Pairwise comparison t-tests (Bonferroni-corrected alpha = .017) revealed that, for the sad faces, happiness ratings increased as the eyes moved upwards: downward-gazing faces were rated as significantly less happy than directly gazing faces, t(79) = 2.64, p = .010, d = 0.30 and upward-gazing faces, t(79) = 5.08, p < .001, d = 0.57, and directly gazing faces were rated as significantly less happy than upward-gazing faces, t(79) = 3.41, p = .001, d = 0.38. For happy faces, on the contrary, directly gazing faces were viewed as the happiest. They were rated as significantly happier than both upward-gazing faces, t(79) = 3.95, p < .001, d = 0.44, and downward-gazing faces, t(79) = 11.38, p < .001, d = 1.27. Upward-gazing faces were rated as significantly happier than downward-gazing faces, t(79) = 7.66, p < .001, d = 0.86.
Discussion
In Experiment 1, participants rated the happiness levels of a series of front-view faces whose eyes were oriented towards different directions along the vertical axis. Based on the association between valence and space apparent in experience and language, we predicted that faces gazing upwards would be perceived as relatively happier while faces gazing downwards would be perceived as relatively sadder.
Overall, both happy and sad faces were perceived as the most unhappy when the eyes were gazing downwards, which is consistent with our hypothesis and with previous research (Mignault & Chaudhuri, 2003). The effects of an upward gaze differed for happy and sad faces, however. For sad faces, an upward gaze was perceived as the happiest, with a direct gaze yielding an intermediate level of happiness between upward and downward gazes. This linear pattern is consistent with our initial predictions. For happy faces, on the contrary, it was the direct gaze that yielded the highest level of perceived happiness, though an upward gaze was perceived as happier than a downward gaze. This difference may reflect the finding that a direct gaze facilitates the perception of approach-oriented emotions like happiness while an averted gaze facilitates perception of avoidance-oriented emotions like sadness (Adams & Kleck, 2003, 2005; Sander et al., 2007). We return to this asymmetry between happy and sad expressions in the “General discussion” section.
The fact that direct and averted gazes differentially impact judgements of sad and happy expressions suggests that the inclusion of both gaze types in Experiment 1 introduced a potential confound. Experiment 2 was designed to address this concern. We repeated the same basic method using profile-view faces so that all stimuli displayed an averted gaze. This had the added benefit of ensuring that it was really gaze direction that caused changes in perceived happiness, and not, for example, other visual features of the face that changed when the face models moved their eyes up or down (e.g., slight muscular changes throughout the face, or the visibility of the sclera). Two additional methodological issues were addressed in Experiment 2. We expanded the stimulus set from six academic emotion models (viewed six times each) to images of 24 face models from a validated face database (viewed three times each). We also changed the anchors on the rating scale from “unhappy–happy”—which may have primed the concept of joy and consequently changed how participants reacted to the happy versus sad faces—to “negative–positive”—a bipolar, relatively emotion-neutral scale. Experiment 2 also included a set of neutral faces to test the generalisability of this spatial–valence congruency effect to putatively non-emotional faces.
Experiment 2
Method
Participants
We recruited 93 new participants (40% female) from Mechanical Turk via TurkPrime/CloudResearch using the same inclusion criteria as Experiment 1. The average age was 33 (SD = 8.9). No participants were excluded from our analyses.
Stimuli
Face stimuli were drawn from the Karolinska Directed Emotional Faces Database (KDEF; Lundqvist et al., 1998). We selected 12 male and 12 female profile faces from the database, with three instances of each face (neutral, happy, and sad expressions), for a total of 72 individual face images. Pilot testing was used to ensure that the happy, sad, and neutral expressions in these faces were reliably differentiated and fit the emotion category labels used in the database. Using Adobe Photoshop, we then created a mirror-image version of each image along the vertical axis, so we had images of each face looking towards the left and right. The KDEF includes profile images in both directions but creating our own mirror-image versions ensured that stimulus features were kept constant for both facing directions. By including and counterbalancing profile images facing both left and right, we were able to control for any effects of eye gaze along the horizontal axis. Research has shown that people tend to associate their dominant side—the right side for the vast majority of people, as ~90% of the population is right-handed (Papadatou-Pastou et al., 2020)—a positive valence because that side enables more fluid interactions with the environment (Casasanto, 2009, 2011); This design allowed us to both measure and control for any effects of right versus left gaze direction. 3
Next, we created versions of each of the images rotated upwards (gazing towards the top of the screen) and downwards (gazing towards the bottom of the screen). This yielded a total of 432 unique face images: 24 individuals × 3 expressions (happy, neutral, sad) × 3 gaze orientations (upward, sideward, downward) × 2 facing directions (left, right); see Figure 3.

Sample stimuli from Experiment 2. KDEF image IDs: AF01, left-facing, and AM06, left-facing (mirror-reversed); happy, neutral, and sad expressions.
We then created six counterbalanced sets of 72 face images. Each set included three images of each of the 24 models in the database. For a given individual, the set would include a happy, neutral, and sad expression, all with the same gaze orientation and facing direction. For instance, a set might include happy, neutral, and sad images of Female #1, all gazing upwards and facing towards the left. Within each set, gaze orientation and facing direction were counterbalanced across the 24 individual face models. For example, in the same set that included images of Female #1 gazing upwards and facing to the left, the three images of Male #4 might all be gazing downwards while facing to the right. Each counterbalanced set included an equal number of images for each gaze orientation and facing direction.
Procedure
The experiment was created using Qualtrics. Participants were given a similar set of instructions as in Experiment 1, with a few notable differences. First, they were told they would be rating how positive/pleasant or negative/unpleasant they perceived a series of emotional expressions to be. We noted that positive emotions include things like being happy, excited, and delighted, while negative emotions include things like being sad, miserable, or distressed. We also mentioned that the face images would sometimes be rotated clockwise or counterclockwise, and we asked that participants keep their own head upright and stationary as they completed the task.
Participants were then randomly assigned to rate one of the six counterbalanced sets of 72 face images, presented one at a time in a randomised order. On each trial, participants were instructed to use the slider bar that appeared below each image to rate how positive/pleasant (as opposed to negative/unpleasant) they perceived the emotion expressed by the face to be (0 = extremely negative, 100 = extremely positive). The bar was initially anchored at 50 on each trial. After rating all the images, participants completed our basic demographics questionnaire.
Results
We took the same analytic approach in Experiment 2 as in Experiment 1. We first calculated mean negativity/positivity ratings for each combination of emotion and gaze orientation for each participant, averaging across the 24 individual face models (see Table 2 and Figure 4). A 3(Emotion: happy, sad, neutral) × 3 (Orientation: upward, sideward, or downward gaze) repeated-measures ANOVA revealed a significant main effect of Emotion, F(2, 184) = 1,121.98, p < .001, ηp2 = .92, as well as Orientation, F(2, 184) = 31.12, p < .001, ηp2 = .25. Pairwise comparisons confirmed that participants rated the happy faces as significantly more positive than the neutral faces, t(92) = 30.44, p < .001, d = 3.16, and sad faces, t(92) = 37.88, p < .001, d = 3.93. Participants also rated the neutral faces as more positive than the sad faces, t(92) = 22.83, p < .001, d = 2.37. Consistent with our hypothesis, upward-gazing faces were rated as significantly more positive overall than both sideward-gazing faces, t(92) = 8.19, p < .001, d = 0.85, and downward-gazing faces, t(92) = 2.96, p = .004, d = 0.31. Downward-gazing faces were rated as significantly more positive overall than sideward-gazing faces, t(92) = 4.72, p < .001, d = 0.49.
Mean positivity ratings (and SDs) in Experiment 2 by emotional facial expression (happy, neutral, sad) and gaze orientation (upward, sideward, downward).

Mean happiness ratings for happy, neutral, and sad faces gazing upwards, sidewards, and downwards in Experiment 2.
The effect of Orientation was qualified by a significant interaction between Emotion and Orientation, F(4, 368) = 29.59, p < .001, ηp2 = .24. Therefore, we ran three additional repeated-measures ANOVAs to test for an effect of the orientation manipulation separately for the three facial expressions. The results revealed a significant main effect of orientation for neutral, F(2, 184) = 10.3, p < .001, ηp2 = .10, and sad faces, F(2, 184) = 64.89, p < .001, ηp2 = .41, but not for happy faces, F(2, 184) = 1.29, p = .278, ηp2 = .01. Pairwise comparisons revealed that, for neutral facial expressions, upward-gazing faces were rated as significantly more positive than sideward gazing faces, t(92) = 4.31, p < .001, d = 0.45, and marginally more positive than downward-gazing faces, t(92) = 2.44, p = .017. There was no difference in how downward- and sideward-gazing faces were judged (using a Bonferroni-corrected alpha level of .017), t(92) = 2.23, p = .028, d = 0.23. For sad facial expressions, upward-gazing faces were rated more positive than sideward, t(92) = 11.01, p < .001, d = 1.14, and downward-gazing faces, t(92) = 2.60, p = .011. In addition, downward-gazing faces were perceived as more positive than sideward-gazing faces, t(92) = 8.47, p < .001, d = 0.88.
Discussion
In Experiment 2, participants rated the emotional valence of a series of sad, neutral, and happy profile faces that were rotated to gaze upward, sideward, or downward. The results largely replicated what we found in Experiment 1 with front-view faces, with a few notable differences that extend our understanding of the relationship between gaze orientation and emotional valence perception.
In support of our hypothesis, we observed a spatial–valence congruency effect for sad and neutral faces: upward-gazing faces were rated as more positive than downward and sideward-gazing faces. This is especially notable because the face images in this study were identical across orientation conditions, differentiated only by how they were rotated on the screen (in contrast to Experiment 1). Notably, for both sad and neutral expressions, the sideward-gazing faces were perceived as more emotionally negative than the downward-gazing faces, though this difference was not statistically significant for the neutral expressions when correcting for multiple comparisons. This may be a result of a general processing advantage for upright faces (Davidenko & Flusberg, 2012; Valentine & Bruce, 1988). Consequently, perceptions of negativity might be enhanced for sideward-gazing faces that express some negativity. Note that the neutral faces were rated, on average, slightly towards the negative end of the spectrum. Taken together, this suggests that the spatial–valence congruency effects observed in this study may be driven by the upward-gazing faces, a possibility we further explore in Experiment 3 and 4 utilising a speeded reaction time classification task.
There were no significant effects of orientation on ratings of the happy faces. This echoes what we found in Experiment 1, where the spatial–valence congruency effects were more evident for sad faces, and supports the view that perceptions of happy and sad expressions are differentially affected by eye-gaze orientation due to specific processing differences for different types of emotions. We explore this possibility in further detail in the “General discussion” section.
Because Experiments 1 and 2 relied on explicit ratings, however, the lack of eye-gaze orientation effects could also be the result of a ceiling effect on positivity ratings for the happy faces. In Experiment 3, we used a more implicit reaction time measure of emotional face processing to rule out this possibility and to test whether the observed spatial–valence congruency effects reflect fast and automatic activations of spatial representations of emotional valence.
Experiment 3
Method
Participants
We recruited 81 participants (73% female) from the Introduction to Psychology Participant Pool at a small public liberal arts college in the northeastern United States. The average age was 19 (SD = 1.2), and participants received course credit for their participation.
Materials and procedure
The experiment was created using PsychoPy software (Pierce, 2007) and was administered on a 21.5 in. iMac desktop computer. We selected 10 of the male and 10 of the female KDEF profile faces that we used in Experiment 2 for Experiment 3, with happy and sad instances of each model, for a total of 40 faces (we eliminated the two male and two female faces that elicited the least reliably differentiated expressions based on our initial pilot testing). We did not include neutral faces because this was a speeded reaction time task; pilot testing with these stimuli suggested that using a three-alternative forced-choice method yielded higher error rates and slower reaction times across the board, potentially obscuring meaningful signals in the data. Therefore, we went with the two-alternative forced-choice design using only happy and sad faces. The images were cropped in Adobe Photoshop using an ovular template to highlight the face and reduce the appearance of distracting and potentially memorable features like hair and clothing (see Figure 5).

(a) Sample stimuli and a (b) schematic diagram of the trial structure for both stimulus duration conditions in Experiment 3. KDEF image IDs: AF01SAFL and AM10HAFL (mirror-reversed).
On every trial, a black fixation cross appeared at the centre of the screen, which had a light grey background. After 500 ms, one of the faces appeared at the centre of the display in one of five possible orientations (0° = looking left or right; −90°, −45° = gazing downwards; 45°, 90° = gazing upwards; see Figure 5). All 40 faces appeared once in each of the 5 orientations for a total of 200 trials. Half of the faces were presented facing to the left, and half were presented facing to the right (counterbalanced across participants). The order of trials was fully randomised. Participants were instructed to respond as quickly and as accurately as possible as soon as a face appeared during a trial, pressing one button on the keyboard if the face was “happy” and another button if the face was “sad.” Participants used the “f” and “j” keys to respond, counterbalanced across participants.
Participants were also randomly assigned to one of two stimulus duration conditions. In the Unmasked condition (n = 40), the face image remained on the screen until participants pressed a response key. In the Masked condition (n = 41), the face image remained on the screen for 100 ms and was then replaced by a scrambled version of one of the images, created in Photoshop. This manipulation was included to test whether any observed spatial congruency effects emerge early in visual processing (i.e., within the first 100 ms); see Figure 5.
Before the main experimental task, participants completed 8 practice trials consisting of upright, front view, cartoon faces (two sad faces and two happy faces, each presented twice) to acclimate them to the task.
Results
We first calculated mean response times (RTs) for each combination of emotion and gaze orientation for each participant. We identified outliers using participant- and condition-specific means and standard deviations. That is, we excluded data from trials in which participants responded three standard deviations faster or slower than their mean RT for the condition (e.g., happy faces at 0°). Of note, this method of identifying outliers yields the same pattern of results with respect to the subsequent analyses as a method that uses the global mean and standard deviation to identify outliers, but it avoids disproportionately trimming data from participants or conditions that were slower on average. Overall, data from 232 trials (<2% of total) were identified as outliers and excluded from the analysis. Accuracy in the remaining trials was high. On average, participants responded correctly M = 93.9% of the time (SD = 4.3).
Using reaction time for correct trials as our dependent variable, we ran a 2 (Emotion: happy vs. sad) × 5 (Orientation: −90°, −45, 0°, 45°, 90°) repeated measures ANOVA with stimulus duration condition as a between-subjects factor. Of particular interest was a predicted metaphor-congruent interaction between the Emotion and Orientation, which was statistically significant, F(4, 316) = 5.04, p < .001, ηp2 = .060 (see Figure 6). Planned contrasts revealed that participants were faster to recognise happy faces (compared with sad faces) oriented at 90°, t(80) = 2.63, p = .010, d = 0.29. There were no differences in recognition time by emotional expression at other orientations (−90°, −45°, 0°), ts < 1.3, ps > .2. That participants were equally fast at classifying upright (0°) happy and sad faces suggests that there were no general processing differences for these stimuli. This makes the observed difference at 90° all the more striking.

Mean reaction times for happy and sad faces for each stimulus orientation in Experiment 3, collapsed across stimulus duration condition.
In addition to the predicted interaction, the model revealed a main effect of Orientation, F(4, 316) = 28.62, p < .001, ηp2 = .266. Consistent with prior work on the effects of orientation on face perception (e.g., Davidenko & Flusberg, 2012; Valentine & Bruce, 1988), participants were fastest to respond to upright faces (0°) and were progressively slower to respond as the faces were rotated away from upright, indicating that they may have mentally rotated the faces to make their judgement. No other main effects or interactions were statistically significant, ps > .1. 4
Error analysis
An analysis of error trials revealed a similar pattern as the analysis of RT data. Using error frequency as the dependent variable, we conducted a 2(Emotion) × 5 (Orientation) repeated-measures ANOVA with stimulus-duration condition as a between-subjects factor. We found that people made more errors on trials that presented downward-gazing happy faces than trials that presented downward-gazing sad faces, and vice versa for upward-gazing faces, F(4, 316) = 5.46, p < .001, ηp2 = .065. Planned contrasts revealed that participants made more errors in recognising happy, compared with sad, faces oriented at −90°, t(80) = 5.25, p < .001, d = 0.58, −45°, t(80) = 3.38, p = .001, d = 0.37, 0°,t(80) = 4.33, p < .001, d = 0.48, and 45°, t(80) = 2.87, p = .005, d = 0.32; there were no differences in error rates by emotional expression 90°, t(80) = 0.14, p = .882, d = 0.02.
The model also revealed differences between the Masked and Unmasked conditions. Not surprisingly, participants in the Unmasked condition (mean accuracy = 96.4%, SD = 2.39) made fewer errors than those in the Masked condition (mean accuracy = 91%, SD = 4.13), F(1, 79) = 45.30, p < .001, ηp2 = .364. Since the performance of participants in the Unmasked condition was close to ceiling, the interaction between Emotion and Orientation was only present for participants in the Masked condition. That is, the 2-way interaction between Emotion and Orientation was qualified by a 3-way interaction between Emotion, Orientation, and Condition, F(4, 316) = 3.95, p < .005, ηp2 = .048.
In addition, this model revealed that people made more errors for happy face trials (M = 92.4% accuracy, SD = 5.64) than for sad face trials (M = 95.2% accuracy, SD = 4.59), F(1, 79) = 18.54, p < .001, ηp2 = .190. This supports previous findings suggesting that an averted gaze facilitates the perception of avoidance-oriented emotions like sadness and disrupts the perception of approach-oriented emotions like joy (Adams & Kleck, 2003, 2005; Hess et al., 2007). People also made more errors as the faces were rotated away from upright, F(4, 316) = 13.6, p < .001, ηp2 = .136, in line with previous research on face perception. As shown in Figure 7, however, the effect of orientation was only apparent for those in the Masked condition, F(4, 316) = 14.76, p < .001, ηp2 = .157, likely due to the aforementioned ceiling effect for the Unmasked condition.

Mean number of errors for happy and sad faces for each stimulus orientation in each condition in Experiment 3.
Discussion
In Experiment 3, we asked whether the spatial–valence congruency effects in emotional face processing observed in the explicit ratings task used in Experiments 1 and 2 would generalise to a more implicit measure like reaction time in an emotional expression categorisation task. The answer was a partial “yes,” but with some important nuance. Participants tended to be faster and more accurate in classifying emotional expressions when the faces were oriented towards metaphorically consistent regions of space. On the reaction time measure, this was true whether the stimuli were masked after 100 ms or remained visible until response, suggesting that spatial representations of valence are activated quickly and automatically when people assess emotional faces (Brookshire et al., 2010).
Interestingly, these effects seemed to be driven largely by a decrease in performance for sad faces gazing upwards: while RTs for happy faces increased symmetrically as the images were rotated upwards and downwards away from upright, RTs for sad faces dramatically increased (and accuracy decreased) on upward rotations (i.e., metaphor-incongruent orientations). This is consistent with the explicit rating task data from Experiment 2, which revealed that orientation did not affect positivity judgements for happy profile faces, while upward-gazing sad profile faces were viewed as more positive than sideward/upright and downward-gazing sad faces. We explore possible explanations for this asymmetry in the “General discussion” section.
Notably, however, the results of Experiments 1–3 cannot address one key question: which way is up? Spatial relations like up, down, and orientation must be defined with respect to a particular frame of reference (Carlson-Radvansky & Irwin, 1993; Davidenko & Flusberg, 2012). When participants are seated at a computer in a typical lab study like Experiment 3, several spatial reference frames are conflated: faces that are oriented upwards with respect to the computer screen, the room itself, and the directional pull of gravity (environmental frames) are also oriented upwards with respect to the participant (egocentric reference frames). This makes it impossible to determine which reference frame(s) participants are using to represent the orientation of the faces, and thus which reference frame is driving the observed spatial–valence congruency effects.
Fortunately, there is a simple method for disassociating environmental and egocentric reference frames: tilt your head 90° to one side. Now profile faces that appeared to be gazing upwards in the environment will appear to be gazing sideward in your egocentric frame of reference (in an upright or upside-down orientation, depending on which way you tilt your head). In Experiment 4, therefore, participants completed the same task as in Experiment 3 while lying down on one side.
Prior research has shown that people process faces independently in both the environmental and egocentric reference frames. For example, Davidenko and Flusberg (2012) found that people were better at classifying and remembering images of faces that were egocentrically upright (as compared with egocentrically inverted) as well as environmentally upright (as compared with environmentally inverted). However, effects in the environmental reference frame were notably smaller. Given the particular importance of egocentric reference frames in face perception (e.g., Rossion, 2008; Troje, 2003), one possibility is that the spatial–valence mapping will be defined with respect to the orientation of the participant, relative to the orientation of their head/eyes. On the contrary, our experience with faces in the real world—which are normally upright with respect to gravity even if we are tilted or on our side—may tie the spatial–valence mapping to an environmental frame of reference. Some spatial metaphors for emotional valence in English seem to reference the environmental frame specifically, as when we say, “things are looking up.” Identifying the reference frame(s) in which the spatial–valence mapping is defined can help us understand how these representations are learned and when they influence our behaviour (Davidenko & Flusberg, 2012).
Experiment 4
Method
Participants
We recruited 85 participants (69% female) from the Introduction to Psychology Participant Pool at a small public liberal arts college in the Northeastern United States. The average age was 19.2 (SD = 2.63) and participants received course credit for their participation. Data from four participants were removed from analysis because the computer crashed mid-session (n = 1), the participant was under 18 and could not give legal consent to participate (n = 2), or the participant’s error rate was extremely high (32% errors; n = 1), leaving 81 participants in the final sample.
Materials & procedure
The experiment was inspired by a series of studies by Davidenko and Flusberg (2012), but was similar in design to Experiment 3, with a few key differences: Instead of sitting on a stool at a computer workstation, participants began the experiment by sitting upright on a futon positioned at the back of the lab room. The computer running the experimental software was positioned on a low table in front of the futon. Participants first completed the same 8 practice trials featuring front-view cartoon faces that were used in Experiment 3. The only difference was that they used only their left hand to make the speeded response, using the “1” and “2” keys on the keyboard (counterbalanced across participants). After the practice trials, participants were instructed to lay down on their right side with their head resting horizontally on a flat pillow facing the computer screen.
For Experiment 4, we used 8 out of the 10 male and 8 out of the 10 female profile faces that we had used in Experiment 3, with happy and sad expressions for each one. This was intended to keep the experiment short enough to complete in a reasonable time frame, since each face appeared 8 times in Experiment 4 compared with 5 times in Experiment 3. The four faces we eliminated for Experiment 4 were chosen based on our original pilot subject ratings of how happy and sad the expressions looked (i.e., the two male and two female faces that scored lowest on these ratings on average).
On any given trial in Experiment 4, 1 of the 32 individual profile images (8 males, 8 females, 2 expressions each) randomly appeared in 1 of 8 possible orientations (see Figures 8 and 9). Participants saw each of the 32 faces in each of the 8 possible orientations, for a total of 256 trials.

Stimulus orientations in Experiment 4.

Mean RTs for happy and sad faces for each stimulus orientation and each frame of reference (environmental on the left) in Experiment 4.
Note that when participants lay on their right side to view these images, faces that were gazing upward or downward in one frame of reference (i.e., rotated 90° or −90° in that frame) were always gazing sideward and either perfectly upright or perfectly upside-down in the other frame of reference. 5 This decoupling of the environmental and egocentric reference frames allowed us to investigate independent spatial–valence congruency effects in both frames of reference.
Results & discussion
As with Experiment 3, we first calculated mean RTs for each combination of emotion and gaze orientation for each participant and identified outliers using participant- and condition-specific means. RTs that were three standard deviations faster (n = 4) or slower (n = 258) than a participant’s mean for a given condition were excluded from the analysis (1.2% of all trials). As with Experiment 3, the results of Experiment 4 are virtually identical if we use the global mean and standard deviation to identify and exclude outliers. Overall classification accuracy was quite good on the remaining trials. Participants responded correctly M = 96% of the time (SD = 3.14).
Past research suggests that there are independent effects of spatial orientation on face perception in the environmental and egocentric reference frames (Davidenko & Flusberg, 2012). Therefore, we analysed the trial data separately for each frame, including only those trials where participants correctly identified the emotional facial expression.
Environmental frame
We first conducted a 2(emotion: happy vs. sad) × 2 (orientation: −90° in the environment vs. 90° in the environment) repeated measures ANOVA with mean RT as the dependent variable. There was no main effect of emotion, F(1, 80) = 0.17, p = .684, ηp2 = .002, or orientation, F(1, 80) = 2.97, p = .089, ηp2 = .036. Crucially, however, there was a significant metaphor−congruent interaction between emotion and orientation, F(1,80) = 4.13, p = .046, ηp2 = .049. Participants were significantly faster to respond to downward-gazing sad faces than upward-gazing sad faces, t(80) = 2.76, p = .007, d = 0.31; they responded similarly to happy faces oriented upward versus downward, t(80) = 0.44, p = .663, d = 0.05. In sum, we observed the same spatial–valence congruency effect for sad faces we observed in Experiment 3 in the environmental reference frame; see Figure 9.
Egocentric frame
We repeated this analysis for trials where the faces were oriented upwards or downwards in the egocentric frame of reference. There were no main effects of emotion or orientation, nor was there an interaction between the two (Fs < .2, ps > .4. In other words, there was no spatial–valence congruency effect in the egocentric frame; see Figure 9.
Error analysis
In both the environmental, F(1, 80) = 23.06, p < .001, ηp2 = .224, and egocentric, F(1, 80) = 6.54, p = .012, ηp2 = .076, frames of reference, analyses of error rates revealed a main effect of emotional expression: in both reference frames people were less accurate when classifying happy faces than sad faces, consistent with what we observed in Experiment 3. There was no interaction between orientation and emotional expression in the error analysis of either frame of reference, Fs < .1, ps > .7.
General discussion
Many social interactions hinge on our ability to quickly assess the emotional experiences of others based on their facial expression. Research suggests that spatial signalling with the head and eyes provides important cues to affective experience and moderate how people communicate and process emotional faces (e.g., Adams & Kleck, 2003, 2005; Hess et al., 2007; Mignault & Chaudhuri, 2003). We investigated whether spatial signalling along the vertical axis, as indicated by eye-gaze or head orientation, would influence how people processed emotional facial expressions. This was motivated by a consideration of spatial metaphors for affective valence in English, where up in space connotes happy or positive feelings (“things are looking up!”) and down in space connotes sad or negative feelings (“I’m down in the dumps”). Evidence for spatial–valence congruency effects in RT tasks suggests that this association is not merely a matter of language; rather, it offers a window into how people mentally represent dimensional concepts like emotional valence (Meier & Robinson, 2004). We hypothesised that faces gazing upwards would be perceived as relatively happier while faces gazing downwards would be perceived as relatively sadder.
Across four experiments, we found partial support for this hypothesis: sad faces gazing upwards were perceived as happier in explicit ratings tasks (Experiments 1 and 2) and classified slower and less accurately in implicit reaction time tasks (Experiments 3 and 4). This was true whether looking direction was cued by eye movements in front-view faces (Experiment 1) or by the orientation of profile faces (Experiments 2–4). However, this spatial–valence congruency effect was only reliable in the environmental frame of reference (Experiment 4). We found little evidence for a comparable effect of gaze direction on judgements of happy faces, though the explicit ratings findings generalised to neutral expressions. These results provide evidence for a novel cue to emotional face processing—at least for sad faces—even as they enrich our understanding of spatial–valence congruency effects in cognition. The data offer support for the view that people represent the concept of emotional valence along a vertical dimension in the environmental frame of reference, which influences how they process and respond to certain emotional facial expressions.
One key question is why we observed a spatial–valence congruency effect for sad, but not happy faces. This may be a result of general differences in how people process sad versus happy facial expressions. Recall that a direct gaze facilitates the processing of approach-oriented emotions like happiness, while an averted gaze facilitates the processing of avoidance-oriented emotions like sadness (Adams & Kleck, 2003, 2005; Hess et al., 2007). Since our profile view stimuli were always gazing away from the observer, this may have disrupted how participants processed the happy expressions, making it less likely that metaphorical spatial representations of valence would affect judgements or speeded responses to these images. Some support for this possibility comes from the accuracy data in Experiments 3 and 4, which showed that people made more errors on the happy face trials than on the sad face trials. Conversely, the averted gaze associated with our profile faces should, if anything, facilitate how participants processed the sad expressions (Adams & Kleck, 2003, 2005). On trials where the sad faces were gazing upwards, participants would experience a conflict between two representations of valence: from the facial expression itself and from its orientation in space. This could bias explicit emotion judgements towards the positive or happy end of the spectrum and result in a slowdown in reaction time due to the general cognitive costs associated with resolving stimulus–response conflicts (e.g., Fan et al., 2003; Simon & Berbaum, 1990).
That the observed spatial–valence congruency effect was only reliable in the environmental frame of reference is somewhat surprising, however, as other research has found that effects of spatial orientation on face perception and memory are typically larger in the egocentric frame (e.g., Davidenko & Flusberg, 2012; Troje, 2003). One possibility is that this finding reflects the fact that real-world expressions of emotional valence—like slouching with the head downcast when sad or depressed—are reliably associated with the environmental and not the egocentric frame of reference. That is, people who are sad droop down relative to the directional pull of gravity, not relative to the head of the observer (unless the observer is upright in the environment as well). This could also help explain the asymmetry we observed between happy and sad faces: it is possible that our experiences seeing sad faces are more reliably associated with a drooping posture than our experiences of happiness are associated with any elevated posture.
That said, multiple environmental reference frames were conflated in the present study design (e.g., faces that were environmentally upright were upright with respect to the computer display, the lab room, and the directional pull of gravity), so future work is required to tease apart which one(s) people are using to structure their concept of affective valence. In addition, all participants completed Experiment 4 while lying on their right side (i.e., we did not counterbalance horizontal participant orientation). While this simplified our study design, it introduced a potential confound: all faces oriented “upwards” in the environment were also oriented to the “left” relative to the observer. Importantly, people tend to associate their non-dominant side—the left side for most people—with a more negative valence (Casasanto, 2009, 2011); While it is unlikely that this can account for the critical interaction between orientation and emotion we observed—as it would have affected responses to all stimuli—future studies should directly assess and control for the impact of participant positioning.
An alternative account of spatial congruency effects in reaction time tasks is a “polarity-based” explanation (Lakens, 2012; Louwerse, 2011; Lynott & Coventry, 2014; Proctor & Cho, 2006 but see Pecher et al., 2010). According to this account, dimensional stimuli like space and valence are always anchored at a default endpoint (+pole) that is typically more frequent and unmarked linguistically. In the case of valence, for example, “happy” is the default +pole: you can negate the unmarked term happy (i.e., unhappy) but not the term sad (i.e., “unsad” is not an English word). The polarity account attributes spatial–valence congruency effects to a generic processing advantage for +polar items and predicts people should be fastest to respond to stimuli that represent the confluence of two +poles. In support of this hypothesis, an RT asymmetry in favour of positively valenced stimuli higher up in space has been observed in several experimental studies, as “up” represents the +pole in space (e.g., Lakens, 2012; Lynott & Coventry, 2014 but see Dolscheid & Casasanto, 2015, for evidence against a universal polarity correspondence account of metaphor-congruency effects). In the present set of experiments, however, we do not observe a processing advantage for happy faces gazing upwards. Rather, we find a processing cost for sad faces gazing upwards. This could be evidence against the polarity account as a universal explanation for spatial–valence congruity effects. Alternatively, it could reflect the fact that our faces were presented in the centre, and not in the upper or lower regions of the screen. A face gazing upwards (i.e., stimulus orientation) might not “activate” the +pole representation in the same way that stimulus position does, indicating that the polarity account would not make any concrete predictions in our tasks. That said, more research may be needed to fully account for the response asymmetries observed in the current studies.
In addition to alternative accounts of spatial–valence congruency effects, there are other features of our studies that may have impacted our findings. For example, we did not assess or control for the state or trait emotions of our participants. While such measurements are not typically included in studies of emotional face perception (e.g., Davidenko & Flusberg, 2012; Gendron et al., 2012; Gregory et al., 2021; Sacharin et al., 2012), there is evidence that emotions can shape perceptual processing (Zadra & Clore, 2011). Given the consistent and converging findings across our experiments, it is unlikely that our key findings merely reflect the emotional characteristics of our participants. Nevertheless, we think that future work should take this into account and assess or manipulate the emotional state of participants. Similarly, we did not assess the handedness of participants, which has been shown to impact valence judgements in other studies (Casasanto, 2009, 2011); Specifically, people tend to associate their dominant side with a more positive valence. While there is little evidence that this impacted our key findings (see Footnote 3), future work on the role of spatial cues in emotional face processing should take the handedness of participants into consideration.
In sum, across four experiments we have provided evidence for a novel eye-gaze cue along the vertical axis that affects how people perceive and classify sad emotional expressions. This spatial–valence incongruency effect does not appear to extend to judgements of happy faces, however, which may reflect intrinsic differences in how people process approach versus avoidance-related emotional expressions. These findings have implications for scholars who use observations of conventional metaphorical language to draw inferences about underlying cognitive representations. While people use vertical spatial metaphors to talk about both positive and negative emotions, our studies reveal that vertical gaze cues in the environmental frame of reference only moderate judgements of negative emotional expressions. This is consistent with research showing that conventional metaphors may or may not reflect how people represent the target domain in question, and that experiments are necessary to make that determination (e.g., Casasanto, 2008).
Footnotes
Acknowledgements
The results of Experiments 3 and 4 were presented in the Proceedings of the 38th Annual Meeting of the Cognitive Science Society. The authors thank Nicolas Davidenko for helpful discussions of this project.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by an Emily & Eugene Grant Incentive Research Award to S.J.F.
