Abstract
Foveated stimuli receive visual processing that is quantitatively and qualitatively different from nonfoveated stimuli. At normal interpersonal distances, people move their eyes around another’s face so that certain features receive foveal processing; on any given fixation, other features therefore project extrafoveally. Yet little is known about the processing of extrafoveally presented facial features, how informative those extrafoveally presented features are for face perception (e.g., for assessing another’s emotion), or what processes extract task-relevant (e.g., emotion-related) cues from facial features that first appear outside the fovea, and how these processes are implemented in the brain.
Keywords
This article is premised on three facts whose combined implications are easily overlooked in research on face perception. First, face stimuli elicit characteristic patterns of fixation on facial features. Second, there are well-established differences between foveal and extrafoveal vision that are not limited to differences in spatial frequency sensitivity. Third, at normal interpersonal distances, another’s face occupies an area of the visual field large enough that not all features can fall within the fovea at once. In the first section, we introduce the basic behavioural and neurophysiological differences between foveal and extrafoveal vision and their relevance for face perception. In the second section, we review findings showing the characteristic patterns of fixation on facial features and how those features thus foveated inform judgments of facially expressed emotions. In the third section, we consider recent work that is uncovering a central role for the amygdala in three processes that are inherently related to differences between foveal and extrafoveal vision, namely, seeking out, fixating, and attending to features within a face. In the final section we return to discussion of fundamental differences between foveal and extrafoveal vision that arise at the very earliest stages of visual processing and that are likely to be reflected in how humans view faces and how their brains process them, but which do not yet inform cognitive neuroscience studies of face perception.
Foveal versus Extrafoveal Vision and its Relevance for Face Perception
During normal viewing our eyes sample the visual scene, directing a sequence of images to the fovea, a small region of the retina that corresponds to the central 2º of the visual field. The average height of an adult human face is approximately 18 cm (Fang, Clapham, & Chung, 2011). At what Hall (1966) calls the “close phase of personal distance” (~76–45 cm), a face will thus subtend visual angles of 13.4–22°, at far personal distances (~122–76 cm) 8.4–13.4°, and at close social distances (~213–122 cm) 4.8–8.4°. Therefore, under such conditions, another’s face occupies an area of the visual field large enough that not all features can fall within the fovea at once, and normal viewing elicits characteristic patterns of fixations, mostly on the eyes, nose, and mouth (e.g., Henderson, Williams, & Falk, 2005; Yarbus, 1967). Features falling outside the fovea nevertheless receive some visual processing, perhaps determining the next fixation location or even contributing directly to the extraction of socially relevant information, such as emotion and identity.
Little is known about how features within a face are processed in the visual periphery and parafovea, despite the well-established differences between extrafoveal and foveal vision. With increasing eccentricity, there is a decline in both visual acuity (i.e., the spatial resolving capacity of the visual system, which determines the ability to see fine detail) and contrast sensitivity (i.e., the ability to detect differences in contrast) (Robson & Graham, 1981). Detailed vision at fixation is supported by the high density of photoreceptors in the fovea and the high proportion of visual cortex dedicated to processing their signals (Tootell, Switkes, Silverman, & Hamilton, 1988). For many visual tasks, performance outside the central visual field can be equated to performance in the central visual field by simply scaling the size of the stimuli (Virsu & Rovamo, 1979). However, different tasks require different scaling factors (Vakrou, Whitaker, & McGraw, 2007), and for some tasks, simple scaling fails to equate performance (Hess & Field, 1993). Thus, differences between foveal and extrafoveal vision are not limited to differences in visual acuity and contrast sensitivity; peripheral vision is also thought to differ qualitatively from central vision, receiving different processing and optimised for different tasks (Strasburger, Rentschler, & Jüttner, 2011). Of particular significance are the effects of “crowding” in the periphery, whereby the recognition of detail is catastrophically impaired by nearby patterns or contours.
Little is also known about the processes that allow us “to seek out, fixate, pay attention to and make use of” (Adolphs et al., 2005, p. 71) facial features that are initially visible only in the extrafoveal visual field. Yet vision is an active process (Findlay & Gilchrist, 2003), and understanding how peripheral vision contributes to the perception of facially expressed emotions may provide important insights into the larger issues of normal and abnormal human social interactions.
Fixating and Making Use of Information from Features within a Face
The great majority of first fixations on a face are located on and around the central upper nose (between and just below the eyes) (Bindemann, Scheepers, & Burton, 2009; Hsiao & Cottrell, 2008; Kennedy & Adolphs, 2010). The location of second fixation is a little more varied, but mostly still around the centre of the face and the eyes (Bindemann et al., 2009; Hsiao & Cottrell, 2008). From the third fixation on, the classic triangular pattern of fixations on the eyes, nose, and mouth becomes evident (Bindemann et al., 2009). Overall, the eyes tend to be fixated most frequently or for longer, or both, compared to any other region of the face (e.g., Henderson et al., 2005; Yarbus, 1967), and observers have a strong preference for using information from the eyes across a variety of tasks (Schyns, Petro, & Smith, 2007; Vinette, Gosselin, & Schyns, 2004). Recent research indicates cultural variation in such scanpaths; in one study, for example, Western Caucasian observers distributed their fixations evenly across the central facial features, whereas East Asian observers consistently fixated the left and right eyes more often than they did the mouth (Jack, Blais, Scheepers, Schyns, & Caldara, 2009).
Abnormal patterns of fixation during facial viewing are associated with conditions involving impaired social cognition. Individuals with autism, for example, tend to fixate the eye region less than controls (e.g., Dalton et al., 2005), whereas children with Williams syndrome fixate the eyes more than controls (e.g., Riby & Hancock, 2008). Yet both groups tend to have difficulties identifying facially expressed emotions (e.g., Philip et al., 2010; Plesa Skwerer, Verbalis, Schofield, Faja, & Tager-Flusberg, 2006), which in some cases is predicted by their abnormal face scanning (Corden, Chilvers, & Skuse, 2008; Kliemann, Dziobek, Hatri, Steimke, & Heekeren, 2010). The link between looking behaviour and impaired function in these conditions suggests an important relationship between foveal and extrafoveal processing of facial cues to emotion.
There is some debate over whether scanpaths for faces differ as a function of the instructions given to the participants (e.g., Corden et al., 2008; Jack et al., 2009; Kennedy & Adolphs, 2010). Nonetheless, it is clear that people do make more or less use of different facial features as a function of the task, such as identity, sex, and emotion judgments. The “Bubbles” technique (Gosselin & Schyns, 2001) has been used to infer the parts and spatial frequency bands of an image that drive performance on a particular task. The ability to discriminate between facial expressions of different emotions can be driven by specific facial features; the eye region, for example, is especially diagnostic of fear and the mouth for happiness (Smith, Cottrell, Gosselin, & Schyns, 2005). One limitation of the Bubbles technique, however, which is also a limitation of the majority of face perception studies, is that face and expression processing are examined under free-viewing conditions in which observers are able to make (one or more) fixations on facial features. Such viewing conditions do not readily allow the teasing apart of foveal and extrafoveal visual processing so that their contributions to task performance can be examined separately.
To examine the distinct contributions of foveal and extrafoveal processing of facial features, the loci of fixations on the face image must be carefully controlled. The simplest method available is to present stimuli only briefly: Since a finite time is required to program and initiate a saccade, presentation and removal of the face image can be completed before an eye movement can redirect the fovea to a new location on the face. This manipulation has been used in a number of studies (e.g., Gamer & Büchel, 2009). The required brevity of stimulus presentation is somewhat controversial. Regular saccade latencies are of the order of 135–220 ms, but this includes time for fixation neurons of the primate superior colliculus to disengage (Fuchs, Kaneko, & Scudder, 1985; Wurtz, 1996), and removal of the fixation stimulus prior to presentation of the target may shorten the critical window for stimulus presentation to 90–120 ms (Saslow, 1967), though not for all tasks (Liversedge et al., 2004).
A more sophisticated experimental manipulation—gaze-contingent stimulus presentation—is to record eye movements during stimulus presentation and to update the stimulus depending on where the observer is looking, thereby controlling the information that is directed to particular retinal locations. This technique was first employed to elucidate the relative roles of foveal and parafoveal processing in reading (Rayner, Inhoff, Morrison, Slowiaczek, & Bertera, 1981). Recently, several labs have used a gaze-contingent “spotlight” to remove extrafoveal information during free viewing of whole faces (Caldara, Zhou, & Miellet, 2010; Kennedy & Adolphs, 2010; van Belle, De Graef, Verfaillie, Rossion, & Lefèvre, 2010).
The Amygdala’s Role in the Perception of Emotion from Faces: Seeking Out, Attending, or Fixating Features within a Face?
In vision, the primary role of the human amygdala is directing processing resources to salient, biologically significant (and thus often emotionally and socially charged) stimuli, via its many connections with a variety of cortical and subcortical structures (Pessoa & Adolphs, 2010). Central to this conclusion is work showing that the amygdala has an important role in directing sensory organs toward environmental locations that had offered predictive information in the past (Holland & Gallagher, 1999; Whalen, 1998).
As we have seen, the eyes are particularly salient facial features, and information from the eyes is extracted and used for a variety of face perception tasks and in social interactions more generally (Emery, 2000). Neuroimaging studies have demonstrated the amygdala’s sensitivity to the eyes (e.g., Kawashima et al., 1999; Morris, deBonis, & Dolan, 2002; Whalen et al., 2004). Bilateral amygdala lesions impair the ability to recognize fear and, to a more variable extent, anger and other negatively valenced emotions from static facial expressions (e.g., Adolphs, Tranel, Damasio, & Damasio, 1994; Calder et al., 1996). For at least one individual with complete bilateral amygdala damage (S.M.), there is a greatly reduced tendency to spontaneously look at other people’s eyes, which partially explains why this individual has a selective deficit in the ability to identify fear in faces (Adolphs et al., 2005), for the eye region is especially diagnostic for the discrimination of fearful expressions (Smith et al., 2005).
The nature of the amygdala’s role is, however, far from straightforward. Remarkably, S.M.’s impairment in identifying fearful facial expressions tended to disappear when she was instructed on a trial-by-trial basis to look at the eyes (Adolphs et al., 2005). Additionally, S.M. did make normal use of high spatial frequency information from the eyes and mouth in Bubbles faces when asked to discriminate the gender of those faces (Adolphs et al., 2005). However, when asked to free-view whole face images, S.M. rarely fixated the eyes, unlike healthy observers. Instead, she fixated mostly the centre of the face (the nose) when viewing photographs, regardless of whether she was judging the emotion or sex of the faces or passively viewing them (Adolphs et al., 2005), and fixated mostly the mouth during conversations (Spezio, Huang, Castelli, & Adolphs, 2007). These subtleties highlight a complex interplay between task performance, looking behaviour, and presentation paradigm.
The findings of two more recent studies suggest that the amygdala’s function during facial emotion perception is more in seeking out (driving saccades towards) the eyes than in actually fixating the eyes. In Gamer and Büchel’s (2009) study, fearful, happy, angry, or neutral faces were briefly presented to healthy participants such that the mouth or one or other of the eyes was aligned with fixation. The participants categorized the emotional expression then rated its emotional intensity. The faces were presented briefly (150 ms) so that participants were unlikely to execute a saccade whilst the stimulus was present. The amygdala response was significantly greater for fearful faces when the mouth was aligned to the fixation target (and thus the eyes were in the extrafoveal visual field) than when the eyes were aligned to the fixation target. Moreover, there was a significant correlation across participants between amygdala activity and gaze preferences for the eye region of fearful faces. No such correlations were observed for the other emotional expressions. Kennedy and Adolphs (2010) showed that S.M. fixates the eyes normally when only the foveated region of the face is made visible via gaze-contingent presentation, suggesting that, in the absence of facial features in the extrafoveal visual field, her gaze can be driven entirely by top–down control. These findings confirm a bottom–up, stimulus-driven causal role for the amygdala in seeking out eyes in the extrafoveal visual field.
The research discussed earlier prompts further questions, which we are currently addressing; for example: although visual attention is intimately related to fixations, it is possible to attend to a visual target whilst fixating a different location (Posner, 1980). So, might it be attention to the eyes or other facial regions that is critical for accurate perception of facially expressed emotions? Furthermore, does the amygdala also have a role in disengaging fixations from core but nondiagnostic features (e.g., fearful mouths)? The answers to these questions are constrained by and inform considerations of the interplay between information extracted in extrafoveal versus foveal vision.
Eccentricity Dependence and Specialization of Early Visual Processing
Several viewing conditions support transmission of only relatively coarse-grained information—for example, viewing over large (vs. short) distances, viewing by newborn (vs. adult) visual systems (Atkinson & Braddick, 1989), and viewing in the extrafoveal (vs. foveal) visual field. To convey information in these situations, the visual signal should be confined to relatively few cycles of spatial variation in intensity across the face, and indeed several researchers present evidence that coding of facial information is tuned in this way (reviewed in Johnson, 2005).
One property of the human visual system that has generated much interest with respect to the extraction of information from facial expressions across different spatial scales is the distinction between magnocellular and parvocellular visual pathways. These pathways are named for the layers of the lateral geniculate nucleus (LGN) of the thalamus through which they project, but they originate in the parasol and midget ganglion cells of the retina respectively. At a given retinal eccentricity, parasol ganglion cells have larger receptive fields than midget cells, though both show an increase in receptive field size with eccentricity. Parasol cells also exhibit faster response latencies and transient responses, and early experiments suggested that they provide strong inputs to subcortical pathways (Schiller & Malpeli, 1977). For these reasons, their role in processing low spatial frequency information in faces, and in the rapid detection of threat-related facial stimuli in the periphery, has been highlighted (Vuilleumier, Armony, Driver, & Dolan, 2003).
The concept of parallel extraction of different visual features by specialized visual subsystems is a powerful one. But the simplistic mapping of the magnocellular versus parvocellular division to other dichotomies—such as (a) low versus high spatial frequency, (b) peripheral versus central vision, (c) subcortical versus cortical routes—is misleading. We now know that the parasol and midget ganglion cell types coexist in primate retina amongst approximately 20 anatomically distinct retinal ganglion cell types (Dacey, Peterson, Robinson, & Gamlin, 2003). Advances in understanding the functional architecture of the retina call for a radically different conceptualisation of the information processing that drives visually guided behaviour (Gollisch & Meister, 2010). Processing in downstream areas may of course further transform the information available, but the sophistication and selectivity that arises in the retina imposes important constraints on subsequent stages. Importantly, each of these ganglion cell populations forms an independent spatial mosaic: The dendritic trees of each cell type have a characteristic size, with large intertype differences, but the cell bodies are nonrandomly placed such that the dendritic trees of each population tile the retinal surface with minimal overlap. These ganglion cell populations thereby provide the basis of 20 or so parallel representations of information across the visual field. For each cell type the dendritic trees tend to increase in size with increasing retinal eccentricity, but the rate of increase varies between cell types, further illustrating the potential for specialization of foveal, parafoveal, and peripheral vision for extraction of different information.
Many ganglion cell types show branching projections to multiple targets, including LGN, superior colliculus, and several other smaller targets that have been implicated in the control of eye movements, pupillary responses, and circadian rhythms. Both parasol and the so-called smooth monostratified cells, for example, project both to the LGN and to the superior colliculus in the macaque (Crook et al., 2008). It is unlikely, therefore, that there will be a straightforward division between the type of information sent to cortical and to subcortical areas (Pessoa & Adolphs, 2011). For our present purpose it is interesting to note the recent developments in elucidating the multiple pathways that are involved in active vision, including control of saccadic eye movements and attentional selection (Wurtz, McAlonan, Cavanaugh, & Berman, 2011), and the growing evidence for direct projections from retina to amygdala in at least some nonprimate mammalian species (Elliott, Weiss, & Nunez, 1995; Hattar et al., 2006).
We are secure in identifying the foveal visual field as the region of highest visual acuity (most directly through measures of the spatial contrast sensitivity function at different eccentricities). It is also likely that input to subcortical areas derives not from midget ganglion cells, but from types of ganglion cells that have larger receptive fields and so are limited to providing relatively coarse spatial information. But the diversity of early visual pathways, their organization across the visual field, and the projections they exhibit leave considerable scope for the way in which extrafoveal information supports the extraction of information from facial expressions.
Concluding Remarks
There is a wealth of information contained in a visually presented facial expression. Characteristic scan paths observed in facial viewing, and performance impairments associated with abnormal scan paths, imply that foveal viewing of particular features, or the way in which these features drive saccades or attention when observed in the periphery, is critical to the extraction of relevant information. Known differences between foveal and extrafoveal visual processing provide some hints to identify the visual signatures and neural substrates that underlie effective visual communication through facial expressions, as evidenced in the work examining the amygdala’s role in seeking out, fixating, and paying attention to (Adolphs et al., 2005) diagnostic facial features. There has been considerable interest in linking behaviourally relevant visual functions to specific pathways in the primate visual system. However, we cautioned against oversimplistic mapping based on headline characteristics of visual cell types. A recent burgeoning of knowledge about the diversity of retinal ganglion cell types further calls into question the validity of any straightforward mapping between high-level tasks and specific visual channels, as do the findings that the majority of primate retinal ganglion cells project both to cortical and subcortical targets. We advocate an approach in which low-level visual features of facial stimuli and the way in which they are sampled by eye movements are well specified. Above all, we argue that the signal transmitted to the brain is not a pictorial remapping of the retinal image, but rather that the retina, and associated target brain areas, generate a large number of representations of the visual field, each of which may show its own dependence on retinal eccentricity. These neural representations will therefore interact with eye movements to drive active visual processing of facial expressions in complex ways.
