Abstract
Image manipulation of real face photographs, including averaging, morphing, and caricaturing, is widely used in studies of face perception. These methods have led to theoretical insights across topics of interest to social psychologists, including social perception, social categorization, stereotyping, prejudice, impression formation, and individual differences. They may also have practical applications in diagnosing clinical impairments of social perception. Here, we outline key manipulation methods, comment on the strengths and weaknesses of the approach, illustrate the breadth of theoretical insights already achieved, and offer best practice guidelines. We hope that the review stimulates greater use of these powerful methods to understand social perception and encourages studies that bridge theory between visual perception and social psychology.
Keywords
A wealth of social cues are available from a person’s face (Bruce & Young, 2012; Rhodes, 2006) and facial impressions can have important social outcomes (Todorov, Olivola, Dotsch, & Mende-Siedlecki, 2015; Zebrowitz, 2005). Faces have even been described as the most important human social stimuli (Webster & MacLeod, 2011). Research on face perception is therefore central to an understanding of social cognition (Quinn & Macrae, 2011) and is increasingly being used to understand diverse high-level social processes of person construal, stereotyping, and prejudice (see Adams & Kleck, 2003; Blair, Judd, & Fallman, 2004; Freeman & Ambady, 2011; Hess, Adams Jr, & Kleck, 2008; Hugenberg & Sacco, 2008; Johnson, Freeman, & Pauker, 2012; Quinn & Macrae, 2011; Todorov et al., 2015, for influential papers showcasing the breadth of this literature). This recent drive builds on a long history of research employing face images to understand the social perception of groups and individuals (e.g., Bruce & Young, 1986; Perrett, May, & Yoshikawa, 1994; Rhodes & Tremewan, 1996; Secord, 1958; Taylor, Fiske, Etcoff, & Ruderman, 1978; Zebrowitz & Montepare, 1992).
Studies examining the social perception of faces often use photographs of real faces or alternatively, synthetic, computer-generated “FaceGen” images. Naturally, real photographs offer clear ecological validity, but they can also lack precise experimental control. Conversely, FaceGen images allow for greater experimental control over facial features, helping studies infer causality, but they do not always fully capture real-life cues (Crookes et al., 2015). One can capitalize on the strengths of each method by manipulating real face photographs using image-based techniques, including morphing and averaging. This image-based approach is gaining increasing currency in social psychology (e.g., Corneille, Huart, Becquart, & Brédart, 2004; Freeman, Pauker, & Sanchez, 2016; Oldmeadow, Sutherland, & Young, 2013; Sofer, Dotsch, Wigboldus, & Todorov, 2015). Importantly, facial manipulation techniques can maintain both experimental control and ecological validity, allowing stronger claims about which real-life face cues subserve social judgments.
These morphing techniques originate from a substantial research tradition in face perception (Bruce & Young, 2012; Calder, Rhodes, Johnson, & Haxby, 2011; Rhodes, 2006, review the field). Crucially, this face perception literature also investigates social and personality processes, including emotion perception, social categorization and stereotyping, self-categorization, personality and individual differences, and impression formation (Bruce & Young, 2012; Calder et al., 2011; Rhodes, 2006). Increased theoretical and methodological cross talk between social psychology and face perception is thus timely and would benefit progress in both fields. To encourage future integration, we show how the computer manipulation of face photographs is a powerful technique that can be applied to domains of interest to social psychologists.
We focus here on the manipulation of real face images rather than the complementary approach of reverse correlation, which instead seeks to visualize participants’ inner representations of stereotypes from deliberately minimal visual input (Dotsch, Wigboldus, Langner, & van Knippenberg, 2008; via Mangini & Biederman, 2004; instead, see Dotsch & Todorov, 2011, for a review). We also focus on readily available averaging and morphing techniques rather than more technically demanding facial manipulation approaches, such as the morphable face model approach (see Vetter & Walker, 2011, for a review). Morphable models have generated important insights into social perception and also allow the manipulation of real face photographs with naturalistic results (Walker & Vetter, 2009). However, face photographs are manipulated with reference to a custom-built statistical model of facial attributes (Blanz & Vetter, 1999), which is not readily available. Moreover, manipulation is also constrained to relationships between social and facial attributes found in the original population (e.g., Caucasian young adults; see Walker, Jiang, Vetter, & Sczesny, 2011, for a discussion). In contrast, facial averaging and morphing techniques are easily carried out via widely accessible software that can flexibly manipulate any new face image set.
We have three main aims. First, we outline key facial image manipulation methods, including landmarking, averaging, morphing, and caricaturing. Second, we show how these methods can be used to understand social perception. In doing so, we illustrate the depth and breadth of theoretical insights already achieved, from classic discoveries through to the recent application of these methods to impression formation. Third, we compare different software to help the interested reader get started. Although we concentrate on face photographs, we end by outlining how comparable methods can give insight into social perception from dynamic or 3-D faces, bodies, and voices.
We hope that this review will stimulate new research and help in the assessment of papers using these methods. For researchers who are already familiar with these techniques, we hope that our review will be useful in evaluating their uses and limitations, outlining different software capabilities, and setting out best practice guidelines. Our overall goal is to stimulate increased dialogue between face perception and social psychology for mutual benefit to both fields.
Face Manipulation Methods
We can think of a face photograph as involving two distinct properties (Bruce & Young, 2012). First, it represents 2-D facial shape: the positions and shapes of features (eyes, chin, etc.) as they are located in the photograph. Second, it represents the facial surface: the brightness and color of features, skin, and hair, including shading cues to 3-D facial shape from ambient lighting.
A digital face image is represented as an array of pixels, each with associated color and brightness values. Image manipulation programs can modify the pixel array to change either shape or surface. To represent and manipulate 2-D facial shape, landmarks are placed around key facial features (placing fiducial points or face extraction; Figure 1A). Landmark locations are defined by x and y coordinates. To represent and manipulate the facial surface, a mesh of small regions is created by joining imaginary lines between the landmarks (tessellation; Figure 2). Pixel values defining the facial surface (color, etc.) are then measured separately for each tessellated region.

(A) Face landmarking: Prespecified feature points are marked out on a face image (at a minimum, defining facial height, width, and eye position). (B) Averaging: A set of landmarked face images can be averaged together (e.g., a set of female faces to create an average female face). The images are first aligned to the average landmarked shape and then the aligned individual images are averaged together (thereby averaging the same features across images, such as the eyes or lips). (C) Morphing: Blending one face image into another, for example, blending a female average (leftmost image) into a male average (rightmost image). (D) Transforming: Transforming an original face image with reference to two other face images by applying the difference between these faces to the original face, for example, blending feminine (leftmost image) and masculine (rightmost image) versions of a single female face (here, taken from A) by transforming the image with reference to average female and male faces (here, taken from the end points of C). (E) Caricaturing: Creating a face image that looks more extreme than the average face by transforming a face image away from an average face, for example, creating a hyperfeminine version of an average female face by transforming it away from an average male face (left image) and vice versa for a hypermasculine face (right image). From “The Karolinska Directed Emotional Faces - KDEF” by D. Lundqvist, A. Flykt, and A. Öhman, 1998, CD Rom from Department of Clinical Neuroscience, Psychology section, Karolinska Institutet. Copyright 1998 by Karolinska Institutet, Department of Clinical Neuroscience, Section of Psychology, Stockholm, Sweden. Reprinted with permission.

A landmarked image (left) and a diagram of the tessellations between the landmarks (right). Morphing is carried out for each tessellated region separately. Image used with permission from David Perrett.
One way to understand this process is to imagine that the photograph is on a rubber sheet, mounted on a board with a pin through each landmark. If we drag the pins around, the rubber sheet will follow, so that the image shape (landmark positions) shifts while its surface properties (rubber sheet) remain the same, albeit now reshaped. If we have landmarked photographs of two people, and we bring all the landmarks into alignment across the photographs (usually by calculating the average position of each landmark across the images), then the 2-D shapes of each person’s image will now become identical, but their surface properties will still be quite different. Since the shapes are the same, we can now average the surface properties by averaging the pixel values in each of the tessellated regions (Figure 1B). This average image is often called a prototype or a composite.
This averaging method involves equal blends of constituent images. However, it is also possible to weight the shape and/or surface properties of different images in the final mix (e.g., to create a blend of 80% of image A and 20% of image B). This is face morphing or warping. By gradually varying the proportions contributed by each image, series of images can be created to blend one face into another (morphing continua). For example, a female face can be morphed into a male face (Figure 1C). Usually, both image shape and surface are manipulated (Figure 1C), but morphing can also be primarily carried out with either facial shape or surface (although complete separation is unlikely; Sormaz, Young, & Andrews, 2016).
Face transforming is an extension of morphing that transforms a single face image along a continuum by applying the difference between two other face images to the original. For example, a single female target can be changed to make her look more feminine or masculine by transforming her image with reference to male or female averages (Figure 1D).
Morphing and transforming both describe processes that blend faces together (e.g., morphing a female toward a male face). In contrast, face caricaturing involves creating a face image that looks more extreme than the original (a hyperface), by exaggerating differences, through transforming a face image away from a reference (usually an average face). For example, a hyper-male face can be created by transforming an average male face away from an average female face (Figure 1E). An individual face can also be caricatured away from the average to emphasize what is distinctive about that person (like a political caricature). Conversely, an individual face can be blended toward an average, reducing its distinctiveness (anticaricaturing).
Inevitably, these techniques involve trade-offs. For example, the number and positions of landmarks limit accuracy in shape representation: With more landmarks, finer-grained manipulation becomes possible. Less obvious is the problem that while the tessellated regions are usually triangular, photographs are usually comprised of square pixels. Therefore, reshaping a region may involve interpolating new pixels to fill stretched parts or deleting pixels where there is shrinkage. Sophisticated algorithms achieve this reshaping, but our rubber sheet analogy is clearly imperfect. Nevertheless, facial manipulation methods have led to very interesting insights into social perception. One reason why 2-D image manipulation can work so well might be because the images falling on our retinas are also 2-D, making us highly capable of extracting essential social information from this format.
Using Facial Manipulation to Examine Social Perception
Facial Landmarking
Although landmarking is usually only carried out to manipulate facial images, landmarked features can themselves be used to investigate important social cues. Vernon, Sutherland, Young, and Hartley (2014) utilized landmark locations across 1,000 naturalistic face images to define facial features (e.g., eye size) and then used these features to predict facial impressions. This approach gave insights into the pattern of facial cues underlying impressions and allowed impressions to be automatically extracted from new photographs, with obvious applications for real-world impression prediction (Vernon, Sutherland, Young, & Hartley, 2014; see also Lienhard, Ladret, & Caplier, 2015). The approach was inspired by pioneering work in social psychology by Zebrowitz and colleagues who used facial landmarks to test the theory that facial impressions are based on subtle resemblance to emotional expression and to investigate stereotypes (e.g., Zebrowitz, Kikuchi, & Fellous, 2010; Zebrowitz & Montepare, 1992). Once face images have been landmarked, it is straightforward to recover feature points to answer new research questions (O’Toole, 2011, reviews many other studies using facial morphology).
Facial Averaging
Face averaging was first described by Galton (1878) who created facial composites of criminals by photographing a series of mug shots using the same photographic plate. Galton was interested in discovering the physiognomy of criminality, believing that character is displayed in facial features (e.g., Lavater, 1778; an idea that remains hotly debated: Rule, Krendl, Ivcevic, & Ambady, 2013; Stillman, Maner, & Baumeister, 2010). To Galton’s chagrin, instead of demonstrating that criminals shared the same grotesque features, the resulting composites looked more attractive than the individual faces. This insight was the first demonstration of the average-is-attractive effect, with later studies replicating this classic finding with modern computer methods (Langlois & Roggman, 1990; see Rhodes, 2006, for a review).
Although Galton (1878) failed to demonstrate a criminal facial “type,” the logic of averaging across facial images to create prototypical representations of facial cues is elegant and very useful in understanding social perception. For example, by averaging across many female faces, one can visualize the prototypical female face (Figure 1B), because inconsistent facial attributes are averaged out, leaving only features that consistently indicate femininity. Averaging thus offers sensitive control over facial stimuli for hypothesis-driven research. Studies have utilized this approach, often in combination with morphing, to examine emotional expression (Calder, Young, Perrett, Etcoff, & Rowland, 1996), race (Hill, Bruce, & Akamatsu, 1995), sex (Bruce et al., 1993; R. Russell, 2010), age (Tiddeman, Burt, & Perrett, 2001; Figure 3), attractiveness (Perrett et al., 1994; Rhodes & Tremewan, 1996), stereotypes (Sutherland, Young, Mootz, & Oldmeadow, 2015), and first impressions (Sofer et al., 2015) and to demonstrate accurate facial judgments of the Big Five (Penton-Voak, Pound, Little, & Perrett, 2006).

Face prototypes made by averaging the shape and surface properties for individual faces within 5-year age brackets: (A) 20–24, (B) 25–29, (C) 30–34, (D) 35–39, (E) 40–44, (F) 45–49, and (G) 50–54 years. (H) The dark lines depict the second youngest face prototype, and the shaded area illustrates the difference in average shape between this prototype and the oldest face prototype. (I) A caricature of the color difference between (J) the oldest prototype and an overall prototype face of all age groups matched for shape (not shown). (K) Contrast- and color-enhanced image made by amplifying RGB color differences between the overall prototype and a uniform gray image. From “Perception of age in adult Caucasian male faces: Computer graphic manipulation of shape and colour information” by D. M. Burt and D. I. Perrett, 1995, Proceedings of the Royal Society of London. Series B: Biological Sciences, 259, 137–143. Copyright 1995 by The Royal Society. Figure modified with permission.
To build average faces, most studies start with tightly controlled face images, photographed under standardized conditions (e.g., frontal facing, consistent expression, standard lighting; e.g., Perrett et al., 1994; see Figure 3). These controlled images are well suited to a hypothesis-driven approach aimed at isolating the effect of a manipulated facial cue on social perception. Although this approach has clearly been very productive, it relies on predetermined hypotheses about which cues are important, risks missing important facial cues that were eliminated by standardization, and ignores the natural covariation between cues as well as their relative real-world importance (Dotsch & Todorov, 2011; Vernon et al., 2014).
In contrast, data-driven methods work by sampling a wide range of potential cues as they occur in the world and then examining how naturally occurring combinations of cues influence social perception, rather than prespecifying which cues may be important (Adolphs, Nummenmaa, Todorov, & Haxby, 2016). The value of using a data-driven approach to examine facial impressions was originally noted in the pioneering work by Secord (1958) who anticipated that “impressions would be based on a pattern of cues; indeed the impressions themselves were conceived to be complex in organization” (p. 304). Recent advances in the creation of dimensional models of impressions have validated Secord’s early insights: Facial impressions and their underlying cues are often complex, multicollinear, and interactive (Oosterhof & Todorov, 2008; Sutherland, Young, et al., 2015; Vernon et al., 2014; Walker & Vetter, 2016). Importantly, data-driven methods are particularly effective in understanding complex judgments of high-dimensional stimuli such as faces, because investigation is not limited by experimental manipulation of small sets of cues at a time (see Adolphs et al., 2016).
Face averaging can be adapted into a data-driven approach, because the technique works even with unstandardized images. Recently, we employed data-driven face averaging by first sampling 1,000 naturally varying face images from the Internet and having these images rated on social impressions (Sutherland et al., 2013). By averaging across the individual faces rated highest and lowest on a given attribute, such as trustworthiness, we could create prototypically “trustworthy” and “untrustworthy” faces, thus visualizing the facial features that naturally cue trustworthiness (Figure 4). Similar data-driven face prototypes have been constructed to visualize complex social groups and associated stereotypes to test predictions arising from social psychological theories (Oldmeadow et al., 2013); for example, bankers look less warm but more competent than nurses, supporting the Stereotype Content Model (Fiske, Cuddy, & Glick, 2007). Likewise, data-driven prototypes have been successfully employed to visualize perceived Big Five personality traits (Sutherland, Rowley, et al., 2015) as well as social norms (Ginosar, Rakelly, Sachs, Yin, & Efros, 2015).

Prototypically high (rightmost) and low (leftmost) attractive (A), trustworthy (B), and dominant (C) face averages, created by averaging together the 20 highest and least trustworthy, attractive, or dominant rated faces from 1,000 original images. Face images lying between the end point faces represent morphed blends of the prototype faces, in 20% steps. The face images demonstrate the facial cues naturally used to cue social impressions. For example, femininity, expression (smiling), head tilt, and age appear to cue perceived trustworthiness. From “Social inferences from faces: Ambient images generate a three-dimensional model,” by C. A. M. Sutherland, J. A. Oldmeadow, I. M. Santos, J. Towler, D. M. Burt, and A. W. Young, 2013, Cognition, 127, 105–118. Copyright 2012 by Elsevier. Figure modified with permission.
In these studies, perceivers’ natural impressions are examined with minimal researcher bias through sampling a large number of potential facial cues. Face averaging thereby offers a complementary approach to another data-driven method, reverse correlation (Dotsch & Todorov, 2011). Reverse correlation also seeks to model participants’ perceptions while minimizing researcher bias. Rather than sampling naturalistic cues to social perception, reverse correlation methods instead aim to visualize mental stereotypes from deliberately minimal visual information.
Facial Morphing and Transforming
Face morphing and transforming go further than averaging by parametrically varying images along a continuum (Figure 1C and D), thereby allowing systematic investigation of how facial cues contribute to social perception (Calder et al., 1996; Young et al., 1997). For example, morphing of facial emotional expressions has been employed to understand whether people represent these important social cues as discrete categories (Ekman & Friesen, 1975) or along emotion dimensions (J. A. Russell, 1980). People are better at discriminating pairs of faces categorized as representing different rather than the same expression, even when face pairs are equidistant along the morphed continuum (Calder et al., 1996; Calder, Young, Rowland, & Perrett, 1997; Etcoff & Magee, 1992). Better discrimination between categories than within categories is a hallmark property of categorical perception, demonstrating that emotional expression is represented categorically. Recently, Harris, Young, and Andrews (2012) combined these methods with neuroimaging to uncover the neural basis of emotion expression perception. They found that some brain regions represent facial emotional expression categorically, whereas others represent expression dimensionally (Figure 5; Harris, Young, & Andrews, 2012). The debate between these competing models (Ekman & Friesen, 1975; J. A. Russell, 1980) was thereby resolved by demonstrating that they hold at different levels of representation. Comparable methods could be used to test the dimensionality of social or personality models (e.g., Fiske et al., 2007; Oosterhof & Todorov, 2008; Osgood, 1969; Walker & Vetter, 2009; Wiggins, 1979).

Facial morphing used to understand dimensional and categorical representations of emotional expression in the brain for (A) same identity faces and (B) different identity faces. A block design functional magnetic resonance imaging (fMRI)-adaptation paradigm paradigm was used in which “within” and “between” blocks involved images from morphed continua that were equally different in terms of the number of steps apart, but images in a within block crossed a perceived emotion category boundary (e.g., happy vs. sad) and images in a between block did not (e.g., slightly vs. very happy). The “same” condition involved repeating the same image in a trial block to create an estimate of the maximum possible neural adaptation as a baseline. The data (C) show that the posterior superior temporal sulcus was sensitive to any change in facial expression (i.e., it represents emotion dimensions), while the amygdala was sensitive only to the shift in perceived emotional category (i.e., it represents emotion categories). This pattern was observed regardless of changes in face identity. From “Morphing between expressions dissociates continuous from categorical representations of facial expressions in the human brain,” by R. J. Harris, A. W. Young, and T. J. Andrews, 2012, Proceedings of the National Academy of Sciences, 109, 21164–21169. Copyright 2012 by National Academy of Sciences. Reprinted with permission.
Morphing has also been used to examine the effect of social categorization on face processing, since perceived categories can be manipulated while keeping facial features identical (Michel, Corneille, & Rossion, 2007, 2010). Targets can also be manipulated on multiple social categories simultaneously (Hopper, Finklea, Winkielman, & Huber, 2014). Morphing has also been used to examine how people perceive and evaluate ambiguous social categories, such as mixed-race targets (Freeman et al., 2016; Michel et al., 2010; Webster, Kaping, Mizokami, & Duhamel, 2004). For example, mixed-race face morphs are reliably categorized as out-group members, although race category boundaries may be surprisingly flexible with experience (Webster et al., 2004; but see Freeman et al., 2016). Using morphed faces may be particularly useful in future, given the increasing interest in understanding ambiguous, intersectional, and dynamic social categories (e.g., Lick, Johnson, & Riskind, 2015; Pauker et al., 2009).
Whereas morphing creates a direct transition between two images, transforming applies transitions to new photographs (Figures 1D and 6), making this technique highly adaptable. For example, transforming has been used to study self-perception by manipulating images of the participants themselves. Instead of picking their real image, people often choose images as veridical that have actually been morphed to be more attractive (Epley & Whitchurch, 2008; likewise for long-term romantic partners: Penton-Voak, Rowe, & Williams, 2007). Face transforming’s ability to create photo-realistic images thereby offers a powerful new tool for investigating other important aspects of self-representation, including social and personal identity (e.g., Ellemers, Spears, & Doosje, 2002; Sedikides & Brewer, 2015).

Male (top row) and female (bottom row) faces created to vary from low (left) to high (right) facial adiposity, by transforming with reference to average male or female faces that were either high or low on body mass index (BMI). BMI values of 16, 18, 20, 22, 24, and 26 (all kg/m2) are shown. Only face shape, not face surface, was transformed. Lower levels of facial adiposity were perceived as more attractive, and to a lesser extent as conveying higher leadership ability. Optimal BMI values were found to be relatively lower for female than male faces, perhaps reflecting gendered media representations of ideal body size. From “The effects of facial adiposity on attractiveness and perceived leadership ability,” by D. E. Re and D. I. Perrett, 2014, The Quarterly Journal of Experimental Psychology, 67, 676–686. Copyright 2013 by The Experimental Psychology Society by permission of Taylor & Francis Ltd. Reprinted with permission.
Transforming has also been useful in examining individual differences in social perception, because pairs of similar faces can be generated and people’s sensitivity to subtle differences in image pairs can be measured (Little, Jones, & DeBruine, 2011). For example, less dominant men are more sensitive to others’ dominance (Watkins, Jones, & DeBruine, 2010). Transforming politicians’ images has also demonstrated contextual shifts in leadership preferences, such that people prefer feminine-looking leaders during times of peace and masculine-looking leaders during war (Little, Burriss, Jones, & Verosky, 2007).
Facial Caricaturing
Face caricaturing involves creating a face image that looks more extreme than the original face, by transforming it away from a reference face (Brennan, 1982; Rhodes, Brennan, & Carey, 1987). Caricaturing can be used to exaggerate the effects of facial manipulation (Figure 1E). For example, it has been used to amplify subtle emotional expressions to test theoretical models of emotion representation (Calder et al., 2000, 1997; Figure 7).

Examples of veridical (0%) and caricatured (+50%) representations of facial expressions. One Ekman face (Ekman & Friesen, 1975; Young et al 2002) is shown posing expressions associated with the six basic emotions: happiness, surprise, fear, sadness, disgust, and anger. Caricatured (+50%) representations were prepared relative to a neutral expression average face, and only shape was manipulated in these faces. From “Computer-enhanced emotion in facial expressions,” by A. J. Calder, A. W. Young, D. Rowland, and D. I. Perrett, 1997, Proceedings of the Royal Society of London B: Biological Sciences, 264, 919–925. Copyright 1997 by The Royal Society. Reprinted with permission.
Caricaturing individual faces away from a prototype reference face has been used to examine how social attributes are coded in faces. For example, caricaturing individual faces away from an average face often makes them look less attractive, supporting the average-is-attractive effect (Rhodes & Tremewan, 1996). Facial manipulation has also demonstrated limits to the averageness benefit, as hyperattractive faces can be made by caricaturing average faces made solely from attractive original faces (Perrett et al., 1994, 1998; Rhodes, Hickford, & Jeffery, 2000). These studies have provided a window into the different (stabilizing vs. directional) selection pressures on attractiveness (Perrett et al., 1994; Rhodes & Tremewan, 1996).
Caricaturing has also been used to investigate the nature of face coding, in combination with perceptual adaptation to shift the reference norm. For instance, after exposing people to extremely feminine caricatured faces, androgynous faces look masculine (i.e., shifted away from the reference face: Pond et al., 2013; Zhao, Seriès, Hancock, & Bednar, 2011). These studies demonstrate the flexible nature of the visual system, which is continuously updating as new social input is encountered. That is, what looks “male,” “attractive,” or “Asian” is also highly dynamic and dependent on experience (Clifford & Rhodes, 2005, review this literature).
Finally, morphing and caricaturing were used in a well-established test of facial emotion perception (Young, Perrett, Calder, Sprengelmeyer, & Ekman, 2002). This test contains individual faces morphed between different emotional expressions, with caricaturing used to generate more extreme expressions to manipulate task difficulty. It has been used to understand emotion perception impairment in many clinical conditions, including autism (Bruce & Young, 2012). Recently, this approach has been extended to examine neuropsychological impairment in formation of key impressions such as trustworthiness (Sprengelmeyer et al., 2016).
Advantages and Limitations
The researcher can choose between using facial manipulation in a hypothesis-driven or data-driven approach, each with advantages and limitations, as outlined previously. In a hypothesis-driven approach, carefully standardized face images are used to gain precise control over prespecified facial cues (Figure 3; cf. Calder et al., 1996; Re & Perrett, 2014), allowing causal inferences to be drawn regarding which facial features are driving social judgments, or to generate controlled but realistic stimuli for use in subsequent studies. Facial manipulation has more recently also been employed as a data-driven approach by averaging across naturally varying face images (Sutherland et al., 2013). Averaging across natural variation loses experimental control but allows prototypical faces to be created that reflect the holistic covariation and relative importance of real-world cues as they naturally occur (Figure 4). Data-driven approaches may be particularly important for understanding impressions in applied settings, where it is critical to study natural variation. Importantly, hypothesis-driven and data-driven approaches are complementary.
Regardless of the overall approach taken, a limitation is that the initial landmarking stage can be time consuming and somewhat subjective. Although automatic landmarking exists, manual adjustment is usually required (Sutherland, 2015, presents detailed guidelines). To overcome subjectivity, one person should landmark all faces or at least markers should not be confounded with the experimental conditions. Ideally, a second person should check landmarks. More landmarks and more consistent landmarking aid photo-realism, but take longer, with more chance of disagreement among markers. One major goal of computer scientists interested in face perception is to develop optimal automatic feature extraction (Kemelmacher-Shlizerman, Suwajanakorn, & Seitz, 2014; Zhu & Ramanan, 2012).
Averaging also tends to produce images that look more attractive than the original photographs (Langlois & Roggman, 1990), making it harder to manipulate negative traits. Thus, where distinctiveness is an important cue, face averages may not be representative. Similarly, older or masculine faces are harder to generate, because averaging tends to smooth out age-related blemishes and wrinkles (Kemelmacher-Shlizerman et al., 2014; Tiddeman et al., 2001) as well as smoothing over masculine features like stubble or angular square jaws (Pond et al., 2013; Zhao et al., 2011). Advanced “wavelet texture” techniques can preserve fine surface detail and can easily run in Psychomorph software (Tiddeman et al., 2001). To minimize loss of shape information when averaging, feature points should be marked in consistent positions on all constituent faces. Including only extreme images can help maintain distinctiveness (e.g., Sutherland et al., 2013, created facial averages from images of elderly adults that appear reasonably old).
A more subtle point is that it is not always obvious what the reference face should be. For example, if manipulating emotional expression, should the reference norm be an unexpressive face, the average of all expressions, or just any other expression? It turns out that any of these reference faces will “work” to create easily recognized caricatures, which is theoretically important but not intuitive (Burton, Jeffery, Skinner, Benton, & Rhodes, 2017; Calder et al., 2000; Young et al., 1997). Although the distinction between emotional expression reference faces may not be vital, other studies have shown that faces do appear to be coded (at least partly) in relation to gender- and race-specific norms (Armann, Jeffery, Calder, & Rhodes, 2011; Little, DeBruine, & Jones, 2005). Do these findings mean that reference and manipulated faces should represent the same social category? Morphing relative to a reference face from the same rather than a different social category is often easier to interpret and generates clearer images (Rennels, Bronstad, & Langlois, 2008; Rhodes, 2006; but see DeBruine, Jones, Smith, & Little, 2010). Yet, where social categories contribute to impressions (e.g., female faces look more trustworthy; Figure 4), controlling them simply removes important cues. Importantly, answering these questions has itself offered insight into social category norms.
Regardless of the choice of reference face, all images used are best kept fairly similar or the process can look unnatural. For example, transforming between open and closed mouths will create artifacts around the lips. Since caricatures push beyond the boundaries of the original images, they will eventually always look grotesque (Pond et al., 2013; Zhao et al., 2011). Manipulating individual photographs can look very realistic (Figure 6) and limiting transformation to face shape (keeping surface consistent) can also help (Watkins et al., 2010). However, surface cues can be important (e.g., for male attractiveness: Said & Todorov, 2011; Torrance, Wincenciak, Hahn, DeBruine, & Jones, 2014). Nevertheless, even if images are too unnatural to be used as stimuli, they can still be very informative to researchers in depicting which attributes are changing.
Finally, a key question affecting all facial manipulation techniques is whether the outcome of multiple linear transforms (in landmarks, pixel colors, etc.) is itself linear, as is widely assumed. As outlined at the start, blending may involve nonlinear transformations when tessellated surface areas are compressed or expanded. Certainly, morphing and transforming steps are often not perceived as linear (Pond et al., 2013), which has provided interesting insights into social categorization (Calder et al., 1996).
A strong approach to address all of these limitations is to first use real face photographs and then provide converging evidence with experimentally manipulated face images. For example, averaging can efficiently replicate findings based on unmanipulated face images with different stimuli (and participants), demonstrating that initial results were not due to noise in the original images (as averaging removes noise). Careful pretesting ensures that images are perceived as intended (Pond et al., 2013). Moreover, unlike FaceGen images, manipulated images as described here are built directly from photographs of real faces, thereby preserving important covariation between cues. They also preserve important surface cues, thus look more natural.
Getting Started
We outline three facial manipulation programs: Fantamorph, Morpheus, and Psychomorph (Table 1). These were chosen because they are readily available, inexpensive/free, and have now been used to study social categorization and stereotyping, impression formation, individual and cultural differences, and clinical impairment (e.g., Frenkel, Lamy, Algom, & Bar-Haim, 2009; Ishii, Miyamoto, Mayama, & Niedenthal, 2011; Sofer et al., 2015; Wu, Laeng, & Magnussen, 2012). GryphonMorph has been used in papers on facial attractiveness (e.g., Rennels et al., 2008) but is now discontinued.
Leading Morphing Software.
Note. N = no; Y = yes
aCitation count via Google Scholar (May 17, 2016) using the titles as search terms plus the word “face.” These counts underestimate how widely these methods have been used, as unfortunately not all papers cite the software used. There were an additional 7 overlapping citations, removed from all citation counts, and an additional 196 unique citations for Gryphon Morph (now discontinued).
Fantamorph (www.fantamorph.com/index.html) and Morpheus (www.morpheussoftware.net/) are commercial programs costing less than US$100, which offer basic face manipulation capability. They come with user guides, although not aimed at researchers. Psychomorph is free software developed by Perrett, Tiddeman, and their colleagues (http://users.aber.ac.uk/bpt/jpsychomorph; Burt & Perrett, 1995; Tiddeman et al., 2001). Psychomorph has unlimited averaging, transforming, and caricaturing and clearer descriptions of underlying calculations. For advanced users, Psychomorph can automatically process multiple images, and the code (JavaScript) can be customized. The Psychomorph site includes help pages, and a user guide exists (Sutherland, 2015). We recommend that new users of morphing software start with extreme faces (e.g., highly different colors/shapes) to understand available features.
Beyond the Face Photograph
Although we have focused on facial image manipulation, the principles outlined here apply to other aspects of social perception, including dynamic or 3-D faces, bodies, and voices (Rowland & Perrett, 1995). MorphAnalyser can manipulate 3-D faces (http://cherry.dcs.aber.ac.uk:8080/wiki/MorphAnalyser) and has been used to investigate ideal facial adiposity and how it varies with sociocultural norms (Coetzee, Re, Perrett, Tiddeman, & Xiao, 2011). As mentioned previously, Vetter and colleagues have also developed sophisticated facial manipulation software (“morphable models”), which can manipulate 3-D (and 2-D) face images on social impressions (Blanz & Vetter, 1999; Vetter & Walker, 2011). Psychomorph can use sequences of frames to morph dynamic input (http://cherry.dcs.aber.ac.uk:8080/wiki/videomorph) and has been used to understand dynamic perception of pro-social traits (Morrison, Clark, Tiddeman, & Penton-Voak, 2010). Psychomorph can also be applied to bodies (Rowland & Perrett, 1995).
Finally, voice manipulation is a relatively new development. Interestingly, preliminary findings appear to parallel those from face perception research (Latinus & Belin, 2011; Schweinberger, Kawahara, Simpson, Skuk, & Zäske, 2014, review this literature). For example, average voices sound attractive (Bruckert et al., 2010). Examining the social perception of dynamic faces, bodies, and voices using averaging and morphing is ripe for further study.
Conclusions
Facial image manipulation has led to many insights into social perception, including the basis of facial attractiveness, the relationship between visual and conceptual stereotyping, and the role of individual differences in impression formation. The approach has practical applications, including the ability to test clinical social impairment. We hope that this review stimulates more researchers to use these techniques, with a better appreciation of the breadth of insights already achieved and the strengths and limitations of the approach. We see great scope for future use of these methods within social psychology.
Footnotes
Acknowledgments
We thank Julian Oldmeadow, Nichola Burton, David Lick, and Stephen Pond for their invaluable comments on an earlier version of the article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Australian Research Council (ARC) Centre of Excellence in Cognition and its Disorders (CE110001021), an ARC Discovery Outstanding Researcher Award to Rhodes (DP130102300), and an ARC Discovery grant to Rhodes, Sutherland, and Young (DP170104602).
