Abstract
Despite the pervasiveness of facial inferences, scholars have debated whether our face reflects valid information regarding how we actually behave. Whereas previous research has largely focused on the accuracy of facial inferences, the present research examined the validity of face-based judgments. Specifically, we tested how accurate face-based judgments are, whether the accuracy of and confidence in face-based judgments are associated, and what mechanisms potentially link facial appearance to behaviors (N = 1,386 American and Korean adults). We found that although face-based judgments could accurately predict someone’s behavior (Study 1), participants’ confidence about their face-based judgments was not associated with their accuracy (Studies 2a and 2b). Moreover, Study 3 demonstrated that the accuracy of facial inferences is possibly due to self-fulfilling effects of facial inferences. That is, accuracy is largely driven by perceivers’ beliefs rather than by the direct association between faces and behaviors.
Keywords
People show strong interest in human faces (Hassin & Trope, 2000; Todorov, 2017; Todorov et al., 2015) and make numerous inferences from faces (Willis & Todorov, 2006). Moreover, facial inferences have been shown to be almost automatic (Todorov et al., 2005; Willis & Todorov, 2006) and consensual across diverse populations (Cogsdill et al., 2014; Oosterhof & Todorov, 2008; Rule et al., 2010). It has also been demonstrated that people act on facial inferences (Na et al., 2015; Todorov et al., 2005). In other words, people cannot help but make inferences from others’ faces and use such inferences as a basis for important social judgments. Despite such pervasiveness of facial inferences, there has been much debate on whether faces provide valid clues about psychological tendencies (Hassin & Trope, 2000; Olivola & Todorov, 2010; Rule et al., 2013; Todorov et al., 2015). In the present research, we addressed this issue by evaluating the validity of facial inferences in social judgments. Specifically, we examined (a) whether one’s facial appearance could predict one’s behaviors, (b) whether individuals would have metacognitive awareness about their face-based judgments, and (c) what would be responsible for the association if facial appearance predicted behaviors.
Facial Appearance and Psychological Tendencies
Emerging literature suggests that facial appearance may be significantly associated with psychological tendencies. For example, Rule and colleagues demonstrated that participants correctly inferred others’ sexual orientation from their faces (Rule et al., 2008, 2009; but see Cox et al., 2016). Likewise, one could tell someone’s political orientation from their face (Rule & Ambady, 2010; Samochowiec et al., 2010). Moreover, it has been shown that one can predict who is a criminal (Valla et al., 2011) and who is a more successful CEO (Rule & Ambady, 2008) at better than chance level just by looking at other people’s faces. In particular, a specific feature of people’s faces, namely, the facial width-to-height ratio, is linked to various types of antisocial behaviors through its association with the person’s level of testosterone (Carré & McCormick, 2008; Lefevre et al., 2013). All in all, this line of research indicates that facial appearance reflects psychological tendencies.
However, there is another line of research casting doubt on the direct association between facial appearance and psychological tendencies. First, consistent inferences about someone’s face may not be possible because of substantial within-person variability (Jenkins et al., 2011; Todorov & Porter, 2014). Also, given such variability, it is possible that people (implicitly or explicitly) present their faces in accordance with their psychological tendencies, such as sexual orientation (Todorov et al., 2015). Secondly, the accuracy of face-based judgments might be driven by contextual factors other than facial appearance (Cox et al., 2016; Todorov et al., 2015; Todorov & Porter, 2014) or by demographic information such as age and gender (Olivola et al., 2012; Todorov, 2017). Moreover, it is suggested that the facial width-to-height ratio may not serve as a valid cue for antisocial tendencies (Kosinski, 2017) because physical characteristics are no longer closely tied to antisocial tendencies in modern environments (Wang et al., 2019). Likewise, recent studies questioned the association between CEOs’ faces and their performances (Graham et al., 2017; Stoker et al., 2016).
Taken together, researchers have debated whether facial appearance significantly predicts psychological tendencies. Extending this discussion, the present research investigated the validity of using facial appearance in social judgments. It might be more important to look at the validity than the accuracy per se because the informational values of facial inferences can still be called into question even when faces significantly predict psychological tendencies.
Statement of Relevance
People commonly infer personality traits (e.g., trustworthiness) from the “look” of a face. Furthermore, people are not shy about making important judgments on the basis of facial inferences. For example, they make inferences about sexual and political orientation, whether somebody is a criminal, and even whether they are successful in business. However, can one’s face really serve as a valid cue for their psychological tendencies? This research reveals that although facial inferences are sometimes accurate, their usefulness is quite limited because people cannot consciously tell when they should or should not trust their inferences. More importantly, our findings suggest that accuracy itself is driven by the self-fulfilling effect of perceivers’ beliefs rather than facial appearances per se. Given the critical role of faces in social interactions, future research should examine when and how facial inferences can be accurate.
The Validity of Face-Based Judgments
For facial appearance to be valid and useful in social judgments, we thought at least three conditions should be met. First, facial appearance should be associated with psychological tendencies. Next, because the association is unlikely to be perfect, individuals need to know when they can or cannot rely on facial appearance. Finally, facial appearance should play a unique role above and beyond other factors in such an association. We first expected that facial appearance could predict psychological tendencies at least to some extent, although further investigation is still needed about whether one’s face truly reflects one’s internal attributes. However, we also expected that people could not tell when their face-based judgments were accurate because Olivola and colleagues (2014) failed to find any evidence for this kind of metacognitive awareness in facial inferences. This lack of metacognitive awareness substantially curtails the usefulness of facial appearance in social judgments.
Furthermore, we hypothesized that even when facial inferences predict psychological tendencies, such an association might occur indirectly through social expectations. Numerous studies show that people’s social expectations lead them to behave in ways that confirm existing expectations (Rosenthal & Jacobson, 1968; Snyder & Haugen, 1994). We argue that social expectations regarding one’s face would also be vulnerable to this kind of self-fulfilling prophecy. In fact, we believe that the effect would be quite substantial given that facial inferences are highly consensual across diverse perceivers (Cogsdill et al., 2014; Oosterhof & Todorov, 2008; Rule et al., 2010). Imagine that you want to offer illegal bribes for your business. It is probably wise to choose someone as your target whom you think is corruptible. Therefore, individuals who look corruptible would be bribed more often than those who do not. Then, even when the likelihood of accepting an illegal bribe is not related to their appearance, individuals who look corruptible would end up engaging in unethical behaviors more frequently simply because they have more opportunities.
Present Research
In the present research, we aimed to go beyond examining whether one’s face and psychological tendencies are associated and to investigate whether facial appearance could indeed serve as a valid basis for social judgments. Toward this end, Study 1 examined the accuracy of facial inferences in terms of predicting behaviors, more specifically, how accurately facial inferences of trustworthiness could predict who engaged in unethical behaviors. Studies 2a and 2b investigated whether individuals would have metacognitive awareness of their face-based judgments. Finally, Study 3 examined whether the association between facial appearance and behaviors would be mainly driven by participants’ expectations rather than targets’ psychological tendencies. All three studies were approved by the institutional review board at The University of Texas at Dallas (IRB MR 14-400) and Sogang University (SGUIRB-A-1806-29).
Study 1
In Study 1, we tested whether one’s face reflects one’s tendency to engage in unethical behaviors by investigating the association between politicians’ faces and their records of corruption.
Method
Participants
We recruited 184 American undergraduates at The University of Texas at Dallas in exchange for partial course credit (137 female, two undisclosed; age: M = 21.37 years, SD = 5.12). The data structure of the present research was complex (e.g., cross-classified), which made it difficult to conduct a traditional power analysis (Arend & Schäfer, 2019). For a reference point, we conducted a simulation-based power analysis using the simr package (Version 1.0.5; Green & MacLeod, 2016) in the R programming environment (Version 3.5.3; R Core Team, 2019), which showed that at least 60 participants would be needed to achieve the minimum power of 80%. However, to perform this power analysis, we needed to specify population variance across participants and across stimuli, for which we did not have any a priori knowledge and had to use some plausible but arbitrarily chosen values (for details, see the Supplemental Material available online). Considering this limitation as well as the fact that a multilevel design is less efficient for estimating the effect of a Level 2 predictor (the participant level in our data; Snijders, 2005), we decided to recruit as many participants as possible within the limits of available resources (e.g., the subject-pool size).
Face stimuli
Given the high cross-cultural consensus on trait inferences of out-group faces as well as in-group faces (Na et al., 2015; Rule et al., 2010), we used face images of Korean politicians. This also made it possible that American participants would make face-to-trait inferences without being affected by prior knowledge of the targets. Stimuli were 24 pairs of facial photos of Korean politicians. All were the winners of Korean Assembly elections. All target images were cropped to a uniform size and converted into black-and-white images on a plain background. In each pair, one politician who had a clean record (the trustworthy target) was matched to the other politician who had a corrupt record (the untrustworthy or corrupted target).
To control for other extraneous cues, we matched the targets in a given pair on hairstyle, facial expression, glasses, and age (±10 years). Female politicians were excluded because of concerns regarding the effects of gender stereotypes on judgments (Eagly & Karau, 1991) as well as the lack of female members of the Korean Assembly involved in political corruption. Because inferences drawn from the headshots could vary as a function of the contexts in which the pictures were taken (Rule et al., 2013; Todorov & Porter, 2014), we used only photos taken during politician’s tenure at the Assembly. Finally, in the case of the corrupted targets, we used the pictures taken before their criminal arrest or indictment had occurred.
Procedure
Participants were presented with a series of face pairs in a random order and asked to indicate which face in each pair looked more trustworthy. The facial photos in each pair were presented side by side on a computer screen, and the position of the trustworthy target was counterbalanced. After making trustworthiness judgments, participants indicated whether they had recognized the target faces. None of the participants recognized any of the target faces.
Results
We analyzed the accuracy of participants’ face-based judgments (i.e., whether the face of the trustworthy target was indeed perceived as trustworthy or not). A cross-classified multilevel model was used in which responses were simultaneously nested within participants and target pairs, which crossed each other. 1 The results are summarized in Table 1. Across all participants and pairs, overall accuracy (61%) was better than chance (50%; odds ratio [OR] = 1.56, p = .008; Model 1). When participants’ age and gender were entered into the model as covariates, overall accuracy was still significantly better than chance level (OR = 1.93, p < .001; Model 2). Unexpectedly, female participants were less accurate than male participants (OR = 0.76, p = .002). This may be because male participants evaluated same-gender targets, whereas female participants evaluated cross-gender targets. We address related issues in the Discussion section (for additional analyses, see the Supplemental Material). Taken together, these results suggest that people are able to accurately infer male politicians’ unethical behaviors to a certain degree just by looking at their faces.
Study 1: Results of Cross-Classified Multilevel Logistic Models Predicting Politician’s Trustworthiness From Participants’ Facial Judgments
Note: The marginal R2 and conditional R2 for Model 2 are .003 and .148, respectively. The marginal and conditional R2s could not be obtained for Model 1 because it is the baseline model against which the R2s were calculated. Age was grand-mean centered. Gender was coded 0 for male and 1 for female. OR = odds ratio; CI = confidence interval.
Study 2a
Study 2a examined individuals’ metacognitive awareness in facial inferences. The accuracy of facial inferences was not perfect in Study 1. Therefore, for facial inferences to be truly meaningful, individuals should be able to tell when they could or could not trust their inferences. Study 2a also examined whether a negatively framed question would yield the same pattern of results.
Method
Participants
Two hundred Americans were recruited through Amazon’s Mechanical Turk (MTurk). Although the power analysis in Study 1 indicated that 60 participants would be needed to achieve 80% power, we decided to collect a larger sample, considering the limitation of the power analysis and possible dropouts. Given that a potentially infinite number of participants in MTurk would be willing to participate, we made an a priori decision to recruit 100 participants per condition instead of recruiting as many participants as possible, as in Study 1. During the debriefing, two participants reported that they had recognized some of the target faces. Thus, we excluded these participants from the analyses, resulting in a final sample of 198 Americans (129 female, one undisclosed; age: M = 37.85 years, SD = 12.71). The inclusion of the two additional participants did not substantially change the results.
Procedure
The experimental procedure regarding the face stimuli was the same as in Study 1, with the following exceptions. First, each participant was randomly assigned to one of two judgment conditions. Participants indicated which face in a given pair looked more trustworthy in the trustworthiness condition (n = 99) and more corruptible in the corruptibility condition (n = 99). Second, after making each judgment, participants reported their level of confidence in that judgment on an 11-point scale ranging from 0 (not at all confident) to 10 (absolutely confident).
Results
As can be seen in Table 2, a cross-classified multilevel analysis showed that facial judgments made about male politicians predicted the politicians’ behaviors at better than chance level not only for trustworthiness (59.6%; OR = 1.47, p = .005) but also for corruptibility (56.5%; OR = 1.30, p = .003; Model 1). Again, the pattern did not change after we controlled for participants’ age and gender (Model 2). However, we failed to find evidence for the metacognitive awareness of facial inferences (see Table S1 in the Supplemental Material). As predicted, participants’ confidence was not significantly associated with their judgment accuracy (OR = 1.01, p = .711). This pattern did not vary as a function of the type of judgment (trustworthiness vs. corruptibility; OR = 1.05, p = .093).
Studies 2a and 2b: Results of Cross-Classified Multilevel Logistic Models Predicting Politician’s Trustworthiness and Corruptibility From Participants’ Facial Judgments
Note: The marginal R2s and conditional R2s for Model 2 are, respectively, .001 and .102 (Study 2a: trustworthiness judgment), .001 and .047 (Study 2a: corruptibility judgment), .002 and .217 (Study 2b: trustworthiness judgment), and .005 and .162 (Study 2b: corruptibility judgment). The marginal and conditional R2s could not be obtained for Model 1 because it is the baseline model against which the R2s were calculated. Age was grand-mean centered. Gender was coded 0 for male and 1 for female. OR = odds ratio; CI = confidence interval.
Study 2b
The importance of traits in social judgments varies across cultures (e.g., Markus & Kitayama, 1991). Also, in spite of cross-cultural consensus in facial inferences (Rule et al., 2010), cross-cultural differences have sometimes been found in face recognition (Malpass & Kravitz, 1969) and the use of facial inferences in social judgments (Na et al., 2015). Thus, in Study 2b, we sought to replicate the results of Study 2a with Korean participants.
Method
We recruited 148 Korean undergraduates at Sogang University (101 female; age: M = 21.53 years, SD = 2.15). Considering the result and limitation of the power analysis performed in Study 1, we aimed to recruit as many participants as possible from the subject pool, including at least 60 participants in each condition. Each participant was randomly assigned to either the trustworthiness condition (n = 73) or the corruptibility condition (n = 75). The procedure was the same as in Study 2a except that we used 24 pairs of photos of male politicians in the United States (i.e., senators or congressmen). All of the photos used in Study 2b were prepared in the same way as in the previous studies.
Results
As in Study 2a, the average accuracy of face-based judgments was above chance both for trustworthiness (66.1%; OR = 1.95, p = .003) and for corruptibility (68.5%; OR = 2.18, p < .001), and this pattern did not change after we controlled for participants’ age and gender (see Table 2). Again, as shown in Table S1, participants’ level of confidence in their judgments was not significantly associated with their judgment accuracy (OR = 1.02, p = .397). Further, the confidence–accuracy association did not differ between the two conditions (trustworthiness vs. corruptibility; OR = 1.04, p = .289). Unexpectedly, a main effect of the condition was significant (OR = 0.83, p = .029; see Table S1). Trustworthiness judgments were lower in accuracy than corruptibility judgments. However, this difference was no longer significant when we controlled for age and gender (OR = 0.85, p = .064). Thus, it can be said that the results did not substantially vary between the two conditions.
To sum up, we found that participants’ accuracy in predicting unethical behaviors of male politicians from facial inferences was better than chance. However, our data suggest that individuals are not consciously aware of when they can be confident about their facial inferences.
Study 3
In the previous studies, we showed that faces were significantly associated with behaviors. However, the results do not provide any clues about the underlying mechanism of such an association. Thus, in Study 3, we investigated whether the face–behavior relation would be driven by self-fulfilling prophecy. Specifically, we predicted that individuals who look corruptible would end up accepting more bribes than their counterparts simply because they are bribed more frequently. To test this prediction, we recruited two groups of participants—a target group (Korean participants) whose face and behavior were evaluated and a perceiver group (American participants) who evaluated targets’ faces and decided whether to offer illegal bribes to them.
Method
Study 3 consisted of three stages: (a) the generation of face stimuli from Korean targets, (b) the facial and bribe judgments by American perceivers, and (c) the corruption judgments by Korean targets.
Stage 1: face-stimuli generation
Participants
Fifty-six Koreans (34 female; age: M = 22.29 years, SD = 2.90) agreed to submit their facial pictures in exchange for monetary compensation (worth approximately $5 U.S.). Participants were recruited through online and off-line fliers posted around Sogang University. We expected some difficulty recruiting participants because they might not be willing to provide a picture of their face to strangers. Therefore, we did not predetermine the number of participants, aiming to recruit as many as possible during a 3-month period. Even though we did not perform any power analysis to inform our sample-size determination for Study 3, we looked into a comprehensive simulation study to get some information about the statistical power that our study may have. The simulation study performed by Arend and Schäfer (2019) examined sample sizes required to achieve enough power to detect various effects in multilevel models and showed that a moderate to large effect was detectable with a power of 80% for the conditions similar to those in our data, in which four observations (corresponding to the four scenario conditions in our data) were nested within each of 50 or 60 persons.
Face stimuli
Participants were told that their pictures would be used in a study on first impressions. Face photos were required to be frontal images and to have (a) natural expressions, (b) no adornments (e.g., earrings, hairband, hat), (c) adequate illumination, and (d) no editing. All selected photos were converted to black and white, placed on a plain background, and cropped to a uniform size.
Stage 2: facial and bribery judgments by American perceivers
Participants
First, 56 Korean targets were divided into four groups of similar sizes (13, 14, 15, and 14 Korean targets per each group) to reduce the response burden. Afterward, we recruited American perceivers in four different waves. Perceivers were recruited and compensated through MTurk. No previous research was available to guide us to determine the size of the perceiver sample. However, considering that perceivers’ ratings of a target should be aggregated to produce the target’s trait scores used for further analyses, we decided to recruit a large number of perceivers to obtain reliable and unbiased trait scores for each target. Specifically, we recruited 200 perceivers per wave, that is, each target was evaluated by 200 perceivers. Thus, across four waves, we recruited a total of 800 American perceivers (531 female, three undisclosed; age: M = 42.77 years, SD = 13.49).
Procedure
American perceivers responded to an online survey and completed two tasks: the trait inference and the bribery judgment. In the trait-inference task, perceivers made trait inferences about strangers on the basis of their faces. Specifically, perceivers made face-based judgments about traits relevant to unethical behaviors (trustworthy, corruptible, ethical, and honest), filler traits (dominant and competent), and control traits (attractive and happy vs. angry expression) across eight blocks. Each trait was rated in a separate block to minimize any carryover effects, and the order of the blocks was randomized. In each block, perceivers viewed the facial images of the Korean targets in a random order and made face-to-trait inferences (e.g., “How trustworthy is this person?”) on a 7-point scale ranging from 1 (not at all trustworthy) to 7 (very trustworthy).
After completing the trait-inference task, perceivers worked on the bribery-judgment task. For this task, we developed four different bribery situations in which the protagonist was contemplating whether to offer an unethical payment to a given target: (a) bribing a government official to win a contract, (b) bribing a colleague to conceal a mistake, (c) bribing someone at the U.S. airport to avoid paying taxes, and (d) bribing someone to cut in line at a concert. Each perceiver was randomly assigned to one of these four scenarios (i.e., 50 participants per scenario) and asked to imagine being the protagonist in the scenario. After reading the assigned scenario, perceivers were shown the Korean faces presented in the trait-inference task and made a binary judgment regarding whether they would bribe that person if he or she were the recipient/target. Perceivers were informed that they should make the bribery judgment solely on the basis of the target’s face. Finally, perceivers were asked whether they had recognized any of the targets. None of the perceivers recognized any of the target faces.
Stage 3: corruption judgments by Korean targets
In Stage 3, we emailed an online survey to the Korean targets in Stage 1. All of them responded to the survey. In the survey, they read four bribery scenarios used in Stage 2 and indicated whether they would accept the bribe if they were a recipient. Specifically, the judgments were measured in two steps. For each situation, they initially reported whether to accept the bribe (the binary judgment) and then indicated how many offers they would accept if there were a number of similar offers (the frequency judgment). The exact number of bribe offers in the frequency judgment was determined by the total number of the American perceivers who offered a bribe to a given Korean target in Stage 2. For example, if 14 American perceivers decided to bribe a Korean target in a given scenario, the Korean target was asked how many offers he or she would accept out of 14 similar offers. Because the number of offers varied across targets, we were able to estimate how the number of offers would be associated with the number of corrupt decisions (i.e., accepting the bribe). Moreover, because Korean targets made the binary judgment before the frequency judgment, we were able to measure targets’ initial tendency to accept an illegal bribe regardless of the number of offers. In summary, our research design allowed us to independently examine whether one’s facial appearance would be associated with (a) the number of illegal bribes he or she would receive and (b) his or her tendency to accept an illegal bribe.
Results
Face-based trait inferences and bribery judgments by American perceivers
Using a cross-classified multilevel logistic model, we tested whether perceivers’ rating of a target face on each trait predicted their bribe offer to the target while controlling for the perceived attractiveness and emotional expression of the target face. The results showed that perceivers were significantly more likely to offer a bribe when they rated the target face as less trustworthy (OR = 0.69, p < .001), more corruptible (OR = 1.39, p < .001), less ethical (OR = 0.67, p < .001), or less honest (OR = 0.67, p < .001; see Table S2 in the Supplemental Material). In other words, individuals who looked more corruptible or less trustworthy were given more opportunities for unethical behaviors than those who looked less corruptible or more trustworthy.
Bribery judgments by Korean targets
Next, we investigated whether corruptible-looking Koreans would actually engage in unethical behaviors more often than their counterparts. First, we created the independent variables, the trait ratings for each Korean face, by averaging the ratings from American perceivers who rated the Korean face. Then, we calculated three dependent variables: the binary decision (yes/no in the binary judgment), the overall frequency (the number of corrupt decisions in the frequency judgment), and the relative frequency of corrupt decisions (the overall frequency divided by the total number of bribe offers, indicating the likelihood of making a corrupt decision). Multilevel logistic models were used for predicting the binary decision and the relative frequency of corrupt decisions, whereas a multilevel zero-inflated negative binomial model was used for predicting the frequency of corrupt decisions. The results showed that the overall frequency of corrupt decisions was significantly higher for Koreans whose faces were rated as less trustworthy (incidence-rate ratio [IRR] = 0.60, p < .001), more corruptible (IRR = 1.91, p < .001), less ethical (IRR = 0.57, p < .001), and less honest (IRR = 0.58, p < .001). However, none of the four facial ratings significantly predicted the binary decision or the relative frequency of corrupt decisions (see Tables S3 to S5 in the Supplemental Material). Thus, although corruptible-looking Koreans accepted illegal bribes more frequently than their counterparts, it was not because they were more likely to accept any given illegal bribe.
Mediation analyses
Finally, we tested whether the relation between facial inferences (independent variables) and the overall frequency of corrupt decisions (dependent variable) was mediated by the total number of bribe offers (mediator). To summarize, the association between each independent variable and the mediator was significant, and the relation between the mediator and the dependent variable was also significant when each independent variable was controlled for. In addition, the direct effect of each independent variable was not only weaker than the corresponding total effect but also nonsignificant (see Table 3). These results supported the intermediate role of the total number of bribe offers connecting the facial ratings with the overall frequency of corrupt decisions. This implies that individuals who look more corruptible might manifest unethical behaviors more frequently because they just have more chances to do so. In other words, corruptible-looking targets may not engage in unethical behaviors more frequently if they receive as many bribe offers as trustworthy-looking targets.
Study 3: Results of Mediation Analyses Examining the Effect of Each Facial-Trait Rating (X) on the Overall Frequency of Accepting Bribe Offers (Y), as Mediated by the Total Number of Bribe Offers (M)
Note: The total effect is the effect of X on Y without controlling for M. The direct effect is the effect of X on Y controlling for M. All the independent variables were grand-mean centered. IRR = incidence-rate ratio; CI = confidence interval.
Discussion
The present research demonstrated that the informational values of facial appearance in social judgments could be substantially compromised even when facial appearance significantly predicted behaviors. Specifically, although trustworthiness and corruptibility judgments made about politicians’ faces significantly predicted whether the politicians engaged in unethical behaviors (Studies 1–2b), participants’ confidence was not associated with the accuracy of their face-based judgments (Studies 2a and 2b). Moreover, Study 3 showed that the likelihood of accepting an illegal bribe did not significantly vary as a function of facial appearance. In this study, individuals whose faces were perceived as less trustworthy or more corruptible indeed accepted illegal bribes more frequently than did their counterparts, and yet the effect was mediated by the number of illegal offers they received.
It is noteworthy that facial appearance was significantly associated with behaviors. We speculate that our experimental designs contributed to the significant results. In the stimuli pairs in Studies 1 to 2b, politicians with a clean record were directly contrasted with politicians convicted of corruption. In a similar vein, the type of behaviors used in Study 3 can be said to be extreme because they involve an illegal bribe. This kind of extremity in the experimental designs might have increased the chance of detecting the effect of facial appearance. Given that a growing number of studies have found a nonsignificant association between faces and psychological tendencies (Olivola & Todorov, 2010; Rule et al., 2013; Todorov et al., 2015), the adequate interpretation of our data would be that faces could (but not should) predict behaviors. We also think that it would be a worthy endeavor to investigate when faces could or could not predict behaviors.
Importantly, our results also suggest that the informational values of facial inferences in social perception are substantially curtailed. Although facial inferences were significantly associated with behaviors in the present research, accuracy was only moderately better than chance level. Thus, it is important to know when people can or cannot base their judgments on facial inferences. However, participants’ confidence about their face-based judgments was not associated with their accuracy in Studies 2a and 2b. This finding is consistent with results of previous research showing that people are poor at evaluating their own face-based judgments (Hassin & Trope, 2000; Olivola et al., 2014). Taken together, the current literature indicates that individuals lack metacognition about their facial inference, which limits its usefulness in social judgments.
Another contribution of our research is that it demonstrates that perceivers’ expectations drive the significant associations between faces and behaviors. The more corruptible a recipient’s face looked, the more frequently the American perceivers in Study 3 decided to offer an illegal bribe. This, in turn, led the Korean targets to accept more illegal bribes in the frequency judgment. However, neither the binary judgment nor the relative frequency of corrupt decisions was associated with recipients’ facial appearance. Indeed, the effect in the frequency judgment disappeared when we controlled for the differences in the number of bribe offers. In other words, the observed association between faces and behaviors could be explained by social expectations. The finding adds to an emerging literature suggesting that the association between facial appearance and psychological tendencies results from factors other than facial appearance, such as contextual factors (Todorov et al., 2015; Todorov & Porter, 2014) or demographic information (Olivola et al., 2012; Todorov, 2017). More generally, the present research attests to the time-honored idea that behaviors are a joint product of the person and the situation (Ross & Nisbett, 1991).
Before closing, we should note that the targets in Studies 1 and 2 were only male politicians. This warrants cautious interpretation of the findings, although Study 3 had both male and female targets and found essentially the same pattern. It is also interesting that female participants in Study 1 were less accurate in their facial judgments. Although this gender difference was not replicated in Studies 2a and 2b, it suggests that face-based judgments may be less accurate for cross-gender targets than for same-gender targets. To investigate this possibility, we conducted additional analyses on the data in Study 3, in which both male and female participants evaluated same- and cross-gender targets. The results showed that participants’ face-based judgments were more accurate in predicting behaviors for same-gender targets than for cross-gender targets (for detailed results, see the Supplemental Material). Thus, the cross-gender effect in the accuracy of facial inferences would be an interesting topic for future research. However, we emphasize that such a cross-gender effect did not qualify the main findings in the present research (i.e., both male and female participants showed a lack of metacognitive awareness and the self-fulfilling effect of facial inferences regardless of targets’ gender).
Also, it turned out that the corruptible targets were significantly older than the trustworthy targets among Korean politicians (Studies 1 and 2a), whereas there was no such difference among American politicians (Study 2b). Thus, we created a new set of Korean politicians who did not show any age differences and replicated the reported findings with 200 Americans (83 female; age: M = 36.09 years, SD = 10.00). That is, the accuracy of face-based judgments was above chance level across all participants and pairs for both trustworthiness and corruptibility judgments. Moreover, participants’ confidence was not significantly associated with their accuracy (for details, see the Supplemental Material).
Another potential issue is that the number of target faces may not have been enough in the present research. To address this issue, we treated target faces as a random factor in our analyses so that the findings would be applicable to other faces. More importantly, the present research was not meant to be a comprehensive test about the accuracy of facial inferences. Rather, we investigated whether the validity of face-based judgments could still be called into question even when one’s face predicts one’s behaviors. We believe that such an investigation can be done with a relatively small number of facial images as long as people can make accurate judgments on them as in the present research. Finally, although our targets and participants were drawn from two different cultures, the generalizability of the current findings should be further investigated in other cultural contexts with more demographic diversity.
To conclude, we show that the validity of using facial appearance for social judgments is questionable at best because of the lack of metacognitive awareness in facial inferences and the self-fulfilling effects of perceivers’ beliefs. Much of the discussion on facial inferences so far has focused on whether facial inferences are accurate. We believe that it would be much more fruitful to think about when and how facial inferences can be accurate. Hopefully, the current research will be a step toward such an endeavor.
Supplemental Material
sj-docx-1-pss-10.1177_09567976211000308 – Supplemental material for Face-Based Judgments: Accuracy, Validity, and a Potential Underlying Mechanism
Supplemental material, sj-docx-1-pss-10.1177_09567976211000308 for Face-Based Judgments: Accuracy, Validity, and a Potential Underlying Mechanism by Seungbeom Hong, Hye Won Suk, Yoonseok Choi and Jinkyung Na in Psychological Science
Footnotes
Transparency
Action Editor: Kate Ratliff
Editor: Patricia J. Bauer
Author Contributions
S. Hong and H. W. Suk contributed equally to this work. S. Hong and J. Na developed the research ideas. S. Hong collected the data. S. Hong, Y. Choi, and H. W. Suk analyzed the data. J. Na, H. W. Suk, and S. Hong drafted the manuscript, and Y. Choi provided critical comments. All the authors approved the final manuscript for submission.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
