Abstract
Do people appear more attractive or less attractive depending on the company they keep? A divisive-normalization account—in which representation of stimulus intensity is normalized (divided) by concurrent stimulus intensities—predicts that choice preferences among options increase with the range of option values. In the first experiment reported here, I manipulated the range of attractiveness of the faces presented on each trial by varying the attractiveness of an undesirable distractor face that was presented simultaneously with two attractive targets, and participants were asked to choose the most attractive face. I used normalization models to predict the context dependence of preferences regarding facial attractiveness. The more unattractive the distractor, the more one of the targets was preferred over the other target, which suggests that divisive normalization (a potential canonical computation in the brain) influences social evaluations. I obtained the same result when I manipulated faces’ averageness and participants chose the most average face. This finding suggests that divisive normalization is not restricted to value-based decisions (e.g., attractiveness). This new application to social evaluation of normalization, a classic theory, opens possibilities for predicting social decisions in naturalistic contexts such as advertising or dating.
Keywords
It is sometimes assumed that one face appears more attractive than another because of its physical attributes, not because of when, where, or with whom it is seen. Indeed, classical decision-making theories assume rational agents who assign values to decision options (Luce, 1959) and assume that these values are fixed, regardless of any other alternatives that are encountered (i.e., regardless of context). These assumptions may contribute to the dearth of theory predicting context effects on facial evaluations (e.g., facial attractiveness). One candidate theory for predicting context effects—normalization—draws support from decision-making studies showing context effects (Rangel & Clithero, 2012). For decisions among multiple options, divisively normalized representations of option values can be obtained by dividing each option’s value (assessed without context) by the sum of all available option values (Louie & Glimcher, 2012; Louie, Khaw, & Glimcher, 2013). For example, when an option having a value of 3 is presented together with options having values of 2 and 1, its value after divisive normalization is 0.5 (i.e., 3/(3 + 2 + 1)).
Normalization may mediate context effects on attractiveness decisions. It typifies neural responses throughout the brain and can be considered a putative canonical neural computation (Carandini & Heeger, 2012). Normalization proliferates in the brain because it ensures that neural firing rates optimally span prevailing stimulus ranges. Divisive normalization is well-studied for vision and explains nonlinear neural responses, including photoreceptor normalization to light intensity (Boynton & Whitten, 1970), contrast normalization and saturation effects (Bonin, Mante, & Carandini, 2005), cross-orientation suppression (Carandini, Heeger, & Movshon, 1997), and neural responses to motion (Simoncelli & Heeger, 1998) and multiple objects (Zoccolan, Cox, & DiCarlo, 2005). Normalization outside vision includes nonlinear neural responses to auditory contrast (Rabinowitz, Willmore, Schnupp, & King, 2011) and cross-modality normalization in multisensory integration (Ohshiro, Angelaki, & Deangelis, 2011).
Representations of reward value are also normalized. Brain responses encoding the values of choice options adapt to the ranges of option values (Cox & Kable, 2014; Padoa-Schioppa, 2009; Rangel & Clithero, 2012). This normalization affects human choice behavior, too. An undesirable option modulates the choice between two other options (Soltani, De Martino, & Camerer, 2012), for example, when monkeys choose among juice rewards and humans choose among snack foods (Louie et al., 2013). In Louie et al. (2013), lower-value distractor items (i.e., a wider range of option values) increased preference for the better of two desirable options, a finding predicted by computer simulations of divisive normalization. Thus, divisive normalization effectively expands or contracts the dynamic range of value representations to match the range of option values.
Normalization could mediate similar context effects in the social domain too, but a theory to predict context effects for social evaluations is lacking. In an attractiveness experiment, I tested whether normalization predicts facial attractiveness. Participants evaluated facial attractiveness without context in a first phase; in a second phase, they performed a trinary-choice task to assess attractiveness preferences. When two highly attractive targets and a low-attractiveness distractor are choice options, participants’ choices should become more sensitive to differences in attractiveness of the two targets. In a control experiment (i.e., averageness experiment), I tested for the same effects using a task in which participants chose the most average-appearing face. These averageness choices may share some perceptual bases with attractiveness choices (Langlois & Roggman, 1990) but have no overt value-based connotation. This control task could therefore show whether normalization is restricted to value-based judgments such as attractiveness or affects high-level facial judgments in general.
Method
Participants
I enrolled two samples of 40 participants each, selecting a sample size equivalent to that used by Louie et al. (2013). For the attractiveness experiment, all 40 (age range = 18−23; 25 women) contributed data for analysis. For the averageness experiment (approximate age range = 18−55, 28 women), data for 1 participant were removed from analysis because her responses suggested that the instructions were not followed. Participants were students or staff at Royal Holloway, University of London, and had normal or corrected-to-normal vision.
For the attractiveness experiment, participants were entered into a drawing for one £25 Amazon voucher. For the averageness experiment, participants were compensated £8. The attractiveness experiment was approved by the psychology department ethics committee, and the averageness experiment was approved by the college ethics committee, both at Royal Holloway, University of London.
Procedure
Stimuli included color photographs of 15 male and 15 female neutral faces sampled from the Karolinska Directed Emotional Faces (KDEF) face set (Lundqvist, Flykt, & Öhman, 1998). In Phase 1 (Fig. 1a, left), participants rated either the attractiveness or the averageness of the 30 faces, two times each, sequentially and in a pseudorandom order.

Sample trials from Phases 1 and 2 of the attractiveness experiment and Phase 1 attractiveness and averageness ratings. In Phase 1 of the attractiveness experiment (a, left), participants rated the attractiveness of faces that were presented without context. In Phase 2 (a, right), participants completed a trinary-choice task in which they chose the most attractive of three faces (two targets and a distractor). In Phase 2 of the attractiveness experiment, distractors were the 10 male faces and 10 female faces with the lowest attractiveness ratings, and targets were the 5 male faces and 5 female faces with the highest attractiveness ratings, identified separately for each participant. The graphs show female and male participants’ (b) mean attractiveness ratings and (c) mean averageness ratings in Phase 2 for the male and female faces selected as distractors and targets. Error bars show 95% confidence intervals. The faces shown in (a) are numbers AF01NES, AF05NES, AF06NES, AF13NES, AM04NES, and AM10NES from the Karolinska Directed Emotional Faces face set (Lundqvist, Flykt, & Öhman, 1998).
On each trial, participants saw a face and below that a horizontal line representing a continuum of values (between 0 and 1). They used the mouse to choose a value. The next trial appeared after the mouse click. The two ratings recorded for each face during Phase 1 were averaged for each participant. These averaged Phase 1 ratings were taken as estimates of each participant’s judgment without context or normalization. In the attractiveness experiment, faces were selected for Phase 2 based on each participant’s attractiveness ratings. In the averageness experiment, faces were selected for Phase 2 based on each participant’s averageness ratings. On the basis of these averages, the lowest-rated (least attractive or least average) 10 male faces and 10 female faces were selected as distractors (Fig. 1b), and the highest-rated 5 male faces and 5 female faces were selected as targets (Fig. 1c) for Phase 2.
In each trial of Phase 2, three faces (two targets and a distractor) appeared on the screen at the left, center, and right (Fig. 1a, right). The faces were presented in pseudorandom order, and the faces’ screen locations were assigned randomly. The faces came from the 10 possible pairs of five targets for each gender, matched to the 10 same-gender distractors. The participant indicated which of the faces was most attractive or most average by pressing a button on a computer keyboard, which triggered the next trial. I measured which target was chosen on each trial and whether the chosen target was the more attractive target (or more average target) according to the ratings from Phase 1. Each participant completed 200 trials.
Results
Attractiveness and averageness ratings in Phase 1 were submitted to an analysis of variance (ANOVA) with factors of distractor, face’s gender, and participant’s gender. Attractiveness ratings (Fig. 1b) were significantly greater for targets than for distractors, F(1, 38) = 320.06, p < .001, η p 2 = .89, and for female faces than for male faces, F(1, 38) = 40.52, p < .001, η p 2 = .52. There were no effects of participant’s gender or any interactions among the three factors. Averageness ratings (Fig. 1c) were also significantly greater for targets than for distractors, F(1, 37) = 349.78, p < .001, η p 2 = .90. Neither the effect of face’s gender (p = .08) nor any other effect reached significance for averageness ratings. Because the distractors were always given low ratings compared with the targets, distractors were never chosen as the most attractive or most average faces in Phase 2.
A divisive-normalization account would predict that distractor faces that received lower Phase 1 ratings would be associated with greater differences in the proportion of choices between the two target faces in Phase 2. Because each participant provided his or her own ratings, I standardized ratings across participants by ranking the distractors (attractiveness experiment, Fig. 2a; averageness experiment, Fig. 2c); the lowest-rated distractor of each gender had the lowest rank (i.e., a 1). Visual inspection of Figure 2 reveals that distractors with a lower rank were associated with more choices of the more attractive target (Fig. 2a) and with more choices of the more average target (Fig. 2c). For the attractiveness experiment, an unexpected result was that male participants were more likely to choose the more attractive target face when it was male than when it was female, whereas female participants showed this pattern to a lesser degree (Fig. 2b).

Proportion of choices of the higher-ranked target. The graph in (a) shows the proportion of choices of the more attractive target as a function of the distractor’s rank (with the best-fitting least squares regression line). In (b), the graph shows the proportion of choices of the more attractive target for female and male participants, separately for male and female faces. The graph in (c) shows the proportion of choices of the more average target as a function of distractor’s rank (with the best-fitting least squares regression line). The shaded areas in (a) and (c) and the error bars in (b) indicate 95% confidence intervals.
Several analyses focusing on the hypothesized effects of distractor rating confirmed these patterns. For the attractiveness experiment, an ANOVA on the proportion of choices of the more attractive target, with distractor’s rank, face’s gender, and participant’s gender as factors, showed a significant main effect of distractor’s rank, F(9, 342) = 3.79, p < .001, η p 2 = .09, and face’s gender, F(1, 38) = 15.03, p < .001, η p 2 = .28, and a significant interaction between face’s gender and participant’s gender, F(1, 38) = 4.22, p = .047, η p 2 = .10. The averageness experiment replicated the significant main effect of distractor’s rank, F(9, 333) = 4.27, p < .001, with a similar effect size, η p 2 = .10, although there were no other significant effects for this ANOVA.
The effects of distractor’s rank in both the attractiveness and the averageness experiments were confirmed by multiple linear regression. Separately for each participant, I quantitatively tested for the predicted negative direction of the effect of distractor’s rank on choice behavior. The predictors in these models were distractor’s rank, face’s gender, and their interaction, and the outcome was the proportion of choices of the higher-rated (more attractive or more average) target. The resulting participant-specific β values were tested for significance across the sample in the second level of a hierarchical analysis that treated participant as a random effect. For the attractiveness experiment, a negative relationship between distractor’s rank and the proportion of choices of the more attractive target was confirmed by a significant left-sided t test, mean β = −0.006, confidence bound = −0.003, t(39) = −4.02, p < .001. The same analysis replicated this effect for the averageness experiment, mean β = −0.01, confidence bound = −0.007, t(39) = −5.30, p < .001. This shift in proportion of higher target ratings was associated with an effect size conventionally regarded as large (attractiveness: Cohen’s d = 1.29; averageness: Cohen’s d = 1.72), and the shift in relative choice as a function of distractor value was within the range of previous reports of other decision-making paradigms (for an example that investigates dominance effects for visual stimulus dimensions, see Trueblood, Brown, Heathcote, & Busemeyer, 2013).
Further confirmation for influences of distractor rating on choice behavior was provided by an analysis that directly predicted the choice on each trial (as opposed to proportions of choices in different conditions) using a binomial generalized linear model with a logistic link function (for a similar analysis, see Louie et al., 2013). For each participant, a generalized linear model predicted a binary variable encoding whether the participant chose the higher rated (more attractive or more average) target on each trial using, as regressors, the Phase 1 rating of the leftmost target, the Phase 1 rating of the rightmost target, the Phase 1 rating of the distractor, the gender of the face, and the interaction between distractor rating and face gender. Again, magnitudes of the participant-specific β values for distractor attractiveness were tested for significance across the sample at a second level, treating participant as a random variable. For the attractiveness experiment, the hypothesis was confirmed by a negative relationship between distractor’s attractiveness and frequency of choices of the more attractive target, mean β = −1.65, 95% confidence interval (CI) = [−2.46, −0.50], t(39) = −2.9, p = .006, Cohen’s d = 0.93. Thus, on a given trial, the more attractive the distractor, the less likely it was that one target would be chosen over the other. The averageness experiment replicated this negative relationship between the distractor’s rating and proportion of choice of the higher rated target (more average, in this case), mean β = −1.37, 95% CI = [−1.916, −0.819], t(38) = −5.0, p < .001, Cohen’s d = 1.64.
Louie et al. (2013), using the same paradigm that we used in the current study, showed that the psychometric function that predicts the probability of choosing a particular target on the basis of differences between the values of the two targets exhibits a larger slope for distractors with lower reward values. This psychometric function can have a sigmoid shape. When Target 2 is considerably more valuable than Target 1 (i.e., target-value difference is negative), participants never choose Target 1. As the value of Target 1 increases relative to Target 2, the proportion of Target 1 choices will increase. When Target 1 becomes considerably more valuable than Target 2, participants will always choose Target 1. The slope of this sigmoid function measures how sensitive choice preference is to differences between the two targets. In the current study, target values were measured as Phase 1 ratings and so we assessed choice as a function of target-rating differences. If attractiveness or averageness representations are divisively normalized, then low-rated distractors should increase participants’ sensitivity to target-rating differences in attractiveness or averageness and thereby increase the slope of the sigmoid function. Hence, a divisive-normalization account predicts that distractor rank will be negatively related to the slope of the psychometric function.
This prediction was tested by fitting to the data the logistic curve y = 1/(1 + exp(−β1 × (x − β2))), where y values are the predicted proportion of Target 1 choices (taken in the current study as the leftmost target on the screen), x values are the target-rating differences, β1 represents the slope parameter, and β2 represents the inflection-point parameter. Because target-rating differences and distractor ratings were different for each participant, there were no fixed values across participants. To facilitate summarizing data over participants (Fig. 3) and to ensure a sufficient number of averaged trials for fitting, I binned the distractor ranks (5 bins) and target-rating-difference ranks (12 bins) and then fitted logistic functions to data in these bins. This yielded separate logistic function parameters (nlimfit.m in MATLAB; The MathWorks, Natick, MA) for each participant and distractor bin. Although logistic slope values were computed from individual participant logistic fits, for visualization purposes, Figures 3a, 3b, 3d, and 3e show functions fit to group data. A reduction in slope that changes continuously across distractor bins is apparent for both attractiveness and averageness choices.

Sensitivity of choice to target-rating differences. The graphs in the top row show observed choice probabilities for (a) the attractiveness experiment and (d) the averageness experiment as a function of target-rating-difference bin, separately for Distractor Bin 1 (lowest-rated distractors) and Distractor Bin 5 (highest-rated distractors). Also shown are best-fitting logistic functions whose parameters were fitted to the group data. The middle row presents the same logistic functions along with the corresponding functions for Distractor Bins 2, 3, and 4. The graphs in the bottom row show the average logistic slopes (fitted to individual participants and then averaged across participants) as a function of distractor bin; the shaded areas indicate 95% confidence intervals.
For analysis of logistic slopes, poor curve fitting occurred on occasion. For the attractiveness experiment, 2 outlying slopes out of 200 (40 participants × 5 distractor bins) were removed because they were more than 6 standard deviations from their distractor-bin means and were greater than 30 although all other values were less than 1. For the averageness experiment, 8 out of 195 slopes were removed because they were more than 2.5 standard deviations from their distractor-bin means and were greater than 14 although all other values were less than 1. Next, I used linear regression to ascertain whether distractor bin was negatively associated with logistic slope, by fitting a line to the five logistic slopes separately for each participant (Figs. 3c and 3f). These participant-specific linear regression β values were submitted to a second-level, one-sample t test. Because we had an a priori hypothesis predicting a decrease in slope as a function of distractor bin, and because a similar decrease had been reported by previous research (Louie et al., 2013), we tested for decreasing slopes by implementing left-sided t tests. The results confirmed the prediction of normalization for both the attractiveness and the averageness experiments. Across participants, logistic slopes were negatively related to distractor’s attractiveness, mean β = −0.02, confidence bound = −0.003, t(39) = −2.00, p = .027, Cohen’s d = 0.32, and to distractor’s averageness, mean β = −0.02, confidence bound = −0.001, t(38) = −1.8, p = .04, Cohen’s d = 0.38.
Discussion
The attractiveness experiment showed that participants were more likely to find one target more attractive than another, depending on the unattractiveness of the distractor (Fig. 2a), even though distractors were too unattractive to be chosen (Fig. 1b). Moreover, less attractive distractors increased participants’ choice sensitivity to differences in target attractiveness (Fig. 3). The same results occurred when the experiment was repeated for averageness choices (Fig. 2c, Fig. 3). These findings could arise from divisive normalization (Carandini & Heeger, 2012), in which neural firing rates adapt to span the range of options. For the attractiveness experiment, less attractive distractors widened the attractiveness range of options, and so normalization to this wider range pulled apart the representations of the two targets. The result was an increase in choice frequency of the more attractive target (Fig. 2) and more sensitivity of choices to differences in target attractiveness (Fig. 3). These findings support the notion that normalization proliferates throughout the nervous system, given that similar effects arise for monkey’s preferences for juice reward and humans’ preferences for snack foods (Louie et al., 2013). Divisive normalization, moreover, explains neural responses throughout the visual and auditory systems and contributes to sensory integration and decision making (Carandini & Heeger, 2012). The attractiveness and averageness experiments demonstrated that normalization affects people’s choices and perceptions in higher-level social domains as well.
Although the normalization account provides a basis for predicting context effects on social evaluations, timing effects still need to be explained. Context can also be manipulated using repetitive presentation of stimuli. Adaptation (prolonged exposure) to facial distortions, for example, increases attractiveness of similarly distorted faces (Rhodes, Jeffery, Watson, Clifford, & Nakayama, 2003; Winkler & Rhodes, 2005). Assimilative effects for sequential faces can occur for attractiveness (Taubert, Van der Burg, & Alais, 2016) and identity (Liberman, Fischer, & Whitney, 2014) and may arise from processes that maintain visual continuity. A different finding, the contrast effect, occurs when successive presentations of attractive faces renders faces less attractive (Wedell, Parducci, & Geiselman, 1987) and has been associated with range-frequency normalization (Parducci, 1965; Wedell et al., 1987). I found that range frequency theory was not a viable alternative to divisive normalization using a simulation of the trinary-choice paradigm following methods reported in Louie et al. (2013). Although this simulation replicated the results of Louie et al. for divisive normalization, range-frequency theory produced an opposite result: Relative choices between targets become more confusable, not more distinct, for lower distractor values. Whether instantaneous presentation and repetitive successive presentations draw on different mechanisms remains to be tested.
Although contrastive adaptation to sequences and normalization share the goal of adjusting neural firing to match prevailing stimulus conditions (Webster, Kaping, Mizokami, & Duhamel, 2004), in trinary choice, the context changes on each trial more quickly than in adaptation experiments. In addition, in the brain, some types of normalization are best observed after blocking many trials (Rangel & Clithero, 2012). This slow time course may be better suited for measuring adaptation and contrast effects than the faster context changes associated with trinary-choice tasks. Louie and Glimcher (2012) have already noted a distinction between normalization by more immediate spatial context versus a temporal context, which is dependent instead on the history of previous stimuli. Results from the attractiveness and averageness experiments therefore highlight the time scale of context effects as a variable requiring further study.
Compared with sequence-dependent ratings tasks, trinary-choice tasks have an advantageous feature. Multiple target options reveal additional effects of stretching or contraction to adjust to stimulus range by showing how distractors pull target choices apart or push them together, a finding difficult to infer from sequential Likert ratings. Trinary choice also forms the basis for many decision-making effects (Kowal & Faulkner, 2016) including the similarity effect (Tversky, 1972), the attraction effect (Huber, Payne, & Puto, 1982) and the compromise effect (Simonson, 1989). These effects involve selectively manipulating a distractor’s similarity to the targets along specific stimulus attributes. Such studies are used to examine multiattribute decision making and are often couched as commodity purchases (rather than as social evaluations) in which multiple-valued attributes can be experimentally manipulated (e.g., choosing between used cars that vary in both mileage and price). The current experiments used photographs of natural faces and manipulated only overall attractiveness (or averageness) rather than select attributes. However, facial attributes could, in principle, be manipulated in trinary-choice tasks to study multiattribute decision-making phenomena in social contexts.
The attractiveness and averageness experiments also engendered new research questions about which representations are normalized. Many potentially normalized representations could contribute to attractiveness decisions, including visual representations of facial attributes. If attractiveness decisions arise from evaluation of visual information, then they will be influenced when visual representations are normalized. For example, Wedell and Pettibone (1999) manipulated the range of eye and nose widths on schematic faces to induce contrast effects on eye- and nose-width perception, and they found corresponding shifts in judgments of face pleasantness. The averageness experiment explored a similar possibility by examining a high-level facial judgment that might resemble some of the perceptual assessment involved in attractiveness perception, although averageness choices lack the same overt reward-based assessment as attractiveness choices. The finding of normalization for averageness decisions suggests that normalization is not limited to reward-based judgments (Louie et al., 2013) and might operate on high-level visual assessments. However, many potential visual contributions to attractiveness decisions could be normalized. Evidence already exists for normalization of low-level visual responses to luminance, contrast, and motion in the visual cortex (Carandini & Heeger, 2012), and imaging evidence exists for normalization in visual areas for object representation (Zoccolan et al., 2005). Behavioral and brain-response measurements are still needed to fully assess the roles of visual normalization in social evaluation of faces.
Attractiveness decisions are also decisions about potential rewards, and the averageness experiment cannot exclude the possibility of additional normalization of reward-based representations (Louie et al., 2013). Moreover, the situation is further complicated by evidence of multiple brain representations of reward-related value, any of which might be susceptible to normalization. For faces, several brain areas (e.g., amygdala, prefrontal areas, ventral striatum) are associated with reward value (Bzdok et al., 2012; Kampe, Frith, Dolan, & Frith, 2001; Pegors, Kable, Chatterjee, & Epstein, 2015; Smith et al., 2010; Winston, O’Doherty, Kilner, Perrett, & Dolan, 2007). Moreover, ventromedial prefrontal cortex may map differently scaled values from different stimulus domains onto a single reward-value representation (i.e., a common currency; Pegors et al., 2015; Smith et al., 2010; Vessel, Stahl, Purton, & Starr, 2015). In the monkey, a similar area in the orbitofrontal cortex has already been shown to exhibit normalized responses to reward (Cox & Kable, 2014; Padoa-Schioppa, 2009). Behavioral and brain-response measurements are needed also to fully assess the roles of normalization of reward-based representations in social evaluation of faces.
Normalization theories (e.g., divisive normalization) motivate new quantitative predictions, such as those tested here. The fact that normalization explains preferences in the social domain suggests new applications to applied or naturalistic settings. The ability to predict social evaluations in context might generalize to advertising, Web site design, consumer behavior, and so forth. Context effects on social decisions also challenge common assumptions, including those of classical economics, that relative values, such as attractiveness, are constant over time. Although people alter their faces through makeup, hairstyles, and the like to appear attractive, in fact, faces appear more or less attractive depending on context. Indeed, people may even socially evaluate a person differently depending on who is standing nearby.
Footnotes
Acknowledgements
I thank Amna Ali, Cara Bache-Jeffreys, William Buckley, Pervhan Dohil, Hajra Hussain, and Caroline Wooley for assistance with data collection.
Action Editor
Philippe G. Schyns served as action editor for this article.
Declaration of Conflicting Interests
The author declared that he had no conflicts of interest with respect to his authorship or the publication of this article.
Funding
Participants’ honoraria were provided by the Royal Holloway University of London Psychology Department.
