Abstract
Research has demonstrated that attractiveness evaluations of adult faces were less accurate when faces were inverted than upright. It remains unknown, however, whether a similar effect applies to perceived cuteness of infants, which is assumed to be based on elemental facial features called the “baby schema.” In this research, we studied the face inversion effect on perceived cuteness of infant faces in a rating task and a two-alternative forced-choice (2AFC) task. We also examined beauty as a control dimension. Although the rating task revealed no inversion effect, the 2AFC task showed poorer discrimination performance with inverted faces than with upright faces in both evaluations. These results indicate that infant cuteness and beauty dimensions are correlated well with each other, and their perception not only relies on elemental features that are not strongly affected by inversion but is also affected by holistic facial configurations when a detailed comparison is required.
The face inversion effect refers to a phenomenon in which the presentation of a face image upside-down inhibits face recognition and identification (Yin, 1969). Face inversion hinders the holistic processing of multiple features and the distance between them rather than the elemental features of a face and its parts (Rossion, 2008, 2009). This difficulty of holistic processing also affects attractiveness evaluations. For instance, Leder et al. (2017) have reported that people assigned higher ratings to less attractive faces when those faces were inverted than when they were upright. Other researchers have also reported that face inversion hinders facial attractiveness evaluations (Bäuml, 1994; Slater et al., 2000; Stróżak & Zielińska, 2019).
Still, it remains unknown whether face inversion affects evaluations of cuteness, which is often termed “infant attractiveness” (Hildebrandt & Fitzgerald, 1978, 1979). Researchers have theorized that perceptions of cuteness depend on the “baby schema.” The baby schema refers to a set of physical features such as a round face contour, a large forehead, and large eyes, which induce an affective feeling that can be described as lovely and adorable (Almanza-Sepúlveda et al., 2018; Komori et al., 2022; Little, 2012; Lorenz, 1943). Lorenz (1943) originally proposed that these features were elemental and exerted additive effects on our behavior. If this is the case, the baby schema effect would be less influenced by face inversion because it relies more on the processing of elemental features than the holistic processing of the face. Consistent with this idea, Kuraguchi and Kanari (2020) reported that face inversion hardly affected ratings of perceived cuteness of adult female faces. However, such lack of face inversion effect on cuteness ratings may have been due to the use of adult faces instead of infant faces, which are more suitable for cuteness evaluations.
Considering a potential semantic gap between the two evaluative dimensions of cuteness and attractiveness, we incorporated beauty as a control dimension in this study. Cuteness and beauty exist as sub-concepts of attractiveness (Rhodes, 2006). Kuraguchi et al. (2015) have proposed that cuteness and beauty evaluations of adult female faces were based on slightly different facial features. Furthermore, a cross-cultural survey has revealed that the connotation of cuteness is more affective (i.e., less intellectual) than that of beauty in Japan, the US, and Israel (Nittono et al., 2021). These dimensions may also differ in how they are perceptually processed. In this regard, Kuraguchi and Ashida (2015) have reported that the facial beauty of adult female faces could be discriminated from peripheral vision, but their cuteness could not. Therefore, face inversion may impact cuteness and beauty evaluations to different degrees.
In this study, we examined the face inversion effect on cuteness and beauty evaluations of infant faces through two task types. A rating task was used because our primary interest was to replicate the face inversion effect on adult facial attractiveness and cuteness (Kuraguchi & Kanari, 2020; Leder et al., 2017). Additionally, we conducted a two-alternative forced-choice (2AFC) task. Nittono et al. (2022) have suggested that 2AFC tasks provide a more sensitive measure of the perceptual ability to detect subtle differences in cuteness compared to rating tasks. Notably, Sprengelmeyer et al. (2009) have reported that participants in a 2AFC task could not successfully choose the cuter face from a pair of manipulated infant faces at a rate higher above the chance level. However, since they have not described their data in detail, this finding should be verified by using a different set of face images.
Our study considers the following hypotheses.
H1: Face inversion will hinder cuteness and beauty evaluations of infant faces.
H2: Face inversion will have a lesser effect on cuteness evaluations than on beauty evaluations.
We tested these hypotheses in terms of discrimination performance on the two tasks. For the rating task, we defined discrimination performance as the difference between the mean rating scores of high-cuteness and low-cuteness faces (Hahn et al., 2015). For the 2AFC task, we measured discrimination performance as the rate at which participants chose the operationally defined cuter image from a pair of manipulated infant faces (Sprengelmeyer et al., 2009).
Methods
Pre-Registration
This study was pre-registered on OSF (https://osf.io/fx5yc). Analyses other than the pre-registered analyses are described as exploratory analyses.
Participants
G*Power (Faul et al., 2007) was used to conduct a priori power analysis for a two-way analysis of variance (ANOVA) with the factors of evaluation dimension (cuteness or beauty) and face orientation (upright or inverted). Given a moderate effect size (f = 0.25), the results indicated that 210 participants would be required (α = 0.05, 1−β = 0.95). To account for the possibility of data exclusion, we recruited 300 participants, which is about 140% of the minimum sample size. Note that this power calculation is valid not only for preregistered, two-way ANOVAs but also for exploratory, three-way ANOVAs, because both calculations are based on the same effect size and the same degrees of freedom.
Participants were recruited via Lancers, Inc., a crowdsourcing company, to complete an online experiment via the online survey analysis platform Qualtrics (Qualtrics, Provo, UT). Participants had normal or corrected-to-normal visual acuity by self-report and were required to speak Japanese as their native language to ensure their ability to understand the instructions. They were also required to complete the experiment on a personal computer (PC), as the use of a smartphone or tablet might prevent the face pairs from being presented side by side in the 2AFC task.
From the 322 total accesses, 305 individuals provided informed consent to participate in the experiment and have their data used in the analysis. Ultimately, the analysis included data from 299 individuals (115 women, 184 men; mean age = 43.62 years; age range = 20–71 years). In accordance with the pre-registered data exclusion criteria, we omitted data from six individuals: one who responded with the same rating value in all rating trials and five who reported different ages and genders before and after the survey. The study was approved by the ethics committee of the Graduate School of Human Sciences, Osaka University (Approval Number: HB022-034). We paid a reward via Lancers, Inc. in line with the standards of Osaka University.
Stimuli
We selected color composite face images of six-month-old girls and boys from the Japanese Cute Infant Face (JCIF) dataset (Nittono et al., 2022). Figure 1 shows some examples. The rating task employed a total of 12 average face images: three images each of high-cuteness and low-cuteness girls’ and boys’ faces according to the normative cuteness ratings provided in Nittono et al. (2022). There was a significant difference between the normative cuteness ratings for the high-cuteness and low-cuteness images (t[10] = 13.58, p < .001, Mhigh = 4.42, Mlow = 3.33, d = 8.53).

Examples of infant face images (adapted from Nittono et al., 2022). Panel A shows high- and low-cuteness prototypical faces. Panel B displays examples of the averaged faces used in the 2AFC task. See Supplemental Materials for details.
For the 2AFC task, we used three types of average face images created by Nittono et al. (2022): the average faces of 30 girls (F30), 30 boys (M30), and all infants (A60). These face images were different from those used in the rating task. We presented the average face images with no manipulation (0%) and those images manipulated to be cuter (+50%) or less cute (−50%) along the continuum of high- and low-cuteness prototypical faces. All stimulus images were taken from the JCIF dataset. They were 256 by 256 pixels in size to ensure smooth loading and display of the images in the online surveys. All faces were presented against a black background (see Supplemental Materials).
Procedure
Each participant was randomly assigned an evaluation dimension (cuteness or beauty) and a face orientation (upright or inverted). Participants completed the rating task and the 2AFC task in this order. The order of tasks was fixed because our primary interest was the rating task and we wanted to avoid any potential carryover effect of conducting a 2AFC task on cuteness ratings. For the rating task, each face image was presented in an upright or inverted orientation at the center of the participant's PC monitor with a rating scale displayed at the bottom of the image. Participants were asked to rate the cuteness or beauty of the face image on a seven-point scale ranging from 1 = not cute at all to 7 = extremely cute in the cuteness condition or from 1 = not beautiful at all to 7 = extremely beautiful in the beauty condition. Each face image was randomly presented once for a total of 12 trials.
For the 2AFC task, two types of images of the same average face were presented side by side, and participants were asked to choose the face image that they found to be cuter in the cuteness condition or more beautiful in the beauty condition. For each identity, the face image combinations were presented in “round-robin” pairings (+50% vs. 0%, +50% vs. −50%, 0% vs. −50%). Each face pair was randomly presented once for a total of nine trials. The face image presentation position (left or right) within a pair was also randomized.
Data Analysis
We measured the rating value for each face image in the rating task and the choice rate of the cuter image out of the two faces in the 2AFC task. The operationally defined cuter face was the +50% face image in the +50% versus 0% pair and the +50% versus −50% pair, and it was the 0% face image in the 0% versus −50% pair. For the rating task, we calculated discrimination performance as the difference between the mean rating values of high-cuteness and low-cuteness faces. For the 2AFC task, we measured discrimination performance as the choice rate of the cuter face.
An Evaluation Dimension × Face Orientation ANOVA was conducted on discrimination performance by task. Two additional ANOVAs were performed as exploratory analyses. First, an Evaluation Dimension × Face Orientation × Cuteness Level ANOVA was conducted using the original scores in the rating task. The results are reported in the Results section. Second, an Evaluation Dimension × Face Orientation × Gender ANOVA was carried out to examine possible gender differences, which only revealed that women achieved higher discrimination performance than men on the 2AFC task without any interaction effects (see Supplemental Materials). These ANOVAs were conducted using the anovakun function in R version 4.0.5 (Iseki, 2022). The t.test function was used for one-sample t-tests on choice rates in the 2AFC task as an exploratory analysis of whether choice rates were above the chance level of 0.5. The significance level was set at 0.05.
All of the aforementioned analyses were complemented with Bayesian statistics performed in JASP 0.16.4 (JASP, 2022). For the ANOVAs, we report the BFinclusion value for each factor. This inclusion Bayes factor reflects the change from prior to posterior inclusion odds. For the t-tests, we report the BF10, which is the Bayes factor quantifying the strength of the evidence that the data provide for the alternative hypothesis versus the null hypothesis. To interpret the Bayes factor, we employed a classification scheme adopted in previous studies (Schönbrodt & Wagenmakers, 2018).
Results
The Rating Task
Figure 2 displays the mean rating scores for each condition (A) and the difference rating scores between the high-cuteness and low-cuteness faces (B). Violin plots are provided in the Supplemental Materials. All of the difference scores were positive, which indicates that participants rated high-cuteness faces as cuter than low-cuteness faces. In accordance with the pre-registration, an Evaluation Dimension × Face Orientation ANOVA was conducted on the mean difference rating scores. No significant main effect or interaction was found (main effect of evaluation dimension: F[1, 295] = 0.008, p = .931, ηp2 < 0.001; main effect of face orientation: F[1, 295] = 1.473, p = .226, ηp2 = 0.005; interaction: F[1, 295] = 0.070, p = .792, ηp2 < 0.001). The Bayesian ANOVA also supports the null hypothesis (main effect of evaluation dimension: BFinclusion = 0.087; main effect of face orientation: BFinclusion = 0.174; interaction: BFinclusion = 0.016). Therefore, regardless of the evaluation dimension, we found no effect of inversion on discrimination performance in the rating task.

(A) The original rating score for each face orientation, evaluation dimension, and cuteness level; (B) the mean difference rating value for each face orientation and evaluation dimension; (C) the mean choice rate for each face orientation and evaluation dimension, with the dashed line indicating the chance level. Error bars indicate standard errors.
As an exploratory analysis, an Evaluation Dimension × Face Orientation × Cuteness Level ANOVA was performed and showed significant main effects of evaluation dimension (F[1, 295] = 11.349, p < .001, ηp2 = 0.037) and cuteness level (F[1, 295] = 997.989, p < .001, ηp2 = 0.772). The rating scores for cuteness were higher than those for beauty for the same face images. Furthermore, high-cuteness faces received higher rating scores than low-cuteness faces, which indicates that the stimulus manipulation was appropriate (see Figure 2A). However, no significant main effect of face orientation (F[1, 295] = 0.912, p = .340, ηp2 = 0.003) or any interaction (Fs[1, 295] < 1.865, ps > 0.173) was observed. The Bayesian ANOVA also supports the main effects of evaluation dimension (BFinclusion = 14.766) and cuteness level (BFinclusion > 100), but the null hypotheses were supported for other effects (interaction between evaluation dimension and face orientation: BFinclusion = 0.400; others: BFinclusion ≤ 0.270). Regardless of the evaluation dimension, we found no effect of inversion on the original rating score.
The 2AFC Task
Figure 2C visualizes the mean choice rate for each condition. An Evaluation Dimension × Face Orientation ANOVA was conducted on the choice rate as a pre-registered analysis. The main effect of face orientation was significant (F[1, 295] = 49.582, p < .001, ηp2 = 0.144), but the other effects were not (main effect of evaluation dimension: F[1, 295] = 0.588, p = .444, ηp2 = 0.002; interaction: F[1, 295] = 0.015, p = .902, ηp2 < 0.001). Similarly, the Bayesian ANOVA yielded strong evidence of the main effect of face orientation (BFinclusion > 100), but the null hypotheses were supported for the main effect of evaluation dimension (BFinclusion = 0.135) and the interaction (BFinclusion = 0.096). Considering a possible violation of the normality assumption in ratio data, we additionally conducted nonparametric Kruskal–Wallis tests using JASP 0.16.4 (JASP, 2022). Again, the effect of face orientation was significant (H = 45.703, df = 1, p < .001), but the effect of the evaluation dimension was not (H = 0.862, df = 1, p = .353). Therefore, face inversion had an effect in both evaluation dimensions, such that the 2AFC performance was lower for the inverted images than for the upright images.
Nevertheless, the choice rates were significantly higher than the chance level regardless of the presentation condition or evaluation dimension (cuteness upright: t[73] = 22.61, p < .001, d = 2.63; cuteness inverted: t[74] = 9.40, p < .001, d = 1.09; beauty upright: t[73] = 23.62, p < .001, d = 2.75; beauty inverted: t[75] = 8.65, p < .001, d = 0.99). The results of the Bayesian one-sample t-test also support these effects (all conditions: BF10 > 100). These results suggest that the participants could correctly choose the cuter infant faces even in inverted presentations.
Discussion
In this study, we have examined the effect of face inversion on cuteness and beauty evaluations of infant faces using a rating task and a 2AFC task. The rating task demonstrated no such effect regardless of the evaluation dimension. In the 2AFC task, however, we found that face inversion had an effect in both evaluation dimensions. Therefore, H1 (Face inversion will hinder cuteness and beauty evaluations of infant faces) was supported only by the 2AFC task, while H2 (Face inversion will have a lesser effect on cuteness evaluations than on beauty evaluations) was not supported by either task.
No effect of face inversion was observed in the rating task, which is consistent with the findings of Kuraguchi and Kanari (2020) regarding cuteness ratings of adult female faces. Although previous studies (Leder et al., 2017; Stróżak & Zielińska, 2019) have reported that face inversion increased attractiveness ratings of unattractive adult faces, the present study did not observe such effect on infant faces. In the present study, the difference in cuteness ratings between the high-cuteness and low-cuteness faces was small (i.e., 1.09 on a 7-point scale). Even such a slight difference could not be negated by face inversion. These results agree with the notion that cuteness perception relies on elemental features of the baby schema such as a round face contour and a large forehead (Almanza-Sepúlveda et al., 2018; Komori et al., 2022), which are less affected by face inversion. On the other hand, an effect of face inversion was found in the 2AFC task. This difference can be explained by the type of information that is needed to complete the task. Specifically, the 2AFC task provides a more sensitive measure of the perceptual ability to detect subtle differences in facial appearance compared to the rating task. Nittono et al. (2022) have reported a significant effect of the interaction between rater age and sex on the choice rate of cuter faces in the 2AFC task but not on seven-point cuteness ratings. When rating the perceived cuteness of infant faces that are presented alone, participants may rely on the presence or absence of certain elemental features (e.g., a round face, large eyes), which would lead to a lesser impact of inversion on the ratings. In contrast, when a more detailed comparison of similar faces is required in the 2AFC task, participants may also rely on holistic configurations of the face, the processing of which was impaired by face inversion. Given that the correct choice rate exceeded the chance level even in the inverted presentation, face inversion did not preclude a cuteness evaluation altogether but only made it more difficult by hindering the recognition of minor differences in facial features.
The aforementioned trends were similar between the cuteness and beauty conditions. Because the mean cuteness ratings were higher than the mean beauty ratings, participants could differentiate these two dimensions. The lower scores of beauty ratings suggest that beauty may be a less appropriate dimension for describing infant faces compared to adult faces (Kuraguchi et al., 2015; Kuraguchi & Ashida, 2015). The exact relationship between the cuteness and beauty dimensions should be investigated more carefully in the future research.
In the 2AFC task, discrimination performance was significantly lower in the inverted presentation than in the upright presentation, although the choice rates in all conditions were significantly above the chance level. The former result is consistent with the findings of Sprengelmeyer et al. (2009) in that the face inversion effect occurred with infant faces. In our study, face inversion made it difficult to detect subtle differences between a pair of faces in both the cuteness and beauty evaluations. In contrast to Sprengelmeyer et al. (2009), however, our participants were able to choose the cuter or more beautiful faces above the chance level, which indicates that face inversion did not completely prevent cuteness or beauty evaluations. The strength of the inversion effect may depend on the facial stimuli that were used or the participants’ characteristics. Note that no gender differences were found in the size of the face inversion effect in our study, although women had higher discrimination accuracy than men. A possible limitation of the present study is that the 2AFC task was always conducted after the cuteness rating task. Repeated exposure to infant faces may have improved discrimination sensitivity of cuteness, and this may cause the finding that face inversion effect appeared only in the 2AFC task. We think this possibility is low because participants did not receive any feedback in the rating task. Still, it should be examined in future research.
In conclusion, the present study has illustrated that face inversion had only a small effect on cuteness evaluation tasks, although it could impede a close comparison of faces that only slightly differ. The weak or absent face inversion effect on infant faces supports the idea that cuteness perception is strongly dependent on elemental features of the baby schema (Almanza-Sepúlveda et al., 2018; Komori et al., 2022). However, in comparisons of two similar faces, perceived cuteness of infant faces may be influenced by not only elemental features but also holistic configurations.
Supplemental Material
sj-docx-1-pec-10.1177_03010066231198417 - Supplemental material for Face inversion effect on perceived cuteness of infant faces
Supplemental material, sj-docx-1-pec-10.1177_03010066231198417 for Face inversion effect on perceived cuteness of infant faces by Kana Kuraguchi and Hiroshi Nittono in Perception
Footnotes
Author contribution(s)
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by Grant-in-Aid for Scientific Research (JSPS KAKENHI Grant Number 21K13679; 21H04897).
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
