Abstract
This study experimentally tested whether individuals have a tendency to associate attractive voices with attractive faces and, alternately, unattractive voices with unattractive faces. Participants viewed pairings of facial photographs of attractive and unattractive individuals and had listened to attractive and unattractive voice samples and were asked to indicate which facial picture they thought was more likely to be the speaker of the voice heard. Results showed that there was an overall tendency to associate attractive voices with attractive faces and unattractive voices with unattractive faces, suggesting that a “what-sounds-beautiful-looks-beautiful” stereotype exists. Interestingly, there was an even stronger propensity to pair unattractive voices to unattractive faces than for the attractive voice–face matching.
Keywords
The physical attractiveness stereotype
There is a large body of evidence showing that people tend to ascribe more favorable characteristics to those who are physically attractive, known as the “what-is-beautiful-is-good” stereotype (Dion, Berscheid, & Walster, 1972). This phenomenon is also related to the earlier coined term, “halo effect,” which is the tendency to form an overall good impression of a person based on one good characteristic (Goldman, Cowles, & Florez, 1983; Nisbett & Wilson, 1977). Such positive features that have been associated with physically attractive individuals include intellectual competence (Jackson, Hunter, & Hodge, 1995), greater popularity (Krantz, 1987), social competence (Goldman & Lewis, 1977), and a variety of other positive personality traits (for reviews see Langlois et al., 2000; Mckelvie, 1993). As such, both attractive children and adults are treated more positively than unattractive children and adults, and physically attractive individuals are often afforded many advantages (Langlois et al., 2000). To name a few, attractive individuals are more likely to be the recipients of helping behaviors (Benson, Karabenich, & Lerner, 1976), voted into public office (Efran & Patterson, 1974), receive preferential treatment for job interview decisions (Gilmore, Beehr, & Love, 1986), seen as having higher writing quality (Anderson & Nida, 1978), and attractive males are viewed as being more competent in managerial positions (Heilman & Stopeck, 1985).
There is also some evidence to indicate that some of these associations between physical attractiveness and positive attributes are indeed true. For instance, more physically attractive students were rated as being more socially skillful and likable when on the telephone than their less attractive counterparts, despite the fact that raters could not see them, suggesting that attractive individuals are, in fact, more socially competent (Goldman & Lewis, 1977). Across different studies and cultures, attractive individuals were shown to be more occupationally successful, in better physical health, possess greater popularity, have greater dating and sexual experiences, and have higher intelligence and self-esteem when compared to unattractive individuals (Langlois, et al., 2000). Other studies, however, have suggested that there are only trivial relationships between physical attractiveness and measures of personality and mental ability but still confirm that attractive people tended to be less lonely, less socially anxious, more popular, more socially skilled, and more sexually experienced than unattractive people (for reviews see Eagly, Ashmore, Makhijani, & Longo, 1991; Feingold, 1992).
The vocal attractiveness stereotype
Similar to physical attractiveness, an analogous stereotype exists for attractive voices and, apparently, “what-sounds-beautiful-is-good” as well (Zuckerman & Driver, 1989; Zuckerman, Hodgins, & Miyake, 1990). People tend to associate positive personality traits with those who have attractive voices, and those with attractive voices were perceived as being competent, warm, and sympathetic (Zuckerman & Driver, 1989; Zuckerman, Miyake, & Elkin, 1995). Vocal attractiveness is also associated with maturity whereby participants whose voices were rated as sounding attractive were also thought of as being more mature (Berry, 1992). Further, vocal attractiveness was positively related to perceived leadership effectiveness when tested in a laboratory setting, and as such, political leaders with more attractive voices tend to be viewed as more effective leaders (DeGroot, Aime, Johnson, & Kluemper, 2011).
The acoustic properties that constitute an “attractive voice” have been examined in the literature; however, there is not overwhelming consensus as to which features best define it. It has been shown that women generally perceive high-pitched male voices to be unattractive (Riding, Lonsdale, & Brown, 2006) and prefer deeper male voices (Collins, 2000; Feinberg, Jones, Little, Burt, & Perrett, 2005; Riding et al., 2006), while men perceive higher pitched female voices as sounding more attractive (Collins & Missing, 2003; Feinberg, DeBruine, Jones, & Perrett, 2008; Jones, Feinberg, DeBruine, Little, & Vukovic, 2008). On the other hand, Daniel and McCabe (1992) found that mid-pitched voices for both sexes sounded the “most sexy,” suggesting that voices with pitches that deviated too far from the average (within each sex’s average voice pitch range) could indicate hormonal abnormalities. Oguchi and Kikuchi (1997) found that both males and females evaluated a lower pitched voice with a small pitch range for both sexes as attractive, while Oksenberg, Coleman, and Cannell (1986) found the opposite, a high pitch with greater variation was associated with voice attractiveness for both sexes. Babel, McGuire, and King (2014) suggested that a constellation of acoustic features which indicate apparent talker size and conformity to community speech norms contribute to perceived vocal attractiveness. Even though the exact acoustical properties of what constitutes an attractive voice are not agreed upon, most studies demonstrate a strong inter-rater reliability for what is subjectively perceived as sounding like an attractive voice for both sexes and when made by both same-sex and opposite-sex raters (Babel, McGuire, & King, 2014; Hughes, Harrison, & Gallup, 2002; Zuckerman & Driver, 1989).
Vocal and physical attractiveness associations
Several studies have documented a relationship between physical and vocal attractiveness. For instance, attractiveness ratings of people seen speaking in videos correlated with independent attractiveness ratings of faces, body, and speech for both males and females (Saxton, Burriss, Murray, Rowland, & Roberts, 2009), suggesting that individual modalities are indicative of overall attractiveness. It is believed that the attractiveness of the voice and face is especially related to women as both may signal common underlying cues of desired femininity (Feinberg et al., 2005). Collins and Missing (2003) also showed that physical and vocal attractiveness in women convey attractiveness together and separately, and women who were rated by participants as having a higher, thus more attractive-pitched voices were also rated as being overall more attractive and youthful. Similarly, Lander (2008) also found a positive relationship between vocal and visual attractiveness for women when participants were not only exposed to static pictures but when seeing moving faces in a video. For men, however, while the visual attractiveness of moving faces seen in videos was related to vocal attractiveness, it was not the case for the static pictures. Hughes and colleagues have shown that voice attractiveness relates to body attractiveness for different features such as bilateral body symmetry (Hughes, Harrison, et al., 2002; Hughes, Pastizzo, & Gallup, 2008) and sex-specific body configurations (Hughes, Dispenza, & Gallup, 2004). Furthermore, ratings of vocal attractiveness seem to be influenced by a person’s physical attractiveness, and vice versa, even when participants were instructed to pay attention to only one particular modality (Zuckerman, Miyake, & Hodgins, 1991).
The question remains as to whether people have a tendency to expect that physical and vocal attractiveness should be associated with one another and if a stereotype exists where people relate these two favorable characteristics together. While there is a large body of research addressing the attractiveness stereotypes, to our knowledge, no studies have directly compared voice and face attractiveness to test shared biases or have specifically examined these stimuli together in order to experimentally test to see if such a typecasting ensues.
It is not uncommon, however, for one to attempt to infer a speaker’s physical attributes when hearing only their voice, and more importantly, to get it correct. For instance, voice samples can be accurately matched to facial photographs over 75% of the time, and people were able to assess a speaker’s age, height, and weight with the same degree of accuracy as when seeing their photograph (Krauss, Freyberg, & Morsella, 2002). People are also able to accurately match the face and a voice together of unfamiliar persons when presented with either one voice and two faces or one face and two voices (Kamachi, Hill, Lander, & Vatikiotis-Bateson, 2003). Likewise, accurate assessments of sex-specific body configurations of speakers (i.e., waist-to-hip ratio for women and shoulder-to-hip ratio for men) can be made with a greater than chance accuracy by only hearing their voice (Hughes, Harrison, et al., 2009).
Current study
This study sought to test experimentally whether individuals have a tendency to associate attractive voices with attractive faces and, alternately, unattractive voices with unattractive faces. We hypothesized that people would hold a stereotype where they expect that physical and vocal attractiveness should be related to each other. If stereotypes exist separately for what is beautiful looking is perceived as being good (Dion et al., 1972) and what sounds beautiful is good (Zuckerman & Driver, 1989), it would stand to reason that both would elicit an expectation that these positive traits should specifically relate to one another. Thus, we predicted that participants would pair attractive voices and faces and unattractive voices and faces and do so without being given any information about the ratings of attractiveness for either modality. To our knowledge, no studies specifically have examined whether voice and face attractiveness are perceived to be related using facial and vocal stimuli in an experimental design.
Method
Participants
Fifty-five participants (25 men and 30 women) with a mean age of 21.04 (SD = 3.05; range = 17–32) were recruited from summer programs, classes, and work staff at a small liberal arts college in the northeastern U.S. to take part in the study, and all received an entry into a small cash raffle as compensation for their participation. Informed consent was obtained including parental consent for the two participants aged 17. The majority of participants reported being White (80%), followed by 13% Black, 5% Asian, and 2% Hispanic.
Stimuli
Vocal stimuli were obtained from previous investigations (see Hughes et al., 2004; Hughes & Harrison, 2013; Hughes, Harrison, et al., 2002) and were of individuals counting from 1 to 10 at a pace of approximately one numeral per second. A total of 40 voice samples were selected, half of which were male voices, half female voices. The selected voices were previously judged for vocal attractiveness by independent raters to be either attractive (M = 5.06, SD = 0.49) or unattractive (M = 2.74, SD = 0.57) calculated on 7-point rating scales, and groups were significantly different from one another, t(38) = 13.83, p < .001. An equal proportion of male and female voices was selected to be included in each attractiveness level group, and all chosen were Caucasian and did not have a discernible accent distinct from our region. The age of the attractive speakers (M = 20.6, SD = 3.4) did not significantly differ from the unattractive speakers (M = 21.9, SD = 5.6), t(38) = −0.92, p = .362.
A set of 80 facial photographs of men and women were obtained from the Internet from public, open-access websites. Strict criteria were followed for a particular facial picture to be included as part of the study; all photographs had to be of clear quality when standardized for presentation size, the person had to be looking directly forward into the camera without any rotation of the head, the person had to have a neutral facial expression, none wore glasses or had distinctive facial features (i.e., facial moles, beards, and so on.), and all those chosen were Caucasian and were judged by independent raters to be between the ages of 18 and 29. For both sexes, half the pictures selected were those in which independent raters gave high facial attractiveness ratings (M = 8.38, SD = 1.05) using a 10-point rating scale, while the other half were rated as unattractive (M = 2.38, SD = 1.06), t(78) = 25.44, p < .001. Pictures were cropped so that only the face could be seen with minimal head hair exposure and background contextual cues.
Procedure
All procedures were approved by the local institutional review board. Participants took part in this study on an individual basis. After obtaining informed consent, participants completed a brief demographic questionnaire regarding the information reported above. Participants were then shown a slideshow presentation using SuperLab 4.5.4 software and saw two faces per screen, an attractive face paired with an unattractive face of the same sex, accompanied by one voice recording (either an attractive or unattractive voice) of a person counting from 1 to 10 of the same sex as the pictures shown. Participants were exposed to all 40 voices and 80 pictures, employed as a within-subject design.
Participants were then asked to choose which of the paired pictures were more likely to be the speaker of the voice sample they heard by pressing one of two keys (A or B) on the response keyboard. The next stimulus was presented only after a response was made on the keyboard, and participants could repeat hearing a voice once more if they wanted. We elected to present one voice recording with two facial pictures to choose from rather than one face with two vocal samples since the visual processing of seeing two faces on one screen can be done instantly, whereas hearing two voice recordings (each lasting for approximately 10 s) to match to one picture of a face could not happen as instantly for the sake of an easy comparison.
All participants were told that the voices and faces were not actually collected together and were not an actual match so as to avoid participants choosing pictures they thought were either better quality or looked like they were obtained in an experimental setting. They were simply asked to determine who they thought would be more likely possess the voice heard using a forced-choice paradigm. Participants were also not told that both the voices and pictures were previously rated for attractiveness nor were told that attractiveness was the variable of interest for the facial or vocal stimuli so as to reduce any demand characteristics. All paired photos were generally matched for certain physical characteristics such as appearance of skin tone, hair color, and age. We counterbalanced which voices were paired with different faces, which two faces were paired together, the side of screen each picture was presented, as well as the overall order of presentation slides so as to avoid any order effects.
Results
We calculated the proportion of times each rater had matched a voice and face for attractiveness level (i.e., paired an attractive voice with the attractive face and an unattractive voice with the unattractive face) when presented with the two face options and compared this against a 50% chance level, using a single-sample t-test. Raters scored significantly better than chance at pairing voice and face that were matched for attractiveness level (M = 72.6%, SD = 11.1), t(54) = 15.14, p < .001, Cohen’s d = 2.04, considered to be a large effect (Gravetter & Wallnau, 2007, p. 259). The sex of the rater showed no difference for matching voice and face attractiveness levels (male raters: M = 73.5%, SD = 12.4; female raters: M = 71.9%, SD = 10.0), t(53) = 0.52, p = .603.
We also divided the data to examine the rates of matching attractive voices with attractive faces in comparison with matching unattractive voices to unattractive faces. Separately, the proportion of times that a participant matched an attractive voice to an attractive face also exceeded chance (M = 65.4%, SD = 13.9), t(54) = 8.17, p < .001, as did matching an unattractive voice to unattractive face (M = 80.0%, SD = 13.8), t(54) = 16.12, p < .001 (see Figure 1).

Mean percent of matching attractive voices to attractive faces and unattractive voices to unattractive faces by participants.
We considered the raters’ sex and whether the participants had a stronger propensity to match the attractive stimuli versus the unattractive stimuli. A 2 (attractive vs. unattractive stimuli pairing) × 2 (sex of rater) mixed-model analysis of variance (ANOVA) was conducted on the proportion of times a rater had matched a face and voice on attractiveness level. There was a main effect for attractiveness level, and participants had matched unattractive voices and faces (M = 80.0%, SE = 1.8) more frequently than they matched attractive voices and faces (M = 65.4%, SE = 1.9), F(1, 53) = 41.65, p <.001, η2 = .440. However, there was no main effect for rater sex, F(1, 53) = 0.27, p = .603, nor a significant interaction between rater sex and attractiveness level of stimuli, F(1, 53) = 0.12, p = .731.
To account for both the rater sex and the sex of target stimuli, we conducted a 2 (sex of rater) × 2 (sex of target stimuli) mixed-model ANOVA on the proportion of times a rater matched the attractiveness level of a face and voice for male and female stimuli. There was no main effect for the sex of rater, F(1, 53) = 0.27, p = .603, nor for the sex of stimuli, F(1, 53) = 1.48, p = .230. Likewise, we found no interaction between sex of rater and sex of target stimuli, F(1, 53) = 0.46, p = .502.
Further, we wanted to examine whether pairings of more moderately attractive/unattractive pictures still produced this stereotype effect and it was not just driven by extreme cases. While we had selected pictures to be categorized in one of two categories (attractive vs. unattractive), there were some pictures included whose mean attractiveness rating fell closer to the median score on the rating scales and could be classified as representing more of a moderate degree of attractiveness/unattractiveness than extreme. Therefore, in some cases, moderately attractive pictures were paired with moderately unattractive pictures and ratings of attractiveness between the two photos did not differ as greatly. Thus, we first computed the absolute difference score for attractiveness ratings for each paired picture and then divided this into three categories using the mean difference in our distribution (M = 6, SD = 1.50). We divided each paired set of pictures presented into: (1) those with smaller differences in attractiveness ratings (
Likewise, we conducted an analysis on the vocal stimuli to take into consideration whether more highly versus more moderately attractive/unattractive stimuli may have influenced this stereotyping effect. Given our sample mean rating for voice attractiveness (M = 3.9, SD = 1.3), we standardized voice attractiveness scores and divided voice samples into four categories using 1 standard score above and below the mean: (1) highly unattractive, (2) moderately unattractive, (3) moderately attractive, and (4) highly attractive. We then calculated the proportion of times each voice sample was matched with a face for attractiveness level (i.e., paired an attractive voice with the attractive face and an unattractive voice with the unattractive face) when presented with the two face options and compared this against a 50% chance level, using a single-sample t-tests. For each voice category, the stereotyping effect remained and the pairing of attractiveness for voice and face exceeded chance level for each type of voice: highly unattractive voices (M = 88.2%, SD = 8.3, t(13) = 17.23, p < .001), moderately unattractive voices (M = 75.5.%, SD = 12.8, t(25) = 10.13, p < .001), moderately attractive voices (M = 59.8%, SD = 17.1, t(21) = 2.69, p = .014), and highly attractive voices (M = 72.4%, SD = 14.6, t(17) = 6.49, p < .001). Nonetheless, an independent ANOVA showed that there were differences in the proportion of times a voice was matched with face attractiveness level, F(1, 76) = 12.41, p < .001. Post hoc analysis revealed that voices rated to be highly unattractive were matched to unattractive faces the most frequently among voice samples, and all other pairwise comparisons were significantly different from one another (p < .05), except for voices rated as moderately unattractive compared to highly attractive. There was also a significant negative correlation between voice attractiveness ratings and the proportion of times a voice was matched with the corresponding face that shared attractiveness level, r = −.406, p < .001. A partial correlation controlling for the discrepancy of physical attractiveness between the two pictures shown had not affected this relationship, r = −.407, p < .001.
Discussion
This study demonstrated that people have a tendency to associate attractive voices with attractive faces and unattractive voices with unattractive faces and confirmed the existence of the what-sounds-beautiful-looks-beautiful stereotype. Because both facial and vocal attractiveness seem to interact together to influence overall perceptions of attractiveness (Feinberg et al., 2005; Saxton et al., 2009; Zuckerman et al., 1991), our findings demonstrate that people hold the expectation that these two dimensions of attractiveness should relate together. Not only is there a propensity to associate face and voice attractiveness, but people seem to become vexed when voice attractiveness and physical attractiveness do not match (Zuckerman & Sinicropi, 2011). In fact, it has been repeatedly shown that targets who are discordant on facial and vocal attractiveness elicit overall negative interpersonal impressions than targets who do show such a match (Miyake & Zuckerman, 1993; Zuckerman & Hodgins, 1993; Zuckerman et al., 1995). Even when considering stimuli that were more moderately rated for attractiveness/unattractiveness levels, participants in our study were still matching faces and voices for similar attractiveness levels, so it appears that extreme cases were not necessarily driving this stereotyping effect.
Because several studies have documented a correlation between face and voice attractiveness (Collins & Missing, 2003; Feinberg et al., 2005; Lander, 2008; Saxton et al., 2009), these findings may be a product of being able to accurately match voices and faces together, given past experiences of seeing these traits interrelated. Indeed, people are capable of matching an unknown person’s voice to their face with greater than chance accuracy (Kamachi et al., 2003; Krauss et al., 2002). However, it is important to note that other studies have reported no correlation between physical and vocal attractiveness (as reported in Zuckerman et al., 1991: mean r across the three studies = .12; Zuckerman & Driver, 1989; Zuckerman et al., 1990), especially for males when using static pictures or when examining same-sex judgments (Lander, 2008), so it is not entirely clear if one’s experience with seeing physical and vocal attractiveness relate in actuality is a causal interpretation for this stereotype. Nonetheless, the aim of the study was not an attempt to resolve these inconsistencies in the literature as to whether face and voice attractiveness correlate in reality but rather was to examine the perceptions of making this association and whether people think they are related in the case of stereotypical decision making.
Despite whether vocal and physical attractiveness actually covary in individuals, we may have a tendency to recall instances where there is a mismatch between the attractiveness levels of a person’s voice and face in comparison with times when our expectations for a match hold true. Examples of such recollections are embedded in common expressions of saying someone has “a face for radio,” which illustrates that idea that we expect radio disk jockeys with great voices to be far more attractive than they actually are and are disappointed when they are not attractive. Similarly, we can be taken aback when seeing a highly attractive individual speak for the first time and hear a very unattractive voice. In cases when voice and face attractiveness are discrepant, perceivers tend to be overly disappointed with the less attractive channel (Zuckerman & Sinicropi, 2011). Thus, we may easily recollect mismatches between attractiveness levels of voices and faces and think they are a more common event. But this salient recall for such instances may rely upon our expectation that physical and vocal attractiveness should be related. We may be more inclined to not notice or forget the majority of times when vocal and physically attractiveness levels align and only have recollection when they are not. The phenomenon known as the distinctiveness effect, or the idea that information that is unusual or distinct is remembered better than common information (Waddill & McDaniel, 1998), may be an explanation for the recollection of times when visual and vocal attractiveness are incongruent. As an example, people are more likely to recall sentences that depict unusual events better than sentences that portrayed common occurrences (McDaniel, Dornburg, & Guynn, 2005). Similarly, when it comes to faces, it was found that distinct faces were more accurately recognized than typical faces, and names of distinct faces were also recalled more easily (Watier & Collin, 2012).
While this study was a first attempt to document the stereotype of what sounds beautiful looks beautiful, it also documented the reverse: what does not sound beautiful does not look beautiful. In fact, there was an even stronger propensity to pair unattractive voices to the unattractive faces than for the attractive voice–face matching. For the vocal stimuli in particular, the more highly unattractive a voice was rated as sounding, the greater the chance it was matched to the unattractive face. This finding is consistent with studies that show a reverse halo effect, the concept of formulating an overall negative opinion given one negative trait such as unattractiveness. In fact, Griffin and Langlois (2006) found that unattractiveness can be more of a disadvantage than attractiveness an advantage, and the aphorism of “unattractiveness-is-bad” prevailed more when participants rated unattractive individuals as being less helpful and having a lower intelligence than it was the other way around, possibly because negative information was more influential. In another study, where participants were asked to make employee dismissal decisions, unattractive women were more frequently “terminated” from the job than were either moderately or extremely attractive woman (Commisso & Finkelstein, 2012). Our legal system also appears to be affected by a negativity bias for unattractiveness. Several studies have shown unattractive defendants were evaluated with more certainty of guilt and assigned harsher punishments than attractive individuals in both simulated and actual trials (Efran, 1974; Mazzella & Feingold, 1994). Furthermore, attractiveness seems to influence the perception of a person’s mental health; when participants were presented with an audio recording of a staged psychological interview with a person and were either shown a photo of an unattractive individual, a photo of an attractive individual, or no accompanying photo, participants who were shown the unattractive photo attributed greater maladjustment and disturbances with a poorer prognosis to that person than the other conditions (Cash, Kehr, Polyson, & Freeman, 1977). Given these findings that highlight the effects if unattractiveness, it makes sense why there was an even greater inclination to associate negative traits together (i.e., unattractive voices with unattractive faces) than there was to associate positive traits together (i.e., attractive voices with attractive faces).
We should acknowledge that one limitation of this study was that standardized photos were not obtained and the Internet was used to gather our facial stimuli. However, using photos from the Internet made it less likely that our raters would have known or been acquainted with those in the photos had we sampled them ourselves. Moreover, the Internet provided a far more vast population to choose from in order to get extreme cases of attractive and unattractive facial pictures than would be from sampling for such a study. Along those lines, future investigations may wish to consider more average-looking faces and average-sounding voices and mix and match one attractive/unattractive modality with an average counterpart to further examine this phenomenon as the next step beyond this initial, exploratory investigation. It is also possible that participants may have guessed the task at hand and responded consistently to the stereotyping, even though our methodology attempted to reduce any demand characteristics by not informing participants that face and voice stimuli varied in attractiveness nor had we brought to the participants’ attention that attractiveness was a variable of interest. While we did not officially record whether or not participants had deciphered the aims of the study, we can tell you anecdotally that during the debriefing session, most participants did not figure out the aim of the study when asked, and some conveyed surprise when they were informed. Future investigations may also use other implicit measures to assess this stereotyping such as taking reaction time measures during this decision-making task or by offering an array of pictures in which to match a voice rather than just a dichotomous picture choice. Future research continuing this line of work could also examine voices and faces that are an actual match of a person which could then extend the external validity of these findings. It would also be interesting to examine whether this effect is sustainable using dynamic images of a person while speaking (i.e., a video) versus using only static facial photos.
The present study provides an extension of the physical and vocal attractiveness biases and demonstrates how individuals expect that vocal and physical attractiveness should be related to one another. We had made a prediction that people would show tendencies to match facial and vocal attractive and unattractive stimuli, and we substantiated our hypotheses by showing this was the case experimentally. Knowledge that such a stereotype exists could especially have implications or practical applications for the fields of broadcasting, telemarketing, and entertainment.
Footnotes
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
