Abstract
Social perceptions of speakers are influenced by their voice information, including vocal characteristics and semantic content. Our study investigated how individuals’ warmth- and competence-related perceptions of speakers were affected by vocal pitch levels (i.e., high/low) and three kinds of semantic cues (i.e., prosocial, antisocial, and neutral) simultaneously. We have three key findings. First, antisocial cues negatively affected social perceptions, regardless of speakers’ gender. However, prosocial cues did not have positive impacts on evaluations of speakers because ratings were similar between prosocial cues and neutral cues. Second, female vocal pitch mattered for warmth-related perceptions but not for competence-related perceptions. The role of semantic cues should be additionally considered when investigating the impact of male vocal pitch on these perceptions. For example, higher-pitched men in prosocial contexts were perceived as warmer, while low-pitched men in antisocial contexts were judged as more competent. Third, the connection between vocal pitch and two kinds of perceptions showed an opposite trend, in which high pitch was related to more warmth but less competence, while the low pitch was associated with less warmth but more competence. These findings extend the understanding of the role of vocal pitch in the formation of stereotypes of strangers in different semantic contexts.
The human voice is a key bio-social marker for judging personal characteristics, such as gender stereotypes and personality. For example, people may say, “Her voice sounds masculine” or “I like his voice because it makes me feel warm.” A meta-analysis study showed that 41.6% of the variance in gender-related perceptions could be attributed to voice pitch (Leung et al., 2018). Generally, high-pitched women sound more feminine and approachable but less competent (Krahé et al., 2021; Oleszkiewicz et al., 2017). In addition to pitch, voice-based judgments may also rely on semantic cues. It has been shown that low-pitched men are considered more reliable when they convey prosocial rather than antisocial messages (O'Connor & Barclay, 2018). To the best of our knowledge, previous research mainly focused on the relation between voice properties and social perceptions of speakers (e.g., trustworthiness), with limited attention on whether this relationship varies across multiple semantic contexts. To fill this research gap, the present study aims to test whether the interaction between a key voice property—pitch and semantics shapes the perception of speakers’ socially relevant traits (e.g., warmth, competence), especially in Chinese contexts. In practice, this research may draw public attention to stereotypical impressions based on vocal cues and further provide insights for future interventions to reduce stereotypes.
Vocal Pitch and Gender-Stereotypic Traits
Voice provides important gender-related information in everyday interactions. An individual can quickly identify a speaker's gender by his or her voice. A key feature of the voice that elicits gender-related judgments about the speaker is the fundamental frequency (F0) of the voice, also called vocal pitch (Fitch & Giedd, 1999; Leung et al., 2018; Titze, 1994). Vocal pitch development in males and females follows different trajectories before physical maturation (Tsantani et al., 2016). Children's vocal pitch is relatively high, regardless of their genders. Throughout puberty, vocal pitch decreases for both genders, but the decline is much greater for boys than for girls. After puberty, vocal pitch becomes stable, with males having a lower vocal pitch (F0 range: 80–185 Hz) than females (F0 range: 165–255 Hz) (Titze, 1989; Tsantani et al., 2016).
In general, higher-pitched speakers tend to be perceived as more feminine, while low-pitched speakers are perceived as more masculine. Research has indicated that voice pitch is associated with gender-stereotypic traits (i.e., masculinity/femininity). For example, Ko et al. (2006) found that people with a higher-pitched voice were more likely to be stereotyped as more feminine than those with a lower-pitched voice, regardless of their gender. Similarly, Krahé and Papakonstantinou (2020) also found that a female was perceived as more feminine if her vocal pitch was between 165 and 220 Hz. These findings suggest that vocal pitch is significantly linked to perceived femininity or masculinity.
Vocal Pitch and Social Perception
Voice not only shapes individuals’ gender-stereotypic judgments of speakers but also influences stereotypical perceptions of speakers’ social traits. A stereotype content model (SCM; Cuddy et al., 2008; Fiske et al., 2007) proposes that there are two fundamental dimensions—warmth and competence that exist in social cognition when people form interpersonal impressions of others. According to this model, people tend to first evaluate a stranger's intent, which either helps or harms them (i.e., warmth dimension), and then determine the stranger's ability to act in accordance with the perceived intent (i.e., competence dimension). Therefore, the warmth dimension generally reflects traits that are associated with perceived intent, such as friendliness, trustworthiness, and sincerity, whereas the competence dimension captures traits that are related to perceived ability, like efficacy, skill, and capability.
The SCM has been widely used to explain how people develop stereotypes about certain social groups or individuals (e.g., immigrants, Caprariello et al., 2009; anti-Asian Americans, Lin et al., 2005). However, the investigation of social perceptions of others in previous studies has been mainly generated by visual materials. For example, in Caprariello et al.'s (2009) study, participants were asked to perceive the warmth and competence of people in an unfamiliar ethnic group on the basis of reading a description of such a group. Little has applied the SCM to explore how stereotypes of others would be established if people have only auditory stimuli provided by the targets, although limited studies have examined how the voice affects one or two perceptual characteristics, like trustworthiness or competence (Oleszkiewicz et al., 2017; Vukovic et al., 2011). Considering that the two core dimensions of warmth and competence proposed by the SCM contribute to over 80% of the variance in overall judgment for others (Fiske et al., 2007; Wojciszke et al., 1998), we selected warmth- and competence-relevant traits as benchmarks to provide a more complete picture of how individuals would perceive others solely based on their voice.
Previous research has examined the relationship between speakers’ vocal pitch and their perceived social traits. Some researchers were interested in the link between speakers’ vowel (i.e., /a/, /e/, /i/, /o/, /u/) production and listeners’ perception of attractiveness, trustworthiness, and competence (Borkowska & Pawlowski, 2011; Feinberg et al., 2005; Munson, 2007; Oleszkiewicz et al., 2017; Vukovic et al., 2011). For instance, Oleszkiewicz et al. (2017) found that lower-pitched men and women were both perceived as more competent and reliable than their higher-pitched counterparts when they pronounced vowels. Concerning warmth-related judgment of speakers, higher-pitched women were considered warmer than lower-pitched women whereas men's voices were not related to the warmth-related perception (Oleszkiewicz et al., 2017).
Other research focused on how vocal pitch influenced first impressions through saying “hello.” Perceived trustworthiness and dominance are two major traits that have been investigated in these studies (McAleer et al., 2014; Tsantani et al., 2016). McAleer et al. (2014) found that people tended to show a greater preference for higher-pitched speakers, regardless of gender, and displayed more trust in them. In terms of the dominance trait, the perception of male and female speakers was reversed, in which low-pitched women but high-pitched men were perceived as more dominant. However, these findings were different from Tsantani et al.'s (2016) study, where they found that lower pitches were more likely to lead to a stronger sense of trustworthiness for both male and female speakers, but they were irrelevant to perceived dominance. Conversely, Wu et al. (2021) found that high-pitched females were rated higher in the dimension of dominance in a Chinese context. These mixed findings may be attributed to the use of meaningless semantic contexts (e.g., vowel pronunciation). Even though some meaningful words (e.g., “hello”) have been chosen as the speech materials in some studies, they may not allow listeners to make consistent judgments about speakers in such a short word pronunciation.
Vocal Pitch, Semantic Cues, and Individuals’ Perception
In daily life, voice has the role of conveying meaningful semantic content in addition to its acoustic properties (e.g., vocal pitch). A more accurate impression of a speaker cannot be formed if people purely rely on the vocal pitch or the semantic content. Research has investigated the role of sematic cues in the relation between vocal pitch and listeners’ social perceptions of speakers. For example, Jones et al.'s (2008) work demonstrated that prosocial semantic information increased men's positive evaluations of female speakers with raised pitch. Men showed stronger preferences for female speakers who appeared interests in them (i.e., “I really like you”) than those who did not (i.e., “I don't really like you”) even though the female speakers had similar high-pitched voices. Another study examined whether males’ vocal pitch interacted with social semantics to influence perceived trustworthiness and attractiveness (O'Connor & Barclay, 2018). In their study (O'Connor & Barclay, 2018), male speakers were asked to record some words with opposite meanings (e.g., caring-cheater). Their voices were artificially manipulated to be higher and lower in pitch. The results indicated that lower-pitched men were perceived as more trustworthy and attractive in a prosocial semantic context (e.g., caring) than in an antisocial semantic context (e.g., cheater). These findings emphasize the importance of considering semantic cues when investigating social perceptions of voices.
Although the effect of interaction between vocal pitch and semantic cues on social perceptions of speakers has been investigated in several studies (Jones et al., 2008; O'Connor & Barclay, 2018), we have some concerns about the auditory stimuli adopted in previous studies. First, most people speak at least one long sentence or paragraph to convey meaningful content in real-life communication. Prior uses of vowel pronunciation and words may not stimulate authentic perceptions of speakers’ social characteristics. Second, using single words with certain semantic meanings is also inappropriate in the Chinese context. Unlike English, Mandarin has four basic tones (Keating & Kuo, 2012). The same syllable in different tones may have different meanings correspondingly (e.g., /shui jiao/: /shuì jiào/睡觉sleep; /shuǐjiǎo/水饺dumplings). Sometimes the same pronunciation could also have different meanings, like/gōng fū/could be Kungfu (功夫) and efforts (工夫). Given such situations, only using one-word utterances may result an inconsistent understanding of vocal stimuli across different listeners.
Third, past research focused on the role of either prosocial or antisocial semantic cues in moderating the influence of vocal pitch on social perceptions (O'Connor & Barclay, 2018), with little attention on neutral utterances. However, an aversion to antisocial cues does not mean a preference for prosocial content. Incorporating voice with neutral meanings may help fully understand how various types of semantic cues affect social perceptions of speakers. Fourth, compared with English speakers, the average range of Mandarin speakers’ vocal pitch is wider (Keating & Kuo, 2012), which may diversify voice-based evaluations of them. Hence, it is worthwhile to explore how vocal pitch affects social perceptions in an understudied Chinese context. Last, researchers previously manipulated voice pitch to be more masculine or feminine, and asked listeners to judge speakers’ social traits based on the manipulated voices (Klofstad et al., 2012; Krahé & Papakonstantinou, 2020; O'Connor & Barclay, 2018; O'Connor et al., 2012). But such manipulation may bring about listeners’ biased preferences for masculine voices. Feminine voices may always be elevated to a harsh level beyond normal listeners’ acceptance (Knowles & Little, 2016). Therefore, using natural voices as stimuli may decrease participants’ biases against certain speakers due to abnormal voice pitch.
Overview of the Present Study
To address the above concerns, our study followed a 2 (pitch: high/low) × 3 (semantic cues: prosocial/antisocial/neutral) × 2 (rating contexts: warmth/competence) factorial design to disclose how vocal pitch interacts with different semantic cues to influence warmth- and competence-related perceptions of speakers. In view of potential differences in voice-based perceptions about females’ and males’ social traits, for instance, perceiving a high-pitched female voice but a low-pitched male voice as dominant (McAleer et al., 2014), we separately investigated people's perceptions about speakers’ social traits based on female and male voices. Specifically, we first tested whether a vocal pitch and semantic cues independently affected social perceptions of female and male voices, respectively. Then we examined if the interaction between vocal pitch and semantic cues affected social perceptions. We also checked whether the indicators of participants’ evaluations (i.e., warmth and competence) would interact in an additive manner. This study may extend the understanding of how voice information, including vocal pitch and semantic content, would influence stereotypes toward Chinese speakers.
Method
This study consisted of two parts: a pilot study and a main experiment. In the pilot study, we created different semantic sentences and asked several speakers to record these sentences. Two factors were manipulated, including semantic cues (prosocial vs. antisocial vs. neutral) and vocal pitch (high vs. low). The voice stimuli chosen in the pilot study were used in the main experiment. With the assistance of Qualtrics Software, we generated online questionnaires for both the pilot study and the main experiment.
Pilot Study
Procedure
The purpose of the pilot study is to select voices for both genders, which were used in the main experiment. The pilot study contained two steps: (1) determining the content of recordings (i.e., sentences with different semantic cues) and (2) selecting speakers to record the content (high-pitched vs. low-pitched). First, we created six sentences with prosocial semantic cues and six sentences with antisocial semantic cues based on the definition of “prosocial” and “antisocial”. We also created six sentences that described some basic facts to represent neutral semantics. Then we invited 24 participants to evaluate the social orientations of these 18 sentences based on their semantics. Participants received a link to an online questionnaire in which they were asked to rate these sentences on a 9-point scale from 1 (extremely antisocial) to 9 (extremely prosocial). In the online questionnaire, the order of the sentences was randomized. Out of these 18 sentences, we finally selected one sentence with prosocial semantics, one sentence with antisocial semantics, and one sentence with neutral semantics. Next, we recruited 16 speakers, including 8 females and 8 males, to record the three sentences, respectively. We used the Praat software to analyze the average vocal pitch of these voices, and finally selected two relatively high-pitched voices and two relatively low-pitched voices for each gender. These eight voices were used in the main experiment.
Determination of Recordings/Voice Content
We created 18 written sentences. Among them, six conveyed prosocial messages, six conveyed antisocial information, and the other six described neutral facts (see Table S1 in Supplemental Materials). According to the definition of “prosocial,” our prosocial sentences describe “sharing, cooperating, helping, feeling empathy and caring for others” (Radke-Yarrow et al., 1983, p. 528), and altruism, which manifests in sacrifice, or normative behaviors (Batson & Powell, 2003). Exemplary prosocial sentences include “I obey social rules” and “I enjoy sharing with others.” Sentences with antisocial cues describe antisocial or delinquent behaviors, such as general deviance (e.g., theft, cheating on exams, or coming to school late) and disobedience to parents (e.g., arguing with parents) (Hindelang et al., 1981). Sampling sentences include “Sometimes I cheat on the exam” and “I often argue with my parents.” Neutral sentences describe some basic facts without social orientations, such as “The high-speed railway from Beijing to Shanghai arrived” and “The car in front is white.” Each sentence contains 8–9 Chinese characters.
To identify whether each sentence valence reflects pre-determined social orientation, we asked 24 participants to rate the social tendency of the semantics of each sentence on a 9-point scale ranging from 1 (extremely antisocial) to 9 (extremely prosocial). A priori power analysis has been conducted for sample size estimation. With α = 0.05 and (1-β) = 0.80, the analysis suggested that we needed at least N = 18 sample size to meet a medium effect size (f = 0.25) for repeated analysis of variance (ANOVA) measure. All participants (Mage = 23.54, SD = 2.57, Nfemale = 14) were recruited from the Internet through convenience sampling. They were all Chinese and consented to participate in our study.
We calculated the average score of each sentence and selected one for each dimension (i.e., prosocial, antisocial, and neutral). More specifically, the sentence with the highest mean score (M = 7.67, SD = 1.55) was chosen as the sentence with prosocial semantics (i.e., I enjoy sharing with others), while the sentence with the lowest mean score (M = 2.50, SD = 1.72) was considered conveying antisocial semantics (i.e., Sometimes I cheat on the exam). The sentence with an average score (M = 5.04, SD = 1.76) closest to 5 was regarded as the neutral sentence (i.e., The car in the front is white). A repeated-measure ANOVA was conducted to test the mean difference between these three sentences. The results showed that the mean scores of these three sentences were significantly different, F (2, 46) = 55.41, p < .001. Pairwise comparisons in post hoc analyses revealed that the mean score of the prosocial sentence (M = 7.67) was significantly higher than that of the neutral sentence (M = 5.04) and antisocial sentence (M = 2.50), whereas the mean score of the neutral sentence (M = 5.04) was also significantly higher than that of the antisocial sentence (M = 2.50), ps < .001.
Selection of Speakers
We recruited 16 speakers, including eight females and eight males, to record the selected three sentences. These speakers were all native Chinese speakers. Their average age is 28.42 years old (SD = 2.57). They were required to record three sentences one by one in a natural tone in a quiet room. Each speaker provided three recordings in total, and each recording contains one selected sentence. We used the Praat software to remove noise in recordings and controlled the loudness at around 62 dB (sounds at or below 70 dB are generally considered safe). Then we used Praat to analyze the mean pitch of each recording (see Table S2 in the Supplemental Materials). Among the 16 speakers, we finally selected two speakers with relatively high-pitched voices and two speakers with relatively low-pitched voices for each gender group (see Table 1).
Descriptive statistics of selected speakers.
Main Experiment
Procedure
In the main study, eight voices (24 recordings) selected from the pilot study were used. Given that individuals may have different perceptions of female and male voices, e.g., perceiving a high-pitched female voice but a low-pitched male voice as dominant (McAleer et al., 2014), we created two online questionnaires to study female and male voices separately. Each online questionnaire included 12 recordings provided by two high-pitched speakers and two low-pitched speakers. Each speaker provided three recordings, including one recording of a prosocial sentence, one recording of an antisocial sentence, and one recording of a neutral sentence. In other words, the 12 recordings could be categorized into six conditions (i.e., high pitch × prosocial, high pitch × antisocial, high pitch × neutral, low pitch × prosocial, low pitch × antisocial, and low pitch × neutral), with two voices for each condition. The order of the12 recordings has been randomized in each questionnaire. Participants listened to the recordings one by one and rated nine items about voice-based perceptions. Each recording could be played repeatedly until all questions corresponding to the voice had been answered.
We recruited participants through convenience and snowball sampling. We posted a message to recruit potential participants on Chinese social media platforms, such as WeChat and Sina Microblog. In this message, the purpose of our study and two online survey links were provided. Participants were free to choose one of the links to complete the online questionnaire. Before taking the questionnaire, they were told to sit in a quiet room without interruption and prepare their earphones. It took each participant about 15 min to finish the questionnaire.
Participants
A total of 139 participants have fully completed our questionnaires. They were all Chinese with normal hearing. Of them, 70 participants (Mage = 26.60, SD = 4.11, Nmale = 35) completed the questionnaire about female voice stimuli. The other 69 participants (Mage = 25.86, SD = 4.64, Nmale = 30) finished the questionnaire about male voice stimuli. To ensure an adequate sample size, we performed a statistical power analysis with G*Power program. The G*Power calculation result pointed out that we needed at least 15 participants to reach a medium effect size of 0.25, for a power of 80% in repeated ANOVA test (α = 0.05).
Voice Stimuli
Twenty-four voice recordings were used in the main experiment. These recordings were obtained from four female speakers and four male speakers. Each speaker provided three recordings: one containing a prosocial sentence, one containing an antisocial sentence, and one containing a neutral sentence. The descriptive statistics of each speaker are presented in Table 1. It is suggested that a difference of approximately 40 Hz between voices is large enough to cause significant differences in perceived traits of speakers (Feinberg et al., 2005; Tsantani et al., 2016). In our study, the difference in mean pitch between the high-and low-pitched female voices we selected is about 48 Hz and that between high-and low-pitched male voices is about 61 Hz, which is large enough to elicit different perceptual judgments of the speakers’ personality traits. The approximate length of each recording is 3.11 s.
Measures
Voice Pitch
Voice pitch, also known as fundamental frequency (F0) (Hz), is an index for the vibration frequency of people's vocal folds per second (Tusing & Dillard, 2000). The maximum, minimum, and mean vocal pitch of each voice and the standard deviation of pitch were extracted via Praat software (see Table 1).
Perceived Gender-Related Traits
Perceived gender-related traits refer to speakers’ masculinity or femininity sensed from their voices. In general, a higher-pitched voice is perceived as more feminine whereas a lower-pitched voice is considered more masculine. In this study, we asked participants to report their perception of gender-related traits (i.e., femininity/masculinity) of a certain voice stimulus on a 9-point scale ranging from −4 (very masculine) to 4 (very feminine). Higher scores indicated that participants perceived the voice to be more feminine.
Social Perception
The social perceptions of a voice were measured by two fundamental dimensions: warmth and competence. According to the SCM (Cuddy et al., 2008), the warmth dimension reflects traits that are associated with perceived intent, such as trustworthiness and friendliness, whereas the dimension of competence captures traits that are related to ability, like efficiency and confidence (Fiske et al., 2007). Therefore, in this study, the dimension of “warmth” was assessed through five traits, including warmth, trustworthiness, friendliness, likeability, and sincerity, whereas the dimension of “competence” was assessed by four traits, which were competence, efficiency, confidence, and dominance. Followed by each voice stimulus, participants were asked, “According to the voice you hear, to what extent do you think the speaker is {TRAIT} (e.g., warm, trustworthy)?”, and were required to rate these nine items on a 7-point scale ranging from 1 (not at all) to 7 (extremely {TRAIT}).
Results
Preliminary Analyses
The first purpose of the preliminary analyses is to test whether the ratings (i.e., perceived gender trait, warmth-related traits, and competence-related traits) of two voices across each condition (i.e., high pitch × prosocial, high pitch × antisocial, high pitch × neutral, low pitch × prosocial, low pitch × antisocial, and low pitch × neutral) are consistent. We conducted a paired sample t-test and compared each rating of two voices across six conditions for each gender. The results showed that most ratings of two voices did not differ from each other, ps >.05 (see Tables S3 and S4 in Supplemental Materials). These findings suggest that the perception of two voices with similar levels of vocal pitch is comparable, particularly when they convey the same semantic cues. Given the similar ratings of two voices in each condition, we averaged each rating for both voices and used the mean scores in the following analyses.
The second purpose is to do a manipulation check. We tested whether the high-pitched voices and low-pitched voices we selected were significantly different in perceived gender traits (i.e., femininity/masculinity). As mentioned above, high-pitched voices are usually perceived as feminine, while low-pitched voices are regarded as masculine. We therefore measured the perceived gender traits of each type of voice stimuli even though there was a large disparity in vocal pitch between the high- and low-pitched voices we selected. The paired sample t-test showed that mean scores of the gender-related perception between high-pitched and low-pitched voices across three semantic cues were significantly different, ps< .001 (see Table 2).
Descriptive statistics and paired-sample t-test for gender-related perception between high- and low-pitched speakers.
Note. Prosocial sentence: I enjoy sharing with others. 我乐于与他人分享。 Antisocial sentence: Sometimes I cheat on the exam. 我有时会考试作弊。 Neutral sentence: The car in the front is white. 前面的汽车是白色的。 Gender-related perception rating range: −4 (very masculine) to 4 (very feminine). *** p < .001.
The last purpose of the preliminary analyses is to examine the factor loadings of “warmth” and “competence” scale items and the reliabilities of these two scales for each gender voice. Based on the SCM (Fiske et al., 2007), we conducted principal component analyses (PCA) to examine whether the items of warmth, trustworthiness, friendliness, likeability, and sincerity pertained to one dimension-warmth. The PCA revealed that only one factor with an Eigenvalue greater than 1.00 in each condition (i.e., high pitch × prosocial, high pitch × antisocial, high pitch × neutral, low pitch × prosocial, low pitch × antisocial, and low pitch × neutral). The Eigenvalue and cumulative % variance (i.e., how much the variance was accounted for by the factor) for each condition were presented in Table S5 in Supplemental Materials. All items met the 0.50 criterion for rotated factor loadings. A similar PCA was also conducted to examine whether the items of competence, efficiency, confidence, and dominance loaded on the same factor. The results indicated that only one factor with an Eigenvalue greater than 1.00 was generated by these four items in each condition, and most items met the 0.50 criterion. Then we tested the reliabilities of the “warmth” and “competence” scales for each condition. The high alpha coefficients for scores on the “warmth” (αs > 0.85) and “competence” (αs > 0.75) scales indicated that the items did measure the relevant dimensions (see Table 3). Therefore, in the following main analyses, we averaged the scores of items of warmth, trustworthiness, friendliness, likeability, and sincerity to represent the speaker's warmth-related perception. The mean scores for the items of competence, efficiency, confidence, and dominance were calculated to represent the competence-related perception of the speaker. The mean scores and standard deviations of each scale under six conditions were presented in Table 3.
Descriptive statistics for warmth- and competence-related perceptions and Cronbach's alpha of each measure.
Note. Prosocial sentence: I enjoy sharing with others. 我乐于与他人分享 。 Antisocial sentence: Sometimes I cheat on the exam. 我有时会考试作弊。 Neutral sentence: The car in the front is white. 前面的汽车是白色的。 Cronbach's α > 0.70 will be considered acceptable.
Main Analyses
We conducted a 2 (pitch levels: high/low) × 3 (semantic cues: prosocial/antisocial/neutral) × 2 (rating contexts: warmth/competence) repeated measures ANOVA for female and male voices separately. We tested whether the main effect of each factor and the interaction effect were significant. Post hoc analyses following up on each significant interaction were conducted. All the p-values for post hoc tests reported in the following analyses have been Bonferroni-corrected (Benjamini & Hochberg, 1995).
Main Analyses of Female Voices
A 2 (pitch levels: high/low) × 3 (semantic cues: prosocial/antisocial/neutral) × 2 (rating contexts: warmth/competence) ANOVA with repeated measures was conducted for female voices. The main effects of pitch levels [F (1, 69) = 32.00, p < .001, ηp2 = .317] and semantic cues [F (2, 138) = 54.00, p < .001, ηp2 = 0.439] were significant but the three-way interaction was not [F (2, 138) = 1.24, p = 0.291, ηp2 = 0.018].
We found a significant interaction effect between vocal pitch and semantic cues in female voices, F (2, 69) = 7.81, p < .001, ηp2 = 0.102 (see Figure 1a). Post hoc comparisons revealed that regardless of rating contexts, the ratings for high-pitched voices with antisocial semantic cues [M(SD)high-anti = 4.23(0.87)] were significantly lower than those for high-pitched voices with prosocial cues [M(SD)high-prosocial = 4.78(0.74), t (69) = 5.41, p < .001] and neutral cues [M(SD)high-neutral = 5.19(0.69), t (69) = 9.20, p < .001]. In terms of low-pitched voices, while the ratings on neutral cues (M(SD)low-neutral = 4.57(0.76)] were significantly higher than those on antisocial cues [M(SD)low-anti = 4.12(0.77), t (69) = 4.54, p < .001], they did not differ from those on prosocial cues [M(SD)low−pro = 4.34(0.83), t (69) = 2.41, p = .190]. In comparisons between high- and low-pitched voices, the ratings for high-pitched voices with prosocial cues and neutral cues were significantly higher than those for low-pitched voices, ps < .001. However, no significant difference of ratings on antisocial cues was found between high- and low-pitched voices.

Mean rating by vocal pitch and sentence cues across warmth and competence rating contexts for female voices.
There was also a significant interaction of semantic cues and rating contexts, F (2, 138) = 12.71, p < .001, ηp2 = .156 (see Figure 1b). Without considering the pitch levels, participants perceived voices with prosocial cues [M(SD)pro-warmth = 4.62(0.73), t (69) = 5.10, p < .001] and neutral cues [M(SD)neutral-warmth = 4.74(0.64), t (69) = 5.57, p < .001] as warmer than voices with antisocial cues [M(SD)anti-warmth = 4.23(0.79)]. Similar findings were also uncovered in the competence-related ratings, ps < .002. Of note, speakers who conveyed neutral information were considered more competent than those who delivered prosocial messages [M(SD)neutral-competence = 5.02(0.64), M(SD)pro-competence = 4.50(0.69), t (69) = 5.08, p < .001].
Likewise, a significant interaction effect between vocal pitch and rating contexts was found in female voices, F (1, 69) = 84.13, p < .001, ηp2 = 0.549 (see Figure 1c). Across each semantic cue, the warmth-related ratings were significantly higher than the competence-related ratings in high-pitched voices [M(SD)high-warmth = 4.96(0.70), M(SD)high-competence = 4.50(0.64), t (69) = 7.39, p < .001], but this result was reversed in low-pitched voices [M(SD)low-warmth = 4.09(0.75), M(SD)low-competence = 4.59(0.69), t (69) = 6.65, p < .001]. No matter what the semantic cue was, higher-pitched voices were more likely to be perceived as warmer than low-pitched voices [t (69) = 9.71, p < .001], whereas the perceptions of competence were similar for both voices [t (69) = 1.09, p = .279]. All comparisons were presented in Table S6 in Supplemental Materials.
Analyses of Male Voices
A similar repeated ANOVA was also conducted for male voices. We found a significant main effect of semantic cues, F (2, 136) = 6.73, p = .002, ηp2 = 0.090, and a significant three-way interaction effect of vocal pitch, semantic cues, and rating contexts, F (2, 136) = 21.23, p < .001, ηp2 = 0.238. Post hoc analyses revealed that a few comparisons remained significant after Bonferroni correction (all comparisons were presented in Table S7 in Supplemental Materials).
For both high- and low-pitched male voices, the warmth-related ratings on prosocial cues were significantly higher than those on antisocial cues (see Figure 2). Similar findings were also found in competence-related ratings, although they were only significant in high-pitched voices. High-pitched speakers who delivered prosocial messages were rated higher than those who conveyed neutral information, especially in the warmth-related perceptions [M(SD)high-pro-warmth = 4.40(0.81), M(SD)high-neutral-warmth = 3.91(0.72), t (69) = 4.17, p < .001]. On the contrary, low-pitched male speakers received higher ratings on both dimensions in neutral semantic contexts rather than in prosocial contexts, but such differences in competence were not statistically significant.

Mean rating by vocal pitch and sentence cues across warmth and competence rating contexts for male voices.
In comparisons between high- and low-pitched voices, we found that the ratings for high-pitched voices with prosocial cues were both higher than those for low-pitched voices with prosocial cues, but the differences were only significant in the warmth dimension [M(SD)high-pro-warmth = 4.40(0.81), M(SD)low-pro-warmth = 4.00(0.78), t (69) = 3.25, p = .002]. Although the warmth-related ratings for high-pitched voices with antisocial cues were still higher than those for low-pitched voices, the competence-related ratings for high-pitched voices were significantly lower than for low-pitched voices in antisocial semantic contexts [M(SD)high-anti-competence = 3.83(0.71), M(SD)low-anti-competence = 4.19(0.69), t (69) = −3.36, p = .001].
Discussion
In the present study, we aimed to examine how social perceptions of speakers are influenced by a vocal pitch and semantic cues. In particular, we investigated whether high and low pitch levels interacted with three types of semantic cues (i.e., prosocial, antisocial, and neutral) in a speech to affect warmth- and competence-related perceptions. Several key findings emerged from this study. First, for both genders, antisocial cues had negative impacts on the social perceptions of speakers, but prosocial cues did not increase positive evaluations of speakers because the ratings between voices with prosocial cues and those with neutral cues were similar. Second, female vocal pitch mattered for the warmth-related perception but not for the competence-related perception. The role of semantic cues should be additionally considered when investigating the impact of male vocal pitch on these two kinds of perceptions. For example, higher-pitched male speakers in prosocial contexts were perceived as warmer, and low-pitched male speakers in antisocial contexts were judged as more competent. Third, the connection between vocal pitch and the two perceptions showed an opposite trend, in which high pitch was related to more warmth but less competence, while low pitch was associated with less warmth but more competence. To our knowledge, our research is the first to systematically apply the SCM (Fiske et al., 2007) to understand how voice-based stereotypical impressions are formed in different semantic contexts.
Previous research has compared the role of prosocial and antisocial semantic cues (e.g., caring vs. cheater) in affecting social perceptions of male speakers, in which low-pitched men who spoke prosocial words received higher evaluations on both trustworthiness and attractiveness than high-pitched men who conveyed antisocial words (O'Connor & Barclay, 2018). In this study, we incorporated both female and male speakers and had consistent findings that antisocial cues negatively affect the perceptions of speakers. More specifically, female speakers, regardless of their vocal pitch, were rated lower in the warmth and competence dimensions when they were attributed to antisocial statements compared with prosocial statements. Similar findings also emerged from high-pitched male speakers. It seems that prosocial semantic cues may help form a better impression of speakers.
However, our findings about neutral semantic cues may imply that prosocial cues do not always affect social perceptions of speakers in a positive way. For example, the warmth-related ratings on prosocial cues and neutral cues were similar in female voices, irrespective of the vocal pitch, whereas the competence-related ratings on neutral cues were even higher than those on prosocial cues. In terms of low-pitched male voices, the positive impacts of prosocial cues almost eliminate because warmth- and competence-related perceptions showed no differences, no matter in the prosocial or antisocial contexts. These findings suggest that people do not have to deliberately express their prosociality in their first conversation with a stranger. These findings do not conform to the conventional thinking that prosociality probably induces others’ positive evaluation. Based on the SCM (Fiske et al., 2007), people form perceptions of strangers first by judging their intent (e.g., “whether they would help/harm me?”). Without information about a stranger's background, listeners tend to perceive the stranger as a threat when they hear antisocial signals in the stranger's speech. Conversely, prosocial semantic cues may not be effective in helping strangers express good intentions. Thus, antisocial semantic cues do have a negative impact on social perceptions of speakers. However, it does not mean that speakers’ prosocial discourse can help them gain others’ positive impressions.
Regarding the warmth-related and competence-related ratings, only warmth was found to differ in relation to pitch, with high-pitched female speakers being perceived as warmer than low-pitched female speakers. Our finding concurs with that of Oleszkiewicz et al. (2017), who also found that women's vocal pitch was associated with perceived warmth. However, it is at odds with results by Ko et al. (2009), who failed to detect the influence of women's vocal pitch on the perception of warmth. The inconsistency may be attributed to the different measurements of warmth in these studies. For example, Oleszkiewicz et al. (2017) conceptualized perceived warmth as a single trait—warmth. In our study, warmth broadly covers social traits relevant to perceived intent, such as warmth, friendliness, and sincerity. In Ko et al.'s (2009) study, however, the warmth perception was reflected by supportive and caring traits. Even though these warmth-relevant traits are closely related, these inconsistent findings suggest that they may share some degree of independence. Additional research is required to distinguish the effects of different voice-based assessments of warmth.
In this study, high-pitched women were judged as being relatively warmer than low-pitched women, but the link between men's voice pitch and perceived warmth was influenced by certain semantic cues. We found that men who spoke prosocial sentences at a lower pitch were rated as less warm than those who spoke them at a higher pitch. Such differences in perceived warmth between high-and low-pitched speakers did not appear in antisocial or neutral semantic contexts, which is in line with previous findings that voice pitch did not influence judgment of men's warmth (Oleszkiewicz et al., 2017). Oleszkiewicz et al. (2017) suggest that the perceived warmth, which is associated with femininity, is activated by high pitch (Eagly & Mladinic, 1989). In general, male voice is less feminine and may inhibit the effects of high pitch on the warmth-related perception. However, our findings imply that the interaction of prosocial cues and high pitch may reactivate the effects of warmth judgments of men.
No effect of female voice pitch on perceived competence was found in our study, which is consistent with evidence by Krahé and Papakonstantinou (2020). Similarly, Tsantani et al. (2016) also found that high and low pitch did not differ in the dominance perception of female speakers. In contrast to these findings, some studies showed that low-pitched female speakers were rated as more competent than high-pitched females (Ko et al., 2009; Oleszkiewicz et al., 2017). One possible explanation is that the influence of women's voice pitch on perceived competence is modulated by specific contexts. For example, Ko et al. (2009) designed a job recruitment scenario where women's vocal pitch was varied with the candidate's previous jobs and hobbies. Under such job hiring contexts, participants may rely more on job candidates’ sustainability to rate their competence (Eagly & Karau, 2002). Extra information, such as hobbies that reflect the candidate's physical conditions, may also influence the judgments of competence beyond the vocal characteristics (Ko et al., 2009; Tsantani et al., 2016).
Previous research has also investigated the link between men's vocal pitch and competence-relevant traits (Feinberg et al., 2005; Jones et al., 2010; Tsantani et al., 2016). They consistently found that low-pitched male speakers were rated as more dominant than high-pitched male speakers. This finding was explained by the misattribution of low vocal pitch to large body size (Puts et al., 2007), which was thought to be connected with one's ability. In line with these previous findings, our study provides novel evidence that semantic cues interact with vocal pitch to influence the perception of male speakers’ competence. We found that men with a low-pitched voice were judged as being relatively more competent than those with a high-pitched voice only when they spoke antisocial sentences rather than prosocial or neutral sentences. These findings are of interest for two reasons. First, they may suggest that prosocial cues or even neutral cues may modulate the link between low vocal pitch and perceived ability. Second, it suggests that antisocial semantic cues may contribute to the speakers’ perceived physical strength, which further increases their perceived ability. Thus, it is important to consider different semantic cues in the investigation of voice-based perceptions, especially in male speakers.
Overall, the effects of vocal pitch on the perception of warmth and competence in both genders were the opposite: high-pitched speakers are perceived as warmer but less competent, while low-pitched speakers are judged as less warm but more competent. These associations between vocal pitch and two dimensions of social perceptions are similar to what has been found among other language speakers in neutral semantic contexts (e.g., Dutch speakers; Ko et al., 2009) and in meaningless speech contexts (e.g., speakers pronouncing monophthong vowels; Oleszkiewicz et al., 2017). It is possible to assume that such associations are applicable across various linguistic contexts. However, this assumption needs to be tested by directly comparing voice-based perceptions of different language speakers in future studies.
Limitations and Future Research Directions
The current research is limited in several aspects. First, although the vocal pitch is the most salient acoustic characteristic of human voices (Banse & Scherer, 1996), other features such as resonance (vowel formant frequencies), vocal tract length, and prosody (innovation, tempo, and stress) were not considered in our study. A systematic review (Leung et al., 2018) has indicated that these features are also related to voice-based perceptions. Therefore, future studies could explore their contributions to voice-based perceptions within different semantic contexts. Second, the number of speakers that we have recruited is limited. For each type of vocal pitch (high or low), we selected only two speakers’ voices. It would be better for future research to engage more speakers to test the generalizability of our findings. Also, since we used two voices to represent one type of vocal pitch (high/low), the confounding effects of other vocal characteristics (e.g., resonance) may be inevitable, even though most ratings of these two voices showed no differences in preliminary analyses. Third, we combined warmth, trustworthiness, friendliness, likeability, and sincerity into a warmth-related dimension, and also integrated competence, efficacy, confidence, and dominance into a competence-related dimension based on the SCM (Cuddy et al., 2008; Fiske et al., 2007). Although each trait of one dimension was closely related, Oleszkiewicz et al. (2017) suggested that these single traits may be independent of each other. Thus, it is meaningful for future research to examine the interaction effect of vocal properties and semantic cues on the social perception of every single personality trait (e.g., friendliness).
Summary and Implications
Our research extended the understanding of the effect of vocal pitch on social perception for both female and male speakers into different semantic contexts. Previous research has examined the interaction effect of vocal pitch and semantic cues (i.e., prosocial words and antisocial words) on the perception of male speakers (O'Connor & Barclay, 2018). Our research included neutral cues and found that prosocial cues did not always exert significant positive effects on social perception. Our findings may have some practical implications for vocal message delivery during non-face-to-face online interactions. With the development of technology, sending voice messages has made communication more convenient and direct, especially in the Chinese context. Chinese people are becoming accustomed to using voice to communicate on WeChat, the most popular social media application in China, not only in daily life but also on business occasions. Here are some tips for such online conversations. First, for both men and women, it is unnecessary to deliberately say prosocial things to gain a stranger's positive evaluation. It would be better to avoid antisocial speech. A neutral speech is enough to leave others with a good impression. Second, adapting voice pitch to different social circumstances may contribute to effective communications. For example, a high pitch will help people show their warmth in daily life, but a low pitch makes them sound more competent at the workplaces. To sum up, our research highlights the importance of vocal information, including vocal pitch and voice content, in forming stereotypical perceptions of speakers.
Supplemental Material
sj-docx-1-pec-10.1177_03010066221135472 - Supplemental material for You are how you speak: The roles of vocal pitch and semantic cues in shaping social perceptions
Supplemental material, sj-docx-1-pec-10.1177_03010066221135472 for You are how you speak: The roles of vocal pitch and semantic cues in shaping social perceptions by Hannah Xiaohan Wu, Yuanhua Li, Boby Ho-Hong Ching and Tiffany Ting Chen in Perception
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethics Approval
All procedures performed in this study were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.
Informed Consent
Informed consent was obtained from each participant included in the study.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
