Abstract
We explored the impact of “stereotype threat” —that is, distress associated with the prospect of confirming a negative stereotype—on communication in evaluative contexts. Participants engaged in a conflict resolution simulation framed as diagnostic of their ability either to be a leader or to maintain close personal relationships. Women were less fluent and used more tentative language under leadership than relational maintenance framing, but men were less fluent and more tentative under relational maintenance than leadership framing. The influence of stereotype frame on the rates of disfluencies and tentative language was partially mediated by state anxiety. Our findings demonstrate that the effects of situationally induced stereotype threat on communication behavior are comparable to its effects on intellectual test performance. Consequences of stereotype threat for impression formation and strategies for reducing its impact on social interaction are discussed.
In the decades since communication theorist Walter Lippmann (1922) first described people’s generalizations about groups as “stereotypes,” scholars in a variety of disciplines have documented their pervasive influence in social thought and interaction. Traditionally this research has investigated how social stereotypes—that is, knowledge structures containing beliefs and expectations about the typical members of social groups (Stangor, 2009)—distort one group’s perceptions of another. For example, communication scholars have studied the impact of stereotypes on Whites’ evaluations of Black crime suspects in television news (Domke, McCoy, & Torres, 1999; Entman & Rojecki, 2000; Gilliam & Iyengar, 2000; Mastro & Kopacz, 2006; Mastro, Lapinski, Kopacz, & Behm-Marowitz, 2009; Oliver, 1999; Oliver & Fonash, 2002; Peffley, Shields, & Williams, 1996), the effects of opposite-sex speech style stereotypes on interactions between men and women (Berryman & Wilcox, 1980; Giles, Scholes, & Young, 1983; Kramer, 1977), and how age stereotypes lead younger people to overaccomodate in communication with older people (Hummert, Shaner, Gartska, & Henry, 1998; Ryan, Giles, Bartolucci, & Henwood, 1986). In recent years, the focus of stereotyping research has shifted from the behavioral correlates of endorsing stereotypes to the cognitive consequences of being targeted by them (Aronson & McGlone, 2009, 2011). People may not give credence to the stereotypes associated with their various social identities (female, Asian, athlete, etc.), but they are nonetheless aware that the stereotypes exist and often are as familiar with their contents as others who espouse them. Such awareness can exert a profound influence on one’s sense of self and interactions with others (Devine, 1989). The reported research explores this influence on communication behavior in evaluative contexts.
Stereotype Threat
Steele and Aronson (1995) coined the term “stereotype threat” to refer to the psychological discomfort people experience when confronted by a negative, self-relevant stereotype in a situation where their behavior could be construed as confirming it. A variety of cultural stereotypes attribute dispositional deficits to certain groups (“women can’t do math,” “Black people aren’t intelligent,” “elderly people have poor memories,” etc.), and group members are as aware of the stereotypes as anyone in the culture, whether or not they personally agree with them. In situations where a devaluing stereotype is relevant (e.g., a mathematics classroom), people targeted by the stereotype (e.g., women) experience an extra mental burden stemming from concern that certain behavioral outcomes (e.g., poor-test performance) will reinforce the stereotype in the eyes of others. Once initiated, stereotype threat produces a number of disruptive effects, including decreases in intellectual performance (Spencer, Steele, & Quinn, 1999; Steele & Aronson, 1995) and increases in physiological arousal (Ben-Zeev, Fein, & Inzlicht, 2005; O’Brien & Crandall, 2003). Over time, it may prompt defensive adaptations such as disengagement from domains of intellectual activity (e.g., mathematics education) where the stereotype is relevant and consequently retard development in these domains. Thus the phenomenon can result in a self-fulfilling prophecy whereby people come to resemble the very stereotype they feared confirming in the first place.
In the original demonstration of the phenomenon, Steele and Aronson (1995) showed how the presence of stereotype threat can undermine the intellectual performance of even highly talented and prepared minority students. Black and White undergraduates were asked to take a difficult test comprising items from the verbal section of the Graduate Records Examination. Half of the students were told that the purpose of the study was to measure their verbal reasoning ability. This “diagnostic” condition was intended to simulate the circumstances under which people commonly take standardized tests purporting to measure enduring dimensions of intellect. Black students were expected to experience stereotype threat in this condition because the alleged goal of the study was to evaluate their ability in a domain that pervasive cultural stereotypes cast in a negative light. White students are not targeted by these stereotypes and so were not expected to experience the threat. In the “nondiagnostic” condition, the test was portrayed as simply a problem-solving exercise that had nothing to do with intelligence or ability testing. Black students performed significantly worse on the test in the diagnostic than the nondiagnostic condition; however, they did just as well as White students in the nondiagnostic condition. In a follow-up study, Black students who were asked to simply indicate their race on a demographics questionnaire prior to taking a test (a common procedure in standardized tests such as the ACT and SAT) also exhibited significant performance impairments. Those asked to disclose their race answered approximately half as many items as others who did not. In both experiments, the threat manipulations had a negligible impact on White students’ test performance.
More than 300 published studies in the years since Steele and Aronson’s (1995) initial investigation have documented the pernicious effects of stereotype threat in a variety of populations and testing domains (for a topical review, see Aronson & McGlone, 2009). In addition to African Americans, these effects have been demonstrated in the performance of women on mathematical reasoning (Inzlicht & Ben-Zeev, 2000; McGlone & Aronson, 2006; Spencer et al., 1999) and political knowledge tests (McGlone, Aronson, & Kobrynowicz, 2006), Latinos (Gonzalez, Blanton, & Williams, 2002) and Native Americans (Osborne, 2001) on college preparedness tests, low SES students on intelligence tests (Croizet & Claire, 1998), and the elderly on memory tests (Levy, 1996; Hess & Hinson, 2006). Although stereotype threat is most keenly experienced among groups historically targeted by negative stereotypes, it is a predicament that can beset anyone, because any group can be compared to another reputed to be at an advantage on some dimension (Aronson, 2002). Thus it can impair the performance of even those groups who are neither minorities nor commonly stereotyped as having an ability deficit. White male university students, for example, bear no historical stigma associated with intelligence. However, in some circumstances comparisons with allegedly superior groups arise, thereby creating a situational threat. Aronson et al. (1999; Experiment 1) created such a threat when they gave White male undergraduates at Stanford University a challenging math test and told half of them their scores would be compared with those of Asian students. Mention of the intended comparison made students mindful of the stereotype of Asian mathematical superiority and consequently impaired their performance relative to others who were not told that this comparison would be made. The impairment occurred despite the fact that the White male students in the study had good reason to be confident in their math skills—most were math or engineering majors with SAT math scores more than 700. Similarly, students enrolled in the honors program of a public university were observed to perform less well on a test of spatial reasoning when led to believe they would be compared with students at a prestigious private university nearby (McGlone, Kobrynowicz, & Aronson, 1999). Such studies refute any claim that stereotypes have impact only on those who have faced broad discrimination or prejudice, or those who harbor persistent self-doubts about their group’s abilities. Under certain circumstances, anyone is prone to performing poorly when confronted by a stereotype that casts their group as being at a disadvantage.
Aronson and Steele (2005) have argued that the phenomenon generalizes across populations, but not across evaluative contexts. Several studies support this claim by demonstrating that the same evaluative activity framed in different ways can promote or hinder a threat response from a particular group. Sometimes, a frame that elicits threat in one group thwarts it in another and vice versa, depending on which group social comparison stereotypically casts in a more negative light. For example, Stone, Lynch, Sjomeling, and Darley (1999) invited Black and White college students to participate in a standardized physical exercise activity characterized as a measure of “natural athletic ability,” “sports intelligence” (i.e., the ability to think strategically in an athletic performance), or “psychological factors associated with general sports performance.” The two groups performed equally well when the activity was framed as a measure of general sports performance, the control condition. However, White participants performed worse than controls when it was framed as measuring athletic ability, whereas Black participants performed worse than controls when it was framed in terms of intelligence. In another striking demonstration, Cadinu, Maass, Lombardo, and Frigerio (2006) asked participants to take a reasoning test portrayed as an index of either “logical intelligence” or “social intelligence.” Men obtained higher scores under logic than social framing but women performed better under social than logic framing. These and other similar findings are analogous to the diagnosticity effects Steele and Aronson (1995) initially demonstrated: Frames that raise the possibility of confirming a negative group stereotype (White people are less athletic than Black people, women are less logical than men, etc.) impair the performance of group members.
The robustness and replicability of stereotype threat have led intelligence theorists to add it to the list of situational factors thought to contribute to longstanding ethnic and gender gaps in standardized test performance (Jencks & Philips, 1998; Nisbett, 2005; Sternberg, 2002). To date, research on this phenomenon has focused almost exclusively on standardized testing, although Steele (1997) and other theorists claim that it can in principle occur in any context in which behavior is evaluated (Aronson & McGlone, 2009; Bergeron, Block, & Echtenkamp, 2006). Yet there are arguably many more evaluative contexts, ranging from class activities to job interviews to public speeches, in which communication behavior is the subject of scrutiny, not a test score. Moreover, it is well documented that communicators experience anxiety in contexts they perceive as highly evaluative, particularly when they are predisposed to communication apprehension (Booth-Butterfield & Butterfield, 1986; Daly & Buss, 1984; Greene & Sparks, 1983; McCroskey, 1978, 1982). For example, Greene and Sparks (1983) observed that communicators with high and low scores on McCroskey’s (1978) Personal Report of Communication Apprehension scale reported comparable state anxiety in a nonevaluative context, but those with high scores were significantly more anxious when told that the quality of their communication would be assessed. The anxiety communicators experience does not always manifest itself in behavioral impairments (e.g., Clark & Arkowitz, 1975), perhaps because some learn to compensate for anxious behavior to avoid negative evaluations (McCroskey, 1980). However, a meta-analysis by Patterson and Ritts (1997) did find a robust association between communication anxiety and behaviors such as speech disfluency and low verbal involvement. Thus the effects of evaluation apprehension (Cottrell, 1972) on communication parallel those of stereotype threat on test performance in important respects. Are these effects observed when communicators confront the possibility of confirming a negative stereotype about their group? The reported research addresses this question.
Study Rationale and Hypotheses
Evaluation apprehension and stereotype threat are both symptomatic of situations in which the attentional resources involved in performing a task are redirected to some other concern. In the case of stereotype threat, this concern is the prospect of generating evidence in support of an unflattering self-relevant stereotype. What situations might prompt communicators to have this concern? One likely circumstance is when they are called on to discuss topics that cultural stereotypes portray as not their forte. For example, Palomares (2009) observed that men and women exhibited different patterns of tentative language (hedges, disclaimers, and tag questions) in dyadic electronic communication depending on the topic: Women used more tentative language when discussing stereotypically masculine topics and men were more tentative when discussing stereotypically feminine topics. Importantly, this contrast emerged in mixed but not same sex dyads, indicating that tentativeness was not driven purely by topic familiarity. Hence women were more tentative in messages about sports intended for men than for women, but men were more tentative in messages about fashion intended for women than for men. Posttest measures indicated that participants had a heightened awareness of their gender identity when communicating about a counter-stereotypical topic with an opposite-sex partner, which in turn partially mediated the impact of message topic on tentativeness. The observed increase in gender salience is consistent with stereotype threat theory, in that participants were particularly self-conscious about a group identity in a context where they could behaviorally confirm a stereotype about that identity (McGlone & Aronson, 2006; see also Palomares, 2008).
Another threatening situation arises when communication itself is implicated in a domain of ability about which people have competency concerns. This situation occurs in social interactions where our interpersonal skills are scrutinized in order to “size us up” as potential clients, colleagues, or companions. In job interviews and instructional activities, it sometimes takes the form of an evaluative exercise in which people are presented with a hypothetical communicative dilemma and then asked to simulate the strategies they would use to address the problem. When they perceive the ability being assessed as one that a familiar cultural stereotype casts in a negative light, concerns about confirming it may undermine performance. To illustrate, consider an evaluative exercise in which people are called on to demonstrate their interpersonal communication skills. Members of any social group may exhibit these skills, but familiar stereotypes allege that some groups are less competent than others in certain domains of social interaction where the skills are employed. The domain of leadership, for example, is stereotypically portrayed as one in which women are at a disadvantage to men (Deaux & LaFrance, 1997; Morrison & Von Glinow, 1990; Sczesny, 2003); in contrast, men are commonly perceived as less adept than women in building and maintaining close personal relationships (Christensen, 1988; Prentice & Carranza, 2002; Vogel, Wester, Heesacker, & Madon, 2003). These stereotypes persist in popular culture despite dramatic changes in social attitudes and gender relations over the past several decades, as well as scientific evidence indicating more commonalities than differences in the way the sexes approach leadership, personal relationships, and other forms of interpersonal interaction (Canary, Emmers-Sommer, & Faulkner, 1997; Embry, Padgett, & Caldwell, 2008; Wilkins & Anderson, 1991). Their persistence and familiarity render these gender stereotypes sources of normative expectations that arouse competency concerns (Eagly, Wood, & Diekman, 2000) and discourage nonnormative behavior (Vogel et al., 2003). As a result, framing the same communication exercise as diagnostic of one’s ability either to be a leader or to maintain personal relationships may differentially affect men and women’s performance.
The reported study examined the effects of evaluative framing on men and women’s experience of stereotype threat in an oral communication exercise. In this exercise, participants were presented with an interpersonal conflict scenario and asked to simulate how they would advise the involved parties to resolve the dispute. The simulations were recorded and later analyzed for the presence of speech disfluencies (e.g., vocalized pauses), tentative language (e.g., hedges), and specific recommendations for resolving the dispute. Prior to performing the simulation, participants were provided with one of three cover stories about its ostensible purpose. One group was told that the study investigated differences in leadership ability, an ability their performance in the exercise would demonstrate. A second group was informed that the study examined communication in personal relationships, and that their performance would reflect their ability to maintain close personal relationships. A third group (control) was simply told about the upcoming simulation exercise without it being framed as a measure of ability. All three groups were explicitly told that men and women’s simulations would be compared to investigate gender differences in performance. Stereotype threat theory predicts a pattern of behavioral impairments in situations where people perceive the potential to confirm a negative self-relevant stereotype. The pattern predicted in the present experiment is articulated in the following hypotheses:
The hypothesized outcomes are not presumed to be direct effects of the evaluative framing manipulation, but rather indirect consequences of the anxiety it induces when people are called on to demonstrate abilities impugned by gender stereotypes. State anxiety has been shown to mediate stereotype threat effects in standardized testing, as measured by self-report (Osborne, 2001; Spencer et al., 1999) as well as physiological indicators such as increased blood pressure (Blascovich, Spencer, Quinn, & Steele, 2001). Schmader and Johns (2003) review converging evidence that the locus of stereotype threat’s impact on intellectual performance is working memory capacity. When a situation induces the threat, these theorists argue, the capacity is diminished by heightened anxiety (Eysenck & Calvo, 1992) and a diversion of attention to disruptive thoughts, ostensibly the negative expectations associated with the stereotype that precipitated the anxious reaction in the first place (Steele & Aronson, 1995). Working memory capacity is a reliable predictor of “fluid” intelligence in general (Engle, Tuholski, Laughlin, & Conway, 1999) and verbal fluency in particular (Daneman, 1991). Moreover, state anxiety has been implicated as a partial mediator of speech disfluencies (Bortfeld, Leon, Bloom, Schober, & Brennan, 2001; Patterson & Ritts, 1997) and tentativeness (Llalljee & Cook, 1975; Pennebaker, Mehl, & Niederhoffer, 2003; Powers, 1977), as well as deficits in message production competence (Greene, Rucker, Zauss, & Harris, 1998) and interpersonal problem solving (Gotlib & Asarnow, 1979; Sarason, Sarason, & Pierce, 1990). Thus, there is both empirical precedent and theoretical plausibility for hypothesizing that state anxiety will mediate the effects of evaluative framing on all three dependent variables:
Method
Participants
Two hundred and nine undergraduates (106 women; M = 20.33 years, SD = 1.97) enrolled in communication courses at a large public university participated in the experiment for course extra credit. All were native English speakers.
Design and Procedure
The experiment employed a 2 × 3 factorial design with participant gender (male or female) and evaluative frame (leadership, relationship maintenance, or control) as between-participant factors, and the rates of disfluencies, tentative language, and resolution recommendations as dependent variables. Two self-report questionnaires—McCroskey, Beatty, Kearney, and Plax’s (1985) Personal Report of Communication Apprehension (PRCA-24) and Marteau and Brekker’s (1992) State-Trait Anxiety Inventory Short Form (STAI-6)—were treated as covariate and mediator measures, respectively. Students were recruited from undergraduate communication courses for a study advertised as an investigation of “oral communication.” Recruiting occurred in the last few minutes of class periods, during which volunteers filled out an informed consent form and then completed the PRCA-24. Participants were then assigned a code number that served as identification when they reported to the laboratory individually for their study session 4 to 7 days later.
Two experimenters, one male and one female, greeted each participant upon arrival at the laboratory. The study session proceeded in three phases. In the first phase, participants were randomly (and covertly) assigned to one of the three evaluative frame conditions and provided with a corresponding handout describing activities that would occur during the session. A paragraph on the first page of the handout served as the primary manipulation of evaluative frame. In the two experimental conditions, the paragraph claimed that the study was designed to investigate differences in a particular ability (leadership or relationship maintenance) and articulated a rationale for why the study procedures were diagnostic of this ability. The upcoming simulation exercise was also briefly described, and the final sentences informed participants that their simulations would be recorded and compared to those of other men and women to investigate gender differences. In the leadership condition, the paragraph read as follows:
The purpose of this research is to investigate differences in people’s leadership ability. Numerous scientific studies have demonstrated that leadership ability critically depends on communication behavior. This research has also found that leadership ability can be predicted from observing people as they simulate the communication behaviors they use when interacting with others. In today’s study, you will simulate the communication strategies you would use to resolve a conflict between two people. Leaders often must resolve conflicts, so your performance in this simulation provides one way to measure your leadership ability. We will make an audio recording of your performance that will be analyzed later on by a team of leadership researchers here at the university. They will compare your recording to those of other men and women participating in this study to examine gender differences in performance.
In the relationship maintenance condition, the paragraph referred to a different ability but was identical to the leadership version in every other respect:
The purpose of this research is to investigate differences in people’s ability to maintain close personal relationships. Numerous scientific studies have demonstrated that the ability to maintain personal relationships critically depends on communication behavior. This research has also found that “relationship maintenance” ability can be predicted from observing people as they simulate the communication behaviors they use when interacting with others. In today’s study, you will simulate the communication strategies you would use to resolve a conflict between two people. Partners in personal relationships often must resolve conflicts, so your performance in this simulation provides one way to measure your relationship maintenance ability. We will make an audio recording of your performance that will be analyzed later on by a team of relationship researchers here at the university. They will compare your recording to those of other men and women participating in this study to examine gender differences in performance.
In the control condition, the paragraph did not portray the study procedures as diagnostic of an ability, nor was a rationale for the procedures offered. It simply introduced the upcoming simulation exercise and informed participants that their performances would be recorded and compared to others to investigate gender differences:
In today’s study, you will simulate the communication strategies you would use to resolve a conflict between two people. We will make an audio recording of your performance that will be analyzed later on by a team of researchers here at the university. They will compare your recording to those of other men and women participating in this study to examine gender differences in performance.
After participants completed the first page of the handout, the experimenters reiterated the content of the framing paragraph and then instructed participants to carefully read the conflict resolution simulation scenario described on the second page. This scenario (see Table 1) described a conflict between Diane and Mark, friends and coworkers in a fictional small business. 1 According to the description, the two are temporarily sharing an office while their individual offices are being renovated. Although they have worked together and been friends for several years, differences in their work habits (e.g., Diane listens to the radio when she works, which distracts Mark; Mark paces back and forth in the office while thinking over an idea, which distracts Diane) and workspace organization preferences (e.g., she does her work electronically to minimize paper clutter; he prefers reading paper documents and stores them all around the office) produce a disharmonious dynamic between the office mates that one day culminates in an argument. Following this description, participants were asked to assume the role of a coworker who knows the disputants equally well and elects to offer them advice about resolving their conflict. The instructions encouraged participants to “offer as many specific recommendations for resolving their differences as you can think of.” After they read through the scenario once and were given the opportunity to ask clarification questions about its contents, the experimenters reminded participants that they would be making an audio recording of their conflict resolution simulations and that the recordings would be compared with those of other participants to examine potential gender differences.
Conflict Resolution Simulation Scenario.
In the second phase, participants were escorted to a small adjoining room and seated at a table. On the table was a digital recorder equipped with a plug-in desktop microphone and digital stopwatch. The experimenters demonstrated how to use the digital recorder and informed participants that they would have 5 minutes to study the conflict scenario and prepare a 90-second simulation of how they would advise the disputants. Participants were not allowed to take notes or write talking points for their simulation during this period. At the end of this preparation period, the participant was given 1 minute to complete the STAI-6 (Marteau & Brekker, 1992). Next, the stopwatch was set for 90 seconds and participants were instructed to begin recording their simulation immediately after the experimenters had left the room. During the simulation recording, participants were allowed to consult the scenario description and to observe the digital readout of the stopwatch. When the recording period was over, the experimenters reentered the room and escorted the participant back to the main laboratory room.
In the third and final phase, participants were probed for any suspicions they may have had about the study’s purpose and then interviewed about their experiences during the experimental procedure. At the end of this interview, they were asked four binary choice questions about their knowledge of gender stereotypes associated with leadership and personal relationships. Initially participants were probed for awareness of each stereotype (e.g., “Are you aware of a cultural stereotype that portrays one sex as being better at leadership/maintaining personal relationships than the other—Yes or No?”); if they responded affirmatively to either probe, they were then asked to specify the direction of the stereotypical comparison (e.g., “According to this stereotype, which sex is supposed to be better at leadership/maintaining personal relationships—men or women?”). 2 After answering these questions, participants were debriefed and thanked for their participation. The entire experimental procedure lasted approximately 25 minutes.
Self-Report Measures
Personal Report of Communication Apprehension (McCroskey et al.,1985)
The PRCA-24 is the most widely used measure of trait communication apprehension, and was employed here as a covariate to account for variance in the dependent measures attributable to CA predisposition rather than the experimental manipulation. 3 This instrument consists of 24 Likert-type items on a 5-point scale anchored by strongly disagree (1) and strongly agree (5) that assess people’s affective orientation toward communication in the context of dyadic interaction, small groups, large groups, and public speaking. The total score provides a broader and more reliable (α = .91) index of trait CA than any of the contextual subscores, so only the total score was used. Comparisons with published norms indicated that the mean score for the current sample (M = 61.54, SD = 13.84) was in the average trait CA range (51-80; McCroskey et al., 1985).
State-Trait Anxiety Inventory Short Form (Marteau & Brekker, 1992)
State anxiety was assessed using this shortened version of Spielberger, Gorsuch, and Lushene’s (1970) original State-Trait Anxiety Inventory. The short form consists of six statements 4 assessing anxiety-related affect (e.g., “I am worried”) rated on a 4-point Likert-type scale anchored by not at all (1) and very much (4). The STAI-6 scale was reliable (α = .87) and the mean score in the current sample was 15.69 (SD = 4.01).
Simulation Coding
The personnel involved in transcribing the simulations and coding their content were blind to the research design and hypotheses. Each participant’s simulation recording was transcribed by one of two transcribers to a level of detail that captured all audible words and word fragments as well as nonlexical vocalized pauses (e.g., uh). After one member of the pair transcribed a simulation, the other checked it for errors and omissions while listening to the recording. Each transcript was reaudited as necessary until the pair agreed that it was an accurate rendering of the simulation. Transcripts ranged in length from 168 to 316 words.
Disfluencies
Two pairs of research assistants coded disfluencies in the transcripts using a scheme developed by Bortfeld et al. (2001). One pair coded vocalized pauses (ah, er, um, uh, etc.) and filler expressions (like, you know, well, etc.), and a second coded adjacent repeated words (e.g., and if, if they see . . .) and utterance interruptions/restarts (e.g., why doesn’t he go—but only when Diane is there trying to work . . .). Members within each pair worked independently and agreed at a rate of 89.6% and 93.5%, respectively (Cohen’s κ > .88 in both cases). Disagreements were resolved through discussion. The numbers of vocalized pauses, fillers, adjacent repetitions, and restarts were summed and transformed (divided by the total number of transcript words and multiplied by 100) to indicate the disfluency rate per 100 words (as per Bortfeld et al., 2001; see also Mulac, Lundell, & Bradac, 1986).
Tentative Language
Three pairs of research assistants coded tentative language (i.e., words or phrases indicating uncertainty or lack of confidence) in the transcripts. Using a coding scheme modeled after Palomares (2008), the coders identified three language features associated with tentativeness: hedges (maybe, probably, sort of, etc.), disclaimers (I could be wrong, I guess, etc.), and tag questions (don’t you think? wouldn’t you? etc.). Each pair coded one of these features, and members within a pair worked independently with separate copies of identical transcripts. The pairs agreed at a rate of 84.6% or higher (Cohen’s κ = .82 or more) and disagreements were resolved through discussion. The numbers of hedges, disclaimers, and tag questions in each transcript were summed and transformed (divided by the total number of transcript words and multiplied by 100) to indicate the tentative language rate per 100 words (Bradac, Mulac, & Thompson, 1995; Mulac et al., 1986).
Resolution Recommendations
Two research assistants coded the “constructive” recommendations appearing in each transcript for resolving the conflict between the scenario characters. A recommendation was considered constructive if it met three criteria. First, it had to articulate a specific action, cessation of action, or strategy, be it physical (e.g., Mark should uh do his pacing out in the hallway), verbal (e.g., y’all maybe can plan out your work schedules a day or two in advance together, so you, you can both agree on when there will be quiet times and um loud times or whatever) or psychological (e.g., Don’t insult him when he gets on your nerves next time, just take a deep breath and like picture that new office that’s coming). Vague suggestions (e.g., at the end of the day, you folks really have to spend serious time, you know, working on uh how to get along here) were not counted. Second, it had to be a prima facie attempt to promote cooperation rather than aggravate relations between the disputants. Consequently, overtly punitive recommendations (e.g., if he’s always making a mess, well then you should tell the boss on him) were not counted. Third, it could not violate assumptions explicitly mentioned in the scenario description. For example, recommendations that Mark or Diane try to switch offices with another employee were not counted because the scenario explicitly rules out this possibility. The coders agreed at a rate of 95.1% (Cohen’s κ = .92) and resolved disagreements through discussion. The unique (i.e., nonrepeated) constructive recommendations within each transcript were summed for subsequent analysis.
Results
Eleven participants were excluded from data analysis based on their responses during the postexperiment interview. Three reported having knowledge of the study’s purpose prior to participating (via interaction with a debriefed participant), four expressed suspicion regarding the evaluative framing cover story, and four claimed not to be aware of one or both of the gender stereotypes associated with leadership or relationship maintenance ability. An additional six participants were eliminated because of anomalous behavior (silent pauses longer than 10 seconds, extended speech not directed toward the conflict scenario, prematurely stopping the recorder, etc.) during the simulation exercise. The remaining 192 (32 per cell) appeared to be naïve regarding the study hypotheses and followed the simulation instructions.
Evaluative Framing Effects
The performance effects predicted by Hypothesis 1 and Hypothesis 2 were tested in a two-way multivariate analysis of covariance with participant gender and evaluative frame as independent variables, trait CA (PRCA-24 score) as a covariate, and rates of disfluency, tentative language, and recommendations as dependent variables. The omnibus test revealed a significant main effect of participant gender, F(3, 183) = 10.68, p < .001, partial η2 = .15, but not evaluative frame, F(6, 366) = 1.76, p = .11. Examination of the univariate analyses indicated that the main effect of participant gender was driven primarily by a significantly higher disfluency rate among men than women (6.72 vs. 5.52 per 100 words), F(1, 185) = 15.78, p < .001, partial η2 = .13, a finding consistent with previous demonstrations of gender differences in disfluency (Bortfeld et al., 2001). The multivariate analysis of covariance main effect was moderated by a significant evaluative frame by participant gender interaction, F(6, 366) = 16.44, p < .001, partial η2 = .17. In addition, trait CA was a significant covariate, F(3, 183) = 6.11, p < .001, partial η2 = .09. One-tailed planned comparisons based on univariate analyses for each dependent measure were conducted to test the predicted differences between cell means (Keppel, Saufley, & Tokunaga, 1992). The relevant means are presented in Table 2.
Mean (SD) Rates of Disfluencies, Tentative Language, and Resolution Recommendations by Participant Gender and Evaluative Frame.
Per 100 words.
Hypothesis 1 predicted that framing the simulation exercise as diagnostic of a stereotypically masculine ability (leadership) would induce stereotype threat in women, as indicated by higher rates of disfluencies and tentative language and a lower rate of recommendations, relative to feminine ability (relationship maintenance) or nonability (control) framing. Consistent with Hypothesis 1a, a planned contrast 5 indicated that the mean disfluency rate (per 100 words) of women in the leadership frame condition (M = 7.29, SD = 2.46) was significantly higher than the rate for women in either the relationship maintenance (M = 4.45, SD = 1.79) or control frame conditions (M = 4.82, SD = 1.80), F(1, 185) = 9.79, p = .002, partial η2 = .11. Hypothesis 1b was also supported: The mean rate of tentative language (per 100 words) was significantly higher among women in the leadership frame condition (M = 1.85, SD = 0.51) than those in the relationship maintenance (M = 1.24, SD = 0.43) or control (M = 1.36, SD = 0.47) conditions, F(1, 185) = 8.57, p = .004, partial η2 = .10. Finally, the pattern of recommendation rates was not consistent with Hypothesis 1c; women did not produce significantly more recommendations in the relationship maintenance (M = 5.36, SD = 1.80) and control frame conditions (M = 5.21, SD = 1.61) than in the leadership frame condition (M = 4.79, SD = 1.24), F(1, 185) = 3.02, p = .083. Thus, women were on average less fluent and more tentative when the simulation exercise was framed in terms of a stereotypically masculine ability than a feminine ability or no ability; however, the number of resolution recommendations they offered was not reliably affected by evaluative framing.
Hypothesis 2 predicted that men would experience stereotype threat when the exercise was framed in terms of a feminine ability and would consequently exhibit just the opposite pattern of disfluencies, tentative language, and recommendation rates produced by women in the evaluative frame conditions. Consistent with Hypothesis 2a, the mean disfluency rate of men in the relationship maintenance condition (M = 8.11, SD = 2.67) was significantly higher than their counterparts in the leadership (M = 5.82, SD = 1.90) or control frame (M = 6.23, SD = 2.09) conditions, F(1, 185) = 8.45, p = .004, partial η2 = .10. Hypothesis 2b was also supported, in that men’s tentative language rate was higher in the relationship maintenance condition (M = 1.55, SD = 0.50) than either the leadership (M = 1.15, SD = 0.40) or control (M = 1.08, SD = 0.43) conditions, F(1, 185) = 6.25, p = .013, partial η2 = .07. Finally, Hypothesis 2c was not supported. On average men produced slightly more recommendations in the leadership (M = 5.15, SD = 1.52) and control frame (M = 4.93, SD = 1.45) conditions than in the relationship maintenance condition (M = 4.46, SD = 1.18), but this difference did not attain conventional statistical significance, F(1, 185) = 2.91, p = .089. These findings indicate that overall, men were less fluent and more tentative when the exercise was framed in terms of a stereotypically feminine ability than a masculine ability or no ability, but their rate of resolution recommendations was not reliably affected by evaluative frame.
Mediation Analyses
Hypothesis 3 predicted that state anxiety would mediate the effects of evaluative framing on the dependent measures. Because this hypothesis presumes that state anxiety stems not from participant gender per se but rather the stereotype threat the different genders experience under particular evaluative frames, the two independent variables were recoded into a single variable reflecting the stereotypic consistency between their factorial interactions. The recoded “stereotype frame” variable had three levels: (a) stereotype-neutral frame (nonability), (b) stereotype-consistent frame (men evaluated for leadership ability, women for relationship maintenance ability), and (c) stereotype-inconsistent frame (men evaluated for relationship maintenance ability, women for leadership ability). Next, we employed the logic for assessing mediation recommended by MacKinnon, Lockwood, Hoffman, West, and Sheets (2002; see also Baron & Kenny, 1986). Mediation analyses were conducted for only the two criteria variables (disfluency and tentative language rate) that had been demonstrated to have significant associations with the predictor (stereotype frame).
The results of the mediation analyses are presented in Figure 1. Regression analyses established the key conditions for state anxiety’s mediation of stereotype frame effects on the dependent variables of interest: Stereotype frame was associated with both the disfluency rate, β = .46, t(185) = 2.98, p = .003, and the tentative language rate, β = .38, t(185) = 2.71, p = .007; stereotype frame was associated with state anxiety, β = .35, t(185) = 2.57, p = .011; and state anxiety was associated with both the disfluency rate, β = .29, t(185) = 2.39, p = .018, and tentative language rates, β = .23, t(185) = 2.36, p = .019. When state anxiety was controlled, however, stereotype frame still had a significant effect on disfluency, β = .31, t(185) = 2.46, p = .015, and on tentative language, β = .30, t(185) = 2.50, p = .013. There were significant indirect effects of stereotype frame on disfluency (z-score product = 5.59, p < .05) and tentative language (z-score product = 5.11, p < .05). Thus, the evidence supports the claim for state anxiety as a partial mediator of stereotype frame’s effect on these variables, but a direct effect persists when this mediator is included.

Mediational model examining the effects of stereotype frame on disfluency and tentative language rates as mediated by self-reported state anxiety.
Discussion
Numerous studies have demonstrated that people’s performance on an intellectual test is impaired when a negative, self-relevant stereotype is made salient in the testing context (Aronson & McGlone, 2009). Steele and Aronson (1995) argued that this performance decrement results from heightened concern that a poor performance may be seen as confirming a negative stereotype about one’s group. Their “stereotype threat” construct stands in stark contrast to sociobiological and knowledge resource theories by explaining gender and ethnic gaps in intellectual performance not in terms of dispositional differences in intelligence between groups, but rather situational factors that can affect the members of any group. 6 The reported study demonstrates that situationally induced stereotype threat can affect people’s communication behavior as dramatically as it does their standardized test performance.
The different circumstances that activated stereotype threat for the men and women in our sample were predicated on cultural stereotypes that portray men as superior leaders (e.g., Powell & Graves, 2005) and women as superior cultivators of personal relationships (e.g., Vogel et al., 2003). However, the consequences of this activation were the same: When participants were led to believe they were being evaluated in terms of a stereotypically sex-linked ability for which their sex was allegedly at a disadvantage, their state anxiety increased and communication performance suffered. Thus when women participated in a communication exercise portrayed as diagnostic of leadership abilities, they were less fluent than women who were told it was a measure of their relationship maintenance skills or women in a nonability frame control condition. The stereotype-threatened women produced on average almost 50% more disfluencies during their simulation performances than those not under threat. They also used significantly more tentative language during their sessions. Men exhibited a similar pattern when they faced the prospect of having their relationship maintenance abilities assessed. As expected, the male participants who were told their performance would be judged on their display of relationship maintenance skills performed worse than males in both the leadership and control scenarios. They verbalized significantly more disfluencies and tentative language.
Many of the concerns stereotype threat research has raised in educational psychology translate into broader problems of impression formation in the domain of interpersonal communication. For example, stereotype threat research challenges the validity of standardized tests as measures of ability (intelligence, scholastic aptitude, college preparedness, etc.) for reasons that transcend perennial criticisms of culturally biased test content (Freedle, 2003; Reynolds & Brown, 1984; Sternberg, 2002). This research shows that no matter what genetic or experiential endowments one brings to the testing context, the context is not group-neutral, because one’s awareness of prevailing social stereotypes (e.g., Blacks are less intelligent than Whites) can influence performance expectations regardless of test content. Analogously, our participants were sufficiently familiar with gender stereotypes about leadership and relationship maintenance ability that merely mentioning the prospect of comparing men and women on one ability or the other induced contrasting performance expectations for the same communication activity. And just as threat-induced underperformance on a test may be misattributed to low ability (Nisbett, 2005), so may disfluent and tentative speech be interpreted as indicating ineptitude (Crawford & Chaffin, 1987; Gibbons, Bush, & Bradac, 1991; Patterson & Ritts, 1997). Consequently, women with effective management communication skills under normal circumstances may appear to be less than leadership material when aware they are being evaluated as such relative to men; similarly, men who are otherwise highly personable may stumble in their social skills when cognizant that their capacity for cultivating personal relationships is being compared with women.
The predicament of the stereotype-threatened communicator is compounded by the fact that people’s communication skills are evaluated far more frequently than they take standardized tests. Sometimes these evaluations are fleeting or insignificant, but at other times they can be consequential. Recruiters assess applicants’ intelligence and sociability in employment interviews, clients appraise the competence of professionals (doctors, lawyers, therapists, etc.) during intake meetings, and prospective partners judge one another’s relational fitness on first dates. Stereotypes about race (Jews are greedy), gender (men are preoccupied with sex), sexual orientation (gay men are effeminate), region (New Yorkers are pushy), class (poor people don’t care about education), and other personal characteristics can influence these interactions by dint of their familiarity, even when the interactants do not endorse the gross and erroneous generalizations they entail. Merely contemplating the prospect of confirming a negative stereotype in the eyes of others—and perhaps oneself—may be sufficient to initiate an unfortunate self-fulfilling prophecy. Our study was designed to explore this communicative predicament.
Although we observed an adverse impact of stereotype activation on the communication of men and women, this impact need not be inevitable. Under certain circumstances, members of a stigmatized group may behaviorally contradict a negative stereotype. Such a pattern of “stereotype reactance” (Kray, Thompson, & Galinsky, 2001) in communication was reported by von Hippel, Wiryakusuma, Bowden, and Shochet (2011; Experiments 1 and 2). These researchers found that when women were confronted with an explicit description of the stereotype of male leadership superiority, they exhibited a more masculine speaking style (higher verbosity, fewer hedges, etc.) than others told only trait information about good leaders without any linkage to gender. The contrast between their findings and ours is likely because of two important methodological differences between the studies. 7 First, we did not explicitly articulate the stereotype to our participants prior to their communication performances. In this respect, our method follows the model of Steele and Aronson (1995), who induced stereotype threat among African Americans merely by evaluating an impugned ability rather than declaring negative performance expectations regarding the ability. Consequently, if reactance occurs in response to externally imposed pressure (Brehm & Brehm, 1981), there was no such pressure for our participants to be reactant toward. Second, Kray et al. (2001) argue that stereotype reactance is most likely to occur when stigmatized individuals have sufficient cognitive resources to alter their behavior and react against the negative stereotype. The stereotype threat effects we observed were partially mediated by anxiety, which depletes cognitive resources such as working memory capacity and higher executive functions (Eysenck & Calvo, 1992; Schmader & Johns, 2003). von Hippel et al. (2011) did not measure anxiety (self-reported or otherwise) among their participants, so we cannot confirm that their participants had more cognitive capacity to draw on than ours by virtue of being less stressed. However, the contrasting methods and findings of the studies suggest that negative stereotype activation may induce reactance when a stereotype is explicitly asserted and targets have the composure to actively subvert it; otherwise, stereotype threat and the adverse consequences we reported may be more likely.
There are a number of limitations to our study. First, there are several questions pertaining to its external validity that merit further examination. We investigated the effects of stereotype threat in a convenience sample of college students using materials that invoked two common gender stereotypes. The generalizability of the observed effects to other populations (children, older adults, ethnic minorities) and other stereotypes (e.g., women are more emotional) are issues to be addressed in subsequent research. Second, participants were asked only to generate conflict-resolution suggestions for a specific hypothetical context involving conflict between coworkers. Thus the communication behaviors exhibited by the participants may not be applicable to other contexts, real or imagined. Third, the manner in which participants generated their conflict resolution strategies was rather artificial. They were asked to verbalize resolution recommendations for 90 seconds into a microphone connected to a computer while alone in a room. This design was employed to minimize contextual differences across conditions but may have felt contrived to the participants, thus restricting the naturalness of their responses. Fourth, although we predicted that stereotype-threatened participants would offer fewer resolution recommendations, this hypothesis was not supported. One plausible explanation for this null result pertains to the structure of the simulation setting. Across conditions, participants produced on average five recommendations. This similarity may be more a symptom of how long it takes to articulate a recommendation than of how many participants were capable of generating. Had we allowed them to end their recordings on their own volition (rather than at a predetermined 90-second limit), it is possible that we would have observed greater variability in the number of recommendations offered. Alternatively, it is possible that the interpersonal problem solving skills participants employed to generate their recommendations were not vulnerable to stereotype threat. Previous research in this area has focused chiefly on the threat’s effects on intellectual performance (e.g., verbal or math test performance) and rudimentary impression management behaviors associated with such performance (e.g., people’s willingness to disclose their personal, gender, or ethnic identities after taking a test; Inzlicht & Ben-Zeev, 2000; Steele, Spencer, & Aronson, 2002). Consequently, the reliable effects of stereotype threat on rates of disfluencies and tentativeness we observed may reflect speech behavior’s role in impression management (Bortfeld et al., 2001; Crawford & Chaffin, 1987), whereas the null effect on resolution recommendation rates may indicate that cognitive effort directed toward helping others (or the simulation thereof) is less susceptible to this threat.
Our mediation analysis also warrants qualification. Based on previous demonstrations in the stereotype threat literature (Osborne, 2001; Spencer et al., 1999), we investigated state anxiety’s potential role as a mediator of the hypothesized effects. That we chose to explore only one mediator and that it appeared to only partially mediate the hypothesized effects leave open the possibility that these effects were multiply mediated (Preacher & Hayes, 2008). Other candidate mediators suggested in the intergroup communication research literature merit future study (Palomares, 2008, 2009; Reid, Keerie, & Palomares, 2003). In particular, gender salience is a plausible candidate. Palomares (2009) reported evidence for gender salience’s role as a mediator of tentativeness rates among men and women in mixed sex dyads: When men and women had a heightened awareness of their gender identity while communicating about a counter-stereotypical topic with an opposite-sex partner, their tentativeness rates were higher than those of others for whom gender was not salient in context. Gender salience may well operate as a mediator in stereotype threat phenomena but has not been explored heretofore as a substrate of its indirect effects (Aronson & McGlone, 2009). However, our decision in the present study to explicitly inform participants that gender comparisons would be made likely increased gender salience across conditions, thereby making it impossible to measure the baseline salience elicited by the leadership or relational maintenance evaluative framing. Subsequent research on this issue should incorporate the mention of gender comparisons in the instructions as an independent variable (i.e., with mention being made to some and not to others) to gauge the separate contributions evaluative framing and explicit mention of gender make to subjective gender salience.
There are three clear paths for subsequent studies on stereotype threat’s impact on interpersonal interactions. First, more work is warranted exploring the varieties of communication behaviors affected by stereotype threat, as well as the varieties of contexts in which it may operate. The present study measured just three relatively simple verbal outcomes in a simulated communication setting. Two of these three appeared to be sensitive to stereotype threat effects but there are likely many others. More verbal communication behaviors should be studied (e.g., nervous laughter, delivery speed, repetition, volume), as well as nonverbal behaviors (e.g., facial expressions, hand gestures, eye contact, body movement) and in a greater variety of contexts. In particular, it would be interesting to know whether persons plagued with stereotype threat appear physically uncomfortable while also making verbal blunders. Study of stereotype threat’s effects on both types of communication variables in the same evaluative context is also warranted. In addition, scholars should also consider exploring how our communication behaviors are interpreted via social stereotypes in other mediums besides face-to-face encounters (email, instant messaging, teleconferencing, etc.). In short, this study represents a first attempt to expand the stereotype threat construct into the realm of interpersonal communication, but there is other important work to be done.
Second, scholars should explore the cognitive processes involved when stereotype threat is activated and communication performance suffers. Inzlicht, McKay, and Aronson (2006) coined the phrase “ego depletion” to describe their observation that stereotype-threatened individuals exhibiting an impaired ability to control their behaviors. The authors argue that because self-regulation is a limited-capacity resource, when members of targeted groups attempt to exhibit self-control in one task they simultaneously deplete their ability to demonstrate self-control on additional tasks. In the present study, participants were asked to formulate conflict resolutions and verbalize these strategies; perhaps this dual demand negatively affected their communication behaviors. It would also be useful to know how cognizant participants are of their altered communication behaviors. For example, some research has shown a positive relationship between women’s tentativeness and their persuasiveness in interactions with men (Carli, 1990; Reid et al., 2003). Hence our finding that women used more tentative language under stereotype threat might be a sign of strategy as well as anxiety. The women may have intentionally softened their message to be more appealing to their male coworker. They may also have considered that their leadership would be more effective (and their advice more likely adopted) if presented in a less dominant form. Another process that merits research attention is how time used for communication planning affects the actual verbal delivery. In this study participants were given five minutes to contemplate the dilemma and construct resolutions prior to verbalizing their thoughts. It would be interesting to know what cognitive activities took place in those five minutes. Honeycutt (2008) posits that individuals often engage in imagined interactions (IIs)—simulations of real-life conversations in one’s mind prior to an actual encounter—in order to improve their verbal and nonverbal performance. For example, Choi (2002) found that contemplating an II prior to a face-to-face encounter reduced verbal disfluencies. Similarly, Allen and Honeycutt (1997) discovered that participants demonstrated less nonverbal nervousness (i.e., playing with a pencil) when they engaged in a focused II prior to their communication interaction. Exploring the role of IIs in communication planning might provide insight into the cognitive processes that take place after stereotype threat is triggered but before communication takes place. IIs might also serve as a mechanism for coping with stereotype-threat induced anxiety.
Third, intervention research is necessary to explore strategies for mitigating the negative effects of stereotype threat in evaluative encounters. Because stereotype threat appears to be universal, are there also universal weapons for combating it? A few studies in the standardized testing domain suggest this to be the case. For example, Walton and Cohen (2002) use the phrase “stereotype lift” to describe how individual performance can actually improve when one is reminded of a negative stereotype about one’s out-group. Applied to the present study, it would be useful to know whether male participants in the leadership category displayed better verbal control because they were reminded of the “men are good leaders” stereotype or because they were reminded of the “women are not good leaders” stereotype. In a similar vein, Shih, Pittinsky, and Ambady (1999) observed that Asian American females performed better on math exams after being reminded they were Asian—a group stereotypically considered superior in math skills—as opposed to being reminded about their female identity—a group stereotypically considered weak in math skills. As applied to this study, how might the women’s performance have improved because they felt they were being stereotyped as a group gifted with relationship maintenance ability? In other words, can stereotype threat actually improve our performance if a positive stereotype is triggered?
Another set of intervention studies attempts to actively combat the negative effects of stereotype threat by teaching individuals various coping mechanisms. For example, McGlone and Aronson (2007) successfully decreased the negative impact of stereotype threat on female undergraduates taking a difficult math test by encouraging them to contemplate a positive achieved identity (i.e., their status as students attending an elite private college) as opposed to a contextually stigmatized ascribed identity (i.e., female). In a similar vein, Martens, Johns, Greenberg, and Schimel (2004) had stereotype-threatened female participants spend time writing down self-affirmations about a particular characteristic they deemed important before taking a difficult math test. This act of self-affirmation reduced the threat-induced performance impairment on the test, and may also be effective in mitigating threat in the communicative context we investigated.
In sum, stereotype threat is a ubiquitous and potentially serious predicament in social interaction that merits an extensive program of communication research. A first step would be to explore its manifestations in different contexts and populations to determine the generalizability of the phenomenon and variability in its impact. Additionally, understanding the cognitive processes that take place during and after it is triggered might shed light on why our communication behaviors change under threat. Finally, discovering strategies for mitigating stereotype threat’s pernicious effects can enhance intergroup relations by reducing the likelihood that group members will inadvertently reinforce negative stereotypes in the eyes of others.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
