Abstract
To explore the role of teachers’ biases in the underrepresentation of minorities and women in STEM, 128 secondary science teachers were asked to evaluate responses spoken with either falling or rising intonation by African American, Latino, and White ninth-grade boys and girls. Responses spoken by minority students were evaluated less favorably than identically worded responses spoken by White students, and rising intonation responses were evaluated less favorably than falling intonation responses. Female speakers have been shown to use rising intonation nearly twice as often as male speakers, so this bias against rising intonation responses disproportionately affects female students (an indirect effect of gender).
Keywords
Introduction
As the need for workers skilled in science, technology, engineering, and mathematics (the STEM fields) continues to increase (Parker & Gerber, 2000), so do concerns about the relative scarcity of African Americans, Latinos, and women pursuing careers in these fields (Ong, Wright, Espinosa, & Orfield, 2011; Simard, 2009). Though the numbers have been improving, these groups continue to be underrepresented, particularly in higher positions (De Welde, Laursen, & Thiry, 2007; Hill, Corbett, & St. Rose, 2010). Crucially, such inequities begin long before individuals enter the workforce. Clear differences in STEM knowledge and aptitude based on race/ethnicity (Peng & Hill, 1995; Stormont, Stebbins, & Holliday, 2001) and gender (M. G. Jones, Howe, & Rua, 2000; Mattern & Schau, 2002) are evident by middle school, and there is some evidence of gender differences as early as the first grade (Levine, Vasilyeva, Lousenco, Newcombe, & Huttenlocher, 2005; Penner & Paret, 2008; Rathburn, West, & Germino-Hausken, 2004). By high school, many African Americans, Latinos, and young women have chosen (or been encouraged) not to take higher level STEM courses (Scantlebury, 1994; Weinburgh, 1995).
Like many issues in education, racial/ethnic and gender inequities in STEM are evident in schools throughout the United States but are particularly acute in urban school districts, conceptualized as those in dense, populous cities characterized by sociocultural diversity and high rates of poverty and racial segregation (Milner & Lomotey, 2014; Noguera, 2014). Given this, improving STEM education for urban students affords an opportunity not only to promote equity and social justice, but to tap into a large pool of potential talent (Clewell, Cosentino de Cohen, Tsui, & Deterding, 2006; Daily & Eugene, 2013). This article, thus, joins a large, ongoing conversation in the field of urban education focused on identifying and removing obstacles to STEM success for underrepresented groups (Grossman & Porche, 2014; Patton, 2016; Tate, Jones, Thorne-Wallington, & Hogrebe, 2012). Studies contributing to this conversation have identified a number of factors that play a role in the underrepresentation of racial/ethnic minorities and women in the STEM fields, including stereotype threat and lack of role models (Correll, 2001; Keller, 2007; Oscos-Sanchez, Oscos-Flores, & Burge, 2008; Simard, 2009; Steele & Ambady, 2006). Other studies have pointed to the importance of home-related factors, afterschool programs, and even summer programs (M. Koch, Lundh, & Harris, 2019; Mac Iver & Mac Iver, 2019). Still others have focused on school-internal factors, particularly the absence of math and science curricula that are appropriately tailored to urban students’ needs and responsive to their cultural backgrounds (Eglash, Gilbert, Taylor, & Geier, 2013; C. C. Johnson & Fargo, 2010; Laughter & Adams, 2012; Martin & Larnell, 2014).
The present study examines the hypothesis that the underrepresentation of African Americans, Latinos, and women in the STEM fields is due in part to another school-internal factor that remains largely unexplored in the literature, namely, negative biases in STEM teachers’ perceptions and evaluations of such students, an example of what Patton (2016) called “the racism embedded within STEM learning environments” (p. 328). The specific question addressed in this article is how secondary science teachers’ evaluation of White students’ and male students’ responses to open-ended science questions compares with their evaluation of equivalent responses given by racial/ethnic minority students and female students.
Literature Review
Teachers’ Biases and Their Effects
Teachers’ biases and the effects of their biases have been the subject of decades of research (for reviews, see Archambault, Janosz, & Chouinard, 2012; Babad, 2009; Jussim & Harber, 2005; Jussim, Robustelli, & Cain, 2009; McKown, Gregory, & Weinstein, 2010; McKown & Weinstein, 2008; Tenenbaum & Ruck, 2007). Teachers’ perceptions of different students are known to influence which concepts teachers present and the pace of instruction, as well as teacher-student interactions in general (Rubie-Davies, 2007; Weinstein, 2002). Negative perceptions have also been found to reduce students’ self-efficacy and motivation (Kuklinski & Weinstein, 2001; Madon et al., 2001; Weinstein, 2002), which significantly affect student achievement (Goddard, Tschannen-Moran, & Hoy, 2001; Linnenbrink-Garcia & Fredricks, 2008). Teachers’ perceptions have even been shown to interact with stereotype threat, such that negative perceptions have a particularly adverse impact on African American and Latino students (McKown & Weinstein, 2002). Nevertheless, the question of how STEM teachers perceive students of different backgrounds, and the potential role of these perceptions in the underrepresentation of African Americans, Latinos, and women in the STEM fields, remains largely unaddressed in the literature.
Measuring Perceptions: The Verbal Guise Paradigm
There are a number of established methods for studying unconscious perceptions and other biases, including the implicit association test (Greenwald, McGhee, & Schwartz, 1998) and the go/no-go association task (Nosek & Banaji, 2001). Both are relatively artificial, though, in the sense that they require participants to perform tasks that they are unlikely to encounter in everyday life. Moreover, neither is particularly well suited to examining interactions among multiple factors. Thus, in the interest of task authenticity and analytical flexibility, the present study uses an adaptation of the verbal guise paradigm (Cooper & Fishman, 1974; Lambert, Hodgson, Gardner, & Fillenbaum, 1960). The verbal guise paradigm measures perceptions of different sociocultural groups by presenting participants with recordings of speakers whose languages, dialects, or other voice features are—for social and historical reasons—distinctively associated with the sociocultural groups of interest, and then eliciting those participants’ assessments (e.g., of the intelligence or employability) of the speakers. The perceptions measured are, of course, not caused by the voices. Rather, listeners “hear” the speakers’ sociocultural backgrounds in their voices, and this serves to evoke existing attitudes toward the groups associated with them.
Studies using the verbal guise paradigm have shown that both African American and White listeners tend to perceive African American speakers as less intelligent, less confident, and less ambitious than White speakers (Hensley, 1972; F. L. Johnson & Buttny, 1982; L. M. Koch, Gross, & Kolts, 2001). Research has likewise found that African American, Latino, and White listeners tend to perceive Latino speakers less favorably than White speakers on similar traits (Arthur, Farrar, & Bradford, 1974; Ryan & Carranza, 1975). Such studies have also shown that these negative perceptions can have serious real-life consequences. Research has found that well-qualified African American job candidates are often judged (based solely on their voices) as suitable for only relatively low-status positions (Henderson, 2001; Shuy, 1973), and that prospective tenants whose voices mark them as African American or Latino are often falsely told that apartments they phone to inquire about are no longer available (Massey & Lundy, 2001; Purnell, Idsardi, & Baugh, 1999).
Verbal guise studies of teachers’ perceptions
Of particular interest here is research that has used listeners’ assessments of speakers to study teachers’ perceptions of African American, Latino, and White students. Such studies have shown that teachers perceive African American (Cecil, 1988; Cross, DeVaney, & Jones, 2001; Hewett, 1971; Politzer & Hoover, 1976) and Latino students (Ramírez, Arce-Torres, & Politzer, 1976) as less intelligent and less likely to do well in school than White students. As Cross et al. (2001) noted, however, “having a belief or attitude is not the same as acting upon it” (p. 224). Consequently, such studies (including that of Cross et al.) are limited insofar as they do not “treat the actions that might arise from the attitudes revealed by the respondents” (Cross et al., 2001, p. 224). Nevertheless, a few studies, using an adaptation of the verbal guise paradigm, have shown not only that teachers often have negative perceptions of certain students but also that these negative perceptions can influence teacher behavior, leading teachers to evaluate the same or comparable work less favorably when presented orally by African American or Latino students than when presented orally by White students (Crowl & MacGinitie, 1974; Granger, Mathews, Quay, & Verner, 1977; Shepherd, 2011; Woodworth & Salzer, 1971).
The design of this last set of studies is notable insofar as teachers are asked to perform the familiar task of evaluating students’ work, rather than the potentially suspicion-raising task of judging students’ intelligence or academic potential based solely on brief, diverse-sounding voice recordings. Woodworth and Salzer (1971) asked teachers to evaluate social studies reports read by African American and White sixth-grade boys. Crowl and MacGinitie (1974) asked teachers to evaluate responses to social studies questions spoken by African American and White ninth-grade boys. Granger et al. (1977) asked teachers to evaluate oral picture descriptions by African American and White third graders. Shepherd (2011) asked teachers to evaluate responses to social studies questions spoken by African American, Latino, and White second- and third-grade boys and girls. Thus, this methodology not only tests teachers’ covert, unconscious perceptions of students of different backgrounds but also does so in terms of actions arising from these perceptions, namely, how teachers evaluate comparable work by such students.
Perceptions of male students versus female students
Verbal guise research has had considerably less to say about teachers’ perceptions of male students versus female students, despite the fact that sociocultural constructs such as race/ethnicity and gender often interact in complex and subtle ways, as highlighted by work on intersectionality (Collins, 2009; Dill & Zambrana, 2009). Woodworth and Salzer (1971) and Crowl and MacGinitie (1974) used only recordings of male speakers, and Granger et al. (1977) did not control for speaker gender “because preliminary work had indicated that listeners could not identify the sex of the third-grade children beyond the chance level” (p. 794). Shepherd (2011), who did include student gender as a factor, found that teachers evaluate identically worded responses significantly less favorably when spoken by African American or Latino students or by White boys than when spoken by White girls. This bias against boys and in favor of girls appears to be limited to early elementary school, however, during which time girls tend to be better behaved than boys (Else-Quest, Hyde, Goldsmith, & Van Hulle, 2006). Indeed, other gender-bias research has found that teachers tend to favor male students over female students (S. M. Jones & Dindia, 2004; Sadker, Sadker, & Zittleman, 2009).
Overview of the Present Study
Instrument
In the interest of task authenticity and analytical flexibility, the present study uses the same adaptation of the verbal guise paradigm used in Woodworth and Salzer (1971), Crowl and MacGinitie (1974), Granger et al. (1977), and Shepherd (2011). In research using this methodology, teacher-participants are asked simply to evaluate students’ spoken work (responses to general science questions, in the case of the present study). Unbeknownst to the teachers, however, what they hear are not the original, spontaneous productions of the students heard speaking. Rather, each one is part of a matched set of recordings of the same pre-determined response, all spoken verbatim by students of different racial/ethnic (and, in some studies, gender) backgrounds. Thus, although each teacher evaluates just one version of each response (to preserve the illusion that the stimuli are spontaneous, original productions), different teachers hear the responses spoken by students of different backgrounds. In the present study, this makes it possible to test the effects of race/ethnicity, gender, and their interactions by comparing the average evaluation of a particular response when it is spoken, for example, by a White student versus an African American student or a male student versus a female student.
The fact that all the response recordings match their pre-determined scripts verbatim, regardless of the speakers’ racial/ethnic backgrounds, also means that all follow Standard English grammar. Responses identifiable (based on characteristic pronunciation patterns) as having been spoken by African Americans are, thus, representative of a variety known as Standard African American English (SAAE), which combines African American English phonology with Standard English grammar (Spears, 2001). The available research suggests that SAAE is perceived significantly more favorably than its grammatically non-standard counterpart—African American Vernacular English (AAVE)—but significantly less favorably than Standard White English (Massey & Lundy, 2001). As for the specific phonetic characteristics of “White-sounding,” “Black-sounding,” and “Latino-sounding” voices, linguists have identified a number of phonetic variables associated with these (see, for example, Purnell et al., 1999, and references therein). For the purposes of the present study, however, a “White-sounding” or “minority-sounding” voice is the voice of a White speaker or of an African American or Latino speaker, respectively, that test listeners correctly and consistently perceive as such, regardless of its particular phonetic properties. Please refer to the “Method” section for a more detailed description of how the stimuli were developed and presented.
Testing for an indirect effect of gender
Given the paucity of verbal guise research examining gender bias, this study also explores the possibility of an indirect effect of gender on evaluation by including response intonation (falling vs. rising) as a third factor. Although the standard in English is for declarative utterances (such as responses to questions) to be spoken with falling intonation (Wells, 2006), studies in Australia (Guy, Horvath, Vonwiller, Daisley, & Rogers, 1986), New Zealand (Warren, 2005), and the United States (Ritchart & Arvaniti, 2014) have found that younger speakers, in particular, regardless of race/ethnicity, produce a significant proportion of their declarative utterances with the rising intonation typical of yes/no questions. Crucially, these studies have also shown that female speakers produce nearly twice as many of these rising intonation declaratives as male speakers. Moreover, a study comparing male and female contestants on a television quiz show found not only that female contestants respond using rising intonation nearly twice as often as male contestants, but that, whereas successful male contestants use rising intonation less than unsuccessful ones, successful female contestants use rising intonation more (Linneman, 2013). Linneman (2013) attributed this finding to “successful women . . . engaging in a compensatory strategy in order to perform their gender ‘correctly’” (p. 96), an explanation that is also consistent with research on female students’ response to STEM-related stereotype threat (Pronin, Steele, & Ross, 2003; Saucerman & Vasquez, 2014). Given such gender differences in intonation, if secondary science teachers evaluate responses spoken with rising intonation less favorably than responses spoken with falling intonation, this will have a much larger negative impact on female students (and on high-achieving female students, in particular) than on male students and, ultimately, may be a factor contributing to the underrepresentation of women in the STEM fields.
Predictions
Given the relative scarcity of African Americans and Latinos in the STEM fields, this study expects to replicate previous findings that teachers evaluate otherwise identical responses (in this case, responses to science questions) significantly less favorably when spoken by African American and Latino students than when spoken by White students. Moreover, given the similar scarcity of women in the STEM fields, direct and/or indirect effects of gender are expected to emerge, as well. Thus, it is predicted that otherwise identical responses will be evaluated significantly less favorably when spoken by female students (a direct effect of gender) and/or when spoken with rising intonation (an indirect effect of gender).
As for possible interaction effects, research suggests that it is not uncommon for members of both minority and non-minority groups to express similar negative beliefs about members of minority groups (Cross et al., 2001; Lambert et al., 1960; Ryan & Carranza, 1975; Shepherd, 2011). Such findings are typically attributed to minority-group members’ having internalized the ideology of the dominant society (e.g., internalized racism; Watts-Jones, 2002; Williams & Williams-Morris, 2000). In two of the above-cited studies, minority listeners’ perceptions of minority-group speakers were actually more negative than those of non-minority listeners (Lambert et al., 1960; Shepherd, 2011), but in the other two studies, minority listeners’ perceptions were less negative than those of non-minority listeners (Cross et al., 2001; Ryan & Carranza, 1975). In light of these mixed results, it is not clear whether to expect a significant interaction between the race/ethnicity of the student-respondent and that of the teacher-evaluator in the present study.
Finally, with respect to the possible interaction between the gender of the student-respondent and that of the teacher-evaluator in the present study, the available research strongly suggests a non-significant result. The relevant verbal guise studies (Crowl & MacGinitie, 1974; Shepherd, 2011) found no such interaction, nor have other studies of teacher bias (Krkovica, Greiffa, Kupiainenb, Vainikainenb, & Hautamäki, 2014; Neugebauer, Helbig, & Landmann, 2011). Indeed, only one study seems to support a different conclusion, namely, Dee (2006), who found that male teachers perceive male students more favorably than female students and that female teachers perceive female students more favorably than male students. Thus, it is predicted that whatever effects of gender (direct and/or indirect) the present study reveals will be equally evident in evaluations by male and female teachers.
Method
Participants
A total of 128 secondary science teachers were recruited via email from 98 randomly selected public, private, and charter schools throughout the U.S. state of California. Of the 128 teacher-participants, 85.16% (n = 109) were White, 7.03% (n = 9) were Asian American/Pacific Islander (AAPI), 5.47% (n = 7) were Latino, and 2.34% (n = 3) were multiracial; 65.63% (n = 84) were female and 34.38% (n = 44) were male. The median age range was 41 to 45. All participants had taught science for at least 1 year, with an average of 12.53 years (SD = 9.64 years). The majority (60.16%; n = 77) taught in high schools (Grades 9-12), 26.56% (n = 34) taught in middle schools (Grades 6-8), and 13.28% (n = 17) taught in combined middle and high schools. Most (67.97%; n = 87) taught in public schools, 17.19% (n = 22) in private schools, and 14.84% (n = 19) in charter schools. African American and Latino students made up from 4.00% to 98.20% of each school’s student population, with a mean of 44.20% (SD = 26.94%).
Materials
The stimuli were recordings of African American, Latino, and White ninth-grade boys and girls (ages = 14-15) reciting 10 pre-determined responses (eight target items and two fillers) to each of three general science questions:
What is the goal of a scientific method?
Why are scientific models useful?
Why is it important that scientific investigations be repeated?
The first and second questions are from a high school physical science textbook (Wysession, Frank, & Yancopoulos, 2009, p. 7), and the third is from a high school biology textbook (Biggs et al., 2005, p. 23). Crucially, though, all three are representative of the type of general science questions found in the introductory chapter of every secondary-school science textbook considered for this study. Consequently, evaluating responses to these questions is believed to be well within the expertise of any secondary science teacher, regardless of area of specialization (e.g., biology, physics, etc.) and grade level usually taught.
To develop a range of authentic student responses, answers to these questions were solicited from nine secondary-school students (five girls, four boys; Grades 7-12), none of whom were recorded speaking any of the actual stimuli. To help ensure that the responses would be long enough for listeners to recognize the speakers’ race/ethnicity and gender, and yet short enough that students could recite them verbatim in a natural and spontaneous-sounding manner, the original answers were edited (and in some cases, split up) such that no response used in the study has fewer than five words nor more than 11. For the complete set of responses, see the appendix.
A total of 60 African American, Latino, and White ninth-grade boys and girls (ages = 14-15) were recorded reciting these responses at three public high schools in a major California city. Each recording session lasted approximately 10 to 15 min and consisted of briefly explaining the task (to recite pre-determined responses to science questions), telling the student the corresponding question, then saying each response and cuing the student to repeat it back. All 30 responses were first cued and recorded with falling/declarative intonation, after which the student was cued to repeat any that had been misspoken. Then, all 30 responses were cued and recorded with rising/questioning intonation, again followed by any necessary repetitions. All the recordings were made using a Zoom H4n digital recorder and a Shure SM10a headset-mounted unidirectional microphone.
Of the nearly 4,000 response recordings that were made, only those that matched the pre-determined responses verbatim were considered for the final stimulus set. Acceptable recordings were edited as needed using the Audacity audio editor (Audacity Team, 2013) to remove hesitations and to ensure that all the recordings were well matched in speech rate and amplitude. This was done so that all the speakers would sound as similar as possible in terms of confidence and fluency.
To verify that the race/ethnicity, gender, and intonation of the recordings were reliably recognizable, all candidate recordings were presented one by one in random order to a total of 12 African American, Latino, and White men and women, none of whom participated in the study itself. These listeners were asked to identify (in three separate tasks) the race/ethnicity and gender of the speaker in each recording, as well as the intonation with which the recorded response was spoken. Interestingly, the results of this verification process revealed, with an alpha level of .05, that African American, Latino, and White listeners misperceive African American speakers as Latino and Latino speakers as African American significantly more frequently than they misperceive African American or Latino speakers as White, χ2(1, N = 152) = 10.64, p = .001. This suggests that listeners, regardless of their own racial/ethnic background, can identify the race/ethnicity of African American, Latino, and White speakers most reliably in terms of a White–minority binary. In light of this perceptual reality, and given that African Americans and Latinos are similarly underrepresented in the STEM fields, the present study follows Shepherd (2011) in treating the variable of race/ethnicity as just such a binary.
Treating race/ethnicity in this way, the accuracy with which listeners identified the race/ethnicity of the speaker in each of the recordings selected for use in the study ranged from 75% to 100%, with a mean of 91.58% (SD = 9.27%; consistent with Crowl & MacGinitie, 1974, whose stimuli included only African American and White speakers). The accuracy with which listeners identified the gender (male or female) of the speaker in each of the selected recordings ranged from 91.67% to 100%, with a mean of 99.83% (SD = 1.71%). Finally, the accuracy with which listeners identified the intonation (falling or rising) used in each of the selected recordings ranged from 83.33% to 100%, with a mean of 96.63% (SD = 5.73%).
In the end, 23 of the 60 ninth-grade students contributed at least one recording to the final stimulus set. The 192 target items (24 Target Responses × White/Minority × Male/Female × Falling/Rising Intonation) came from 20 speakers (five African American, two Latino, and 13 White; 10 male and 10 female), each of whom contributed from one to 21 recordings (M = 9.60, SD = 6.82). Three of the White speakers (one male, two female) each contributed one of the six filler items, as well, and the last three speakers (two White males and one White female) contributed only filler items (one each).
Procedure
To preserve the illusion that the responses were the spontaneous, original work of the students heard saying them, it was essential that no teacher-participant hear more than one recording of the same response nor hear the same student give more than one response to the same question. To this end, the stimulus recordings were arranged into lists containing exactly one recording of each of the 30 responses, avoiding recordings of multiple responses to a given question by the same speaker. In every list, the 10 responses to each question were kept together. Moreover, in an effort to normalize the teacher-participants’ use of the rating scale, each set of responses to a given question began with that question’s two filler-item recordings, always in the same order. These were followed by recordings of the question’s eight target items.
The sets of eight target-item recordings for each question were generated using a Latin-square design with restricted randomization (Bailey, 1983; Youden, 1972), ensuring that each set contained exactly one recording of each response and exactly one recording representing each “guise” (combination of race/ethnicity, gender, and intonation) and avoiding the inclusion of multiple recordings by the same speaker. Two such Latin squares were created for each question’s set of responses, yielding a total of 16 unique response guise collocations for every question, each containing eight of that question’s 64 target-item recordings. Two different restricted randomizations of the order of each of these 16 collocations were then created, avoiding more than two consecutive recordings featuring speakers of the same race/ethnicity (White or minority), of the same gender, or using the same intonation (falling or rising), for a total of 32 random presentation orders. Next, a backward version of each random order was generated, resulting in a total of 64 different sub-lists for each question. Finally, these 192 single-question sub-lists were randomly grouped into 64 three-question lists, each containing one sub-list from each question and with the order of the three questions randomized between lists.
All teacher-participants completed the experiment online via Qualtrics’s online survey software (Qualtrics, 2014), which was set up to assign each participant, at random, to one of the 64 stimulus lists, such that each list was assigned to two of the 128 participants. Participants were first shown instructions indicating that they would be hearing ninth-grade students (14- and 15-year-olds) saying responses to general science questions and that they would be asked to evaluate how well each response answers the corresponding question. Participants were then shown whichever of the three general science questions was first in their particular list and presented with the corresponding set of recordings of that question’s 10 responses (one at a time, on separate screens, beginning with the question’s two filler items). Following Shepherd (2011), each screen displayed (from top to bottom) the science question, the embedded sound player, the text “How well does the response answer the question?” and an eight-point, partially anchored Likert-type scale (see Davies, 2008), consisting of radio buttons numbered “1” through “8” from left to right, with the leftmost labeled (above the “1”) “Not so well” and the rightmost labeled (above the “8”) “Very well.” Such a scale is within Cox’s (1980) optimal range of five to nine points and lacks a middle point, which respondents tend to overuse (Cox, 1980). Participants were required to select one of the eight radio buttons to continue to the next screen and could not go back after submitting a rating. Once participants had rated all 10 responses to a question, they were shown whichever question was next in their particular list and presented with the corresponding set of recordings of that question’s 10 responses.
After evaluating all 30 responses in this way, participants were asked to complete a demographic questionnaire requesting their gender, age (in 5-year increments), race/ethnicity, native language(s), and age of arrival in the United States (0 if born in the United States). They were also asked how many years they had been teaching, what grade(s) they usually taught, and what area(s) of science they usually taught. Data on the percentage of African American and Latino students and the average socioeconomic status of the students were obtained later from the schools themselves. Participants completed the study in a single session lasting approximately 10 to 15 min.
Results
A three-way ANOVA (Race/Ethnicity × Gender × Intonation) conducted in SPSS revealed that with an alpha level of .05, there were significant main effects of race/ethnicity (White [M = 5.02, SD = 1.76] vs. minority [M = 4.88, SD = 1.75]), F(1, 3,064) = 4.53, p = .03, and intonation (falling [M = 5.04, SD = 1.77] vs. rising [M = 4.86, SD = 1.74]), F(1, 3,064) = 8.12, p = .004. The main effect of gender (male [M = 4.98, SD = 1.76] vs. female [M = 4.92, SD = 1.75]), though in the expected direction, failed to reach significance, and there were no significant interactions among the factors of race/ethnicity, gender, and intonation. Additional linear regression analyses found no significant interactions between the factors of race/ethnicity, gender, and intonation and any of the teacher variables (race/ethnicity, gender, age, number of years teaching, and grade(s) usually taught) nor any of the school variables (public/private/charter, percentage of African American and Latino students, and average socioeconomic status of the students). Unfortunately, given the limited number of Latino teachers (5.47%; n = 7) and the complete lack of African American teachers in the study sample (consistent with these groups’ underrepresentation in STEM more generally), these non-significant interactions cannot be interpreted further.
Discussion
In light of the relative scarcity of African Americans, Latinos, and women pursuing careers in the STEM fields (Ong et al., 2011; Simard, 2009), the present study explored the hypothesis that the underrepresentation of these groups is due, in part, to negative biases in secondary science teachers’ perceptions and evaluations of such students. This was done by examining how such teachers evaluate identically worded responses spoken by African American, Latino, and White ninth-grade boys and girls (ages = 14-15). Previous research using this methodology has shown that teachers evaluate work presented orally by African American and Latino students less favorably than equivalent work presented orally by White students (Crowl & MacGinitie, 1974; Granger et al., 1977; Shepherd, 2011; Woodworth & Salzer, 1971). Such research has had considerably less to say about teachers’ perceptions of male students versus female students, however, so the present study also explored the possibility of an indirect effect of gender by including response intonation (falling vs. rising) as a factor in addition to speaker race/ethnicity and gender.
The results of the present study reveal significant inequalities in secondary science teachers’ evaluation of different students’ work. Responses spoken by African American and Latino students, for instance, were evaluated significantly less favorably than identically worded responses spoken by White students, a result that parallels previous findings on race/ethnicity-based inequity in other school subjects (Crowl & MacGinitie, 1974; Granger et al., 1977; Shepherd, 2011; Woodworth & Salzer, 1971). Although the effect of race/ethnicity on evaluation is of relatively small magnitude, its negative impact on African American and Latino students will be exacerbated by its interaction with stereotype threat (McKown & Weinstein, 2002). Over time, the experience of being evaluated less favorably for the same quality of work, combined with the other adverse effects of the negative perceptions underlying such evaluations (e.g., on teacher–student interactions; Rubie-Davies, 2007), will adversely affect African American and Latino students’ self-efficacy and motivation (Kuklinski & Weinstein, 2001; Madon et al., 2001; Weinstein, 2002), which, in turn, are known to significantly affect student achievement (Goddard et al., 2001; Linnenbrink-Garcia & Fredricks, 2008). Ultimately, the culmination of these negative effects is likely a significant factor in the scarcity of racial/ethnic minorities pursuing careers in the STEM fields.
As for gender inequality, although the main effect of gender failed to reach statistical significance, teachers did give significantly lower scores to otherwise identical responses spoken using rising intonation, which female speakers have been found to use nearly twice as often as male speakers (Guy et al., 1986; Ritchart & Arvaniti, 2014; Warren, 2005). Moreover, whereas male speakers who have reason to feel particularly confident about their knowledge tend to respond use rising intonation less than the average male speaker, the opposite is true of female speakers, a result that has been attributed to these women “compensating” for their success when performing their gender (Linneman, 2013) and which is also consistent with research on female students’ response to STEM-related stereotype threat (Pronin et al., 2003; Saucerman & Vasquez, 2014).
Such gender differences in intonation mean that secondary science teachers’ less favorable evaluation of responses spoken with rising intonation will tend to have a much larger negative impact on female students (and on high-achieving female students, in particular) than on male students. As with the race/ethnicity-based disparities discussed above, being evaluated less favorably for the same quality of work negatively affects the affected students’ self-efficacy and motivation, which, in turn, are known to significantly affect student achievement (Goddard et al., 2001; Linnenbrink-Garcia & Fredricks, 2008). Moreover, the negative impact on female students will be exacerbated by its interaction with gender-related STEM stereotype threat (Pronin et al., 2003; Saucerman & Vasquez, 2014). Ultimately, the culmination of these negative effects, like the analogous effects of race/ethnicity, is likely a significant factor in the scarcity of women pursuing careers in the STEM fields.
Conclusion
The results of the present study suggest that many secondary science teachers perceive African American and Latino students less favorably than White students—perceptions that likely contribute to the underrepresentation of African Americans and Latinos in the STEM fields. Given this, helping teachers develop equitable, positive perceptions remains as important as ever (Archambault et al., 2012; Demanet & Van Houtte, 2012; Weinstein, 2002). Making teachers aware of how perceptions can affect their interactions with different students has long been known to help neutralize the effects of teacher bias (e.g., Smith & Luginbuhl, 1976), and calling teachers’ attention to students who tend to be overlooked in classroom interactions has been shown to lead to marked improvements in both the quality and quantity of teachers’ interactions with such students, in turn leading to positive changes in those students’ behavior, as well (e.g., Good & Brophy, 1974). In fact, equitable perceptions have even been shown to mitigate the effects of real ability differences among students (Roeser, Eccles, & Sameroff, 2000; Stipek, 2001). In light of such findings, teachers need to be aware of their perceptions of students of different sociocultural backgrounds and must remain conscious of the fact that their biases can significantly affect student learning.
This study also raises the possibility that direct biases against underrepresented and underperforming groups are only the tip of the iceberg, as a variety of indirect effects of traits that are differentially associated with particular sociocultural groups are also possible. The present study highlights how teachers’ less favorable perception and evaluation of one such trait—rising intonation, which female speakers have been found to use nearly twice as often as male speakers (Guy et al., 1986; Linneman, 2013; Ritchart & Arvaniti, 2014; Warren, 2005)—can have a much larger negative impact on female students (and on high-achieving female students, in particular), creating an indirect effect of gender on evaluation.
Any inequitable evaluation of comparable work can negatively affect students’ self-efficacy and motivation, which are known to significantly affect achievement (Goddard et al., 2001; Linnenbrink-Garcia & Fredricks, 2008), and these effects are exacerbated when they interact with stereotype threat (cf. McKown & Weinstein, 2002, on the interaction of teacher biases with race/ethnicity-based stereotype threat). Consequently, teachers’ awareness of their potential biases must go beyond simply realizing that they may unconsciously perceive some students less favorably than others. Teachers must also be aware of their perceptions of traits indirectly associated with different groups, as these perceptions can also have a significant impact on students’ learning. These traits could be a matter of word choice, which varies by dialect, or of which word a student chooses to stress within a particular utterance (in recording the stimuli and carefully enforcing uniformity, it became evident that most of the responses could be spoken in multiple ways by stressing different words).
Finally, complicating matters a bit further, some students simply lack confidence, quite possibly due to having had their confidence undermined by the very sorts of biased evaluation found in this study. The significant negative impact of rising intonation, which listeners tend to perceive as reflecting uncertainty, suggests that other speech features associated with uncertainty probably also negatively affect teachers’ evaluation of otherwise comparable responses. Candidates for future research along these lines include hesitations, slow speech rate, and low amplitude (recall that in the present study, all hesitations were edited out of the recordings, and both speech rate and amplitude were normalized across the stimuli). Ultimately, if a student’s lack of confidence is due, even just in part, to having experienced negative teacher bias in the past, a vicious circle can emerge. The solution, of course, is for teachers to strive, consciously and constantly, for both equity and objectivity in their evaluations of student work, as well as in teacher–student interactions in general—much easier said than done but, nevertheless, well worth the required effort.
Footnotes
Appendix
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
