Abstract
Service providers who are Black tend to be evaluated less favorably than those who are White, hindering opportunities for advancement. We propose that the Black-White racial disparity in service performance evaluations is due to occupational-racial stereotype incongruence for interpersonal warmth and that more emotional labor is necessary from Blacks to reduce this incongruence. A pilot study manipulating employee race and occupation confirmed warmth and person-occupation fit judgments are lower for an otherwise equal Black than White service provider. We then demonstrate the racial disparity in service performance is due to interpersonal warmth differences in an experimental study with participants evaluating videos of retail clerks (Study 1) and a multisource field study of grocery clerks with supervisor-rated judgments (Study 2). Furthermore, White service providers are rated highly regardless of emotional labor, but performing more emotional labor (i.e., amplifying positive expressions) is necessary for Black providers to increase warmth judgments and reduce the racial disparity. In other words, Black providers are held to a higher standard where they must “fake it to make it” in service roles. We discuss implications for stereotype fit and expectation states theory, emotional labor, and service management.
Keywords
In the United States, Black employees continue to face a “glass ceiling” in terms of opportunities for promotions and career advancement (Avery, 2011; Roberson, Galvin, & Charles, 2007). One reason for this is that Black employees’ performance is rated lower than White employees’ performance even when objectively the same (Arvey & Murphy, 1998; Greenhaus, Parasuraman, & Wormley, 1990; Luksyte, Waite, Avery, & Roy, 2013; McKay & McDaniel, 2006; Oppler, Campbell, Pulakos, & Borman, 1992; Roberson & Block, 2001; Sackett & DuBois, 1991). In occupations where cognitive performance is central, such as managerial/professional occupations, stereotypes about intellect are usually the explanation for the performance bias (Harlow, 2003; McKay & McDaniel, 2006), yet racial disparity is also found in service performance ratings (Hekman, Aquino, Owens, Mitchell, Schilpzand, & Leavitt, 2010), where interpersonal performance is critical. Given that employed Blacks are almost twice as likely to be in these customer-facing occupations (i.e., service or sales; 58.6%) than in management or professional occupations (30%; U.S. Department of Labor, 2016), a better understanding of racial disparity in service performance has implications for the job success and advancement of a large proportion of Black employees.
We further this understanding in two ways. First, we draw on stereotype fit theory to argue that the racial disparity in performance is due to a misfit between (a) the interpersonal warmth expectations for the service occupation and (b) the interpersonal warmth stereotypes for Blacks in the United States (Eagly & Karau, 2002; Roberson & Block, 2001). Service occupations require friendly and prosocial behaviors (Diefendorff, Richard, & Croyle, 2006), and perceptions of warmth are even more critical for performance evaluations than perceptions of competence (Smith, Martinez, & Sabat, 2016). Yet racial stereotypes are held about Blacks as a group in ways that are incongruent with these high-warmth expectations (Block, Aumann, & Chelin, 2012; Madon et al., 2001; Neel, Neufeld, & Neuberg, 2013; Niemann, Jennings, Rozelle, Baxter, & Sullivan, 1994). We propose that stereotype incongruence about interpersonal warmth can bias job performance evaluations against Black service providers compared to White providers.
Second, the occupation-race stereotype incongruence around warmth means that Black service providers must work harder to be rated equivalent to White providers, based on the double standards principle of expectation states theory (Berger, Cohen, & Zelditch, 1972; Foschi, 2000). Specifically, we propose that higher levels of emotional labor may be needed from Black employees to reduce the racial disparity in judgments of warmth and overall performance. Emotional labor refers to an employee’s emotional display (e.g., a smile with customers) that conforms to the work role as well as the effortful strategies (e.g., frequently faking a good mood) to achieve that display (Beal, Trougakos, Weiss, & Green, 2006; Grandey & Gabriel, 2015; Hochschild, 1983). In our inquiry, we ask whether Black employees must perform more emotional labor—more intense positive emotional displays or more frequent emotional strategies—in order to reduce the racial disparity.
Our study has important theoretical, empirical, and societal contributions. First, we uniquely integrate diversity and emotional labor literatures (Ashkanasy, Hartel, & Daus, 2002) in order to better understand and reduce the racial disparity in service performance ratings. Stereotype fit theory is a well-established explanation for gender bias (Eagly & Karau, 2002; Kulik & Bainbridge, 2006) and racial bias (Hareli, David, & Hess, 2013; Sy et al., 2010) in leadership/professional roles; we extend stereotype fit theory to see whether it can explain bias found in service jobs. Furthermore, we extend the double standards principle of expectation states theory, which has previously been tested in terms of competence (Foschi, 2000), to suggest that more effort is needed by Blacks than Whites to improve warmth in a service job. Emotional labor has mixed effects on performance judgments depending on the strategy and situational factors (Chi, Grandey, Diamond, & Krimmel, 2011; Gabriel, Diamond, & Grandey, 2015; Groth, Hennig-Thurau, & Walsh, 2009; Hülsheger & Schewe, 2011), and we suggest the employee’s race may also determine its effectiveness. Thus, we integrate these two literatures in novel ways that extend existing theory and research.
We make a notable empirical contribution by testing our propositions with complementary methodological approaches. We extend prior research on the service performance disparity (Hekman et al., 2010) by testing whether racial stereotypes about warmth explain the effect and whether emotional labor reduces this bias. We conduct an experimental pilot study with photographs, an experimental study with videos of a work interaction, and then a multisource field survey study, that use hotels, retail sales, and grocery stores, respectively, as the service contexts. As a set, our studies also extend prior work by testing whether the disparity occurs with both first-time ratings and ongoing supervisor ratings. Such complementary methods set in different service contexts reduce concerns about methodological artifacts and generalizability present in any one study (Chatman & Flynn, 2005; Cialdini, 1980).
Practically, the growth of jobs in the service sector and diversity in the workforce renders these results relevant to many U.S. employees. Our results explicate how racial stereotypes may be present even in entry-level occupations, which can be a barrier to career advancement (Roberson et al., 2007). Understanding whether emotional labor strategies overcome racial stereotypes in judgments is helpful in the short term (McKay, Avery, & Morris, 2008; Singletary & Hebl, 2009) but can also improve conditions in the long term by raising awareness about whether and how stereotyped groups expend additional effort at work with potential costs to their well-being.
Theoretical Development
We draw on stereotype fit (Eagly & Karau, 2002) to make propositions about why there is racial disparity in Black and White service performance and expectation states theory (Berger, Wagner, & Zelditch, 1985; Foschi, 2000) to argue when disparity is minimized.
Stereotype Fit: Occupational and Racial Stereotypes About Interpersonal Attributes
Stereotypes—overgeneralized beliefs about the characteristics of a group of people—can serve as mental shortcuts to believing certain things about a person on the basis of his or her social group membership (McCauley, Stitt, & Segal, 1980). Stereotypes provide people a way to “simplify and understand the huge amounts of social information that they confront and make inferences that go beyond available information” (Kunda & Spencer, 2003: 524). According to stereotype fit theory (Eagly & Karau, 2002; Sy et al., 2010), people tend to develop general beliefs or stereotypes about how holders of roles are expected to look and behave (i.e., a prototype; Rosette, Leonardelli, & Phillips, 2008). When evaluating someone in that role, they compare the target to these prototypes.
For example, leaders are expected to be agentic and powerful, such that stereotypes about White men’s attributes are more consistent with the role than women or most racial minority group members (Eagly & Karau, 2002; Lord, 1985; Sy et al., 2010). When a target’s group stereotypes are congruent with an occupational prototype, or high occupation-group stereotype fit, evaluations tend to be more favorable than when they are incongruent, as shown in research on leader evaluations by gender (Eagly & Karau, 2002; Lord, 1985) and race (Roberson et al., 2007; Sy et al., 2010). In general, occupation-group stereotype fit has been primarily tested with leader or high-status professional roles (Hareli et al., 2013).
We extend these ideas to propose that incongruence between the service occupation and racial group stereotypes for interpersonal warmth helps to explain racial disparities in service provider ratings. Interpersonal warmth (i.e., moral-social attributes such as friendliness and prosocial tendencies) is an automatic and immediate judgment made about people that influences future attributions and behavior (Fiske, Cuddy, & Glick, 2007). High interpersonal warmth is a core expectation of service occupations. A service provider is expected to be enthusiastic and friendly (Diefendorff et al., 2006) and to engage in prosocial and agreeable behavior toward the customer (Parasuraman, Zeithaml, & Berry, 1985), which in the aggregate can be conceptualized as high interpersonal warmth. Interpersonal warmth increases liking and rapport, which tends to positively influence performance appraisals (A. W. Sutton, Baldwin, Wood, & Hoffman, 2013) and is specifically valued in service exchanges (Smith et al., 2016). Interpersonal behaviors that indicate warmth, such as a wide smile, eye contact, and nodding, are linked to higher performance ratings in a variety of service contexts (Pugh, 2001; Tsai & Huang, 2002). Overall, a prototypical service provider is someone high in interpersonal warmth, and this becomes the standard by which other service employees are compared and evaluated.
At the same time, an employee’s racial group has stereotypes that provide a heuristic, a shortcut that automatically provides information, and this is matched to the occupation prototype to inform the judgment (Hareli et al., 2013; Sy et al., 2010). In particular, people tend to hold stereotypes about Blacks that suggest lower service occupation–race stereotype fit than Whites. When asked to select group-congruent attributes, low-warmth attributes like loud, argumentative, threatening, aggressive, and quick tempered are more likely to be selected for Blacks than Whites, while friendly, caring, and courteous—prototypical attributes of service providers—are more typically selected for Whites than for Blacks (Block et al., 2012; Madon et al., 2001; Neel et al., 2013; Niemann et al., 1994). These racial stereotypes suggest there is occupation-racial stereotype incongruence for Black providers but congruence for White providers. Consistent with this idea, evaluators are more likely to favor White over Black applicants for customer contact positions (Moss & Tilly, 2001; Watson, Appiah, & Thornton, 2011) and rate White service providers higher than Black providers (Hekman et al., 2010), even when objective information is held constant.
We extend these findings to propose that this bias occurs as a result of occupation–racial group stereotype fit: interpersonal warmth is central to the service provider prototype, and Black service providers are stereotyped as lower in interpersonal warmth than White service providers. This occupation-race stereotype incongruence results in less favorable evaluations of Black service providers compared to White service providers. Explicitly stated:
Hypothesis 1: All else being equal, Black service providers are perceived as less interpersonally warm than White service providers (1a), and interpersonal warmth judgments explain racial differences in overall performance ratings (1b).
Double Standards: Emotionally Laboring to Override Stereotype Incongruence
Stereotypes are a heuristic to fill a knowledge gap (Fiske & Neuberg, 1990; Hilton & von Hippel, 1996). As such, stereotypes are most likely to inform judgments in the absence of other salient and relevant information. According to expectation states theory, when stereotypes indicate low expectations for an attribute, the stereotyped group member needs to provide higher levels of information for that attribute—that is, frequent or highly salient behavior—than a group member for whom there are higher expectations (Berger et al., 1985). As applied to gender and leadership evaluations, a man can do less than a woman because “he is allowed more latitude (more demonstrations of low ability) than a woman before lack of ability is inferred” (Biernat & Kobrynowicz, 1997: 545-546). Similarly, White employees are permitted to be late more often than are Black employees before it harms performance evaluations (e.g., Luksyte et al., 2013). This double standard—higher standards for the stereotype incongruent group than the stereotype congruent group—occurs because “unexpected performance elicits a stricter standard, because the judge requires stronger evidence” when there is incongruence (Foddy & Smithson, 1989: 76). Note that this is in contrast to the shifting standards model (Biernat & Kobrynowicz, 1997), which suggests that a negatively stereotyped group member easily exceeds expectations when performing at minimal levels and so is evaluated more favorably than a positively stereotyped group member.
We extend the idea of the double standard from expectation states theory to suggest that more effort is needed by Black employees than White employees to demonstrate their warmth and, thus, reduce the racial disparity in service performance. With low effort on warmth behaviors, White service providers are allowed “more latitude,” whereas Black service providers are likely viewed less favorably as a result of the stereotype incongruence (Biernat & Kobrynowicz, 1997; Kunda & Spencer, 2003; Roberson & Block, 2001). In order to be viewed favorably, Black service providers need to meet a higher standard, that is, perform more frequent or intense warmth behaviors than White providers (Fiske, Lin, & Neuberg, 1999; Johnston & Hewstone, 1992; Kunda & Spencer, 2003).
Emotional labor, that is, exhibiting positive displays with customers, is a way to demonstrate warmth and good performance in service jobs (Barger & Grandey, 2006; Pugh, 2001; Tsai & Huang, 2002), but it is unknown whether the effect is stronger for minority racial group members. Positive emotional displays improved liking for and evaluations of gay/lesbian job applicants for a service job, whereas heterosexual job applicants were viewed favorably regardless (Singletary & Hebl, 2009). However, gay and lesbian groups are not stereotyped in ways incongruent with warmth expectations; thus, it is unclear whether positive displays override occupation-stereotype incongruence.
We propose that as a result of occupation–racial warmth stereotype incongruence, emotional labor enhances evaluations of Black providers more than White providers. In contrast, when emotional labor is minimal (i.e., polite smile and basic courtesy), the occupation–racial warmth stereotype incongruence will result in more favorable evaluations of White than Black service providers. With more emotional labor (e.g., wide smile, upbeat demeanor), Black providers meet the higher standard of evidence to override stereotype incongruence, but it is less influential for the already favorable ratings of White providers (Fiske et al., 1999; Johnston & Hewstone, 1992; Kunda & Spencer, 2003). Stated formally:
Hypothesis 2: Emotional labor moderates the effect of race on interpersonal warmth judgments (2a) and the indirect effect of race on performance via warmth (2b) such that when emotional labor is lower, Black service providers are rated lower than White service providers, but when emotional labor is higher, the racial difference is reduced.
Our hypothesized model is represented in Figure 1.

Model of Employee Race and Emotional Labor on Service Provider Judgments
Method Overview
We begin with a pilot study to confirm assumptions about occupational-racial stereotype incongruence using photographs as stimuli. In Study 1, we test hypotheses with video stimuli that manipulates employee race and emotional labor, measuring judgments of warmth and performance. Then in Study 2, we test the same hypotheses in a field survey with Black and White employees’ frequency of emotional labor strategies and supervisor warmth and performance judgments.
Pilot Study
The pilot study tests three assumptions of occupation-race stereotype fit: (a) the occupational stereotype for service provider is high interpersonal warmth, (b) Black employees are stereotyped lower in interpersonal warmth than White employees, and (c) service providers who are Black are perceived as a worse fit for the occupation than White service providers, all else equal.
Sample and Procedure
Using Amazon’s Mechanical Turk (MTurk), we recruited 138 respondents for a study described as being about decision-making based on limited information in work settings. MTurk provides data as reliable as that collected from more traditional sampling methods (Buhrmester, Kwang, & Gosling, 2011) but is critiqued for mindless responding (Chandler, Mueller, & Paolacci, 2014). Thus, we only retained 124 respondents (90%) who passed two attention checks (e.g., selected “agree” when instructed; Meade & Craig, 2012) and correctly reported the race of the target condition. The sample included 69 female (55.6%) and 55 male (44.4%) respondents ranging in age from 18 to 70 (M = 35.73 years, SD = 11.15) and had an ethnic composition of mostly Caucasian (74.2%) with 13% Black, 10.5% Asian, and 2.4% Latinx/Hispanic. Participants’ education ranged from high school graduate (9.7%) to graduate degree (10.5%) with 4-year college degree the modal response (41%), and respondents reported an average American household income (M = 6.13, SD = 3.20, where 6 = $60,000).
We followed prior research methodology on race-occupation stereotype fit (Sy et al., 2010) with a 2 (Black or White employee race) × 2 (customer service or other occupation) design; respondents were randomly assigned to one of the four between-subject conditions. Participants were provided a photo with a brief description: name and age (held constant as Trisha Jones and 23) and manipulated statements about employee race and occupation. For employee race, the employee was described as African American or Caucasian, and there was a photograph of either a Black or a White female with neutral expressions from the Montreal Set of Facial Displays of Emotion (MSFDE; Photos 26 and 36), which matched the targets in terms of emotional expression, pose, background, and facial structure (Beaupré, Cheung, & Hess, 2000). To manipulate service occupation, we used hotel service clerk, with responsibilities described as “Greeting each guest; confirming room reservations; responding to customer questions and complaints” (customer service occupation), or hotel housekeeper, described as “Providing new linens and towels in vacated hotel rooms; removing soiled linens and towels; restocking room supplies” (control).
Results and Discussion
Conditional means for the three dependent variables can be found in Table 1 in the online supplemental material. We confirm occupational warmth differences by asking the extent (1 = not at all to 5 = extremely) that “interpersonal warmth and concern for others” is required in the job, with two other items for comparison: (a) competence (“cognitive intelligence and confidence”) and (b) effort (“physical strength and strong work ethic”). Occupation warmth requirements are predicted by occupation as expected, F(1, 124) = 38.57, p = .000, η2 = .24; Mservice = 4.37, SD = 0.88; Mcontrol = 3.23, SD = 1.11; t = 6.27, p = .000, 95% confidence interval (CI) of difference = [1.49, .77], Cohen’s d = 1.14, and did not significantly vary by race, F(1, 124) = 0.43, p = .52, η2 = .004. Within the service condition, paired t tests showed that expectations for warmth were higher than for cognitive intelligence, M = 3.62, SD = 0.80, t(59) = 5.01, p = .000, 95% CI of difference = [.45, 1.05], or for strength/effort, M = 2.83, SD = 0.87, t(59) = 8.99, p = .000, 95% CI of difference = [1.19, 1.88], consistent with our assumption.
For the second assumption, we asked for judgments of the employee’s interpersonal warmth: “Based on your initial impression, to what extent do you think this person has the following attributes . . . ?” Four items of interpersonal warmth (friendly, warm, well intentioned, good natured; Sy et al., 2010; α = .89) are presented in randomized order with distractor items. On the basis of a 2 (occupation) × 2 (race) analysis of variance (ANOVA), employee race has a significant effect on perceived warmth such that Whites are perceived as warmer, F(1, 124) = 12.17, p = .001, η2 = .09; MWhite = 3.27, SD = 0.86; MBlack = 2.65, SD = 1.09; t = 3.54, p = .001, 95% CI of difference = [.27, .97], Cohen’s d = 0.63, and did not interact by occupation, F(1, 124) = 1.49, p = .225, η2 = .012.
For our third assumption, person-job fit (Sy et al., 2010) is assessed with two items: “This job is a good fit for Trisha Jones” and “Trisha Jones is a good match for this job” (α = .95) on a Likert-type scale from 1 (strongly agree) to 7 (strongly disagree). Employee race has a significant effect on perceived fit: F(1, 124) = 14.04, p = .000, η2 = .11; MWhite = 4.93, SD = 1.00; MBlack = 4.01, SD = 1.69; p = .000, 95% CI of difference = [1.42, .44], Cohen’s d = 0.66. Race by occupation was not significant, F(1, 124) = 2.23, p = .138, η2 = .02, but planned comparisons reveal that the White service provider was perceived as a better fit (M = 5.03, SD = 1.00) than the Black service provider, M = 3.73, SD = 1.74, t(58) = 3.55, p = .000, 95% CI of difference = [.57, 2.03], Cohen’s d = 0.92, and this difference was not significant for the housekeeper occupation, MWhite = 4.83, SD = 1.01; MBlack = 4.27, SD = 1.62; t(62) = 1.66, p = .102, 95% CI of difference [–.11, 1.23], Cohen’s d = 0.42. Moreover, a test of the conditional indirect effect of race on fit judgments via warmth (Hayes’ PROCESS Model 14) supported that the racial difference in warmth more strongly affects judgments of job fit for the service occupation (coefficient = −.74, SE = .24, 95% CI = [–1.26, –.29]) than nonservice occupation (coefficient = −.43, SE = .16, 95% CI = [–.82, –.16]).
In conclusion, our pilot study supported our three assumptions: (a) a customer service occupation has significantly higher warmth requirements compared to another occupation in the same context and compared to cognitive and physical requirements; (b) across occupations, the Black employee was perceived to have lower interpersonal warmth compared to the White employee; and (c) a service provider who is Black was perceived to fit the job less well than a White counterpart. Furthermore, racial differences in warmth influenced evaluations of person-job fit to a greater extent in the service than nonservice occupation.
We acknowledge that the specific photographs might have idiosyncratic effects on judgments and are only of females; however, a follow-up study supported similar results with photographs of males. 1 Regarding external validity, the photographs had no contextual cues, such as uniforms or customers, and the occupational context was limited to hotels, making it unclear if the results are limited to these specific conditions.
We also questioned whether respondent characteristics might strengthen the racial disparity. Exploratory tests with respondent age, education, and salary revealed no significant interaction with employee race on warmth. For performance, gender interacted with employee race, F(1, 116) = 7.88, p = .000, η2 = .06, such that men showed more racial bias than women. Respondent race (White: n = 92; minority: n = 32) did not moderate the racial bias for warmth judgments, F(1, 124) = 0.33, p = .569, η2 = .003, and perceived job fit, F(1, 124) = 0.43, p = .515, η2 = .004. Aggregating races and low statistical power for subgroups may obscure effects, but the effects may occur as a result of widely held stereotypes regardless of group membership. We proceed to test hypotheses in Study 1 and again test whether evaluations vary by respondent race.
Study 1
In Study 1, we manipulate employee race and emotional labor using videos of a realistic retail sales interaction, testing for racial differences in performance via warmth (Hypotheses 1a and 1b) and whether emotional labor moderates the effect by enhancing warmth (Hypotheses 2a and 2b). We recruited Black and White respondents to explore whether results differ by same/cross-race raters.
Method
Participants and Procedure
We recruited 148 Black and White participants from MTurk for a study on service evaluations. Of these respondents, 77% correctly reported the race of the employee condition and selected the indicated response in two attention check items (Meade & Craig, 2012), resulting in 113 final participants. There were equal numbers of women and men (50% each), with an average age of 34.08 (SD = 11.24), who were randomly assigned to one of four conditions of a 2 (employee race: Black or White) × 2 (emotional labor: high or low) between-person design.
Stimuli Development: Manipulating Employee Race and Emotional Labor
We created video stimuli with actors in an actual retail store that sells home goods to create a realistic scene. In all four videos, a service interaction takes place at the cashier’s counter as a customer brings two decorative table lamps to the counter and requests assistance choosing between them. Our script for all four scenes had the employee be responsive to the request, communicate detailed product knowledge, ring up the order accurately, and then bag the choice and hand it to the customer efficiently. The customer could be seen only partially from the back and had few verbal comments. The interaction was about 60 seconds long.
A Black woman and a White woman with prior acting experience were trained to play the role of the retail clerk. The script was memorized and scenes were shot repeatedly until they appeared natural. Emotional labor performance was manipulated to be high or low intensity based on facial and vocal expressions (Ekman & Friesen, 1982; Rafaeli, 1989). High emotional labor was enacted by amplifying positive expressions (i.e., maintained eye contact with nodding and repeated wide smiles and rhythmic and upbeat vocal tone). Low emotional labor was enacted with minimal positive expressions (i.e., some eye contact with a closemouthed small smile and polite but more monotone voice). See the appendix for still shots of the four conditions.
Dependent Measures
Descriptive results and reliability coefficients are provided in Table 1. Different response anchors are used to discourage response biases across items (Podsakoff, MacKenzie, Lee, & Podsakoff, 2003).
Descriptive Statistics and Correlations for Study 1 and Study 2
Note: Study 1 means and standard deviations are shown on the left of the slash and correlations are above the diagonal; Study 2 means and standard deviations are to the right of the slash and correlations are below the diagonal. Gender is coded 1 (male) and 2 (female). Job tenure is coded 0 (less than 1 year) and 1 (more than 1 year). Rater familiarity is average hours observing employee a week. Rater race is White (1) and other (0). Employee race is White (0) and Black (1). ER = expressive regulation.
Experimental conditions in Study 1.
p < .05.
Overall job performance
Two items measure the extent that the service interaction matched expectations, consistent with our theorizing and an established approach to assessing global performance from customers (Oliver, 2006). Respondents were asked to evaluate both “the retail associate” and “the service experience” on a scale of met expectations (1 = much poorer than expected, 4 = met expectations, 7 = much better than expected; α = .95).
Interpersonal warmth
Our focal warmth measure is a four-item scale of perceived employee-customer interpersonal rapport (Gremler & Gwinner, 2000); participants responded using a 7-point scale from 1 (strongly disagree) to 7 (strongly agree). Example items are “the retail associate created a feeling of ‘warmth’ during the interaction” and “I would have enjoyed interacting with this retail associate” (α = .92). 2
Service competence
As a comparison, we used two items to assess task-oriented behaviors: “able to answer the customer’s question(s) correctly” and “tried to help the customer achieve her goals” (α = .83; M = 6.07, SD = 0.98) on the same 7-point agreement scale.
Covariate
Racial similarity between employee and participant (same = 0 or different = 1) was measured to account for in-group/out-group biases on ratings of the Black and White employees.
Manipulation Checks
To confirm emotional labor was manipulated effectively, we asked respondents to rate the extent they agreed (1 = strongly disagree to 7 = strongly agree) the employee showed positive displays using three items (Pugh, 2001; Rafaeli, 1989): smiled, eye contact, pleasant tone of voice (α = .80). We conducted a 2 (high and low expressive regulation) × 2 (Black or White employee) ANOVA. Only the main effect of emotional labor condition was significant, F(1, 112) = 43.68, p = .000, η2 = .29, and race did not have a significant interaction effect, F(1, 112) = 0.67, p = .413, η2 = .006. Positive displays were higher in the high emotional labor compared to the low emotional labor condition (MhighEL = 6.28, SD = 0.75; MlowEL = 5.22, SD = 0.96; p = .000, 95% CI of difference = [.74, 1.38], Cohen’s d = 1.23).
To ensure that emotional labor was acted similarly, we obtained ratings of inauthenticity (Grandey, 2003), as agreement that the actor “was ‘faking it’ with customer” and “put on a faked smile” (1 = strongly disagree to 7 = strongly agree; α = .88). Across conditions, the actors were fairly authentic (Overall M = 3.05, SD = 1.54), and variation by employee race or emotional labor condition was not supported, Overall F(3, 112) = 1.30, p = .277, η2 = .04. We also confirmed scenario realism (1 = unrealistic, 3 = unsure, 5 = realistic) was overall high (M = 4.21, SD = 0.99) and that the actors consistently enacted realistic service interactions across conditions, Overall F(3, 112) = 0.81, p = .491, η2 = .022. Thus, we proceeded to test our hypotheses.
Analyses and Results
We began by conducting a 2 (employee race) × 2 (emotional labor) multivariate analysis of variance (MANOVA) on the three dependent variables (interpersonal warmth, service competence, overall performance). The MANOVA supported significant effects for employee race, F(3, 108) = 4.72, p = .004, η2 = .12, and emotional labor, F(3, 108) = 8.20, p = .000, η2 = .19, and for the interaction, F(3, 108) = 3.23, p = .025, η2 = .085. 3 Thus, we proceeded with univariate ANOVAs to test direct and interaction effects and then path modeling for testing the indirect and conditional effects.
Analyses of Variance
Employee race had direct effects on the warmth judgments consistent with Hypothesis 1a, F(1, 109) = 4.98, p = .028, η2 = .04, with the White employee rated higher than the Black employee (MWhite = 5.81, SD = 1.06; MBlack = 5.40, SD = 1.42, 95% CI of difference = [.05, .89], Cohen’s d = 0.33). Furthermore, consistent with Hypothesis 2a, there was a significant interaction of race and emotional labor on interpersonal warmth, F(1, 113) = 7.86, p = .006, η2 = .067. Emotional labor improves evaluations of the Black employee’s perceived warmth, F(1, 52) = 24.04, p = .000; MlowEL = 4.52, SD = 1.47; MhighEL = 6.12, SD = 0.87; 95% CI of difference = [–2.26, –.95], Cohen’s d = 1.32, with no support for emotional labor improving the White employee’s warmth, F(1, 59) = 2.34, p = .132; MlowEL = 5.58, SD = 1.21; MhighEL = 6.00, SD = 0.90; 95% CI of difference = [–.96, .13]; Cohen’s d = 0.39. See Figure 2a for means by condition.

Conditional Means for Interpersonal Warmth Judgments as a Function of Employee Race and Emotional Labor
In addition, employee race has a significant direct effect on performance, F(1, 113) = 6.34, p = .013, η2 = .056; MWhite = 5.33, SD = 1.34; MBlack = 4.75, SD = 1.42; 95% CI of difference = [.13, 1.12], Cohen’s d = 0.42, and interacts with emotional labor, F(1, 110) = 5.48 p = .021, η2 = .048, such that performance evaluations improve with greater emotional labor for Black employees, F(1, 52) = 14.93, p = .000; MlowEL = 4.02, SD = 1.23; MhighEL = 5.36, SD = 1.28; 95% CI of difference = [–2.04, –.65], Cohen’s d = 1.07, but evaluations are consistently high across levels of emotional labor for White employees, F(1, 57) = 0.24, p = .623; MlowEL = 5.23, SD = 1.39; MhighEL = 5.41, SD = 1.31; 95% CI of difference = [–.89, .54], Cohen’s d = 0.13.
Path Modeling
In line with recommendations (Edwards & Lambert, 2007; Preacher, Rucker, & Hayes, 2007), we used Mplus path modeling to test indirect effects of race on performance via warmth (Hypothesis 1b) and conditional indirect effects of emotional labor (Hypothesis 2b). Table 2 shows the final model for all variables. Table 3 shows the direct, indirect, total, and conditional effects.
Results From Moderation Analyses of Employee Race by Emotional Labor on Interpersonal Warmth Judgments
Note: Expressive regulation (ER) and mood regulation are measured in Study 2. Race is coded 0 (White) and 1 (Black). Gender is coded 1 (male) and 2 (female). Job tenure is coded 0 (less than 1 year) and 1 (more than 1 year). CI = confidence interval.
Experimental conditions in Study 1.
Results of Moderated Mediation Regression Analysis for Studies 1 and 2
Note: Emotional labor (EL) is manipulated low and high intensity of positive display in Study 1; EL is 1 SD above and below the mean frequency of expressive regulation in Study 2.
Employee race had a significant direct effect on interpersonal warmth (coefficient = −.44, SE = .22, t = −2.05, p = .040, 95% CI = [–.88, –.01]) consistent with Hypothesis 1a. The indirect effect of race on performance via warmth does not include 0 (coefficient = −.35, SE = .17, 95% CI = [–.71, –.031]), supporting Hypothesis 1b (see Table 3). As shown in Table 2, the coefficient for the interaction of race and emotional labor is significant (coefficient = 1.20, SE = 0.43, p = .006) and explained unique and significant variance in interpersonal warmth, ΔR2 = .056, ΔF(1, 113) = 7.90, p = .006. Consistent with Hypothesis 2a and the ANOVA results, the effect of employee race on interpersonal warmth is significant with low emotional labor (coefficient = −1.06, p = .001) but does not predict warmth with high emotional labor (coefficient = .13, p = .636; see Table 3). In support of Hypothesis 2b, the indirect effect of employee race on performance via interpersonal warmth is significant when emotional labor is low (coefficient = −.83, 95% CI = [–1.33, –.33]) but not when it is high (coefficient = .10, 95% CI = [–.32, .53]). These coefficients are significantly different (.93, 95% CI = [.27, 1.60]), supporting Hypothesis 2b.
Additional Analyses
Competence versus warmth as mechanism
Given that there are race-based stereotypes about competence as well (McKay & McDaniel, 2006), it is possible that low emotional labor indicates being less competent at this job, and the racial effect is via competence perceptions. Results based on the 2 × 2 ANOVA showed that service competence variation by employee race was not supported, F(1, 113) = 0.04, p = .847, η2 = .00; MBlack = 6.10, SD = 0.87; MWhite = 6.04, SD = 1.08; 95% CI of difference = [–.33, .40], Cohen’s d = 0.06, nor was there an interaction with emotional labor, F(1, 113) = 2.42, p = .123, η2 = .02. Moreover, the indirect effect of race on performance via competence was not significant (indirect estimate = .035, SE = .16, 95% CI = [–.24, .38]) whether emotional labor was high (indirect coefficient = .35, SE = .21, 95% CI = [–.02, .79]) or low (indirect coefficient = −.34, SE = .21, 95% CI = [–.80, .16]). Finally, after adding competence judgments to the model (coefficient = .39, SE = .13, p = .004, 95% CI = [.13, .67]), we found that the effect of warmth on performance remains significant (coefficient = .58, SE = .10, p = .000, 95% CI = [.37, .78]).
Respondent race as moderator
We recruited White and Black respondents to determine whether results were consistent with same- or cross-race raters. We conducted univariate ANOVAs adding respondent race as an independent variable so the design was 2 (employee: White or Black) × 2 (emotional labor: low or high) × 2 (respondent: White or Black). There were no signs of cross-race bias: respondent race did not interact with employee race to predict warmth evaluations, F(1, 112) = 0.11, p = .739, η2 = .001, or performance, F(1, 110) = 0.001, p = .978, η2 = .000. The three-way interaction was also not significant. Employee race and the interaction with emotional labor were still significant as reported above. Thus, the results do not vary by respondent race.
Discussion
We employed an experimental design to provide the necessary control to infer that variations in perceptions of warmth and performance are attributable to race and emotional labor. When holding constant task performance and ensuring that expressions were courteous but not highly positive (low emotional labor condition), a Black service provider was perceived to be less interpersonally warm than a White service provider, consistent with occupation-race stereotype congruence. Racial differences in warmth explained disparity in service performance evaluations. Furthermore, consistent with the double standards idea of expectation states theory, more emotional effort was needed for the Black provider to be viewed favorably, whereas emotional labor is less consequential for judgments of the White provider. In fact, as shown in Figure 2a, the Black employee performing high emotional labor (M = 5.36, SD = 1.28) is evaluated similarly to the White employee performing low emotional labor (M = 5.23, SD = 1.39; t = 0.36, p = .720, 95% CI = [.83, –.57], Cohen’s d = 0.10), supportive of “double standards” by race. We do not find support for a “shifting standards” perspective such that minimal emotional effort by Black employees exceeds low expectations (Biernat & Kobrynowicz, 1997), given that employees engaging in minimal warmth (i.e., low emotional labor) were evaluated less favorably than their White counterparts performing at similar levels.
An alternative explanation for our results is that emotional labor reduces racial disparity by improving negative reactions to stigmatized groups, which enhances judgments (Singletary & Hebl, 2009). If that is the case, then the high emotional labor condition should also improve competence judgments of Black service providers; this was not the case. Emotional labor did not significantly interact with employee race to predict competence, nor did competence carry the effect of race to performance judgments. Thus, emotional labor specifically helps with warmth stereotypes. Furthermore, the rating biases seem to be present in both cross- and same-group raters.
Limitations
There are several methodological limitations of this study. First, given that we used only a single Black and a single White actor, one might be concerned that there are idiosyncratic differences that explain our effects. We took steps to rule this out. We used the same script and setting and ensured that the scenes were similar. We also tested and found that the two actors were similarly effective in appearing authentic and competent in their portrayals. Another concern of an experimental study is the lack of external validity (Berkowitz & Donnerstein, 1982), including concerns of ecological validity of the task and generalizability of responses to the real world. We filmed in a real service context and confirmed there were high perceptions of scene realism, and we asked adult participants to do a task they should be familiar with as customers. However, we used only female stimuli and one service context—retail sales—that may activate specific warmth expectations. To increase external validity, in Study 2, we test hypotheses with both male and female employees of a grocery store.
A conceptual limitation of this study is that we are not directly assessing the effort of the employee, only the observed display. There are different ways that employees can effortfully engage in emotional labor, such as by regulating expressions and regulating moods (Grandey, 2003; Hochschild, 1983). Though both actors appeared similarly authentic, emotional labor strategies might be differentially effective for Black and White service employees. Furthermore, both our pilot study and Study 1 asked people to provide judgments of strangers (i.e., first impressions). Whereas this is certainly relevant when asking customers about their performance judgments, supervisor performance judgments may more directly affect career trajectories. To address these limitations, in Study 2, we survey Black and White employees about the extent of emotional labor: frequency of regulating expressions and moods while interacting with customers.
Study 2
We focus on two strategies to “upregulate” positive emotions: expressive regulation and mood regulation as used in service interactions. 4 Expressive regulation, where employees amplify or fake positive expressions, objectively improves observable expressions (Webb, Miles, & Sheeran, 2012) though has mixed effects on service performance judgments and can seem phony to others (Chi et al., 2011; Groth et al., 2009; Hülsheger & Schewe, 2011). Mood regulation (i.e., reappraisal or focusing on the positive) does not directly enhance observable expressions (Webb et al., 2012) though seems to enhance service performance judgments via positive mood (Chi et al., 2011; Hülsheger & Schewe, 2011) and perceived sincerity and prosocial motives (Groth et al., 2009).
Expressive regulation, by definition, directly amplifies positive displays, which we found to mitigate racial disparity in Study 1. As such, we expect that more frequent use of expressive regulation will improve ratings for Black providers with a weak effect for White providers, based on the double standard idea of expectation states theory. Mood regulation helps one feel better and appear genuinely interested in helping others and, thus, could improve perceptions of sincerity and prosocial intentions, indicators of warmth (Fiske et al., 2007). If so, mood regulation might be more effective for Black providers than it is for White providers if it overrides low-warmth stereotypes about Black providers. At the same time, mood regulation does not directly increase expressions and, thus, may result in genuine (and, thus, more subtle) positive expressions. It is possible that the exaggerated positive expressions are necessary to override low-warmth stereotypes. We compare both strategies to see whether high emotional labor effort in general improves ratings of Black providers or whether the effect is specific to exaggerating positive expressions, as in Study 1.
We also extend Study 1 by considering a different source of performance evaluations, the supervisor. Racial stereotypes are most likely to influence judgments in first impressions, when there is little individualized information about the target (Fiske & Neuberg, 1990; Hilton & von Hippel, 1996). This means that stereotypes are likely to be more influential with strangers as targets, as seen in experimental stimuli (as in our pilot study and Study 1) or by customers who do not know the employee (Hekman et al., 2010). A more conservative test of our predictions is whether we see the same effects with supervisors who are familiar with the employees. Supervisors also (a) compare employees to a prototypical (i.e., ideal) service provider and (b) sometimes exhibit racial biases in judgments (DuBois, Sackett, Zedeck, & Fogli, 1993; Roberson et al., 2007), but it is unknown whether this is due to racial bias in warmth judgments and can be mitigated by emotional labor.
Thus, in Study 2, we obtain employees’ frequency and type of emotional labor used with customers and see whether that interacts with employee race to predict supervisor ratings of warmth and performance. This provides a constructive replication and extension of Study 1 by testing Hypotheses 1 and 2 with different methods, measures, and contexts (Lykken, 1968).
Method
Sample and Procedure
Data were collected by an external consulting firm from 357 grocery store employees and their 79 corresponding supervisors (average of 4.5 employees per supervisor) in 54 grocery stores. We focused our main analyses on 311 respondents who were Black (n = 90, 25.2%, coded 1) or White (n = 221, 61.9%, coded 0) and their 76 supervisors. The majority of the respondents were department clerks (i.e., deli, bakery; n = 176, 49%) or cashiers (n = 125, 35%), with the rest in roles such as customer service associate. The majority (74.5%) of these employees had been with the company more than a year. Of the 76 supervisors (56% male, 41% female), the majority (81%) of the supervisors were White (12% were Black).
Employees received an e-mail link to the online survey with the opportunity to participate in the project during work hours. Social desirability response biases (Podsakoff et al., 2003) were minimized by the following procedures: (a) numeric codes rather than names were used to match employees to their supervisor evaluation, (b) instructions stated that ratings were for research and not administrative purposes, and (c) all responses were sent directly to the consulting company. The consulting company collecting data was developing a performance rating tool, and part of the consulting involved an informational training to supervisors about avoiding rating biases such as leniency and halo effects; racial biases were not discussed. Overall, these methods mean there is likely to be less inflation and more variability in the performance ratings, such that we can effectively test whether that variability is due to employee race and emotional labor.
Measures
Table 1 shows bivariate correlations of the study variables below the diagonal.
Emotional labor
Participants were asked how frequently they engaged in certain behaviors and responded to the items using a 7-point frequency scale from 1 (never) to 7 (always). For expressive regulation, we used four items to assess strategies for increasing positive expressions (Côté & Morgan, 2002; Grandey, 2003): “Enhance my expressions of feeling happy when interacting with customers,” “Exaggerate how positively I feel while interacting with the customer,” “Fake a good mood with customers,” and “Just pretend to have the feelings I display with customers” (α = .71). For mood regulation, we used four items to indicate cognitive strategies for increasing positive mood (Grandey, Dickter, & Sin, 2004): “Look at the positive side of things before interacting with customers,” “Focus on happier things in life before interacting with the next customer,” “Try to see things from the customer’s perspective,” and “Reinterpret what customers said or did to avoid taking things personally” (α = .68).
Overall job performance
We aggregated three global judgments that the supervisors made about each employee: employee’s typical productivity (1 = less than 10%, 2 = 11%–25%, 3 = 25%–50%, 4 = 51%–75%, 5 = 76%–90%, 6 = 91%–100%), overall performance (1 = very poor, 2 = struggling, 3 = okay, 4 = strong, 5 = exceptional), and desire to rehire the employee if they had the chance (1 = no, 2 = maybe, 3 = yes). Each item was standardized before forming a composite so that the items are weighted similarly (M = 0.0, SD = 0.89). Though these are judgments about different aspects of the employee, we aggregate them for three reasons. First, we did not want to use single-item measures for which we cannot determine internal consistency reliability estimates and that have more questionable validity (i.e., there may be external constraints on productivity and performance, or economic factors that affect rehiring). Second, the alpha coefficient (α = .85) of these three items supports the judgments’ internal consistency. Finally, a global performance measure like this is used in real business contexts for differentiating employees and decision-making (i.e., promotions).
Interpersonal warmth
Supervisors rated five items that indicate rapport-building and prosocial behaviors toward customers: showing empathy with, respect for, and interest in the customer, being responsive, and putting customers’ needs first; α = .94). 5 These interpersonal behaviors with customers were evaluated relative to occupational requirements on a 5-point scale (1 = Fails to meet minimum job requirements, 2 = Meets minimum requirements, sometimes unsatisfactory, 3 = Performs adequate or satisfactory, rarely exceeds expectations, 4 = Performs adequate or satisfactory, sometimes goes extra mile, 5 = Consistently performs above and beyond desired standards). This rating scale permits us to test how employees differ in congruence with perceived occupational requirements around warmth.
Control variables
We controlled for factors that upwardly bias supervisor ratings of warmth and performance: employee gender (1 = male, 2 = female), supervisor familiarity (weekly average hours observing the employee), and supervisor-employee racial similarity (0 = different, 1 = same). In addition, we control for job tenure (0 = less than a year, 1 = more than a year) since new employees are likely to perform less well than more experienced employees. Finally, we control for intrinsic job motivation with a four-item measure that represents positive affect (i.e., enjoyment) from working and predicts service performance (Grant, 2008).
Analysis and Results
We first assessed the fit of our proposed measurement model to the data (Hinkin, 1998). We used Mplus 8 (Muthén & Muthén, 1998-2012) to conduct a multilevel confirmatory factor analysis that accounts for the fact that employees are nested in supervisors and for the effects of control variables on the supervisor measures. A model with four latent variables (employee’s ratings of expressive regulation items and mood regulation items, supervisor’s ratings of interpersonal warmth items and overall performance items) had a marginally acceptable fit to the data (comparative fit index = .91, root mean square error of approximation = .06); all factor loadings were over .35 and averaged .75, and the fit was significantly better than three-, two-, and single-factor alternative models (see Table 2 in the online supplemental material). Thus, we proceeded with hypothesis testing using the proposed composite measures.
We conducted a multilevel path analysis with fixed slopes using Mplus to estimate employee (Level 1) effects while taking the supervisor (Level 2) effects into account, given the nonzero intraclass correlation coefficient (ICC) values for Level 2 effects (warmth: ICC1 = .335, ICC2 = .672; performance ICC1 = .025, ICC2 = .216). Level 2 variance of the slope for race and interpersonal warmth and the slope for interpersonal warmth and performance were not significant.
Hypothesis Testing
Hypothesis 1a stated that Black service providers are judged to be less interpersonally warm than White providers. Employee race had a significant effect on warmth evaluations in the expected direction, F(1, 310) = 5.28, p = .022; MBlack = 3.52, SD = 0.94; MWhite = 3.79, SD = 0.89; 95% CI = [.040, .50], Cohen’s d = 0.29. To account for data being nested in supervisors and other factors likely to contribute to evaluations, we estimated a multilevel model of direct effects and regressed interpersonal warmth on the control variables, employee race (White = 0, Black = 1) and both emotional labor strategies. The fixed slope for employee race on interpersonal warmth with customers was significant and negative (γ = −.31, p = .034; see Tables 2 and 3). Overall, we find support for Hypothesis 1a.
Hypothesis 1b proposed that interpersonal warmth judgments explain racial disparity in overall job performance. Employee race had a significant main effect on the standardized job performance rating, F(1, 310) = 5.20, p = .023; MBlack = −0.18, SD = 0.88; MWhite = 0.07, SD = 0.85; 95% CI = [.035, .47], Cohen’s d = 0.29. To test the significance of the hypothesized indirect effects via warmth, we used a Monte Carlo simulation procedure (with 20,000 replications) to reflect accurately the asymmetric nature of the sampling distribution (Preacher, Zyphur, & Zhang, 2010). Analyses support a significant indirect effect of employee race on performance through warmth, such that Blacks were rated lower than Whites as a result of being rated lower in warmth (indirect effect estimate = −.27, SE = .13, 95% CI = [−.51, −.02]; see Table 3), consistent with Hypothesis 1b.
Hypothesis 2a predicted that emotional labor interacts with employee race to predict interpersonal warmth judgments. The interaction coefficient of expressive regulation and race was significant on warmth judgments (γ = .22, p = .007; see Table 2). Simple slopes comparisons (Aiken & West, 1991) indicate that Black providers are rated less warm than Whites when employees infrequently used expression regulation (simple slope coefficient = −.51, SE = .16, p = .001), but the racial disparity was neutralized when employees frequently regulated expressions (simple slope coefficient = −.03, SE = .15, p = .87), and these coefficients were significantly different from each other (see left column Table 3). Moreover, within-group comparisons reveal that more frequently performing expressive regulation improved evaluations for Black employees (subgroup b = 0.16, p = .032; MlowEL = 3.24, SE = 0.17 vs. MhighEL = 3.57, SE = 0.15), but the effect was not found for White employees (subgroup b = −0.078, p = .155; MlowEL = 3.91, SE = 0.11 vs. MhighEL = 3.71, SE = 0.12; see Fig. 2b).
As per Hypothesis 2b, we compared the indirect effect of race on performance via warmth across high and low levels of expressive regulation (Edwards & Lambert, 2007). The results are shown in Table 3; racial disparity in performance occurs via differences in warmth judgments when expressive regulation is infrequent (indirect effect estimate = −.51, 95% CI = [−.81, −.20]), but this indirect effect is nonsignificant when expressive regulation is high (indirect effect estimate = −.03, 95% CI = [−.32, .27]). The difference of these effects was statistically significant (95% CI = [.13, .84]) and is consistent with Hypothesis 2b, such that expressive regulation mitigates the Black-White performance rating difference by improving warmth judgments specifically for Black employees.
In contrast, mood regulation did not interact with employee race to predict warmth (γ = −.16, p = .062); as such, we did not further test conditional indirect effects. Overall, Hypothesis 2 was partially supported, with expressive regulation strategies but not mood regulation strategies moderating the effect of race on warmth judgments.
Additional Analyses
Employee gender
In Study 1, we held constant the employee gender, but in Study 2, we have both male and female employees and so can determine whether gender changes the effects we find. We tested the two-way and three-way interactions with employee gender, employee race, and both emotional labor strategies. None of these interactive effects with gender were significant, and the effects described above remained the same. Thus, the effect of employee race and expressive regulation holds across male and female employees.
Rater racial similarity
It is possible that when supervisors are different races than their employees (14 of White employees, 57 of Black employees) they may be more likely to use stereotypes than same-race pairs (193 of White employees, 31 of Black employees), although evidence is mixed (Hekman et al., 2010; Oppler et al., 1992; Sackett & DuBois, 1991). We reran the multilevel path model with the main effects and the two- and three-way interactions of employee race, expressive regulation, and supervisor racial similarity on interpersonal warmth. Racial similarity did not significantly modify the effect of employee race, or the race by emotional labor interaction, on warmth or performance judgments; thus, our hypotheses are invariant for same- and cross-race supervisors. This needs to be interpreted with caution, given the small sample sizes by condition.
Discussion
The Study 2 results are consistent with stereotype fit: Black service providers are rated lower than White service providers as a result of racial stereotypes for warmth that are incongruent with occupational stereotypes. Supervisors rated Black employees lower in interpersonal warmth and performance than White employees, beyond the employees’ gender, job tenure, intrinsic motivation, and supervisor rater similarity and familiarity. Consistent with expectation states theory, this racial disparity was present only with infrequent expressive regulation, but with more frequent expression regulation, Black employees’ ratings were on par with their White counterparts. In meeting a higher standard for warmth by frequently faking and exaggerating positive expressions, Black providers reduce the racial gap in performance ratings.
In contrast, using mood regulation to improve customer interactions did little to attenuate differences in the evaluations of Black and White employees; in fact, it had little effect on warmth or performance judgments at all. We acknowledge that the internal reliability of this measure was lower than optimal, which may constrain its relationships; however, mood upregulation was positively related to intrinsic job motivation as would be expected. A theoretical explanation for this difference is that the goal of mood regulation strategies is to improve how one feels but is not as clearly linked to displays (Webb et al., 2012), and these subtler (though sincere) expressions fall short of reaching the higher standard for Blacks. We argue that the differential effect of expressive regulation and mood regulation is informative for why emotional labor mitigates racial disparity in warmth and performance. If mood regulation functioned similarly to expressive regulation, then any motivated effort by service providers to improve emotions is sufficient to increase racial equity of performance judgments. Instead, this pattern of findings suggests that to override warmth stereotypes, Black employees must frequently try to produce observable positive expressions regardless of their feelings.
Limitations
In contrast to the experimental approach of Study 1, in Study 2 we obtained multisource field data, which have strengths around external validity but weaknesses related to internal validity. Specifically, the data are cross-sectional such that our mediational model results must be interpreted with caution; it is possible that supervisors’ judgments of performance motivate employees to amplify expressions with customers. Furthermore, we cannot rule out the possibility that there are other factors explaining the racial differences or that there actually are objective performance differences between Black and White service providers. Finally, though using field data increases the likelihood that findings generalize to work contexts compared to lab studies, we note that there are certain characteristics of this sample that may hinder such generalizability. As part of this data collection by the consulting company, supervisors received training to reduce performance appraisal biases; thus, our results may be conservative since in other contexts such training is less likely and might permit stereotypes to play a greater role. This also is a context of grocery store employees, where efficiency or product knowledge are also highly important (R. I. Sutton & Rafaeli, 1988) and interpersonal warmth may be less focal than in other contexts, such as caring work or client-based services. However, this grocery store was striving to measure interpersonal connections with customers, and warmth behavior was strongly linked to performance as has been found in other service contexts (Smith et al., 2016).
General Discussion
In service jobs, low skill positions where many employees start their career trajectory, there are racial differences in performance evaluations (Hekman et al., 2010). We attempted to understand both why this occurs and what mitigates this racial disparity. We find support for stereotype fit theory (Eagly & Karau, 2002; Roberson & Block, 2001) in that (a) there is occupation-race stereotype incongruence in service jobs favoring White over Black employees and (b) Black service personnel are seen as less warm and, ultimately, as lower performing than White service personnel. Furthermore, on the basis of expectation states theory (Berger, 1974; Correll & Ridgeway, 2003), Black service providers must meet higher standards to be perceived as good performers compared to White service providers; Blacks must perform more emotional effort to be seen as equivalent to Whites in these high-warmth service roles.
Our results replicate Black-White racial differences in performance evaluations (Hekman, et al., 2010; Roberson & Block, 2001) and also shed new light on this difference in three main ways: first, the racial difference in service performance evaluations is due to perceived warmth and not competence; second, more emotional labor is needed by Black employees to be evaluated similarly to White service employees; and third, these effects occur whether the rating is first-time customers (as has been found previously) or ongoing supervisors. We discuss these in more detail below.
First, we find that Blacks are rated lower in overall performance in service roles as a result of beliefs about interpersonal warmth. Stereotype fit theories suggest that when evaluating a Black employee, racial stereotypes are activated that are incongruent with service occupation prototypes (i.e., friendly, agreeable, courteous, responsive to others), whereas Whites as a group are perceived to match the service prototype and, therefore, tend to be rated higher (Niemann et al., 1994). We rule out the alternative explanation that there are actual race-based differences in warmth and performance by (a) experimentally manipulating service provider performance to ensure comparability in the pilot study and Study 1 and (b) controlling for job motivation and job experience in Study 2. Across studies, employee race informed judgments about both interpersonal warmth, whether measured as warmth attributes or the extent of rapport with customers, and performance, whether from a customer’s perspective of exceeded expectations or a supervisor’s perspective of productivity and promotability. Overall, we find support for a racial bias in performance evaluations that is explained by differences in perceived interpersonal warmth.
Second, consistent with a double standards perspective (Berger et al., 1985), we find that to improve evaluations, Black service providers need to exaggerate positive expressions—an easily recognizable signal of warmth—whereas White providers do not. A novel contribution of our work is that we demonstrate this effect both with intensity of emotional displays, the outcome of emotional labor (Study 1), and frequency of expressive regulation strategies, the effort behind emotional labor (Study 2). These results are inconsistent with the shifting standards view (Biernat & Kobrynowicz, 1997), which suggests that Black employees engaging in minimally good performance could more easily exceed expectations and be viewed positively compared to White employees. We also find that more subtle forms of emotional labor are not effective for Black employees. Neither being polite and pleasant (Study 1) nor effortfully upregulating positive mood (Study 2) were enough to overcome the racial disparity in warmth and performance disfavoring Blacks. Amplifying emotional expressions is more directly observable and so provides more visible evidence countering perceived discrepancies between views of racial group and the service provider prototype (i.e., stereotype incongruence), consistent with our theoretical predictions.
Finally, we provide unique evidence that employee race and emotional labor contribute to both first-time and ongoing evaluations. Stereotypes are more likely to influence judgments when one lacks additional information; it provides a heuristic to evaluate someone in that ambiguous situation (Fiske & Neuberg, 1990; Hilton & von Hippel, 1996). In fact, prior work has argued that minority group stereotypes influence customers’ evaluations specifically as a result of their being one-time interactions and low motivation to evaluate the target accurately (Hekman et al., 2010). Yet we also found that racial differences in warmth and performance exist with supervisor ratings unless Black service providers put in extra effort to frequently amplify positive expressions. This suggests occupation-race stereotype fit informs supervisor evaluations too.
Limitations and Future Research
Though we believe our use of complementary methods and triangulation to test our hypothesized model helps rule out a number of alternative explanations for our findings, there are limitations that should be acknowledged. For instance, in Study 2, we control for job enjoyment and demonstrate effects of race and expressive regulation beyond this affective state about work. However, we do not control for all other possible characteristics that predict emotional labor and performance, such as extraversion, neuroticism, and customer orientation. Importantly, Blacks and Whites do not consistently differ on these personality and behavioral tendencies (Foldes, Duehr, & Ones, 2008), making it unlikely that they would spuriously inflate the relationship between race and performance. However, future research is needed to demonstrate that race predicts beyond these other established predictors.
Stereotype fit has rarely been applied to understand racial differences in performance evaluations (Roberson et al., 2007); thus, our theoretical frame is novel to the literature. Yet we do not directly test occupation-racial stereotype incongruence as a mechanism; instead, we support the assumptions of occupation-racial stereotype fit in our pilot study and then infer that stereotype incongruence is occurring in our other two studies. A more direct test of occupation-racial stereotype fit would measure incongruence directly or compare other occupations/racial group stereotypes. For example, future research could compare evaluations of Black employees in contexts where racial stereotypes for low warmth or hostile attributes are congruent with occupational requirements (e.g., police interrogator, bill collector) versus incongruent. Testing our model with other racial groups with distinct stereotypes would be an important next step. For example, Asian American employees are seen as low warmth (Kunda & Spencer, 2003; Sy et al., 2010) and could also benefit from amplified expressions in high-warmth occupations.
Across the pilot study and Studies 1 and 2, we find evidence for racial bias against Black employees. We argued this is due to generally held racial stereotypes, but it is possible that certain people are more likely to endorse or agree with those stereotypes and enact those biases in evaluations and decisions. Hekman et al. (2010) found that scores on an implicit test of racial bias determined whether having a minority customer service employee was linked to service dissatisfaction. Though our pilot study did not find evidence that rater demographics determined biased judgments, we did not test the rater’s racial bias. Future research could test whether warmth and performance evaluations are more strongly linked to employee race for those who score higher on racial bias tests, though it is unclear whether emotional labor would be more necessary or less able to override implicit beliefs.
Finally, our theoretical rationale and model is a moderated mediation effect, such that warmth judgments explain the effect of race on performance depending on emotional labor. Though our experimental design in Study 1 permits us to say that employee race and emotional labor cause different warmth and performance judgments, in both Study 1 and Study 2, the warmth mediator is obtained at the same time and by the same source as the overall performance measure. Thus, it is not possible to rule out that overall performance judgments influence warmth ratings (i.e., halo effects). However, warmth judgments tend to be formed automatically (Fiske et al., 2007), and our model and results are consistent with other studies showing that warmth judgments predict overall performance (Smith et al., 2016). We conducted follow-up analyses, and employee race and emotional labor did not interact to predict warmth perceptions via overall performance.
Implications
The purpose of our research is to shed light on racial differences in service provider performance evaluations. We learned that emotional labor seems to be effective at diminishing these differences, which provides a practical implication for Black service providers wanting to obtain better performance evaluations. Our finding coincides with other research findings about ways to overcome racial stereotypes at work. In one study, newly hired demographic minorities need to be more extraverted—sociable and outgoing—than others to produce favorable impressions among their coworkers (Flynn, Chatman, & Spataro, 2001). In a laboratory study, Black male participants were likely to endorse positive displays as impression management strategies when they needed to counter racial stereotypes for low warmth (Neel et al., 2013). Our study extends these findings by showing the effectiveness of these strategies for overcoming stereotypes and the implications for service performance judgments.
Notably, expressive regulation may be unwise for Blacks to use consistently over time as a result of fairly consistent linkages with burnout that occurs because of the self-regulatory effort and felt dissonance (Hülsheger & Schewe, 2011). This expressive regulation may make Black employees feel like they must keep up a constant façade, effort that is linked to negative personal and work outcomes (Hewlin, 2009). It is also notable that mood regulation is generally more effective at improving well-being (Hülsheger & Schewe, 2011), but this emotional labor strategy does not seem to help performance for Black employees. Thus, this finding seems to create a paradox where the emotional labor strategy that is best for Black employees’ performance may be harmful to their sense of self and well-being. A critical next step is determining how the emotional labor needed for Black personnel can be enacted in ways to be evaluated on par with their White counterparts but in ways that do not harm health and psychological well-being.
Though emotional labor offers a strategy for low-warmth stereotyped groups to proactively improve their situation (Houston & Grandey, 2013), management and organizations can also take steps to improve the conditions in which these group members work. Customer racial biases may seem beyond an organization’s control, but managing the workplace climate could help to attenuate the impact of such biases on the bottom line. For instance, customers evaluate stores with minority employees less positively unless the diversity climate is more favorable (McKay, Avery, Liao, & Morris, 2011). Diversity climates communicate inclusion and may help to inhibit the use of stereotypes to evaluate racial minority employees.
In general, workplace emotions and diversity are burgeoning topics of interest in the organizational sciences (Ashkanasy et al., 2002). We provide a unique theoretical model to understand racial differences in performance by integrating stereotype theories with emotional labor strategies. Our findings demonstrate that Black employees have to “fake it to make it” more than their White counterparts. Though this suggests a proactive way that Black employees can overcome racial disparity in evaluations, we encourage more attention to what factors reduce the need for such extra effort and the costs such effort may have to Black employees’ well-being.
Supplemental Material
JOM757019_Supplemental_material_CLN – Supplemental material for Fake It to Make It? Emotional Labor Reduces the Racial Disparity in Service Performance Judgments
Supplemental material, JOM757019_Supplemental_material_CLN for Fake It to Make It? Emotional Labor Reduces the Racial Disparity in Service Performance Judgments by Alicia A. Grandey, Lawrence Houston III and Derek R. Avery in Journal of Management
Footnotes
Appendix
Acknowledgements
We are grateful to Autumn Krauss for her assistance with obtaining data for Study 2 and to Larry Martinez and Kisha Jones for helpful suggestions on prior drafts of the paper.
Supplemental material for this article is available with the manuscript on the JOM website.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
