Abstract
Prior research has found inconsistent effects of diversity on group performance. The present research identifies hormonal factors as a critical moderator of the diversity-performance connection. Integrating the diversity, status, and hormone literatures, we predicted that groups collectively low in testosterone, which orients individuals less toward status competitions and more toward cooperation, would excel with greater group diversity. In contrast, groups collectively high in testosterone, which is associated with a heightened status drive, would be derailed by diversity. Analysis of 74 randomly assigned groups engaged in a group decision-making exercise provided support for these hypotheses. The findings suggest that diversity is beneficial for performance, but only if group-level testosterone is low; diversity has a negative effect on performance if group-level testosterone is high. Too much collective testosterone maximizes the pains and minimizes the gains from diversity.
For decades, researchers have investigated the effects of diversity on group dynamics, but the nature of diversity’s influence on group performance remains unclear (Jehn, Northcraft, & Neale, 1999; Mannix & Neale, 2005). On the one hand, diversity often enhances group performance because the diverging perspectives of group members can lead to better decisions and more creative ideas and solutions. On the other hand, it can also hinder performance by increasing conflict between group members (see Galinsky et al., 2015, for a review).
Diversity is particularly relevant in the context of group competition. Groups can win competitions through two routes: (a) by perfecting intragroup processes, such as coordination and integration, or (b) by maximizing intergroup competitive motivation (Galinsky & Schweitzer, 2015). The present research examined the interplay between diversity and hormonal factors in determining group performance.
There is evidence that diverse groups, relative to homogeneous groups, tend to focus their attention on intragroup dynamics, often leading to greater conflict, less cohesion, and less trust across group members, all of which can undermine group performance (Kirkman, Tesluk, & Rosen, 2004; Mannix & Neale, 2005; van Knippenberg & Schippers, 2007). These findings are consistent with social identification and self-categorization theories, which suggest that diversity within a group leads group members to categorize themselves along prominent social dimensions, such as race and gender, and exaggerates the differences between group members (Tajfel & Turner, 1986). These processes can increase stereotyping (Chatman, Polzer, Barsade, & Neale, 1998), heightening group members’ sensitivity to how their behavior is perceived by other group members who differ demographically (Blascovich, Mendes, & Seery, 2002).
However, this focus on intragroup differences can also be beneficial for diverse groups, serving as a catalyst for group members to consider and incorporate the potentially diverging perspectives of demographically different group members into the group process (Galinsky et al., 2015; Phillips, Mannix, Neale, & Gruenfeld, 2004). Thus, among diverse groups, a focus on intragroup dynamics can have both positive and negative effects on group performance.
In contrast to diverse groups, homogeneous groups tend to focus their attention away from intragroup dynamics and toward intergroup goals. Consistent with social identity theory, during intergroup competition, groups are generally motivated to achieve higher social standing relative to other groups, which drives group members to sacrifice individual gains in an effort to accomplish the group goal of outcompeting other groups (Hogg & Terry, 2000; Tajfel & Turner, 1986). This focus on outcompeting other groups can enhance group performance, especially when the competition is intense (Cox, Lobel, & McLeod, 1991; Murray, 1989). However, this intergroup focus could impair performance by increasing conformity pressures and stifling different perspectives from emerging within the group. Homogeneous groups are particularly susceptible to conformity pressures because homogeneity can motivate a need for cohesion. For example, homogeneity can increase group members’ propensity to conform to clearly inferior decisions (Gaither, Apfelbaum, Birnbaum, Babbitt, & Sommers, 2017). Furthermore, homogeneous groups can be less accurate in information processing and can lack objectivity in decision making in part because of an avoidance of disagreement, relative to diverse groups (Phillips & Apfelbaum, 2012; Sommers, 2006).
Taken together, diversity and homogeneity can each be helpful and harmful to group performance. Diverse groups have the potential to capitalize on novel perspectives but are prone to conflict; thus, they may lack the intragroup cohesion necessary to take advantage of the diverse perspectives offered. Homogeneity solves the conflict problem but makes groups susceptible to conformity pressures that can negatively influence group performance. We help reconcile these contradictory findings by examining a critical and overlooked factor in determining whether diversity and homogeneity hurt or help group performance: the hormonal makeup of group members.
Testosterone, a steroid hormone released as the end product of the hypothalamic-pituitary-gonadal axis, is associated with greater motivation to attain status and thus is particularly relevant in competitive contexts (Mazur & Booth, 1998). High-testosterone individuals tend to outperform other individuals in competition, exhibiting dominance-related behaviors (Coates & Herbert, 2008; Mazur & Booth, 1998). Yet in the context of groups, too much testosterone can hinder performance by creating intragroup status conflict (Mehta, Lawless DesJardins, van Vugt, & Josephs, 2017; Ronay, Greenaway, Anicich, & Galinsky, 2012). In contrast, low testosterone increases the motivation to cooperate and decreases status striving (Josephs, Sellers, Newman, & Mehta, 2006; Mehta, Wuehrmann, & Josephs, 2009; Wright et al., 2012). As a result, people with low testosterone perform especially well in settings that incentivize cooperation, but they perform poorly in settings in which the focus is on outcompeting other people.
Building on these separate lines of research on diversity, status, and hormones, we proposed that the effect of diversity on performance would depend on a group’s collective testosterone levels. According to our theoretical model of hormone-diversity fit (Fig. 1), groups collectively high in testosterone will perform optimally when group diversity is low because the lack of diversity will allow these groups to focus their competitive attention on intergroup status dynamics (i.e., the motivation to outcompete other groups), but their status drive will also prevent conformity pressures. In contrast, we propose that groups collectively high in testosterone will perform poorly when group diversity is high because diversity will lead these groups to focus their attention on intragroup status dynamics (i.e., the motivation to outcompete other individuals within the group), leading to heightened conflict among group members. We propose that groups collectively low in testosterone (see top row of Fig. 1) will perform optimally when diversity is high because their cooperative focus will create the cohesion often missing from diverse groups. To summarize, our theory of hormone-diversity fit proposes that diversity will boost performance among groups collectively low in testosterone but harm performance among groups collectively high in testosterone.

Theoretical model of hormone-diversity fit.
The present research provided an initial test of our theory of hormone-diversity fit. Our study was designed to test the primary phenomenon that the model proposes, which is an interaction between collective hormone levels and diversity in determining group performance. However, we leave an investigation of the processes outlined in our model for follow-up research. We examined our hypothesis that group-level testosterone moderates the effect of diversity on group performance by randomly assigning individuals to groups and using a statistical methodology that takes into consideration diversity on multiple categories of difference across group members. Specifically, rather than purely measuring one dimension of group member diversity (e.g., ethnicity), we employed a faultline framework (Lau & Murnighan, 1998; Zanutto, Bezrukova, & Jehn, 2011) that examines the interaction of multiple attributes of group members and its effect on group performance while taking into consideration the collective hormonal profile of group members.
Method 1
Participants were 370 master of business administration students (age: M = 27.5 years, SD = 1.93; 64.1% male, 35.9% female) enrolled in both a leadership and an operations management course at Columbia Business School. The sample size was determined by the overall size of the class and the willingness of students to participate. The ethnic composition of our sample was diverse: 54.9% White, 16.5% Asian, 10.8% Hispanic, 9.5% South Asian, 4.6% Black, 1.4% South East Asian, and 2.4% other. Participants were randomly assigned to 74 groups that ranged in size from three to six people. All procedures were approved by the Columbia University Institutional Review Board. The data and analysis syntax for R Version 3.3.0 (R Core Team, 2008) are provided on the Open Science Framework (https://osf.io/8eqtc).
One week prior to engaging in the group decision-making exercise, participants provided a saliva sample, later assayed for testosterone 2 (Salimetrics, Carlsbad, CA). Average intra- and interassay coefficients of variation were 2.5% and 5.6%, respectively. Testosterone values were log-transformed prior to analysis and centered around the grand mean. Unbiased mean levels of testosterone were calculated for each group (Croon & van Veldhoven, 2007). We chose unbiased mean levels of testosterone to capture collective hormonal profiles because the average can be considered the central tendency of normally distributed variables. We also wanted to capture the testosterone levels of all group members, which we were best able to do by examining the group mean. However, we also conducted exploratory analyses using testosterone standard deviation, minimum, and maximum.
Diversity was computed using group faultline analysis (Lau & Murnighan, 1998), which examines how group members differ across multiple attributes (Lau & Murnighan, 1998; Zanutto et al., 2011). Faultline analysis often offers more explanatory power than examining single-issue demographic characteristics (Lau & Murnighan, 2005). To illustrate our faultline approach to computing diversity, Table 1 highlights the degree of diversity of five groups and categorizes these groups by high and low diversity. For instance, the group in our sample with the lowest diversity was a five-person group consisting of three White men from the United States and two White women, one of whom was from the United States and the other from Eastern Europe (see Table 1, Group 1). This group is the least diverse with regard to ethnicity, gender, and country of origin relative to other groups. The group with the greatest diversity was a six-person group consisting of four White men (two from the United States, one from Europe, and another from Eurasia), one Hispanic man from yet another country, and one White woman, whose country also differed from that of the five men (see Table 1, Group 4). This group can therefore be considered very diverse.
Examples of Groups With Low Diversity and High Diversity
Note: Diversity was calculated using the faultline (Fau) approach, which focuses on the number of demographic characteristics that are aligned in the group (denoted as “align”) and the possible ways in which the group can be divided on the basis of these demographic characteristics (denoted as “ways”), with the number of characteristics per group fixed at three (ethnicity, gender, and country of origin). We classified diversity on the basis of the maximum number of aligned characteristics: high = 0 or 1 aligned characteristics, low = 2 or more aligned characteristics.
The three demographic characteristics used in this study to calculate diversity using the faultline approach were ethnicity, gender, and country of origin. For ethnicity, 2.7% of groups were mono-ethnic, 23.0% had two ethnicities, 45.9% had three ethnicities, 27.0% had four ethnicities, and 1.4% had five different ethnicities. With regard to gender, 1.4% of the groups had no women, 18.9% of the groups had one woman, and the remaining 79.7% had two women. Finally, for country of origin, 6.8% of groups represented five countries, 18.9% of groups represented four countries, 54.1% of groups represented three countries, 18.9% of groups represented two countries, and 1.4% of groups were all from the same country. Diversity was calculated with the equation below (Zanutto et al., 2011) using the asw.cluster package for R (Meyer & Glenz, 2013). According to Zanutto and colleagues (2011), “the first step is to calculate
where xijk is the value of the jth characteristic of the ith member of subgroup k,
Groups engaged in an interdependent week-long computerized decision-making exercise (Littlefield Labs, Responsive Learning Technologies, Los Altos, CA) simulating the supply chain process of blood-testing laboratories. Groups were employees at the blood-testing laboratory responsible for managing several aspects of the lab with the goal of maximizing performance relative to other groups in the class. Each group had the responsibility of managing one laboratory outside of class time over 7 days. On average, groups spent 20 to 30 hr on the group decision-making task over the course of the 7 days. The task was interdependent because groups were encouraged to involve all group members in both developing and executing a strategy that would maximize the performance of the laboratory. To this end, groups made decisions together, either in person or via e-mail, and would decide which group member would physically execute the strategy (i.e., by logging into the simulation platform and implementing the chosen strategy) on a given day. In most cases, the responsibility for physically executing the strategy rotated across group members. Importantly, no unilateral strategic decisions were made without there being collective agreement across group members.
Group performance on Day 7 of the simulation (simulating 315 days of laboratory operations) was our key dependent variable. We selected performance on Day 7 as the key dependent variable because we wanted to understand the interplay of diversity and testosterone on the outcome that ultimately determined group status; groups were competing to win the exercise as determined by their Day 7 performance, which had implications for their grades and status in the class. However, for 52 of the 74 groups, we also captured performance on Day 5 of the exercise (simulating 170 days of laboratory operations), which allowed us to conduct exploratory analyses to examine the stability of our predicted effect (see Table S4 in the Supplemental Material available online). Group performance was a composite of the following measures: profitability, number of contracts, number of reorders on existing contracts, and group rank relative to other groups. These measures were standardized and then averaged to create the aggregated group performance metric (α = .86).
Results
We conducted a micro-macro multilevel analysis (Croon & van Veldhoven, 2007) that modeled group performance as a function of an unbiased group mean for testosterone, group diversity, and the interaction between group diversity and group testosterone. Groups differed in the time of day of saliva collection and in size; however, neither of these variables moderate our effects, so we included them as covariates. We also controlled for the percentage of women in each group given that testosterone levels differ reliably between men and women. All predictors were mean-centered prior to analysis.
We had nested data (i.e., individuals nested within groups), for which multilevel modeling is a proper analysis because it accounts for the dependence of individuals within the same group. However, multilevel modeling is traditionally used to model dependent variables at the individual level, whereas our dependent variable, group performance, was measured at the group level. We therefore employed the micro-macro multilevel modeling method (Croon & van Veldhoven, 2007), which we implemented using the MicroMacroMultilevel package (Lu, Page-Gould, & Xu, 2017) in R Version 3.3.3 (R Core Team, 2008). The micro-macro method treats group-level testosterone as a latent variable, of which the individual testosterone values are assumed to be manifestations. 3 After the unbiased means are estimated, they can be used in a linear regression with other group-level variables. If groups are different sizes, as our groups were, the micro-macro method also requires that the standard errors of the slopes are corrected in the final linear regression. In addition, we estimated effect size by converting the slope statistics into partial R2 values (Edwards, Muller, Wolfinger, Qaqish, & Schabenberger, 2008).
As predicted, the interaction between group testosterone and group diversity was significant, b = 19.75, SE = 3.22, t(67) = 6.14, p < .01, R2 = .36 (see Table 2). Consistent with our hypothesis, when group diversity was low (Fau score was 1 SD above the mean), group testosterone significantly positively predicted performance, b = 1.79, SE = 0.45, t(67) = 3.95, p < .01, R2 = .19 (Fig. 2). That is, groups that were collectively high in testosterone outperformed groups collectively low in testosterone when group members had greater alignment in ethnicity, gender, and country of origin. However, when group diversity was relatively high (Fau score was 1 SD below the mean), group testosterone significantly negatively predicted performance, b = −1.77, SE = 0.55, t(67) = −3.21, p < .01, R2 = .13 (Fig. 2).
Results of the Multilevel Model Predicting Group Performance
Note: N = 74 groups for final performance measured on Day 7. Diversity was calculated using faultline analysis (Zanutto, Bezrukova, & Jehn, 2011). Higher numbers denote lower diversity in the group because the group has many characteristics that are aligned.

Group performance in the decision-making exercise as a function of group testosterone and group diversity. Group performance was an aggregate of four items, which were standardized and averaged. The shaded areas indicate 95% confidence intervals.
In other words, groups that were collectively low in testosterone outperformed groups collectively high in testosterone when group members were less aligned with regard to ethnicity, gender, and country of origin. Importantly, we observed no significant effects when examining the interaction between testosterone and ethnicity alone, b = −1.34, SE = 0.69, t(67) = −1.94, p = .06, R2 = .05; gender alone, b = 9.53, SE = 6.19, t(68) = 1.54, p = .13, R2 = .03; 4 or country of origin alone, b = −0.91, SE = 0.86, t(67) = −1.06, p = .29, R2 = .02. 5 These findings are consistent with research demonstrating that faultline analysis can have more explanatory power than single-issue demographic characteristics (Lau & Murnighan, 2005).
Furthermore, to ensure that we properly controlled for gender, we also ran our analyses with log testosterone values standardized within gender as our testosterone measure. We observed the same pattern of results: The interaction between group testosterone and group diversity was significant, b = 8.82, SE = 2.18, t(67) = 4.05, p < .01, R2 = .20 (see Table S3 in the Supplemental Material). The same analysis without controlling for the percentage of women in each group yielded a similarly significant interaction between group testosterone and group diversity, b = 9.27, SE = 2.21, t(68) = 4.19, p < .01, R2 = .20. In addition, we calculated a diversity score, removing gender and including only ethnicity and nationality. Again, we observed a significant interaction between group testosterone and group diversity (excluding gender), b = 8.77, SE = 2.19, t(67) = 4.01, p < .01, R2 = .19. We reran this same analysis, controlling for the Diversity (excluding gender) × Percentage of Women in each group interaction, and the interaction between group testosterone and group diversity (excluding gender) remained significant, b = 7.96, SE = 2.18, t(66) = 3.64, p < .01, R2 = .17 (see Table S4). Taken together, these results demonstrate the robustness of our effect when taking gender into account in multiple ways.
We also repeated our primary analysis using testosterone standard deviation, minimum, and maximum in our model. The interaction between group testosterone standard deviation and group diversity was not significant, b = −2.06, SE = 6.20, t(67) = −0.33, p = .74, R2 < .01. However, we did observe a significant interaction using group minimum testosterone, b = 6.99, SE = 3.16, t(67) = 2.21, p = .03, R2 = .07, and group maximum testosterone, b = 10.79, SE = 3.44, t(67) = 3.13, p < .01, R2 = .13. Importantly, when we included unbiased average group levels of testosterone in our model, as well as minimum and maximum testosterone levels and their interactions with diversity, only the interaction between mean group levels of testosterone and diversity remained a reliable predictor of group performance (see Table S2 in the Supplemental Material). Furthermore, a Bayesian model comparison (Raftery, 1995; for details, see the Supplemental Material) suggested that there was strong evidence for using the unbiased mean of testosterone over the alternate quantifications tested.
Discussion
Our findings provide preliminary support for our theoretical model of hormone-diversity fit presented in Figure 1. We demonstrated that groups collectively high in testosterone perform optimally when group diversity is relatively low. Low diversity may allow high-testosterone groups to focus their status attainment motivations toward outcompeting other groups, facilitating overall group performance. In contrast, high diversity may lead groups collectively high in testosterone to focus their status attainment motives toward outcompeting other individuals within the group, creating intragroup conflict that undermines group performance.
Conversely, we also found that groups collectively low in testosterone performed better when diversity was high. Groups low in collective testosterone may experience greater intragroup cohesion as a result of the motive to cooperate (Josephs et al., 2006; Mehta et al., 2009; Wright et al., 2012). Thus, when diversity is high, the dissimilar identities among group members may allow the group to focus attention on cooperative intragroup processes, leading to greater intragroup cohesion and better group performance. This finding is aligned with studies demonstrating that the disruptive effects of diversity can be eliminated when members of diverse groups focus on collective goals, for instance, by having a culture that emphasizes collectivism, or when the task requires interdependence (Chatman, Sherman, & Doerr, 2015; Jehn et al., 1999). Importantly, our study design included random assignment of individuals to groups, making it clear that our results are not due to self-sorting into groups (e.g., based on diversity dimensions). Furthermore, the moderating effect of collective testosterone on the diversity-performance relationship could not be explained by gender differences in testosterone levels; our results remained robust using multiple ways to account for gender.
Notably, we found similar effects using testosterone minimum and maximum, but these effects were no longer significant when we included mean testosterone levels in the model. However, because mean testosterone was significantly correlated with minimum and maximum testosterone (see the Supplemental Material), these findings suggest that these three different quantifications of collective hormonal profiles likely reflect similar psychological processes at play in groups. Because we did not include any intra- or intergroup process variables in this study, future research can build on these findings and our theorizing by incorporating process measures to more directly test the predictions highlighted in our model of hormone-diversity fit. Specifically, process measures that capture group cohesion and cooperation would seem especially relevant because cohesion and cooperation can mitigate the negative effects of diversity on group performance and can enhance performance in homogeneous groups (Chatman et al., 2015; Jehn et al., 1999).
In addition, future research should focus on the process by which group-level testosterone and diversity affect performance over time by examining the effect of multiple days of performance on group decision-making tasks. Although our finding that time of performance (examining both Days 5 and 7) did not moderate our effects suggests that performance may have been stable toward the end of the task (see the Supplemental Material), it is possible that group performance may have shifted over the course of the week. Our theoretical model predicts that groups collectively low in testosterone but high in diversity perform well because their cooperative focus creates cohesion. Because it can take time for groups to become cohesive (Jehn et al., 1999; Watson, Kumar, & Michaelsen, 1993), it is possible that these groups may have performed poorly at the beginning of the week but gained momentum, outperforming other groups as the week progressed. Conversely, our theory would predict that high testosterone, high diversity groups may have performed well at the beginning of the week because of status attainment motivations but may have experienced decrements in performance over the course of the week because of intragroup competition stemming from diversity. Further exploration of these potential effects of performance time is an important avenue for future research.
Our research also demonstrates that the configuration of group members’ characteristics along multiple attributes can be an even stronger determinant of group performance than are individual characteristics alone. Diversity is not a unitary construct but, rather, an intersection of identities (Gopaldas, 2013). By incorporating this intersectionality perspective into research on diversity, we contribute to theory by considering the impact of different social category configurations on group performance.
In sum, by demonstrating that collective hormonal profiles implicated in status attainment and cooperation motivations moderate the effect of diversity on group performance, we open up new avenues for research on biological factors that help explain how configurations of diversity can differentially impact group performance. At the same time, we acknowledge that the current research provides only initial support for the proposed model of hormone-diversity fit. We encourage replications and new studies that explore group process-related mechanisms.
Supplemental Material
AkinolaOpenPracticesDisclosure – Supplemental material for Hormone-Diversity Fit: Collective Testosterone Moderates the Effect of Diversity on Group Performance
Supplemental material, AkinolaOpenPracticesDisclosure for Hormone-Diversity Fit: Collective Testosterone Moderates the Effect of Diversity on Group Performance by Modupe Akinola, Elizabeth Page-Gould, Pranjal H. Mehta, and Zaijia Liu in Psychological Science
Supplemental Material
AkinolaSupplementalMaterial – Supplemental material for Hormone-Diversity Fit: Collective Testosterone Moderates the Effect of Diversity on Group Performance
Supplemental material, AkinolaSupplementalMaterial for Hormone-Diversity Fit: Collective Testosterone Moderates the Effect of Diversity on Group Performance by Modupe Akinola, Elizabeth Page-Gould, Pranjal H. Mehta, and Zaijia Liu in Psychological Science
Footnotes
Acknowledgements
We thank A. D. Galinsky and W. B. Mendes for their valuable feedback.
Action Editor
Bill von Hippel served as action editor for this article.
Author Contributions
M. Akinola designed the study and collected the data. E. Page-Gould analyzed the data. M. Akinola, Z. Liu, P. H. Mehta, and E. Page-Gould wrote the manuscript. All the authors approved the final manuscript for publication.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
Funding
This research was funded in part by the Eugene M. Lang Fund (grant to M. Akinola), the Canada Research Chairs program, the Social Sciences and Humanities Research Council of Canada, and an Early Researcher Award from the Ontario Ministry of Research and Innovation (to E. Page-Gould). P. H. Mehta was supported by National Science Foundation Grant No. 1451848.
Open Practices
All data have been made publicly available via the Open Science Framework and can be accessed at https://osf.io/8eqtc. The complete Open Practices Disclosure for this article can be found at https://journals-sagepub-com.web.bisu.edu.cn/doi/suppl/10.1177/0956797617744282. This article has received the badge for Open Data. More information about the Open Practices badges can be found at
.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
