Abstract
In light of renewed debate regarding publication rigor and ethics, this commentary raises questions about the subjectivity of the peer review process. We argue that the same biases organizational scientists consider as topics of our research—such as confirmation bias, negative bias, anchoring and adjustment, overconfidence bias, and social dynamics—may infect the scholarship process. In addition to these general phenomena, we examine subtle biases that may be unique to or exacerbated within diversity management scholarship. We describe the theoretical basis of such biases and offer preliminary evidence of their nuanced manifestations before outlining suggestions for their reduction.
It would probably also be beneficial to find one or two male biologists to work with (or at least obtain internal peer review from, but better yet as active co-authors), in order to serve as a possible check against interpretations that may sometimes be drifting too far away from empirical evidence into ideologically biased assumptions.
The public and scientific communities are engaging in important conversations about fabrication (Singal, 2015) and reproducibility (Nosek et al., 2015) of findings from the scientific literature. These are undeniably important conversations that must continue. Overlooked in the current debate—but discussed in prior decades—is a critical gatekeeping process that is vulnerable to systematic problems: peer review. Instead, the process by which scientific experts evaluate scholarship before its publication is sometimes heralded as a hallmark of good science.
The overarching purpose of this commentary is to revisit the implicit assumption that peer reviews offer impartial or objective evaluations of science. In so doing, we hope to ensure that the peer review process is part of emerging discussions about ethics in publishing. In addition, we point to a specific subdiscipline within science exemplified in the opening quote—the study of diversity in organizations—that may be hampered by subtle, yet systematic, subjectivity in the peer review process. Finally, we offer initial ideas about how to limit bias and strengthen the peer review system.
Subtle Bias in the Review Process
Let’s start with the proverbial “elephant in the room” concerning this commentary: we, the authors of this commentary on peer review biases, ourselves are biased peer reviewers and editors. We are not immune to the cognitive and social processes that influence judgments. And let’s be clear: neither are you.
Cognitive biases, which may serve important functions from a self-regulatory, limited cognitive capacity standpoint, can nonetheless interfere with accuracy in attention, memory, and judgment. At least five fundamental biases have direct relevance to the peer review process. First, confirmation bias represents a tendency to look for, notice, and remember information that is consistent with one’s expectations (Plous, 1993). This means that whatever people (or reviewers, in this case) think about a particular phenomenon (e.g., “personality is related to job performance,” “emotional intelligence is a key competency,” or “performance management systems are inappropriately complex”), they will favor arguments, data, and manuscripts that confirm these beliefs, sometimes irrespective of the actual quality of the scientific design or empirical data.
Second, negative bias, which is the tendency to notice, recall, and process negative more than positive information, also can infect reviews. Reviewers approach manuscripts, unconsciously or consciously, from the lens of criticism. One of their primary tasks is to identify problems. As such, a manuscript that has several weaknesses as well as several strengths may be more likely to get rejected than an article that has few weaknesses but also few strengths. This bias, then, may be part of what drives a general tendency toward conservatism in science (see Pfeffer, 2007) and certainly makes it more difficult for strengths of manuscripts to outweigh their weaknesses.
Third, people have a tendency to fixate on one factor as an anchor for their judgments (e.g., Tversky & Kahneman, 1974). In the review process, reviewers unwittingly may use initial information gleaned very early on in conducting their review (e.g., mention of a particular theory, grammatical mistake at the beginning of a manuscript) as a lens through which they view the rest of the paper. In fact, Epley and Gilovich (2006) argue that adjustments made after such anchors are established often prove insufficient, particularly because people are not motivated, willing, or able to search for maximal accuracy. In conjunction with negativity bias, a reviewer is likely to hold onto a paper’s early-established weaknesses and base ratings and comments on these initial judgments. In conjunction with confirmation bias, a reviewer who doesn’t concur with some of the tenets of diversity research may regard reliance on these tenets as a weakness and base ratings on these initial judgments.
A fourth, particularly pernicious issue in the review process may be overconfidence (see Moore & Healy, 2008). People tend to overestimate their understanding and knowledge, which leads to misplaced certainty (Hastie & Dawes, 2010). In the review process, this effect might manifest in reviewers’ misguided beliefs about such things as their reading of the literature (similar to a false consensus effect) or the most appropriate statistical technique. Their critiques, then, may be less accurate than they are confident.
Fifth, social processes also can influence reviewers’ comments and evaluations. For example, reviewers may engage in social conformity and adjust their own reviews after reading those of other reviewers in later rounds of the review cycle. As another example, reviewers could “loaf,” assuming that other readers will identify problems in methods or missed areas of scholarship. Moreover, when a reviewer reads the views of another reviewer (as in the case of a revise-resubmit letter), they are likely to be influenced by others’ critiques. As human beings, reviewers are susceptible to each of these cognitive and social biases that emerge in seemingly objective critiques that are in fact steeped in subjective, subtle biases.
Subtle Bias in the Diversity Scholarship Review Process
Scholarly consensus suggests that it is not overt, conscious biases but rather covert, subtle biases that influence a range of individual and organizational outcomes (see Dipboye & Colella, 2005). We suggest that subtle biases may be particularly prevalent in reviews of diversity scholarship. We acknowledge, however, that this observation may in and of itself be a function of a self-serving bias. That is, as diversity scholars ourselves, we are primarily exposed to and aware of scrutiny specific to this topic. Moreover, making attributions of negative feedback to reviewers’ biases (rather than the quality of our scholarship) can serve a self-protective function.
It is certainly possible that scholars in other areas would make similar arguments for similar reasons, in which case this section of the paper could be read as an example of the norm rather than as an anomaly. Yet we point to theory and evidence to suggest that scholarship on diversity may be subject uniquely to particularly problematic forms of systematic subjectivity. We also describe examples from our own experience to illustrate these possibilities. Note that our intent in using these examples is not to vent or accuse but rather to depict how such biases—including identity threats and status legitimizing ideologies—might appear.
Subtle Biases Against Diversity Scholarship
One such bias is that diversity research by nature is often identity threatening in a way that other forms of research are not. This bias may emerge in reaction to (a) the content of diversity scholarship or (b) the social identity characteristics of the authors. With regard to the former driver of bias, scholarship that suggests organizations can, should, or even must consider diversifying their personnel is likely to evoke unique concerns among those whose identity groups have a history of overrepresentation (e.g., White men in the United States). From a social identity perspective, in- and out-group classifications could become particularly salient when a reviewer encounters this type of research, and if the reviewer (or editor) is a member of the demographic majority, it is conceivable that the paper may be viewed as advancing the cause of the out-group at the potential expense of the in-group. To the extent that this occurs, one would anticipate a level of hypercriticism to be directed toward the paper as a form of psychological defense to the perceived identity threat. This is a prototypical example of a derogation response when one’s identity is threatened (Petriglieri, 2011).
With regard to the latter driver of bias, reviewers and editors may respond to articles in which the author’s gender and/or race is identifiable (or presumed) with particularly heightened reactions that can be discriminatory. For instance, in one of our recent projects (Trump-Steele, Nittrouer, & Hebl, 2016), we examined the reactions of men and women to articles about gender equity written by either male or female authors. Men responded more negatively (by discrediting the author and article) when the article was authored by a woman as opposed to a man. Was the woman perceived as acting in her own best interests, as a member of an underrepresented group arguing on behalf of the group? Was she just seen as less competent because of societal norms, stereotypes, and prejudice? We can’t say, but these results suggest a bias toward demographic characteristics of the authors themselves. Furthermore, despite blind review procedures, reviewers seem to make assumptions about the identities of authors; evidence comes from the opening quote of this paper, in which a reviewer presumed that a paper about gender had been written by female scholars.
Status Legitimizing Ideologies
Research examining diversity-relevant phenomena also stands to threaten the belief systems of some reviewers. For instance, if a paper examines demographic differences to shed light on how a certain identity group may be disadvantaged relative to another in the workplace, this may threaten the belief systems of reviewers belonging to the advantaged group. Establishing that one group enjoys an advantage relative to another indicates a form of privilege that potentially calls into question the legitimacy of the advantaged group’s relatively higher status (Nkomo & Ariss, 2014). It is no coincidence that members of advantaged groups tend to perceive the world (and the inequalities in it) as fairer (i.e., have greater belief in a just world; Hunt, 2000). Doing so provides both a means to cognitively justify the advantaged position they enjoy and a sense of identity affirmation that accompanies seeing one’s own group as somehow superior to another. Research that challenges the legitimacy of an unbalanced status quo by demonstrating the existence or validity of alternative explanations threatens such a worldview and is likely to engender heightened resistance to and scrutiny of the threatening ideas.
As illustrative examples, each of us has authored work on discrimination of various forms (e.g., gender, pregnancy, racioethnic, sexual orientation) that we felt elicited responses from reviewers that minimized or denied the existence or importance of prejudice. In one case, a reviewer called the size of the population of pregnant women “too small” to be of scholarly interest in the face of statistics suggesting that as many as 90% of women are working when they become pregnant. The reviewer may have been focusing on the likelihood that a particular unit would include a pregnant worker at a particular point in time and thus unconsciously overlooked the overall base rate. This can be contrasted with papers in top journals that focus on objectively smaller populations such as workers in one single organization, a limited industry, or CEOs of Fortune 500 companies. When studying racioethnicity, we have been criticized for subgroup sample sizes being too small when our data were collected randomly (despite proportions being representative of the larger U.S. population) on one hand and chastised about threats to external validity when we oversampled minorities on the other. It is interesting to note resemblance between these experiences and that described by a diversity scholar who participated in Cox’s work a quarter century ago: “People raise questions about the reality of findings that refer to conditions that they want to deny. An example is racism” (1990: 18). Perhaps it is the perpetuation of these types of occurrences that led the journal that published that work to reprint it as one of their most influential articles 15 years after its first appearance.
This denial of discrimination may arise from a confluence of related ideologies that relate to beliefs in a just world and system justification. These belief systems create continued preferences for those in power and justify why such people do and should hold power. As such, one outcome might be that reviewers hold researchers doing diversity-related research to higher, or at least different, standards. In an illustrative instance, two of the present authors submitted research empirically documenting the discrimination that pregnant women receive when they apply for jobs. The study was conducted in a large metropolitan city located in the United States. One reviewer thought the methodology was strong but cited the problem that the data were collected only in the South and it was not clear whether such discrimination would generalize to other portions (e.g., the North, the West) of the United States. The manuscript was initially rejected until data from a second experiment confirmed the patterns of interest. If this level of scrutiny were used in all studies published in management and applied journals, imagine all of the articles based on data collected within a single organization, a single building, or a single college laboratory setting that would lay victim to the same charges.
Preliminary Evidence of Subtle Bias Against Diversity Scholarship
These arguments and anecdotes point to the need to further consider evidence regarding the influence of subtle biases on diversity scholarship. One fact leading us to believe diversity research may be hyperscrutinized relative to more mainstream topics is the extent to which it is underrepresented in the scholarly literature, particularly in the more prestigious outlets. For instance, a 45-year review of the Journal of Applied Psychology (JAP) and Personnel Psychology indicated that the percentage of published articles on “demographics” ranked among the lowest of all the topics coded (Cascio & Aguinis, 2008). The relative dearth of diversity publications in top journals led Cascio and Aguinis (2008) to conclude that diversity is an area deserving of greater research attention. Likewise, a more recent and more general review concluded that scholarship on ethnic minority issues (one facet of diversity) “continues to be underrepresented in the literature, particularly in top-tier journals” (Hartmann et al., 2013: 243). It is clear that the lack of such publications is not because of a lack of interest. For perspective, a topical examination of presentations at the 2011 Society for Industrial and Organizational Psychology annual conference (Highhouse & Schmitt, 2012) revealed that diversity was third (tied with personality) behind only testing/assessment and leadership. Likewise, a similar examination of organizational behavior–related submissions to the Academy of Management Journal (AMJ) from 2007 to 2009 (Morrison, 2010) revealed that diversity was third behind only leadership and teams, suggesting any underrepresentation in top journals is probably not simply a “pipeline” issue. 1 We probed this perspective through archival and experimental data.
Archival data
Because underrepresentation in the literature cannot be equated directly with bias in the review process, we gathered data to provide an initial assessment of our intuition that diversity-related manuscripts are reviewed in subtly different ways from nondiversity manuscripts. With invaluable assistance from the staff at the Journal of Management (JOM), we traced the outcomes associated with 7 years of manuscript submissions for which authors either included “diversity” as a topic area (n = 390) or did not include this topic area (n = 7,442). The rates of acceptance (15.38% vs. 14.27%) and rejection (61.28% vs. 62.51%) for these two categories of submissions were highly similar. As those virtually identical rates suggest, there was no difference in the number of rounds of review, and the means and standard deviations of the reviewer recommendations received for diversity and nondiversity submissions were also statistically indistinguishable (all t values were less than 1; see Table 1). At first glance, it appears that papers received virtually identical treatment throughout the process whether they focused on diversity or not.
Descriptive Statistics and Correlations From Archival Study
p < .01.
However, a more nuanced examination of the data revealed that publication decisions differed as a function of diversity focus depending on the round of review. Namely, a multinomial logistic regression (1 = rejection, 2 = revision opportunity offered, or 3 = accepted) indicated two significant interactions between diversity focus and round of review on publication outcomes (b = −1.63, odds ratio (OR) = 0.20, p = .02 for rejection vs. acceptance; b = −1.80, OR = 0.17, p < .01 for revision vs. acceptance). When comparing rejection to revision, there is not a significant Diversity × Round of Review interaction. Importantly, nearly 25% of all acceptance decisions (n = 256) came in Rounds 1 or 2, wherein diversity papers were less likely to garner acceptances than nondiversity papers (0.04% vs. 1.2% in Round 1 and 14.8% vs. 18.7% in Round 2). Indeed, examination of the simple slopes for the significant interactions revealed that in early rounds (b = 2.53, OR = 12.59, p = .02 rejection vs. acceptance; b = 2.74, OR = 15.43, p = .01 revision vs. acceptance), focusing on diversity significantly and adversely influenced odds of garnering acceptance but had no significant impact in later rounds of review (b = −0.33, OR = 0.72, p = .46 rejection vs. acceptance; b = −0.44, OR = 0.65, p = .21 revision vs. acceptance). This indicates that in early rounds, diversity papers (compared to nondiversity papers) were 12 times more likely to be rejected than accepted and 15 times more likely to be offered a revision than accepted.
It is also noteworthy that there is not a significant difference in the number of rounds for nondiversity and diversity papers (M = 1.46 vs. 1.47, respectively). However, integrating this finding with the interactions we detected helps illustrate why the reduced likelihood of being accepted in Rounds 1 and 2 is potentially problematic. Namely, papers, on average, don’t get beyond the first two rounds. Diversity papers fare well if they can get to Round 3, but the overwhelming majority won’t ever get there. In the end, the numbers are pretty equivalent because diversity authors that persevere through the first couple of rounds (where so many papers are rejected) eventually get in (at higher rates, albeit not significantly so) and restore balance. In summary, whereas the simple comparisons of diversity and nondiversity papers published revealed little difference in treatment on the surface, considering the effect of the paper’s stage in the process indicates a more covert form of bias against diversity scholarship.
Experimental data
In addition, because causal interpretations are impossible to draw from archival data, we conducted an experiment. We considered evaluations and recommendations of submissions that varied with regard to whether they focused on diversity and whether they were high or low in quality. We wanted to test the reactions of “real” reviewers but did not want to burden them by requesting reviews of full manuscripts and so asked editorial board members of JOM, AMJ, and JAP to evaluate genuine abstracts that were experimentally modified. This was a 2 (Diversity, Not Diversity) × 2 (High Quality, Low Quality) mixed subjects design; participants read four different (randomly assigned) abstracts (edited to reflect the manipulations) that each represented one of the four possible experimental levels. A total of 130 reviewers (response rate approximately 20%) responded, but only 75 provided complete data indicating their recommendation for each stimulus (1 = accept; 2 = accept pending minor revisions; 3 = reject, encourage revision; 4 = reject, high risk revision; and 5 = reject outright) and their evaluations of the manuscript using the JOM manuscript reviewing scale that includes eight items (literature review, theory, design, analysis, discussion, practical implications, conclusion, and clarity; α = .90). A principal component factor analysis indicated a single factor with an eigenvalue greater than 1, and it accounted for 59.84% of the variance.
Given our limited power, we split our sample into objectively (experimentally manipulated) lower and higher quality submissions and examined two stages of a potential indirect effect capturing potential bias against diversity papers (see Table 2 for descriptive statistics, correlations, and internal consistency estimates in both subsamples). Because participants evaluated multiple submissions, these assessments were nested (intraclass correlation coefficients = .30 and .43 for lower and higher quality, respectively), necessitating multilevel modeling. At Stage 1, we examined the main effect of having a diversity focus on reviewer perceptions of quality (i.e., the JOM reviewing scale). The effect of having a diversity focus on perceived quality was not significant for objectively lower (b = 0.06, p = .61) or higher (b = 0.22, p = .08) quality papers.
Descriptive Statistics and Correlations From Experimental Study
Note: Numbers on the diagonal represent internal consistency estimates (Cronbach’s alpha).
p < .05.
p < .01.
At Stage 2, we examined whether having a diversity focus moderated the relationship between perceived quality and reviewer recommendation. When objective quality was higher, the moderating effect of having a diversity focus on the quality–recommendation linkage was not significant (b = −0.20, p = .18). When manuscript objective quality was lower, however, the moderating effect of having a diversity focus on the quality–recommendation linkage was significant (b = −0.43, p = .05). Looking at the conditional effects, we see that perceived quality related significantly to recommendations for manuscripts examining diversity issues (b = −0.56, p < .01) but not for those that did not (b = −0.13, p = .33). This finding is consistent with previously established forms of “stricter standards” bias whereby perceived worthiness proves more impactful for stigmatized than nonstigmatized individuals (Lyness & Heilman, 2006). Thus, a nondiversity submission deemed relatively mediocre might nonetheless receive a favorable recommendation, whereas a marginal diversity paper would not.
Collectively, these results indicate that while participants paid little attention to whether a paper focused on diversity when assessing its subjective quality, they used these assessments differently to arrive at recommendation decisions for diversity and nondiversity papers. Though reviewers appeared motivated to suppress any biases they may have against diversity work when rating the manuscript, they may have generally lower expectations for diversity papers that create a higher threshold (see Foschi, 2000). Accordingly, greater evidence of worthiness is necessary for diversity than nondiversity papers. Alternatively, it may be that suppressing prospective bias against diversity scholarship at Stage 1 actually made individuals more susceptible to its influence at Stage 2 in a form of “rebound” (MacCrae, Bodenhausen, Milne, & Jetten, 1994). Although the trend was less pronounced for objectively higher quality papers, the direction was the same. Namely, recommendations are based more on perceived quality for diversity papers than for those focused on other topics. In fact, recommendations of objectively lower quality papers focused on topics other than diversity were not significantly based on reviewer perceptions of quality at all. This pattern, together with the archival data, could be interpreted to suggest that nondiversity papers get the “benefit of the doubt” to a greater extent than diversity papers.
These findings are far from conclusive. The archival data are unique to a single journal, the experimental data are limited by a small sample size and the lack of information provided for decisions (e.g., abstracts rather than full manuscripts), and conclusions from both should be interpreted cautiously. These are serious limitations, and the particular patterns of results reported here beg for constructive replication. Furthermore, the patterns suggestive of bias are nuanced, subtle, and small in magnitude. Nevertheless, the potential of their practical significance should not be underestimated. Molehills can become mountains, and previous research has demonstrated that seemingly small empirical effects compound over time to show significant disparities (Martell, Lane, & Emrich, 1996; Valian, 1998). The careers of an assistant professor approaching tenure review or a graduate student on the job market can be made or lost with a single decision letter. We welcome and encourage additional empirical examinations (beyond the scope of a commentary) on the impact of bias on science and the scholarly review process. Indeed, we believe these data and rationale taken together provide only preliminary evidence about subtle biases that work to the disadvantage of diversity scholars as they attempt to publish their work.
Initial Solutions and Conclusions
So what can we do about these pernicious, but largely unconscious, processes? Reactive processes—such as formal appeals in cases of perceived unfairness—are certainly options for authors in the case of overt forms of bias. One proactive approach to more subtle bias in peer reviews may be to simply call attention to it. It is possible that many editors and reviewers are unaware that their identities, beliefs, and cognitive structures may lead them to react in a biased manner to certain types of research. In fact, a recent survey of editorial board members of various journals in the organizational sciences (including JOM) indicated that many already include diversity and discrimination among the most relevant topical areas for research (Loignon, Myers, & Rogelberg, 2013). However, to the extent that this recognition of topical importance is coupled with unawareness about some of the psychological processes described in this editorial, the trend of underrepresentation in the literature may persist. At least one recent study suggests that mere awareness that bias might color one’s actions can sometimes dissuade people from acting in a biased manner (Pope, Price, & Wolfers, 2013). We hope that this editorial is able to play such a role.
A more actionable part of the solution, however, may involve structural changes in the review process. For example, efforts to train and reward fair reviewers may be helpful. That is, editorial board members could benefit from sessions in which it is ensured that they have acquired basic competencies prior to beginning their service. These competencies might include content knowledge (e.g., up-to-date statistical procedures, experimental methodologies that are common in studies of intergroup bias) but also could involve training in demonstrating equitable, fair, and constructive review-process procedures. Furthermore, at present there is very little incentive (other than professional recognition and personal satisfaction) that individuals are given to serve on editorial boards. It is possible that individuals may be more motivated to be unbiased if they were not only made aware of such biases but also rewarded for doing more thorough and fair assessments that rely on consistent implicit and explicit criteria for judgments.
As another example, ongoing efforts to ensure that boards adequately represent the diversity of scholars and scholarship could be strengthened. For example, a study of editorial board gender composition in 57 management journals between 1989 and 2004 found that on average, nearly 80% of journals had editorial boards with less than 20% women (including JOM; Metz & Harzing, 2009). An updated analysis through 2009 showed an increasing trend in the representation of women on editorial boards, but the proportion of female editors and associate editors was still about 22% overall (Metz & Harzing, 2012). We are all susceptible to the aforementioned general cognitive biases, but biases due to privileged identity-based ideologies and threats may be reduced for reviewers from underrepresented groups.
It is noteworthy that the first author of this commentary likely served as the action editor on many of the diversity manuscripts included in the archival analysis presented here. A person with relevant knowledge, awareness, and motivation nonetheless contributed to the subtle disadvantaging of diversity scholarship. In spite of and perhaps exemplified by this fact, we see this as a unique moment in time wherein independent movements scrutinizing scientific rigor and intergroup relations (e.g., #blacklivesmatter, anti-Muslim bigotry) loom large in public discourse. Thus, we have the opportunity and responsibility to take timely action to strengthen our scientific process, particularly as it pertains to diversity scholarship.
Footnotes
Acknowledgements
This article was accepted under the editorship of Patrick M. Wright.
