Abstract
This study empirically examined the statistical and methodological issues raised in the reviewing process to determine what the “gatekeepers” of the literature, the reviewers and editors, really say about methodology when making decisions to accept or reject manuscripts. Three hundred and four editors’ and reviewers’ letters for 69 manuscripts submitted to the Journal of Business and Psychology were qualitatively coded using an iterative approach. Systematic coding generated 267 codes from 1,751 statements that identified common methodological and statistical errors by authors and offered themes across these issues. We examined the relationship between the issues identified and manuscript outcomes. The most prevalent methodological and statistical topics were measurement, control variables, common method variance, factor analysis, and structural equation modeling. Common errors included the choice and comprehensiveness of analyses. This qualitative analysis of methods in reviews provides insight into how current methodological debates reveal themselves in the review process. This study offers guidance and advice for authors to improve the quality of their research and for editors and reviewers to improve the quality of their reviews.
Keywords
Organizational, management, and applied psychology research has always placed a great deal of emphasis on research methodology, with many literature reviews examining trends and frequencies of particular methods over the years (Aguinis, Pierce, Bosco, & Muslin, 2009; Austin, Scherbaum, & Mahlman, 2002; Scandura & Williams, 2000; Stone-Romero, Weaver, & Glenar, 1995). While it is essential to examine what is published in the literature, it is equally critical to understand how methods are evaluated in the manuscript review process, as editors and reviewers ultimately decide which methods are acceptable for publications. Previous opinion-based articles by editors have emphasized the importance of research methods in the manuscript review process (Bono & McNamara, 2011; Desrosiers, Sherony, Barros, Ballinger, Senol, & Campion, 2002); however, few studies have empirically examined the statistical and methodological issues raised by reviewers and the influence of these specific issues in editors’ manuscript decisions. As editors and reviewers are the gatekeepers of journals, it is important to answer the following question: Are there particular methodological and statistical issues that are more prominent, and thus raised more often, in the reviewing process? The purpose of the present study is to expand our understanding of how editors and reviewers approach research methods and statistics through a quantitative and detailed qualitative analysis of peer reviews. Specifically, we used an iterative approach to code editors’ and reviewers’ letters for the methodological and statistical issues that they raised concerning the manuscripts and the suggestions that they provided to authors. The specific codes and themes identified provide insight into the peer review process and give authors guidance to improve the quality of their research in their own submissions.
Past Research
Relatively few studies have examined empirically the peer review process in terms of the editors’ and reviewers’ decisions (Fogg & Fiske, 1993; Gilliland & Cortina, 1997). Fiske and Fogg (1990) used content analysis to analyze reviewers’ reports for 153 papers submitted for the first time to American Psychological Association journals in late 1985 and in 1986. They coded reviews of manuscripts as well as the editors’ letters to the authors for every critical point of weakness. These codes were categorized into areas such as Introduction, Design and Procedures, Methods and Results, and Discussion. In addition, the authors also coded the comments into one of the two categories regarding the research process: (a) planning and execution of the research and (b) presentation of the manuscript. A follow-up study by Fogg and Fiske (1993) tested the relationship between these two categories and editors’ decisions and reviewers’ recommendations. They found a significant correlation between severe planning and execution criticisms and reviewers’ final recommendations, demonstrating the importance of research methods and advanced planning in research design (Fogg & Fiske, 1993). However, the authors did not examine relationships between specific methodological issues, as listed in Fiske and Fogg (1990), and manuscript decisions (Fogg & Fiske, 1993).
Similar to Fogg and Fiske’s (1993) finding that severe planning and execution held importance in reviewers’ recommendations, Gilliland and Cortina (1997) found that reviewers’ research design evaluations were the most predictive of their recommendations, followed by evaluations of conceptual or theoretical arguments, operationalization of constructs, appropriateness of topic, and adequacy of data analysis. In addition to the analyses of reviewers’ evaluations (i.e., numerical ratings on general manuscript dimensions), the authors coded manuscript submissions for statistical techniques (e.g., factor analysis, analysis of variance) in order to determine what analytical factors were most predictive of manuscript decisions (Gilliland & Cortina, 1997). Manuscripts that predominantly used factor analysis or analysis of variance received less favorable recommendations than papers that relied on correlations, regression, Linear Structural Relations (LISREL), confirmatory factor analysis (CFA), path analysis, or other methods. A limitation of this paper mentioned by its authors was that the content of reviews and decision letters were not examined (Gilliland & Cortina, 1997). This limits the ability to determine if reviewers and editors raised issues pertaining to these analytical techniques and if such issues had a subsequent impact on their recommendations.
To date, a qualitative analysis by Rogelberg, Adelman, and Askay (2009) provides the most detailed account of issues raised in the peer review process. Through content analysis of 131 reviewers’ comments for nearly 100 manuscripts submitted to the Journal of Business and Psychology over a four-month time period, the authors identified areas of concern commonly expressed by reviewers. The general categories established by Rogelberg and colleagues were similar to that of Fiske and Fogg’s (1990) categories such as Methods and Results, Analyses, and Discussion. Rogelberg et al. also provided illustrative comments for each general category. Although Fiske and Fogg (1990) included subcategories such as “a statistical error is specified, and a better statistic is suggested or implied” and “a specified statistical analysis should be added” (p. 593), they failed to provide what additional analyses were requested. In addition to stating that reviewers recommended adding statistical analyses, Rogelberg et al. provided specific examples of those analyses that reviewers requested, such as betas for a regression or using the Sobel test for mediation. Rogelberg et al.’s content analysis of reviewers’ comments provides the most detailed empirical assessment of reviewers’ comments to date, but it does not identify themes among these manuscript issues, nor does it compare the frequency of issues in comments to manuscript decisions to reveal how reviewers and editors evaluate these issues.
The Current Study
The current study builds on previous work by (a) expanding on the specificity of statistical and methodological issues raised in reviews while also providing unifying themes for these topics, (b) relating both these specific issues and general themes to editors’ decisions, and (c) linking reviewers’ and editors’ comments to methodological debates in the literature. The present study used systematic procedures to qualitatively analyze editors’ and reviewers’ letters. This approach allowed for a more extensive and detailed qualitative analysis of peer reviews compared to previous analyses of manuscripts and reviewers’ comments (e.g., Fiske & Fogg, 1990). We also expanded on the methodological issues and suggestions raised by Rogelberg et al. (2009) by not only providing common methodological issues raised in peer reviews but also offering unifying themes for these issues. We suggest that it is important to examine methodological comments with both a level of specificity and a level of generality as they provide unique information to understand what reviewers and editors care about. Specific issues can inform authors about particular errors to avoid and techniques to use in order to address these issues. Themes can help to understand reviewers’ and editors’ perspectives, such as the request for comprehensive analyses, that might be driving specific comments about particular issues, enabling us to provide advice to authors that can be utilized more broadly to other methods and statistical analyses.
Many researchers have analyzed and coded journal articles to examine methodological challenges acknowledged in articles (e.g., Aguinis & Lawal, 2012; Brutus, Aguinis, & Wassmer, 2013) and to see if articles’ methods and statistics reflect best practices (e.g., Becker, 2005; Carlson & Wu, 2012; Cortina, 2003). To our knowledge, no study has related gatekeepers’ comments to best practices. To provide authors with recommendations to improve their quality of research, and thus enhance their manuscript submissions, we not only provide findings from our iterative coding of reviewers’ and editors’ letters but also relate these comments to current debates and practices in the literature. Our recommendations to address common methodological and statistical issues raised in reviews should also help reviewers and editors with their reviews and decisions.
Method
Sample
The sample consisted of reviews for all relevant first submissions of manuscripts to the Journal of Business and Psychology (JBP) whose date of first decision was between July 2011 and December 2011. 1 Manuscripts that were solely literature reviews, focused exclusively on methodological procedures, or submitted for special features were excluded from the sample. The final sample from 69 manuscripts consisted of 304 letters including 138 letters from reviewers to authors, 97 letters from reviewers to editors, and 69 letters from editors to authors. Forty-one reviewers did not provide specific comments for the editor separate from their comments to the authors. Thirteen different editors and 107 reviewers reviewed the manuscripts. Of the 107 reviewers, 27 reviewed two manuscripts, 2 reviewed three manuscripts, and 78 reviewed only one manuscript in the sample.
Although our sample is drawn from the comments of editors and reviewers from one journal, their ideas likely reflect those of many other journals. It is typical for associate editors and the editorial board to also serve other journals. Therefore, the thoughts of editors and reviewers reflected in JBP likely reflect the thoughts of editors and reviewers from other journals. The 165 editorial board members at JBP during the time of the current study now comprise 24.47% (n = 58), 13.04% (n = 33), 15.07% (n = 11), 4.67% (n = 14), and 16.05% (n = 13) of the current editorial boards at Journal of Applied Psychology, Journal of Management, Organizational Research Methods, Academy of Management Journal, and Personnel Psychology, respectively. Additionally, between 20% and 30% of the current associate editors at Journal of Applied Psychology, Journal of Management, Organizational Research Methods, Personnel Psychology, and Psychological Methods were JBP board members at the time of this study. Therefore, our findings regarding editors’ and reviewers’ comments from JBP can likely be generalized to the comments of reviewers and editors at other management or applied psychology journals.
For each peer-reviewed manuscript, quantitative data were also gathered. The editor’s decision (reject the article or revise and resubmit) was recorded for each of the 69 manuscripts. Of the 69 manuscripts, there were 69 manuscript decisions made by 13 editors regarding the outcome of the manuscript. We used logistic regression to see if there were differences across editors and reviewers with regard to issues raised and manuscript rejection rate. Logistic regression analyses suggest that there were no editor or reviewer effects. With 12 editor dummy codes as predictors, we ran a logistic regression with the manuscript outcome (reject or revise and resubmit) as the criterion. We also ran 11 logistic regressions with the presence or absence of each manuscript issue (for all 11 code families, see Table 1) as the criterion. For all of these analyses, we found nonsignificant regression weights, meaning that the editor reviewing the manuscript did not significantly predict whether an issue was raised, nor did it predict the manuscript outcome. Additionally, we ran similar logistic regressions for reviewers and did not find any significant regression weights for dummy-coded reviewers. Thus, for all common methodological and statistical issues, the editor or reviewer reviewing the manuscript did not significantly predict if an issue was raised or not, nor did the manuscript rejection rate depend on the editor.
Correlation Matrix of Code Families Raised in Manuscripts.
Note: N = 69 manuscripts. A code family was present in a manuscript if at least one issue falling within the family was raised in an editor’s and/or reviewer’s letter. HLM = hierarchical linear modeling; SEM = structural equation modeling.
*Correlation is significant at the .05 level (two-tailed). **Correlation is significant at the .01 level (two-tailed).
Procedure
This study examined the methodological issues most noteworthy to editors and reviewers through qualitative analysis of all 304 editors’ and reviewers’ letters (approximately five documents for each manuscript). The inductive approach involved systematic procedures to analyze editors’ and reviewers’ letters and further define these gatekeepers’ views on methodology.
ATLAS.ti
To code methodological issues in the 304 letters, the present study used ATLAS.ti, a computer program that assists with the analysis and coding of text documents by organizing codes into categories and superordinate categories (Pollach, 2011; Weitzman & Miles, 1995). Once the documents are uploaded into the software, the user can code phrases or words, reuse these codes later, and create links among codes. Relations for links between codes can be assigned logical properties using preprogrammed semantic relations or original relations, such as “is associated with” or “is part of” (ATLAS.ti, n.d.; Weitzman & Miles, 1995). The codes can also be grouped into categories, and these categories can be further grouped into families. Additionally, ATLAS.ti provides quotation counts for the various codes, meaning that ATLAS.ti output shows how many documents have a particular code and how many times each document has the code. In the present study, we uploaded each of the 304 reviewers’ and editors’ letters into ATLAS.ti as separate documents.
Qualitative coding
Through a constant comparative method (Locke, 2002), the current study coded the reviewers’ and editors’ letters to build conceptual categories, general themes, and overarching dimensions about research methods and statistics in the peer review process. Other examples of this iterative type of approach can be found in Clark, Gioia, Ketchen, and Thomas (2010); Pratt (2000); and Rerup and Feldman (2011). The analytic process started small, first with open coding of issues regarding research methods and then building the issues into conceptual categories, which could then be developed into themes. In first-order, or open, coding, researcher expectations do not play a role, such that the codes are specific, detailed, and directly represent the text (Gioia, Corley, & Hamilton, 2013; Strauss & Corbin, 1994). For instance, a quote from one editors’ letter read, “It is not appropriate to conduct both exploratory and confirmatory factor analyses on the same sample.” In the open coding stage of the analysis, this quote was coded as “do not conduct EFA [exploratory factor analysis] and CFA on the same data set.” As additional letters were coded, other comments relating to CFA emerged, such as “why EFA instead of CFA,” and “only need CFA not EFA for established measure.” The open coding process generated 1,751 statements from the letters. The same statement could be counted more than once because statements could be coded into multiple categories. As an example, one reviewer’s comment regarding a one-item control variable reflected concerns about single-item measures, scale choice, and a potential missing control variable. In order to reflect all these issues raised by the reviewer, this comment was assigned multiple open codes, including “inappropriate choice of measure” and “failure to include control variable.”
This study used two coders to analyze the data. After coding 25 letters (approximately 5 manuscripts) with these specific, first-order codes directly representing the text, both coders reviewed the code list and conducted focused coding. Focused coding, or second-order coding, requires the grouping and categorization of similar codes into concepts (Lee, Mitchell, & Harman, 2011). Referring to the previous example of codes, “do not conduct EFA and CFA on the same data set,” “why EFA instead of CFA,” and “only need CFA not EFA for established measure,” the coders categorized these codes into an overarching conceptual code called “choice of factor analysis.” This focused coding enabled a comprehensive organization of the codes into conceptual categories. During this focused coding, any discrepancies between the coders regarding the open codes were discussed and resolved prior to comparing, contrasting, and subsequent organization of the codes. All categories and grouping of codes were based on consensus between the two coders. Consensus coding is a commonly used procedure for iterative coding (e.g., Koppman & Gupta, 2014; Stigliani & Ravasi, 2012).
After focused coding was complete for a set of manuscript documents (i.e., letters), open coding would continue for 25 more letters (approximately 5 manuscripts). As the open coding continued on these new letters, the coder could use previous conceptual categories to code methodological and statistical comments within the editors’ and reviewers’ letters or could also create new first-order codes if none of the existing categories fit the data. ATLAS.ti software helped revise previous codes such that the renaming or combining of first-order codes in later letters would be applied to all letters that had those codes, including earlier letters. The coding of new letters helped to evaluate the current categories and reveal if there were new dimensions not yet articulated (Lee et al., 2011; Locke, 2002). After the coding of the new letters was complete, the coders would meet again to discuss new open codes and reevaluate the coding categories. Continuous comparison of codes and conceptual categories are an integral part of inductive research in order to refine the categories and ensure they reflect the data (Frost, 2011). This process continued until all of the 304 letters were analyzed and there were no new first-order, open codes or codes that did not fit into existing categories. The two coders used consensus throughout the categorization process to generate 267 second-order codes from a total of 1,751 coded statements.
After all the manuscript letters were coded, the coders evaluated the 267 final second-order codes and organized them into a broader framework using the families function in ATLAS.ti (Locke, 2002). Families allow the combination of categories into overarching groups. For instance, the “choice of factor analysis” related to other categories, such as “compare models” and “techniques to improve fit.” These could be grouped into the family of “Factor Analysis/SEM.” Other themes are further described in the results section but include, for example, common method variance, measurement, and control variables. Figure 1 provides an example of the reduction of codes from first-order, open codes to second-order categories and finally, to families or themes. In order to further create a comprehensive framework, comments within families regarding data analysis were grouped into three larger overarching dimensions: comprehensiveness of analyses, choice of analyses, and data analytic errors.

An example of the constant comparative qualitative coding process.
The families and three overarching dimensions represent the core patterns of research method comments in reviews. The following description of the findings from the coding process is organized to reflect these families and themes. Additionally, we discuss the ATLAS.ti output that provided frequency counts for the codes in the letters. These frequency counts were used to relate letter content to the editors’ final decisions on the manuscripts, providing further insight about the impact of these topics on manuscript outcomes. In our discussion of these common issues and themes, we relate editors’ and reviewers’ comments to best practices as recommended by the literature to provide guidance and recommendations to authors for future manuscript submissions.
Results
Overview
Systematic coding for methodological and statistical comments in reviewers’ and editors’ letters resulted in 267 conceptual codes from 1,751 statements that were further reduced into 11 families. Reviewers’ and editors’ comments for topics spanning design, measurement, and analysis typically referred to perceptions of authors’ mistakes or suggestions for improvement. Occasionally, these comments represented praise; however, comments of praise rarely provided specific methodological and statistical detail that could be coded. Table 1 provides a correlation matrix of the presence of issues in code families (e.g., measurement, moderation) in the manuscripts. A code family was present in a manuscript if at least one issue falling within the family was raised in an editor’s and/or reviewer’s letter. In general, the presence of an issue in one code family in an editor’s and/or reviewer’s letter for a manuscript positively related to the presence of issues in other code families for that manuscript.
We focus our analyses on particular categories and families because these issues were most frequently raised in the sample of manuscripts. The frequencies of methodological and statistical issues raised in reviewers’ and editors’ letters are presented in Table 2. It is important to note that frequency refers to whether a reviewer or editor mentioned an issue pertaining to a topic in his or her letter, not the number of times the issue was mentioned in an individual letter. Although the content of the manuscripts themselves was not coded, an issue was considered present in a manuscript if it was raised in an editor’s and/or reviewer’s letter. Our focus on the more common issues raised does not mean that they are necessarily the most important issues in terms of a manuscript’s outcome as a rarely raised issue could actually be more critical in the outcome than these common issues. We do nevertheless link the presence of these common issues in manuscript reviews to the final manuscript decision, as summarized in Table 2. 2
Frequencies and Percentages of Issues Related to Methodological and Statistical Topics in Reviews and Editors’ Decisions.
Note. In total, editors rejected 75.36% (n = 52) of the manuscripts. HLM = hierarchical linear modeling; SEM = structural equation modeling.
aReviewer pair means that the issue was raised by at least one of the two manuscript reviewers.
bAn issue was considered present in a manuscript if it was raised in an editor’s and/or reviewer’s letter.
As typical in the peer review process, editors rejected more first-time manuscript submissions than they gave revise and resubmit, with editors’ initial decisions reflecting a rejection rate of 75.36% of the manuscripts (n = 52). For comparison purposes, Table 2 not only includes the rejection rate for those manuscripts when an issue was present in an editor’s and/or reviewer’s letter but also the rejection rate for those manuscripts when the issue was not present. Due to the uneven frequency in manuscripts where the issue was present versus not present and the low base rates for certain issues, comparisons among these rates are difficult. As a result, we only make comparisons between these rejection rates when the number of manuscripts in which the issue was present was similar to the number of manuscripts in which the issue was not present, as was the case for common method variance (CMV) and factor analysis. To provide further information regarding these topics and manuscript decisions, Table 3 provides the percentages of the 52 rejected manuscripts and the percentages of the 17 revise and resubmit manuscripts in which issues related to these topics were present. In the Discussion section, we relate editors’ and reviewers’ comments to current methodological discussions in the literature in order to offer suggestions and recommendations to improve the research quality of future manuscript submissions (see Table 4).
Percentage of Rejected/Revise and Resubmit Manuscripts Raising Each Issue.
Note: HLM = hierarchical linear modeling; SEM = structural equation modeling.
aOut of 52 rejected manuscripts.
bOut of 17 revise and resubmit manuscripts.
Common Issues in Reviews, Corresponding Recommendations, and Supporting References.
Note: CFA = confirmatory factor analysis; CMV = common method variance; DV = dependent variable; RCM = random coefficient modeling; HLM = hierarchical linear modeling; OLS = ordinary least squares; EFA = exploratory factor analysis.
aThere tends to be widespread disagreement for issues and recommendations pertaining to common method variance and control variables.
Three prevalent families including measurement, control variables, and CMV, along with relevant categories, are described in the following. In addition to these three families, six other families of codes of frequent data analytic techniques, including factor analysis/ SEM, correlated variables, hierarchical linear modeling (HLM), moderation, mediation, and regression, are discussed as they pertain to three overarching data analytic themes, including comprehensiveness of analyses, choice of analyses, and common errors in data analysis. Other families of codes, including theory and study design, resulted from the coding process. Although our primary focus pertains to methods and statistical techniques, we will briefly discuss the common issues pertaining to study design.
Theory and Design
Comments pertaining to theory and design issues were raised in an editor’s and/or reviewer’s letter corresponding to 53.62% and 92.75% of the manuscripts, respectively. Typical remarks concerning theory pertained to weak theoretical bases for subsequent hypotheses. For example, an editor noted “the need for a stronger theoretical foundation for the paper as a whole … and for the development of specific hypotheses.” Similarly, other concerns related to whether the design was capable of testing the theory behind the hypotheses. A reviewer reflected this mindset when expressing “worry that there is a bit of a disconnect between the context that is alluded to throughout the introduction and that represented by the methodology.” In general, editors and reviewers were noting disconnect between theory, research questions, and research design.
Three of the most common design issues raised included threats to validity (n = 15 manuscripts), issues with causality (n = 31), and sampling (n = 13). Examples of threats to validity raised by editors and reviewers included history threat, regression to the mean, and selection biases. Causality issues primarily included comments that authors’ designs did not allow conclusions regarding causal effects. One editor conveyed to the authors that it was necessary to test the causal link between two of their constructs but “unfortunately, no causal inferences can be drawn from your data because all measures were collected from a single source at one point in time.” Data that cannot support causal predictions are an issue because, as one reviewer mentioned, there is “the likelihood of reverse causality.” One issue raised with sampling was overall low response rate, while another reviewer expressed concern over a base-rate problem such that “it is difficult to draw conclusions about a phenomenon that is not widely manifested in the sample.” Of these various design issues, the rejection rate for manuscripts that raised at least one design issue (75%) was not much different than the overall rejection rate of manuscripts in the sample (75.36%). However, regarding particular design comments, when an issue regarding sampling was raised, the manuscript rejection rate rose to 84.62%.
Measurement
The most frequently raised methodological concerns by editors and reviewers were those involving measurement. Comments for 62 of the 69 manuscripts (89.86%) discussed issues regarding construct measurement. These comments targeted choice of constructs, conceptualization and operationalization of constructs, divergent and convergent validity, and tests for measurement models.
Choice of construct
In total, reviews of 46 of 69 manuscripts (66.67%) discussed the need to justify constructs, suggesting that editors and reviewers are often unconvinced by author justifications for inclusion of constructs. Comments pertaining to the choice of construct and theoretical justification for constructs are consistent with Edwards’s (2010) call to enhance theoretical progress in organizational research. In order to test meaningful propositions, Edwards and Berry (2010) suggest that authors not only use rigorous methods but also develop more precise theories. In the present study, when editors and reviewers criticized authors’ choice of constructs, they typically asked that the authors theoretically justify their choices, including, specifically: why did they choose those particular constructs, why were these constructs chosen over others, and why did they use only some components of the construct. For example, one editor remarked that “Overall, a more convincing rationale is needed to justify your choice of outcome variables,” and a reviewer commented that “I felt there was very little justification for why the problem investigated was important, and for why the variables selected were chosen.”
Conceptualization and operationalization of constructs
Another common issue raised by reviewers and editors involved authors’ conceptualization and operationalization of constructs. Issues regarding conceptualization and operationalization were mentioned in the editor’s and/or reviewer’s letters for 52 manuscripts (75.36%).
In terms of conceptualization, reviewers and editors expressed a desire for authors to provide a more detailed and clear definition of their constructs. Conceptualization issues were raised in reviews of 29 manuscripts (42.03%). A comment exemplifying this sentiment was, “The definition of [the construct] is not clearly defined and given this is a focal construct to your paper, both of the reviewers and I were very troubled by this.”
Editors and reviewers mentioned issues with operationalization of measures for 46 manuscripts (66.67%). Issues with operationalizations included ambiguity in construct measures (n = 5 manuscripts), a misalignment between conceptualization and operationalization (n = 21), and issues with the choice of operationalization (n = 29). Sometimes editors and reviewers questioned the appropriateness of a construct’s operationalization by noting a disconnect between the conceptualization and operationalization. A comment reflecting these concerns referred to a paper on teams: “I am not convinced that the two operationalizations of [the construct] … used in the current manuscript are, in fact, the optimal operationalizations of [the construct]…. Several plausible alternative operationalizations exist.”
Convergent and discriminant validity
Another commonly critiqued aspect of measurement was a lack of convergent and discriminant evidence provided by the authors. This issue was raised in letters corresponding to a little over half of manuscripts (n = 39; 56.5%). Editors and reviewers questioned the convergent and discriminant validity of constructs when authors did not provide evidence of validity. For example, one reviewer’s comment exemplifying this concern read, “Given the high correlations among the three forms of conflict, I would like to see evidence of discriminant validity (i.e., CFA).”
When measures were related, reviewers and editors often suggested that they might be indicators of the same construct. One reviewer reflected these thoughts for a paper on climate, “The two … scales share a manifest level correlation of .66…. Maybe the two scales should be used as two indicators of a common construct.” Some of reviewers’ and editors’ requests for factor analyses and structural equation modeling (as discussed further in the Data Analytic Errors section) were, in part, an attempt to make sure that the constructs were distinct, such as one reviewer’s request that the authors “at least employ [an] exploratory factor analysis with all of the studies’ items to provide additional evidence that the measures are distinct.” Additionally, reasons provided by two reviewers to conduct a CFA were to “make sure the items load on the measures appropriately” and to “confirm that [constructs] are in fact different and appropriately measured.”
Tests of measurement models
One strategy for establishing the appropriateness of measures is to test and compare measurement models. To urge authors to provide more information about the appropriateness of their measures, editors and/or reviewers for 13 manuscripts (18.94%) recommended factor analyses (e.g., CFA or EFA) when none had been conducted. In cases where analyses had been conducted, editors and reviewers raised issues regarding the authors’ choice of factor analysis (n = 10 manuscripts). For instance, of these manuscripts, editors and reviewers specifically questioned why authors used EFAs instead of CFAs (n = 5) given that CFA procedures provide a stronger a priori test. As one editor noted, “EFA is used when the researcher has no prior theory regarding the proper number of factors (latent variables).” Thus, when the authors “seemed to have reasonable bases … for a hypothesized factor structure that could be tested,” an editor asked, “Why did you use an EFA as opposed to a CFA to understand the factor structure of your scale?” In particular, editors and reviewers preferred CFAs rather than EFAs for testing measurement model fit. We discuss the specific issues pertaining to the implementation of factor analyses in our review of data analytic errors.
Control Variables
In our analysis of the peer review process, the issue of control variables appeared in the editor’s and/or reviewer’s letters for less than half of the manuscripts. While not as frequent as measurement topics, the issue of control variables reflected two contrary perspectives: (a) the failure to include control variables and (b) the inclusion of unnecessary controls. For 39.13% of the manuscripts (n = 27), reviewers and editors found it problematic that certain control variables were not included in the study, particularly ones that the editors and reviewers believed were theoretically relevant. As one reviewer wrote to the authors, “Perhaps the most concerning part of the current manuscript was the failure to control for endogenous factors that could account for the observed relationships.”
Not all reviewers and editors thought it necessary to include more control variables. For 14.49% of the manuscripts (n = 10), editors and reviewers asked authors to justify their inclusion of control variables. Here, they were more concerned with the choice and inclusion of control variables rather than the omission of controls. For instance, one reviewer stated, “It wasn’t clear to me why the control variables … were added into the analyses. A more detailed explanation would be very helpful in this regard.” Editors and reviewers appeared to be looking for theoretical justification for the inclusion of control variables that without justification may be deemed unnecessary.
Though one may think it easier to remove unnecessary controls or justify the inclusion of controls than to add control variables in a study, editors rejected relatively even percentages of manuscripts that included unnecessary controls (80%; n = 8 out of 10 manuscripts) compared to those manuscripts that failed to include controls (74.07%; n = 20 out of 27 manuscripts). Although the failure to include controls was more common in the manuscripts, these rejection rates suggest that authors should be wary of including more control variables simply to avoid omitting variables, as editors and reviewers also viewed too many controls as problematic. One reason that unnecessary control variables may be an issue is that their inclusion may prompt perceptions of weak theoretical justification for the entire study.
Common Method Variance
Common method variance can occur when similarities in measurement methods produce biased estimates of reliability and validity and result in inaccurate estimates of relationships among variables (Podsakoff, MacKenzie, & Podsakoff, 2012; Spector & Brannick, 2010). Editors and reviewers mentioned issues with CMV in their letters for 42.03% (n = 29) of the manuscripts. In terms of more specific CMV concerns, issues regarding self-report data were raised for 18 of these 29 manuscripts. Editors and reviewers were aware of situations in which CMV posed an issue, as exemplified by an editor’s explanation to the authors: I would have liked to see you address this problem in Study 2 by employing different procedural and/or statistical controls for CMV in order to rule out its potential biasing effects. However, the data in Study 2 were collected from a single source at one point in time using survey items with identical response scales. Such a design creates conditions that are ripe for CMV.
There was not a significant difference between the proportion of reject and revise and resubmit decisions for manuscripts in which CMV was present (reject = 82.76%; revise and resubmit = 17.24%) compared to those manuscripts in which CMV was not present (reject = 70%; revise and resubmit = 30%). Conversely, 46.15% of all rejected manuscripts in our sample had issues with CMV, while only 29.41% of the manuscripts that received more favorable recommendations had CMV present. These results provide partial support for prior findings that editors and reviewers find issues with common method and self-report data (Chan, 2009) and may be quick to conclude that common methods inherently mean inflation of relationships between the variables.
A couple of manuscripts (n = 2) attempted to address CMV in their study using the Harman single factor test. Recent research has shown this technique to be largely ineffective at identifying CMV (Richardson, Simmering, & Sturman, 2009). Therefore, editors and reviewers appropriately appeared to be discouraging this practice. However, results suggest that more sophisticated statistical analyses to account for CMV may resolve the issues. Editors and reviewers recommended statistical remedies for CMV, citing Podsakoff, MacKenzie, Lee, and Podsakoff (2003; n = 2 manuscripts); Conway and Lance (2010; n = 6); and Johnson, Rosen, and Djurdjevic (2011; n = 2). Still, although editors and reviewers are proposing some post hoc analyses to address issues associated with common methods, results suggest that if they find issues with CMV in the study design (e.g., same source design), they will likely reject the manuscript.
Data Analytic Errors
Analysis of editors’ and reviewers’ comments revealed common errors in data analysis made by authors. Three major themes arose in the analyses relating to (a) the comprehensiveness of the analysis, (b) the choice of analysis, and (c) errors relating to specific procedures.
Comprehensiveness of analyses
One of the most common errors identified in the peer review process concerned analyzing one’s data in a piecemeal fashion rather than using a more comprehensive approach. These comments fell into two categories: correlations between variables and testing for interactions.
Correlations between variables
For almost one quarter of the manuscripts (21.74%; n = 15), editors and reviewers commonly requested that authors account for the correlations between endogenous variables by conducting more comprehensive analyses. One reviewer’s comment exemplified this request by saying that “rather than addressing any systematic issues … [the author] essentially breaks what should be a complex multivariate analysis into a large set of narrow comparisons.” In comparison to the overall 75.36% rejection rate for all manuscripts, editors rejected only 53.33% of the manuscripts in which this issue was raised in their and/or the reviewers’ letters. Of all 69 manuscripts, 41.18% of the manuscripts that received a revise and resubmit contained this issue, while only 15.38% of the rejected manuscripts contained this issue. Because the failure to account for correlations between endogenous variables can be addressed post hoc with an alternate analysis, editors and reviewers may have been more forgiving of this error and willing to give authors a chance to confirm their original results.
Testing for interactions
Moderation was raised in editors’ and reviewers’ letters corresponding to 21 of the manuscripts (30.43%). Some of their comments suggested that authors test for moderators (n = 7 manuscripts) and look for interactions by including product terms rather than running separate additive analyses for different groups or levels of a potential moderator. Editors and reviewers also critiqued authors for taking a piecemeal approach to the testing of a model when authors considered moderation and mediation in isolation rather than testing them in a single moderated mediation (n = 4 manuscripts). As one editor explained to the authors, “Both reviewers point out that the analyses also lack integration that is suggested by the hypotheses, namely an analysis of moderated mediation.” Reviewers and editors recognized in their comments that “testing the mediation separately from the moderation” means “taking a piecemeal approach to the testing of a model.” When editors and reviewers mentioned the need to test for moderators or test for moderated mediation, these manuscripts had high rejection rates of 85.71% (n = 6) and 100% (n = 4), respectively. We propose reasons for these high rejection rates in the Discussion section.
Choice of analyses
A related yet different concern expressed frequently in the peer review process concerned choice of analyses. While these comments may be viewed by some as pointing to more comprehensive analytical approaches, we categorized these comments as reflecting a desire for more appropriate procedures.
Mediation
When editors and reviewers raised issues with manuscript authors’ tests for mediation (n = 13 manuscripts), they often suggested more appropriate analytical procedures (69.23%; n = 9). Reviewers and editors criticized authors for using outdated tests for mediation, in particular the piecemeal portion of Baron and Kenny’s (1986) approach (n = 3 manuscripts). Instead, reviewers requested advanced statistical procedures for testing for mediation including the Sobel test (n = 5 manuscripts), bootstrapping standard errors for indirect effects (n = 5), and structural equation modeling (SEM; n = 5). For these manuscripts where editors and/or reviewers raised issues with mediation, the rejection rate was 69.23% (n = 9), suggesting that editors and reviewers recognized this issue can be addressed by the authors through more sophisticated analyses.
HLM
More appropriate analyses were also suggested by editors and reviewers to account for issues with nested data and non-independence. For 15 (21.74%) manuscripts, editors and reviewers suggested HLM. For example, one editor wrote, “At the least, you need to account for organizational-level variance. A more sophisticated analysis of this type could also potentially examine some variables as a shared experience at the organizational level.” Similar to tests for mediation, rejection rates for manuscripts containing this issue were relatively low, 60% (n = 9). Therefore, the request for authors to conduct an HLM was not necessarily a fatal flaw but rather a recommendation by editors and reviewers to account for the nested nature of the data.
SEM
In cases when dependent variables were highly related or alternative models needed to be tested (n = 5 manuscripts), editors and reviewers recommended that authors use SEM to estimate relationships among latent variables and simultaneously model multiple relationships. One reviewer wrote, “Authors, do not kill me, but you really need to run a SEM on this study!! You’ve got 3 DVs that are highly related and latent variables that have measurement issues.” Another reviewer stated, “One way you could make a greater contribution [is] to take the results from the meta-analysis and use structural equation modeling to provide a comprehensive test of the … model.” Consistent with the results for mediation and HLM issues, the rejection rate for manuscripts with SEM issues was 60% (n = 3), although the issue arose relatively infrequently.
Errors relating to specific procedures
In addition to the two aforementioned general categories, we identified a number of common errors authors tended to make with regard to certain analytical procedures.
Factor analysis/SEM
Some of the most common comments pertaining to data analysis concerned the use of EFA, CFA, or SEM. These issues were raised in the editor’s and/or reviewer’s letters for over half of the manuscripts (53.62%; n = 37). These comments included inappropriate use of modification indices (n = 2 manuscripts), correlated residuals (n = 2), failure to test alternative models (n = 10), and/or improper elimination of items (n = 4). Of these 14 manuscripts containing such issues, editors rejected 85.71% (n = 12) of them.
Our analyses also identified some common errors with EFA. When authors used EFA, reviewers and editors questioned the factor analytic method being used (n = 3 manuscripts) and the choice of rotation method (n = 1). Notably, both of these issues have been identified as common errors when conducting an EFA (Bandalos & Boehm-Kaufman, 2009). Another issue raised in letters corresponding to two manuscripts had to do with authors conducting an EFA and CFA on the same data set. When manuscript authors conducted an EFA and CFA on the same sample, a reviewer explained that, “These confirmatory results do not provide evidence of confirmation, but merely indicate that two modeling approaches on the same data converge.” Of the 37 manuscripts that had issues with factor analyses and/or SEM, 83.78% (n = 31) were rejected, which was higher than the 65.63% rejection rate (n = 21) of the 32 manuscripts that did not have these issues. These findings suggest that editors and reviewers are taking issues with factor analyses and SEM seriously.
Moderated regression
Editors and reviewers raised issues pertaining to moderated regression in their letters for 14.49% (n = 10) of the manuscripts. In 30% of the manuscripts where issues with moderated regression were raised (n = 3), editors and reviewers requested that authors center variables prior to testing for moderation, as is commonly recommended (Aiken & West, 1991). Additionally, authors were asked to include follow-up tests on significant interactions, as emphasized by one reviewer who wrote, “without simple slope tests we do not know which levels are actually different from one another.” Simple slopes identify changes in the predicted relationship between variables at different levels of the moderator (Aiken & West, 1991). It is important to note that simple slopes are not an appropriate method if the moderator is continuous with no meaningful cutoff values because the different levels of the moderator would have no theoretical meaning (Dawson, 2014).
Discussion
This study elaborates on previous empirical studies (Fiske & Fogg, 1990; Rogelberg et al., 2009) by providing a detailed examination of methodological and statistical comments in reviewers’ and editors’ letters, identifying specific areas of concern, and relating them to manuscript decisions. We highlighted common errors made by authors both generally (e.g., lack of comprehensiveness of analyses) and for specific analyses (e.g., poor methods to improve model fit).
Overview of Findings
In the current study, editors and reviewers frequently commented on measurement topics (e.g., theoretical justification of constructs, conceptualization, and operationalization of measures) for manuscripts submitted for publication in JBP between July 2011 and November 2011. Issues with theoretical justification of constructs resulted in high manuscript rejection rates. Also common in editors’ and reviewers’ letters were comments on theoretical justification of study design and hypotheses. Additionally, editors and reviewers raised concern with the failure to include controls more often than the inclusion of unnecessary controls; however, the presence of either issue related to the theoretical justification of control variables and resulted in similar rejection rates.
Issues with common methods, particularly self-report data, were found to often result in manuscript rejection. Other common data analytic errors related to higher rejection rates (e.g., failure to test for moderated mediation, inappropriate factor analysis test); although results also suggested that some data analytic errors might not necessarily constitute as fatal flaws (e.g., failure to account for the correlations between endogenous variables, failure to conduct a SEM or HLM, failure to test for mediation). We discuss the apparent inconsistencies in editors’ manuscript decisions for these issues that could be resolved with revisions in the following section on data analytic errors.
In order to provide recommendations to editors, reviewers, and authors concerning the methodological and statistical topics covered in the results, we relate our findings to previous research as well as current debates in the literature. Table 4 provides an overview of the common issues raised by editors and reviewers as well as recommendations and sources to remedy these issues. Authors who want to improve their current methodological practices, or who are concerned with an editor’s or reviewer’s comment, can refer to this table for appropriate solutions and references. Editors and reviewers can also use this table to provide comments that are consistent with recommended methodological and statistical practices. Table 4 notes that there is disagreement in the literature on the topics of CMV and control variables. We have provided sources for both sides of the debate, but in our recommendations, we identified a middle ground that does justice to both. Editors and reviewers should be aware that there does not appear to be much agreement in the literature for these topics, and authors should note these widespread disagreements when addressing issues pertaining to these topics.
Relationship to Current Methodological Practices and Recommendations
Design
Design issues were commonly raised for manuscripts within this study, with issues corresponding to threats to validity, threats to causal inference, and sampling appearing most often. Our study’s findings are in line with previous findings from reviews of limitation sections in empirical articles published in Journal of Business Venturing between 2005 and 2010 (Aguinis & Lawal, 2012) and in empirical articles published in various top journals from 1982 to 2007 (Brutus et al., 2013). These studies found that common design limitations mentioned by empirical articles included small sample size and lack of confidence regarding causality (Aguinis & Lawal, 2012), as well as external validity issues, particularly concerning lack of generalizability (Brutus et al., 2013). This suggests that the design issues commonly raised by editors and reviewers are also reflected in the limitation sections of published manuscripts. Given the high frequency of design issues raised in the limitation section of published manuscripts, it is likely that these manuscript authors could not have mitigated these issues even if editors and reviewers raised them during the peer review process. Although manuscripts can be published with design limitations, authors should consider sampling and validity threats in the planning stage of their study to avoid as many design issues as possible.
Previous research has found that adequacy of design had the biggest influence on editorial recommendations for a manuscript (Gilliland & Cortina, 1997). Similarly, Fogg and Fiske (1993) found a significant correlation between severe planning and execution criticisms and negative reviewers’ final recommendations. Although design issues were common in the current study’s editors’ and reviewers’ letters, we did not find a higher rejection rate for manuscripts with design issues compared to the other statistical issues. Advanced planning in research design is important, but it is likely that editors and reviewers recognize that every study has some limitations. We recommend that authors follow Brutus et al.’s (2013) guidelines on how to report limitations and address the threats to validity they face with their studies.
Measurement
This study identified measurement concerns as a critical issue that authors must anticipate if they want to improve their chances of success in the publication process. Our field has a long history of emphasizing high quality measurement (SIOP, 2003). Conversely, methodologists have lamented the seeming lack of emphasis being placed on high quality measurement. This concern is exemplified in a recent exchange on the electronic mailing list of the Research Methods division of the Academy of Management (RMNET) regarding construct validity: This seems like it would be a great subject for a new Urban Legends piece. I can see the title now: “My measure was used in a top tier publication, so of course it’s reliable and valid!” … The other thought this string triggered is a potentially interesting disconnect between elements of professional practice and scientific inquiry in our field. In some cases, particularly in the case of high-stakes testing or personnel selection, standards for psychometric quality of a measure seem to be different [APA’s Standards and SIOP’s Principles] … from the standards that measures are held to in journals (almost regardless of tier), For example, with few exceptions, a practicing I-O psychologist in the area of personnel selection would be hard pressed to simply cite an article in a top tier journal as an adequate defense for the psychometric properties of a selection measure s/he plans to use in a hiring process. (Putka, 2011)
While some construct validity issues could be more fatal flaws than others, we cannot say definitively that some hold more weight in the final manuscript decision. However, results from previous research have suggested that issues with operationalization of constructs are frequently mentioned in the limitations sections of empirical articles (Aguinis & Lawal, 2012) and are strongly negatively related to reviewers’ manuscript evaluations and the manuscript decision (Gilliland & Cortina, 1997), suggesting that authors should pay close attention to their operationalization of constructs.
Control variables
Our analysis of comments pertaining to control variables reflected two conflicting points of view: more controls are needed and fewer controls are needed. These findings elaborate on previous research that only covered reviewers’ comments when manuscript authors had no or poor controls (Fiske & Fogg, 1990). The two perspectives that emerged from our analyses mirror the debate in the statistical literature concerning the appropriate use of control variables. One side of the debate generally argues for the inclusion of more control variables (Antonakis, Bendahan, Jacquart, & Lalive, 2010), while the other typically recommends including a more limited set of control variables that can be theoretically justified (Spector & Brannick, 2011).
Those methodologists who argue for more controls warn against omitted variable bias, such that omitting variables leads to biased estimates in regression coefficients and may lead to estimations of the wrong model (Antonakis et al., 2010). In regards to the inclusion of irrelevant regressors, these researchers suggest, “It is always safer to err on the side of caution by including more than fewer control variables” (Antonakis et al., 2010, p. 1092). Carlson and Wu (2012) argue that the choice to include or not to include controls can heavily influence one’s findings and that it is a common misconception that the inclusion of control variables always allows for the isolation of the effect of the independent variable on the dependent variable, holding other influences constant. Additionally, there is the concern that authors can conduct analyses with all possible subsets of a set of potential controls until they find the subset that produces the most flattering results for the constructs of interest. In order to reduce the belief that statistical controls always lead to more accurate estimates of variable relationships, Spector and Brannick (2011) suggest that “Editors and reviewers should be the first line of defense against the purification principle” by recommending that authors refrain from including too many control variables (p. 302).
Previous studies have examined journal articles for authors’ use of control variables (Becker, 2005; Carlson & Wu, 2012) in order to provide authors with suggestions and guide editors’ and reviewers’ commentary, but to our knowledge, our study is the first to examine the peer review process itself to see if reviewers’ and editors’ comments reflect leading methodologists’ suggestions. Becker (2005) and Carlson and Wu (2012) stressed the need for better justification for the inclusion of controls. Our findings show that reviewers and editors similarly wanted explanations as to why the authors included certain control variables. One reviewer’s suggestion to “include as control variables those that make most sense on theoretical grounds,” such that “a more compelling rationale should be given for the variables you have statistically controlled for in each study” exemplifies the idea that “authors should be asked to thoroughly explain and justify what they have done. They should not get away with merely saying they included a control variable just because it might affect the variables of interest” (Spector & Brannick, 2011, p. 302). Our findings fit Spector and Brannick’s (2011) recommendation that when editors and reviewers request that authors include control variables, they should provide the same amount of theoretical justification that they expect from the authors. The present study’s results suggest that although editors and reviewers recognize the issues of unmodeled endogeneity (Antonakis et al., 2010), they are demonstrating vigilance by asking authors for theoretical justification of control variables and by providing justification for their suggestions to include control variables.
Both sides of the debate suggest that control variables should be included on the basis of theoretical importance; however, they differ as to whether one should be wary of including too few or too many control variables. The finding that editors and reviewers more frequently raised issues with manuscripts having too few compared to too many control variables suggests that authors need to think carefully about which variables to include and provide justification for their omission of relevant control variables. Similarly, editors and reviewers should theoretically support any control variables they recommend for manuscripts.
Common method variance
Another current methodological debate reflected in our results concerns CMV and the bias that can be associated with it. In their content analysis of reviewers’ letters, Rogelberg and colleagues (2009) made mention of CMV but provided brief recommendations (e.g., CMV should be considered in the design of the study) that do not disentangle or delve into the extent of the debate on issues concerning CMV. Many researchers believe that CMV is an important issue and must be controlled, while others suggest the issues are not inherent in CMV and that these concerns are misplaced. Those who express concern regarding CMV argue that CMV can threaten construct validity, influence the structure of constructs, obscure relationships between constructs by inflating reliability estimates and convergent validity, and deflate the relationship between different constructs (e.g., Bagozzi & Yi, 1990; Baumgartner & Steenkamp, 2001; Doty & Glick, 1998; Podsakoff et al., 2003). Self-report data, the most common CMV culprit, has been characterized as creating low construct validity due in part to social desirability (Spector & Brannick, 2010) and is posited as a red flag for issues with CMV. In an editorial for the Journal of Applied Psychology, Kozlowski (2009) writes, Most desk-rejected manuscripts (aside from posing a trivial research question) are single-shot, cross-sectional, self-report survey designs. There are very rare occasions where such a design may be warranted and defensible. In such situations, authors need to make every effort to address the concerns of common source variance (see Podsakoff, MacKenzie, Podsakoff, & Lee, 2003). (p. 3)
The finding that editors and reviewers raised issues with CMV suggests that editors and reviewers may not agree with Spector’s (2006) argument that “the assumption that method alone is sufficient to produce biases, so that everything measured with the same method shares some of the same biases” is an urban legend (p. 223). Spector’s side of the CMV debate suggests that the construct and its measurement, not just the source of data, determine the amount of shared bias a measure will have with others. Conway and Lance (2010) caution against reviewers assuming “a) that relationships between self-reported variables are necessarily and routinely upwardly biased, b) other-reports (or other methods) are superior to self-reports, and c) rating sources (e.g., self, other) constitute measurement methods” (p. 235). Although they acknowledge that there is potential for CMV to bias findings, they suggest that gatekeepers should not base their criticisms purely on these assumptions. Rather, editors, reviewers, and authors should evaluate measurement within the context of the research situation (Conway & Lance, 2010).
In line with the current CMV debate, authors are encouraged not to ignore CMV but to address the threats in their justifications, procedures, and if necessary, post hoc techniques. Though having data with CMV is not necessarily a fatal flaw, authors should carefully consider in the design process if they are able to defend their choice of measures. If authors can avoid sole reliance on a single method, then they should. If authors believe that measures with common method are necessary, then they should be aware of the situations in which CMV may or may not be an issue. For instance, research suggests that CMV inflates correlations in multilevel and cross-level studies (Ostroff, Kinicki, & Clark, 2002) but that CMV does not generate artificial interaction effects (Evans, 1985).
Based on our findings, editors and reviewers seem prone to find issues with CMV, and thus, the potential for manuscript rejection is high. Authors who determine that common methods are necessary for their study should consider strategies to attenuate CMV’s negative effects and be prepared to address the potential threats that CMV poses. Some techniques to address the potential threats that CMV poses include spreading data collection over time to reduce temporary affective states and memory effects (Ostroff et al., 2002), splitting the sample in half (i.e., using half the sample to measure one construct, half the other construct; Ostroff et al., 2002), and providing post hoc analyses that control method biases (e.g., Podsakoff et al., 2003). Authors should avoid the Harman single factor technique to remedy potential problems with CMV and aim to use techniques by Podsakoff et al. (2003), Conway and Lance (2010), or Johnson et al. (2011).
Data analytic errors
Data analytic errors related to three general themes in reviewers’ and editors’ comments: (a) the comprehensiveness of the analysis, (b) the choice of analysis, and (c) errors relating to specific procedures.
Choosing the most appropriate statistical technique mitigates the risk of errors in the comprehensiveness of data analyses. When testing for mediation, authors should use more appropriate procedures such as differences in coefficients and products of coefficients tests (e.g., Sobel’s test), bootstrapping, and SEM, instead of Baron and Kenny’s (1986) causal steps that do not focus on the indirect effect. Authors must also be aware of the distributional assumptions on which tests are based (e.g., the Sobel test assumes multinormality). Although additive analyses are more parsimonious, authors should conduct multiplicative analyses when appropriate. For instance, if authors propose both mediation and moderation in the same study, they should be aware that tests for moderated mediation might be necessary and should use appropriate statistical techniques to test these models (Edwards & Lambert, 2007; Hayes, 2013; Sardeshmukh & Vandenberg, 2016). However, authors should be wary of instances when multiplicative analyses are not meaningful (see e.g., Murphy & Russell, 2016).
Our findings suggest that authors should account for correlations between endogenous variables with comprehensive analyses (e.g., MANOVAs or step-down technique). Additionally, to analyze highly related dependent variables or latent variables with measurement issues, authors should consider conducting SEM analyses, although there are considerations to take into account (see recommendations and references in Table 4). In order to account for issues with nested data, such as the non-independence of data, authors are encouraged to use random coefficient modeling (RCM), as the traditional ordinary least squares (OLS) approach should not be used with non-independent data. As these analyses can be addressed in revisions, editors and reviewers did not seem to reject as many of these manuscripts compared to the overall rejection rate. However, this was not the case for all issues that could be addressed through revision.
Though other analyses could also be conducted post hoc, authors’ failure to conduct these analyses may have raised questions about their study’s conclusions or the manuscript itself, as demonstrated by high rejection rates for the failure to include certain analyses (e.g., moderation) and for improper implementation of factor analyses and/or SEM. For instance, when authors failed to test for interactions that would have made their hypotheses or analyses more comprehensive, the majority of these manuscripts were rejected, perhaps because authors did not include the necessary moderators in their studies or because the interpretation of their data would change if they tested for a moderator. Similarly, high rejection rates resulted when authors improperly implemented a SEM or factor analysis, possibly because the fit or interpretation of the model would change when properly tested.
Reviewers and editors also provided particular suggestions for errors relating to factor analyses and SEM. While previous research found that reviewers and editors often requested CFAs or EFAs for measures (Rogelberg et al., 2009), the current study expands on these findings by including more specific issues and recommendations regarding the proper implementation of factor analyses and structural models. For instance, authors should be sure to appropriately use modification indices, test alternative models, explain and support the elimination of items, and avoid correlating residuals unless they can justify the choice (e.g., an endogenous variable measured at different times). When conducting an EFA, authors should avoid common errors by justifying the use of an EFA instead of a CFA, not conducting an EFA and CFA on the same data set, and supporting the choice of rotation method (see Bandalos & Boehm-Kaufman, 2009, for more on these common errors).
Regarding moderated regression, editors and reviewers suggested that authors include tests for simple slopes and center their variables. The request for simple slopes was made in several cases where issues with moderated regression were raised. This phenomenon is interesting considering the viewpoint of some researchers that tests of simple slopes may not be appropriate in most situations. As Dawson (2014) explains, tests of simple slopes often tell us little about the effect of interest and should only be used in certain circumstances (e.g., where the conditional effect at a certain value of the moderator would be particularly meaningful). Moreover, the use of arbitrary values such as one standard deviation above or below the mean to plot simple effects offers little utility. Authors should use and editors and reviewers should suggest the Johnson-Neyman technique or regions of significance as alternatives to simple slopes when moderators are continuous with no meaningful cutoffs (Aiken & West, 1991; Preacher, Rucker, & Hayes, 2007). Authors, editors, and reviewers should recognize the utility of simple slopes but also be aware of the circumstances in which this test should be used.
Editors and reviewers also occasionally requested that authors center variables prior to tests of moderation. Conversely, a recent article in the literature suggests that centering variables may be misguided (Dalal & Zickar, 2012). Authors should realize that centering variables in moderation only reduces non-essential collinearity, making the main effects easier to interpret. Authors should not misinterpret reviewers’ suggestions to center variables as ones that will improve fit, impact power, or change reliability of the interaction term (Dalal & Zickar, 2012). Authors should center their variables prior to testing for moderation, but they, as well as reviewers and editors, should realize that this technique does not resolve problems such as poor model fit.
When considering the totality of issues raised, there was considerable overlap between methodology and theory, as demonstrated by the presence of theoretical concerns raised in the majority of manuscript reviews, suggesting that authors’ choices in methodology should be theory driven. The emphasis on theoretical justification in editors’ and reviewers’ comments pertaining to control variables, measures, and analyses confirm that authors should start with theory and then select the method and analyses that best allow tests of the theory (Wilkinson, 1999). As previously mentioned, authors are encouraged to reference Table 4 for an overview of issues commonly raised by reviewers and editors, recommendations, and additional sources.
Limitations and Future Research
There are several limitations within this study that should be noted. Although we generated 267 codes from 1,751 statements in 304 letters, the letters came from only 69 manuscripts. The sample of 69 manuscripts was a relatively good size given that we incorporated both reviewers’ and editors’ letters for each manuscript, allowing us to look at comments and concerns among reviewers and editors for the same manuscript. In previous studies, Fiske and Fogg (1990) coded 402 reviewers’ letters, and Rogelberg et al. (2009) analyzed 131 reviewers’ letters, but they did not examine all editors’ and reviewers’ letters for every manuscript included in their studies.
It should also be noted that the sample of editors and reviewers were a select group of people, and thus, the opinions of these editors and reviewers may not be representative of the field as a whole. Moreover, though the perspectives of multiple editors (N = 13) and many reviewers (N = 107) are represented, the sample consists of multiple letters from the same editors and from 29 of the 107 distinct reviewers. Additionally, while JBP is a highly respected journal, the goals of this journal may differ from those of other journals. Still, as previously mentioned, it is typical for associate editors and board members to serve other journals, and therefore, the thoughts reflected in this journal should be consistent with those reflected in other journals. Future research should expand the qualitative analysis of reviews to other journals.
An additional limitation arises from the lack of multiple independent coders. Though the initial codes were reviewed, organized, and categorized by two coders, consensus coding was used, meaning that statistics about interrater agreement and reliability cannot be computed. Because the present study is cross-sectional, the study cannot support or attest to trends in methods comments over time. Future studies should look at the peer review process over time to examine if methodological trends in reviews match those of the literature.
Conclusion
In conclusion, the present study provides prospective authors with detailed information regarding what the gatekeepers say about research methods and analysis in the peer review process. Our hope is that reviewers and editors can use these findings to improve on their own reviews and decisions. Similarly, researchers can use this information and our subsequent recommendations to enhance the quality of their current methodological practices and, most importantly, conduct effective and appropriate research. Doing so should ultimately enhance their chances of publication success.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
