Introduction to the Special Series on Results-Blind Peer Review: An Experimental Analysis on Editorial Recommendations and Manuscript Evaluations

Abstract

Publication bias occurs when studies with statistically significant results and large effects are more likely to be published than similarly rigorous studies with null and mixed findings. Results-blind peer review requires peer reviewers to consider only the “Introduction” and “Method” sections of submitted manuscripts prior to making editorial recommendations. This process ensures recommendations for publication focus on methodological rigor and not the direction, significance, or magnitude of the reported effects. The current investigation experimentally tested whether reviewers’ editorial recommendations and perceptions of manuscript importance, quality, and rigor varied as a function of type of review (i.e., results-blind or results-included) among 44 reviewers. Results indicated reviewer recommendations did not vary as a function of review type. However, reviewers found results-blind manuscripts less rigorous than results-included and reported less confidence in their recommendations on result-blinded manuscripts. Descriptive findings of results-blind reviewing were mixed with some support for the method, but a lack of confidence in its overall effectiveness. We discuss findings in relation to the conceptual benefits of results-blind reviewing and the increased focus on open and transparent science within special education and preview the papers included in the special section.

Keywords

policy issues single-case experimental design randomized trial

Publication bias occurs when studies with statistically significant or large findings are more likely to be published than similarly rigorous studies with null or mixed findings (Rosenthal, 1979). Evidence indicates publication bias is widespread across multiple fields of research (Chong et al., 2016; Dwan et al., 2013; Findley et al., 2016; Francis, 2012), including education and special education (e.g., Chow & Ekholm, 2018; Gage et al., 2017; Polanin et al., 2016). It appears that publication is driven in part by journal audience, editor, and reviewer preference for research resulting in statistically significant results (Heene & Ferguson, 2017; Shadish et al., 2016). As such, researchers may not attempt to publish studies yielding null effects (Franco et al., 2014) or may engage in “p-hacking” (Cook et al., 2018), data fishing (Humphreys et al., 2013), or other activities that make significant findings more likely.

Aggregated across studies, publication bias may distort a body of evidence by inflating estimated intervention effects and providing an inaccurate depiction of the available research (Cook & Therrien, 2017). In addition, publication bias has likely contributed to the “replication crisis,” in which previous empirical findings have not been replicated consistently in subsequent attempts (Open Science Collaboration [OSC], 2015; Shrout & Rodgers, 2018). For example, if a researcher attempted an experiment 3 times, but only published the one instance with statistically significant results, it would not be surprising if subsequent attempts did not replicate the published findings (Francis, 2012).

Special education research (Cook, 2014), and specifically research investigating effective practices for students with emotional and behavioral disorders (EBD), has not been immune to publication bias. Indeed, EBD research and single-case design research show evidence of preference for publication of large effects (Cook & Therrien, 2017), questionable research practices (Shadish et al., 2016), and researchers not attempting to publish studies with null effects (Tincani & Travers, 2019). As a result, meta-analyses of effective practices for students with EBD may overestimate effects, and special education teachers may find practices touted as evidence- or research-based less effective than indicated, contributing to the research-to-practice gap and poorer outcomes for students with EBD (Cook, 2014; Cook & Therrien, 2017).

Results-blind peer review is an approach proposed to reduce publication bias. Results-blind peer review refers to journals conducting reviews with peer reviewers blind to study findings (Findley et al., 2016). In results-blind peer review, reviewers evaluate the quality of the study based on its theoretical and methodological rigor alone; the assumption being that research quality relates to the study’s theoretical underpinnings and methodological rigor, not the direction or magnitude of the findings. Research journals from a number of fields, including political science (Findley et al., 2016), organizational psychology (Woznyj et al., 2018), medicine (Button et al., 2016), and neuroscience (Chambers et al., 2014), have piloted or instituted results-blind peer review as a method of counteracting publication bias. Evaluations of these pilots have generally focused on author and reviewer perceptions after participating in results-blind peer review and yielded mixed results. For example, both Findley et al. (2016) and Woznyj et al. (2018) found that researchers supported the potential for results-blind review in theory, but had varying concerns about logistics and unintended consequences of implementation.

Given the recent acknowledgment on issues of publication bias and advancing of new approaches and methods to increase the transparency and openness of research, special education researchers should investigate the effects of proposed practices to inform decisions to adopt, adapt, or discard these reforms (Cook et al., 2018). Moreover, the tradition of using experimental single-case research methods in special education poses interesting issues and challenges to the development of processes and protocols that emphasize methodological rigor over experimental results (Johnson & Cook, 2019). The intent of this special series, therefore, is to pilot results-blind peer review with special education research in EBD, test its effects on reviewers’ decisions, and assess reviewers’ perceptions of the results-blind peer-review process. We examined whether the editorial perceptions and recommendations of peer reviewers differed when evaluating results-blind and results-included manuscripts, and investigated reviewers’ perceptions of results-blind reviewing. The research is exploratory given the limited amount of research available on results-blind reviewing, and the lack of established methods for investigating and measuring peer review. The specific research questions were as follows:

Research Question 1: Do reviewer editorial recommendations vary as a function of review type (results-blind or results-included)?

Research Question 2: Do reviewer perspectives on the importance and rigor of the manuscript vary as a function of review type?

Research Question 3: Does reviewer confidence in their editorial recommendation vary as a function of review type?

Research Question 4: How effective and equitable do reviewers find the results-blind review process?

Method

Participants

Table 1 presents the characteristics of the 44 reviewers who participated in the study. Participants in this research included individuals who had previously served as a reviewer for Behavioral Disorders. Each participant held a doctoral degree and worked at an institution of higher education. Participating reviewers were largely male and White with most serving as either assistant or associate professors at the time of the study. Moreover, the reviewers were experienced with the peer-review process, with most serving on three or more editorial boards and completing more than 11 reviews in the past year. Finally, half the sample had experience as an associate editor, and a minority had experience as an editor-in-chief.

Table 1.

Reviewer Characteristics.

Characteristic	n (%)
Gender
Male	27 (61.36)
Female	17 (38.64)
Ethnicity
Asian	2 (4.55)
Black	3 (6.82)
Latino	1 (2.27)
White	38 (86.36)
Professional role
Assistant professor	13 (29.55)
Associate professor	21 (47.73)
Full professor	9 (20.45)
Other^a	1 (2.27)
Editorial boards
1–2	15 (34.09)
3–4	14 (31.82)
5+	15 (34.09)
Associate editor experience
Yes	22 (50.00)
No	22 (50.00)
Editor in chief experience
Yes	7 (15.91)
No	37 (84.09)
Reviews completed in past year
1–5	4 (9.09)
6–10	9 (29.55)
11+	25 (61.36)

Recently retired professor.

Procedures

Manuscript recruitment

Manuscripts were solicited from authors with a broad call advertised through professional contacts and networks. The call provided a brief overview of the purpose and underlying rationale for results-blind reviews. Prospective authors were instructed to submit two versions of their manuscript: one being a traditional, results-included paper; and the other with the results, discussion, and all references to the research findings removed (i.e., results-blind version). There were no restrictions on the types of research designs eligible. Following the submission of papers, the journal’s editorial assistant and a member of the research team reviewed each paper to ensure that both a results-included and results-blind version were submitted, and that the results-blind version did not provide information on the results of the study. In total, 14 unique manuscripts were submitted for review. The research designs used across these manuscripts included one meta-analysis, two group-experimental designs, and 11 single-case research studies.

Reviewer assignment

Following the submission of the 14 manuscripts, we randomly selected and sent email invitations to reviewers through the journal’s online management system. Random selection consisted of assigning a number to all individuals in the journal’s review system and matching these to a list of random numbers generated with Microsoft Excel’s random number function. The reviewer whose number corresponded with the randomly generated list was contacted with a standard email describing the purpose and procedures of the study (see Appendix A). The email requested reviewers to conduct reviews of two distinct manuscripts, one of which was results-blind and the other results-included. In addition, the email notified reviewers that after submission of their editorial recommendation, they would be receiving a brief survey designed to gain additional information on the manuscript and review process. Reviewers were given the option to decline participation, at which point the research team sent a new solicitation to another randomly selected reviewer. In all, we contacted 58 potential reviewers with nine declining to review and five providing no response. If reviewers agreed to participate, we randomly assigned them an initial manuscript that was either results-blind or results-included. Upon completion of the first review, we assigned a second manuscript that was the opposite of the previous manuscript type and removed the reviewer from the solicitation pool. That is, each participating reviewer reviewed (a) one results-included manuscript and (b) one different results-blind manuscript. This process was repeated until there were three reviewers for each results-blind and results-included version of the papers. In two instances, a reviewer agreed to participate and reviewed an initial manuscript, but subsequently declined to review a second paper due to time constraints or personal issues. When this occurred, we randomly selected a previously unsolicited reviewer to complete the review. As such, we recruited 44 total reviewers for 14 manuscripts.

Peer-review process

Behavioral Disorders uses a double-blind peer-review process in which authors and reviewers remain confidential throughout the process. Following the submission of all results-included (n = 3) and results-blind (n = 3) reviews for a manuscript, we rendered an initial editorial decision. Because the focus of the study is on results-blind reviewing, the editorial decision was based on the recommendations and comments solely from reviews of the results-blind version of the manuscript. Consistent with the journal’s editorial practice, authors were notified that their paper was accepted, accepted pending minor revision, rejected with an opportunity to resubmit, or rejected. For papers that were not rejected, we sent reviewer comments on both the results-blind and results-included manuscripts, and made additional suggestions for strengthening the manuscripts. Reviewers were given the opportunity to provide additional feedback on the revised manuscript, but were not required to do so.

Measures

The measures used for the current investigation included the editorial recommendations provided by each reviewer and responses from postreview surveys.

Editorial recommendations

Reviewers’ editorial recommendations were collected to determine if manuscript perceptions varied as a function of the type of manuscript reviewed. Following the completion of reviews, each reviewer submitted a recommendation for publication through the journal’s submission system. Options available to reviewers for recommendations included accept, accept with revision, reject and resubmit, and reject. Research team members recorded the response for each response option using an ordinal scale ranging from 0 to 3, with higher values corresponding with more positive recommendations.

Postreview survey

We developed postreview surveys that were administered following the completion of each review (Appendix B). Following the submission of an editorial recommendation, we sent each reviewer an email with a link to the survey through the journal’s submission portal. Three subscales—importance, rigor, and confidence—were used in the inferential data analysis. The fourth subscale on perspectives of the results-blind reviewing process was only administered following review of results-blind papers and used for descriptive purposes.

Importance scale

The importance subscale consisted of five items designed to understand the reviewer perception of the importance of the research to the discipline. Each item was rated on a 5-point scale ranging from strongly agree to strongly disagree. Items included (a) “the research represents an important topic area,” (b) “the research advances our knowledge in the topic area,” (c) “the research will have a meaningful impact on the field,” (d) “the journal’s audience will benefit from dissemination of this research,” and (e) “the journal’s audience will benefit from dissemination of this research.” The alpha for this scale was .80.

Rigor scale

The rigor subscale consisted of four items designed to capture reviewer perception of the rigor of the reviewed research. Each item was rated on a 5-point scale ranging from strongly agree to strongly disagree. Items include (a) “the research design was appropriate to address the research questions,” (b) “the researchers implemented the design procedures appropriately, (c) “the research represents a rigorous evaluation of the questions,” and (d) “there are methodological problems with the study that threaten the validity of findings.” The final item was reverse coded. The alpha for this scale was .43.

Confidence scale

The confidence subscale consisted of three items designed to measure reviewer confidence in their editorial recommendation. Each item was rated on a 5-point scale ranging from strongly agree to strongly disagree. Items include (a) “I feel confident that I identified the primary methodological weaknesses in the study,” (b) “I feel confident in my recommendation to the editor,” and (c) “the manuscript contained all the information I needed to make an informed decision.” The alpha for this scale was .44.

Results-blind perspectives

The results-blind perspectives subscale consisted of seven items focused on understanding reviewer perceptions of the review process. This subscale was only administered following review of results-blind papers. Each item was rated on a 5-point scale ranging from strongly agree to strongly disagree. Example items included (a) “results-blind reviewing is a fair method,” (b) “I would like to review results-blind manuscripts in the future,” (c) “I recommend results-blind reviewing for other journals,” and (d) “I endorse the use of results-blind reviewing for the field.” The subscale concluded with three open-ended response questions to allow reviewers to provide additional insight about the process. Open-ended questions prompted reviewers to describe the challenges they confronted with results-blind reviewing, the perceived strengths of the approach, and an opportunity to share any additional insight. The alpha for this scale was .88.

Experimental Design

We conducted a randomized control trial, with the results-included version of the manuscript serving as the counterfactual condition for the results-blind version. Each participant was exposed to both conditions except for the two instances where reviewers did not complete the second review, as noted previously.

Data Analysis

Descriptive statistics included tallying the frequencies of each response and computing proportions of responses across reviewers. Following the computation of descriptive statistics, we compared reviewer item-level responses for results-blind and results-included manuscripts using a chi-square analysis to evaluate the relationship of reviewer responses as a function of review type (results-blind or results-included). Following the initial analyses, we used multilevel modeling to compare recommendations from results-blind and results-included reviews. That is, we conceptualized peer-review recommendations and perspectives as depending on the manuscript quality and unmeasured aspects of the reviewer such as training, experience, and judgment. As such, we used three-level models with reviewer responses at Level 1, reviewer variance at Level 2, and manuscript variability at Level 3. We implemented two types of multilevel models, including (a) an ordered logistic regression for reviewer recommendations and (b) a mixed-effects, means-as-outcomes model for continuous outcomes obtained from survey responses.

For the ordered logistic regression models, the model assumptions included (a) the dependent variable was ordered, (b) there was no multicollinearity between the independent variables, and (c) the proportional odds for each outcome were the same (Fullerton, 2009). For the current analysis, only the proportional odds of each outcome were a concern because we coded the recommendation variable to ensure ordering and the analyses included only one independent variable, making multicollinearity moot. As such, we tested the proportional odds assumption with the Brant test of parallel regression. Results supported the assumption that relationship between each pair of outcomes was the same χ² = 2.14, p = .34.

Mixed-effects models used reviewer responses on the scales as the fixed-effect dependent variable and the reviewer and manuscript as the random effects. Model assumptions were tested, with results indicating that the data were linear, due to the dichotomous dependent variable of review type, and homogeneous, based on Leven’s test for equal variances. Moreover, the reviewer responses were independent due to the blinding of manuscripts, though unmeasured reviewer variance was accounted in the second level of the model as noted. We fit null models for each outcome and then added the manuscript type as a covariate to investigate its impact and evaluated if the covariate improved model fit using the likelihood-ratio test. Finally, we computed odds ratios for the likelihood of acceptance of an initial decision based on whether the study used a single-case research design or an alternative design such as meta-analysis or group experimental. Inferential data analyses were conducted using STATA’s meologit and xtmixed commands.

Results

Editorial Recommendations

The sample for editorial recommendations consisted of 84 unique reviews across 44 total reviewers. As noted, we recruited two additional reviewers because two reviewers who initially agreed to participate later withdrew due to either time constraints or personal issues. Table 2 provides an overview of the recommendations provided on the manuscripts by results-blind and results-included review procedures. As can be seen, the initial comparison, not accounting for the nested structure of reviewer, revealed no difference in the recommendations. Because editorial recommendations likely vary as a product of reviewer training, experience, and perspective, as well as manuscript characteristics, such as methodological quality and reporting, multilevel analyses were undertaken to account for these sources of variabilities. Results of the multilevel analyses are available in the supplemental documents. Results supported initial analyses that there were no statistical differences in the recommendations as a function of the type of review. Finally, we conducted post hoc comparisons regarding the difference in acceptance rates between studies using single-case methods and those using other research designs. Results indicated no differences in the odds of acceptance (OR = 0.29, 95% CI = [0.02, 4.24]).

Table 2.

Frequency and Comparison of Recommendations by Manuscript Type.

Review type	Reject	Reject and resubmit	Accept pending revision	Accept	χ² (p-value)
Results blind	10 (23.81%)	26 (61.90%)	5 (11.90%)	1 (2.38%)	2.50 (.48)
Results included	11 (26.19%)	20 (47.62%)	10 (23.81%)	1 (2.38%)	2.50 (.48)

Postreview Survey Responses

The target sample for these analyses was the same as for the editorial recommendations. One reviewer of a results-blind manuscript and two reviewers of results-included manuscripts did not complete the postreview survey. As such, the total number of responses is 41 for results-blind papers and 40 for results-included papers, representing 98% and 95% response rates, respectively. In addition, as noted in Tables 3 –5, some respondents did not complete all items for both surveys resulting in lower response rate numbers for specific items.

Table 3.

Importance Scale Responses and Comparison by Results-Blind and Results-Included Review Procedures.

Item	Review type	Strongly disagree	Disagree	Neither agree nor disagree	Agree	Strongly agree	χ² (p-value)
Research represents an important contribution	Results blind	0 (0.00%)	5 (12.20%)	10 (24.39%)	24 (58.54%)	2 (4.88%)	3.22 (.52)
Research represents an important contribution	Results included	1 (2.50%)	9 (22.50%)	6 (15.00%)	22 (55.00%)	2 (5.00%)	3.22 (.52)
Research advances our knowledge in the topic area	Results blind	2 (4.88%)	8 (19.51%)	14 (34.15%)	15 (36.59%)	2 (4.88%)	1.34 (.86)
Research advances our knowledge in the topic area	Results included	2 (5.00%)	10 (25.00%)	10 (25.00%)	17 (42.50%)	2 (4.88%)	1.34 (.86)
Research will have meaningful impact on field	Results blind	1 (2.44%)	7 (17.07%)	22 (53.66%)	9 (21.95%)	2 (%)	6.05 (.20)
Research will have meaningful impact on field	Results included	4 (10.00%)	11 (27.50%)	15 (37.50%)	10 (25.00%)	0 (0.00%)	6.05 (.20)
Journal’s audience will benefit from the research	Results blind	0 (0.00%)	10 (24.39%)	12 (29.27%)	16 (39.02%)	3 (7.32%)	5.59 (.23)
Journal’s audience will benefit from the research	Results included	4 (10.00%)	7 (17.50%)	11 (27.50%)	17 (40.74%)	1 (2.50%)	5.59 (.23)
Research is educationally significant	Results blind	1 (2.44%)	6 (14.63%)	15 (36.59%)	15 (36.59%)	4 (9.76%)	8.39 (.08)
Research is educationally significant	Results included	2 (5.00%)	11 (27.50%)	8 (20.00%)	19 (47.50%)	0 (0.00%)	8.39 (.08)

Note. The response rate on the postreview survey was 41 for the results-blind manuscripts and 40 for the results-included manuscripts out of 42 potential responses for each.

Table 4.

Rigor Scale Responses and Comparison by Results-Blind and Results-Included Review Procedures.

Item	Review type	Strongly disagree	Disagree	Neither agree nor disagree	Agree	Strongly agree	χ² (p-value)
Research design was appropriate	Results blind	2 (4.88%)	7 (17.07%)	8 (19.51%)	21 (51.22%)	3 (7.32%)	0.42 (.98)
Research design was appropriate	Results included	1 (2.50%)	6 (15.00%)	8 (20.00%)	22 (55.00%)	3 (7.50%)	0.42 (.98)
Research was implemented appropriately	Results blind	1 (2.44%)	9 (21.95%)	23 (56.10%)	7 (17.07%)	1 (2.44%)	13.33 (.01)
Research was implemented appropriately	Results included^a	1 (2.56%)	7 (17.95%)	9 (23.08%)	21 (53.85%)	1 (2.56%)	13.33 (.01)
Research represents a rigorous evaluation	Results blind	3 (7.32%)	14 (34.15%)	16 (39.02%)	6 (14.63%)	2 (4.88%)	10.06 (.04)
Research represents a rigorous evaluation	Results included	3 (7.50%)	14 (35.00%)	7 (17.50%)	16 (40.00%)	0 (0.00%)	10.06 (.04)
Methodological problems exist that threaten findings	Results blind	1 (2.44%)	6 (14.63%)	14 (34.15%)	12 (29.27%)	8 (19.51%)	3.53 (.47)
Methodological problems exist that threaten findings	Results included	0 (0.00%)	9 (22.50%)	9 (22.50%)	16 (40.00%)	6 (15.00%)	3.53 (.47)

Note. The response rate on the postreview survey was 41 for the results-blind manuscripts and 40 for the results-included manuscripts out of 42 potential responses for each.

Missing response for this item.

Table 5.

Confidence Scale Responses and Comparisons by Results-Blind and Results-Included Review Procedures.

Item	Review type	Strongly disagree	Disagree	Neither agree nor disagree	Agree	Strongly agree	χ² (p-value)
Reviewer confidence in identifying primary weaknesses	Results blind	0 (0.00%)	6 (14.63%)	8 (19.51%)	24 (58.54%)	3 (7.32%)	5.25 (.16)
Reviewer confidence in identifying primary weaknesses	Results included	0 (0.00%)	1 (2.50%)	7 (17.50%)	25 (62.50%)	7 (19.44%)	5.25 (.16)
Reviewer confidence in their recommendation	Results blind	1 (2.44%)	1 (2.44%)	5 (12.20%)	29 (70.73%)	5 (12.20%)	4.36 (.36)
Reviewer confidence in their recommendation	Results included	0 (0.00%)	2 (5.00%)	1 (2.50%)	33 (82.50%)	4 (10.00%)	4.36 (.36)
Manuscript contained all necessary information	Results blind	10 (24.39%)	13 (31.71%)	9 (21.95%)	9 (21.95%)	0 (0.00%)	28.17 (<.001)
Manuscript contained all necessary information	Results included	0 (0.00%)	8 (20.00%)	2 (5.00%)	25 (62.50%)	5 (12.50%)	28.17 (<.001)

Note. The response rate on the postreview survey was 41 for the results-blind manuscripts and 40 for the results-included manuscripts out of 42 potential responses for each.

Importance scale

Table 3 presents reviewer responses on the importance subscale for results-blind and results-included manuscripts. Results indicated no differences in reviewer ratings of manuscript importance on the five items. For the total importance scale, the multilevel analyses accounting for the importance scale indicated no difference between the perceived importance of the results-blind and results-included manuscripts (see Supplemental materials for tables and output).

Rigor scale

Table 4 presents reviewer responses on the rigor scale for results-blind and results-included manuscripts. Results indicated no differences in reviewer perceptions of the research design and the presence of methodological problems. In contrast, reviewer ratings regarding the implementation of the research were significantly more positive for the results-included manuscripts. Moreover, reviewers’ ratings were significantly more positive regarding the rigor of the research for results-included papers. For the total rigor scale, the multilevel analyses indicated that reviewers rated results-blind manuscripts as less rigorous than results-included manuscripts (see Supplemental materials for tables and output).

Confidence scale

Table 5 presents reviewer responses on the confidence subscale for results-blind and results-included manuscripts. Results indicated no differences in reviewer confidence in identifying primary methodological weaknesses of the research and confidence in their editorial recommendations. However, there were differences in reviewer perceptions that the manuscript contained all the information needed to make a recommendation, with more agreement that results-included papers contained the necessary information. For the total confidence scale, the multilevel analyses indicated that reviewers reported less confidence in their recommendations for results-blind manuscripts than results-included manuscripts (see Supplemental materials for tables and output).

Results-blind perspectives

Table 6 presents the results of reviewer perceptions on the results-blind reviewing process. Results indicated a small majority of reviewers agreed results-blind reviewing is a fair approach to peer review, would like to review results-blinded manuscripts in the future, would recommend results-blind reviewing for other journals, and endorsed the process. Interestingly, most reviewers did not agree that results-blind reviewing is an effective peer-review method or an improvement on traditional methods.

Table 6.

Reviewer Perspectives of Results-Blind Reviewing.

Item	Strongly disagree	Disagree	Neither agree nor disagree	Agree	Strongly agree
Results-blind reviewing is a fair method	3 (7.32%)	8 (19.51%)	7 (17.07%)	17 (41.46%)	6 (14.63%)
Results-blind reviewing is an effective method^a	2 (5.00%)	9 (22.50%)	10 (25.00%)	12 (30.00%)	7 (17.50%)
Results-blind reviewing is an improvement on traditional methods^a	2 (5.00%)	9 (22.50%)	15 (37.50%)	10 (25.00%)	4 (10.00%)
I would like to review results-blind papers in the future	4 (9.76%)	6 (14.63%)	6 (14.63%)	16 (39.02%)	9 (21.95%)
Results-blind reviewing can strengthen the field	3 (7.32%)	6 (14.63%)	10 (24.39%)	14 (34.15%)	8 (19.51%)
I recommend results-blind review for other journals	2 (4.88%)	9 (21.95%)	11 (26.83%)	11 (26.83%)	8 (19.51%)
I endorse the use of results-blind reviewing for the field	2 (4.88%)	9 (21.95%)	9 (21.95%)	12 (29.27%)	9 (21.95%)

Note. The response rate on the postreview survey was 41 for the results-blind manuscripts out of 42 potential responses.

Missing response for this item.

Discussion

The purpose of this investigation was to evaluate results-blind reviewing to determine if editorial recommendations and reviewer perceptions of study importance, study rigor, and confidence in recommendations vary as a function of the presence or absence of results in the manuscript reviewed. In addition, the research included a descriptive component focused on reviewer perceptions of the results-blind reviewing process. Results indicated that reviewers’ editorial recommendations and perceptions of manuscript importance did not vary as a function of the type of review conducted. However, the perceived rigor of the research and confidence in recommendations did differ by type of manuscript. Results of the descriptive analyses indicated mixed perceptions on the fairness and effectiveness of results-blind reviewing, though there appears to be interest in and support for results-blind review of special education research. In the following sections, we describe the results in relation to previous research and commentaries on results-blind reviewing and provide considerations for the adoption of methods to promote the dissemination of research based on methodological rigor, and not the direction or magnitude of the results, with specific considerations for single-case research methods.

Results-Blind Review

Editorial recommendations

This exploratory investigation suggests that results-blind reviewing does not lead to differing editorial recommendations from reviewers. However, results-included reviews did result in acceptance and publication of multiple studies with null findings. Based on the editorial recommendations of results-blind reviews (we did not consider the recommendations of results-included reviews when making editorial decisions), we accepted eight papers, three of which had null effects, two with small or variable effects, and three with large or moderate effects. Thus, although reviewer recommendations did not vary by manuscript type, it appears that the use of results-blind review resulted in the submission and acceptance of studies with null results. This suggests results-blind peer review may be an effective way to address publication bias (Lee & Moher, 2017).

Reviewer perceptions

Findings suggest that the perceived importance of the studies did not vary based on the inclusion of results. Regarding the lack of differences in perceived importance, Locascio (2017) noted that research importance is a main consideration for peer review and is typically communicated in the introduction section where a study’s relevance, contribution, and relationship to previous research is established. Both results-blind and results-included submissions included identical “Introduction” sections, which might explain similar assessments on the importance of the research across review types. That is, reviewers of results-blind and results-included manuscripts worked from the same information for this particular section, resulting in similar perceptions.

Whereas findings revealed no differences in editorial recommendations and perceived importance of results-blind and results-included papers, reviewer perceptions of study rigor and their confidence in editorial recommendations did vary as a function of the type of manuscript reviewed. For instance, reviewers rated results-blind manuscripts, on average, approximately one point less rigorous than results-included manuscripts (see Supplemental materials). Item-level analyses suggest that results-included reviews were associated with higher ratings for rigor of design procedure and rigorous evaluation of research questions, as well as confidence that the reviewer was provided all necessary information—despite both types of reviews including the same method sections. When considered in the context of no differences between manuscript types for ratings of the presence of methodological problems, it appears the presence of results influenced reviewer perceptions on the execution of the research. Although future research on results-blind reviewing is needed to parse the impact of results on reviewers’ perceptions of rigor, we offer two possible explanations. First, it is possible that reviewers sought additional methodological details in the results-blind manuscripts that they felt may have been included in the results section. For example, researchers designing a rigorous study do not guarantee that the study was conducted rigorously, and reviewers may have wanted to see the results section to examine how the study was actually conducted. As such, a possible approach moving forward is for results-blind papers to receive conditional acceptance pending review of the entire manuscript. This might alleviate reviewer concerns about study rigor by ensuring researchers implemented key methodological components as described previously (similar to registered reports; Chambers, 2019). Second, the differences in perceived rigor might suggest the need for training on results-blind reviewing to orient reviewers to the method of evaluating rigor based solely on the information presented in the “Method” section (e.g., research design, implementation validity, reliability, and validity of outcome measures; Grand et al., 2018).

Results-blind reviewing process

The descriptive results indicated a mixed perception of results-blind reviewing with most respondents suggesting that the method is fair and has the potential to strengthen the field. However, only a minority of the participants found results-blind reviewing effective, and most did not recommend the practice for other journals or were unsure. Responses on the results-blind reviewing process are consistent with the previous research that has found openness to the process, but some reticence for widespread adoption (Woznyj et al., 2018).

Implications for Open and Transparent Single-Case Research

Single-case research methods remain prominent and frequently used within special education to examine the effects of practices for students with EBD. It is not surprising, therefore, that the submissions to the special issue consisted mostly of single-case research designs. The number of submissions to this special series using single-case research provides an opportunity to consider issues related to open and transparent reporting of single-case research. Specifically, the applied, inductive tradition of single-case research methods pose some interesting issues for open and transparent research, because internal validity traditionally depends on the presence of a functional relation (see Johnson & Cook, 2019). Participating reviewers echoed these sentiments in open-ended comments, with many noting the difficulty in evaluating a single-case design without knowing the results. In particular, reviewers reported needing graphed data to support visual analysis, a hallmark of single-case methods, to determine if fatal issues with the design existed. Fortunately, reviewers provided recommendations for improving the review of results-blind process for single-case research methods, and we highlight four of the most common.

First, reviewers suggested that authors include example graphs that are either blank or include mock data, which depict the hypothesized relationship between the student’s functional need and the intervention. Consistent with conceptual descriptions on the function of graphs in single-case research (Kennedy, 2005), reviewers noted that an example graph would allow for a clearer understanding of the research design used. That is, reviewers noted that graphs, even those that included only the phase lines but not data, would provide important information regarding the appropriateness of the design to address the research questions, and facilitate a better understanding of the proposed relationship between the intervention and student function.

Second, reviewers posited that explicit description of the hypothesized data patterns and changes to the data patterns following intervention would provide important context for evaluating results-blind single-case designs. That is, some results-blind single-case reviewers suggested they would benefit from a detailed overview of the form of the anticipated data pattern during baseline and the expected change once intervention is applied. Johnson and Cook (2019) provided a framework for describing the hypothesized relationship between the independent and dependent variables that details the predicted direction of the dependent variable across each of the phases. Formal descriptions of the data patterns prior to revealing the results support the development of stronger hypotheses and underscores the proposed relationship between the behavior of interest and the selected intervention. When coupled with an example graph, the information can support reviewers evaluating (a) predictions of the relation between the independent and dependent variables in single-case designs and (b) the extent to which the research design addresses the research questions.

Third, reviewers recommended that authors provide more detailed descriptions of the baseline and intervention conditions to assist reviewers in conceptualizing the researchers’ procedural manipulations and their relationship to the hypothesized function of the target behavior. Procedural differences between baseline and intervention conditions provide readers with the information needed to determine the key variables responsible for changes in behavior and are often underreported in single-case studies (Ledford & Gast, 2014). Given that reviewers of results-blind single-case papers do not have access to the graphed results, the reporting of procedural differences provides additional information for reviewers to determine if the modifications represent meaningful alterations to the environment to support behavioral change (Lane et al., 2007).

Fourth, reviewers emphasized the need for authors to better attend to critical aspects of methodological rigor in single-case designs, such as the reporting of interobserver agreement, intervention fidelity, and participant attrition. Although reporting standards in special education exist (Odom et al., 2005), it is incumbent on authors to include relevant information in their reports. As the field continues to explore the potential of open and transparent research practices to address sources of potential bias in the research, developing additional guidance for authors and reviewers on the most critical methodological features to address might lead to stronger submissions and more valid reviews.

Limitations

The results of the current review must be interpreted in light of the following limitations. First, the investigation is based on a unique sample of studies, which introduces questions regarding the comparability of the studies to those typically published in Behavioral Disorders and other journals in the field. For example, the call for manuscripts for a results-blind special issue resulted in an atypically large proportion of submissions with null findings. In addition, whereas the large proportion of single-case studies is reflective of the journal content, results-blind reviewing poses a number of conceptual challenges as previously discussed. Second, the stringency of the research design in which reviewers were exposed to both conditions required us to account for the nested structure of the data and reduced power more than if reviewers were responsible for a single manuscript. It is worth noting that results not accounting for nesting also resulted in nonsignificance, supporting the finding that review type did not influence editorial recommendations. Third, the response rate on the postreview surveys was high, but resulted in a design that was unbalanced. Fourth, the reliability for two of the four perception scales was low and therefore the results should be interpreted with caution (Gleser, 1992). As such, we deemphasized these findings and relied primarily on frequency tests of item-level reliability to draw conclusions. Finally, reviewers were informed that they were participating in an investigation of results-blind reviewing, which may have affected their reviews, including results-included reviews of studies with null effects.

Summary

The current investigation represents an initial attempt to experimentally evaluate results-blind reviewing and serves as an opening to the special series on the topic. In coming issues, papers accepted as part of the special series will be published with a banner indicating that the paper underwent a results-blind reviewing process. Regarding the current investigation, it appears that whereas results-blind reviewing did not lead to differences in editorial recommendations, the number of papers accepted with null results suggests that results-blind reviewing might serve as an effective means for focusing reviewer attention on methodological rigor rather than the significance or size of the results. We recommend further research to examine the effects of results-blind reviews and other reforms intended to increase the openness and transparency of research.

Supplemental Material

Supplemental_Materials – Supplemental material for Introduction to the Special Series on Results-Blind Peer Review: An Experimental Analysis on Editorial Recommendations and Manuscript Evaluations

Supplemental material, Supplemental_Materials for Introduction to the Special Series on Results-Blind Peer Review: An Experimental Analysis on Editorial Recommendations and Manuscript Evaluations by Daniel M. Maggin, Rachel E. Robertson, Bryan G. Cook, Daniel M. Maggin, Rachel E. Robertson and Bryan G. Cook in Behavioral Disorders

Footnotes

Appendix A

Appendix B

Acknowledgements

The authors would like to thank the reviewers who participated in this research and provided their evaluations of both results-blind and results-included papers and for completing the surveys.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: The lead author for this manuscript acknowledges that he is the editor of the journal which might be perceived as a conflict of interest.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Daniel M. Maggin

Bryan G. Cook

Supplemental Material

Supplemental material for this article is available online.

References

Button

K. S.

Bal

Clark

Shipley

(2016). Preventing the ends from justifying the means: Withholding results to address publication bias in peer-review. BMC Psychology, 4, Article 59. https://doi.org/10.1186/s40359-016-0167-7

Chambers

(2019). What’s next for registered reports? Nature, 573, 187–189. https://doi.org/10.1038/d41586-019-02674-6

Chambers

Feredoes

E. D.

Muthukumaraswamy

S. J.

Etchells

(2014). Instead of “playing the game” it is time to change the rules: Registered reports at AIMS neuroscience and beyond. AIMS Neuroscience, 1, 4–17. https://doi.org/10.3934/Neuroscience.2014.1.4

Chong

S. W.

Collins

N. F.

C. Y.

Liskaser

G. M.

Peyton

P. J.

(2016). The relationship between study findings and publication outcome in anesthesia research: A retrospective observational study examining publication bias. Canadian Journal of Anesthesia, 63(6), 682–690. https://doi.org/10.1007/s12630-016-0631-0

Chow

J. C.

Ekholm

(2018). Do published studies yield larger effect sizes than unpublished studies in education and special education? A meta-review. Educational Psychology Review, 30, 727–744. https://doi.org/10.1007/s10648-018-9437-7

Cook

B. G.

(2014). A call for examining replication and bias in special education research. Remedial and Special Education, 35(4), 233–246. https://doi.org/10.1177/0741932514528995

Cook

B. G.

Lloyd

J. W.

Mellor

Nosek

B. A.

Therrien

W. J.

(2018). Promoting open science to increase the trustworthiness of evidence in special education. Exceptional Children, 85(1), 104–118. https://doi.org/10.1177/0014402918793138

Cook

B. G.

Therrien

W. J.

(2017). Null effects and publication bias in special education research. Behavioral Disorders, 42(4), 149–158. https://doi.org/10.1177/0198742917709473

Dwan

Gamble

Williamson

P. R.

Kirkham

J. J.

(2013). Systematic review of the empirical evidence of study publication bias and outcome reporting bias—An updated review. PLOS ONE, 8(7), Article e66844. https://doi.org/10.1371/journal.pone.0066844

10.

Findley

M. G.

Jensen

N. M.

Malesky

E. J.

Pepinsky

T. B.

(2016). Can results-free review reduce publication bias? The results and implications of a pilot study. Comparative Political Studies, 49(13), 1667–1703. https://doi.org/10.1177/0010414016655539

11.

Francis

(2012). Too good to be true: Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin & Review, 19(2), 151–156. https://doi.org/10.3758/s13423-012-0227-9

12.

Franco

Malhotra

Simonovits

(2014). Publication bias in the social sciences: Unlocking the file drawer. Science, 345, 1502–1505. https://doi.org.10.1126/science.1255484

13.

Fullerton

A. S.

(2009). A conceptual framework for ordered logistic regression models. Sociological Methods & Research, 38(2), 306–347. https://doi.org/10.1177/0049124109346162

14.

Gage

N. A.

Cook

B. G.

Reichow

(2017). Publication bias in special education meta-analyses. Exceptional Children, 83(4), 428–445. https://doi.org/10.1177/0014402917691016

15.

Gleser

L. J.

(1992). The importance of assessing measurement reliability in multivariate regression. The Journal of the American Statistical Association, 87, 696–707. https://doi/abs/10.1080/01621459.1992.10475271

16.

Grand

J. A.

Rogelberg

S. G.

Banks

G. C.

Landis

R. S.

Tonidandel

(2018). From outcome to process focus: Fostering a more robust psychological science through registered reports and results-blind reviewing. Perspectives on Psychological Science, 13, 448–456. https://doi.org/10.1177/1745691618767883

17.

Heene

Ferguson

C. J.

(2017). Psychological science’s aversion to the null, and why many of the things you think are true, aren’t. In Lilienfeld

S. O.

Waldman

I. D.

(Eds.), Psychological science under scrutiny: Recent challenges and proposed solutions (pp. 34–52). Wiley-Blackwell. https://doi.org/10.1002/9781119095910.ch3

18.

Humphreys

De la Sierra

R. S.

Van der Windt

(2013). Fishing, commitment, and communication: A proposal for comprehensive nonbinding research registration. Political Analysis, 21(1), 1–20. https://doi.org/10.1093/pan/mps021

19.

Johnson

A. H.

Cook

B. G.

(2019). Preregistration in single-case design research. Exceptional Children, 86(1), 95–112. https://doi.org/10.1177/0014402919868529

20.

Kennedy

C. H.

(2005). Single-case designs for educational research. Allyn & Bacon.

21.

Lane

Wolery

Reichow

Rogers

(2007). Describing baseline conditions: Suggestions for study reports. The Journal of Behavioral Education, 16(3), 224–234. https://doi.org/10.1007/s10864-006-9036-4

22.

Ledford

J. R.

Gast

D. L.

(2014). Measuring procedural fidelity in behavioural research. Neuropsychological Rehabilitation, 24(3-4), 332–348. https://doi.org/10.1080/09602011.2013.861352

23.

Lee

C. J.

Moher

(2017). Promote scientific integrity via journal peer review data. Science, 357, 256–257. https://doi.org/10.1126/science.aan4141

24.

Locascio

J. J.

(2017). Results blind science publishing. Basic and Applied Social Psychology, 39, 239–246. https://doi.org/10.1080/01973533.2017.1336093

25.

Odom

S. L.

Brantlinger

Gersten

Horner

R. H.

Thompson

Harris

K. R.

(2005). Research in special education: Scientific methods and evidence-based practices. Exceptional Children, 71(2), 137–148. https://doi.org/10.1177/001440290507100201

26.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716

27.

Polanin

J. R.

Tanner-Smith

E. E.

Hennessy

E. A.

(2016). Estimating the difference between published and unpublished effect sizes: A meta-review. Review of Educational Research, 86(1), 207–236. https://doi.org/10.3102/0034654315582067

28.

Rosenthal

(1979). The “file drawer problem” and tolerance for null results. Psychological Bulletin, 86, 638–641. https://doi.org/10.1037/0033-2909.86.3.638

29.

Shadish

W. R.

Zelinsky

N. A. M.

Vevea

J. L.

Kratochwill

T. R.

(2016). A survey of publication practices of single-case design researchers when treatments have small or large effects. The Journal of Applied Behavior Analysis, 49, 656–673. https://doi.org/10.1002/jaba.308

30.

Shrout

P. E.

Rodgers

J. L.

(2018). Psychology, science, and knowledge construction: Broadening perspectives from the replication crisis. Annual Review of Psychology, 69, 487–510. https://doi.org/10.1146/annurev-psych-122216-011845

31.

Tincani

Travers

(2019). Replication research, publication bias, and applied behavior analysis. Perspectives on Behavior Science, 42, 59–75. https://doi.org/10.1007/s40614-019-00191-5

32.

Woznyj

H. M.

Grenier

Ross

Banks

G. C.

Rogelberg

S. G.

(2018). Results-blind review: A masked crusader for science. European Journal of Work and Organizational Psychology, 27(5), 561–576. https://doi.org/10.1080/1359432X.2018.1496081

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.05 MB