Abstract
The implementation of research-based practices by teachers in public school classrooms is required under federal law as expressed in the Individuals With Disabilities Education Improvement Act of 2004. To aid teachers in identifying such practices, researchers conduct systematic reviews of the educational literature. Although recent attention has been given to changes in the quality of these reviews, there has been minimal discussion about changes in the quality of the studies that comprise them. Specifically, to what extent have educational policies leading to the creation of experimental design standards resulted in a change in the rigor of educational research? Using a subset of the single-case literature commonly published in special education journals, we estimate the impact of What Works Clearinghouse single-case design standards on the trend in the rigor of single-case studies using a comparative interrupted time series framework. Within this subset of single-case studies, our estimation strategy did not detect a change in the trend of the rigor of single-case research following the establishment of What Works Clearinghouse single-case design standards. Implications are discussed for practitioners and researchers.
In this article, we overview the movements and policies resulting in federal mandates requiring public-school teachers to use educational practices supported by rigorous scientific research. Given the impetus on researchers to support teachers by identifying practices meeting criteria to be considered research-based, we estimate the impact of educational policies leading to the creation of quality standards on the rigor of a subset of special education research. Specifically, we examined research using single-case experimental designs to evaluate the effects of the system of least prompts (SLP) response prompting procedure, as this body of research extends across decades and comprises studies with students and children with disabilities. Our estimation method uses a comparative interrupted time series framework to estimate the impact of design standards created for single-case experiments on the trend in the rigor of published single-case studies evaluating SLP. The introduction of this article discusses policies related to the evidence-based practice movement in education. The “Method” section presents our analysis procedures and is followed the “Results” section. The final section describes the implications of our findings for practitioners and researchers. In addition, we have included a series of graphic displays to assist readers in interpreting our findings and hosted our data and syntax in Microsoft Excel and Stata formats on Open Science Framework at https://osf.io/xp7wv/. Supplemental tables and figures referenced throughout this article are also hosted on Open Science Framework.
Research-Based Practices in Education
Over the last 20 years, educational policy has brought scientifically supported practice to the forefront of teacher responsibilities. Although preference and familiarity with practices are important considerations for many teachers, policy dictates that teachers use practices that are “research-based” (Individuals With Disabilities Education Improvement Act [IDEA], 2004, p. 2787). This notion of research supporting instruction derived from evidence-based medicine (Sackett et al., 1996) whereby the use of scientifically supported medical practices and treatments guide decisions about patient care. Within schools, the evidence-based practice (EBP) movement attempts to bring scientifically supported educational practices, curricula, and services to all students. Origins of the EBP movement are evident in the 1997 reauthorization of IDEA. In the reauthorization, school personnel were tasked with identifying the functions of a student’s challenging behavior, and subsequently developing a reinforcement-based behavior plan based on identified functions. In response to this mandate, many in the field of special education expressed concerns that the policy requirements extended beyond the available resources allotted to students through school-based services (Nelson et al., 1999; Reid & Nelson, 2002); these concerns brought to light the difficulties in translating research to practice. In the years to follow, implementation science emerged to promote the uptake of research-based practices in medicine, policy, education, and other fields. In 2002, the Education Sciences Reform Act echoed the goals put forth in the field of implementation science by establishing resources to help transition research into practice and support the implementation of EBPs in schools. These resources were housed under the newly created and federally funded Institute of Education Sciences (IES) and its affiliated centers and agencies. In 2002, IES established the What Works Clearinghouse (WWC) to synthesize and identify educational practices that have been rigorously evaluated and found to lead to improved student outcomes.
Given that school personnel were required to use research-based practices, WWC served as a guide for personnel to identify such practices. That is, rather than have teachers and administrators comb the research literature and make their own evaluations about which practices may be considered research-based, WWC did the job for them. To support this endeavor and promote non-biased, transparent, and replicable procedures when conducting a review, WWC developed standards by which to evaluate the rigor and effects of identified studies (WWC, 2008). In initial reviews, WWC relied heavily on randomized control trials to identify research-based practices. For teachers, special educators, and related service providers who worked primarily with students with low incidence disabilities, WWC’s reliance on randomized control trials was insufficient as there were few trials that evaluated practices with these populations of students. Rather, single-case designs (SCDs) were commonly employed as they allowed for experimental evaluations with few participants. Single-case methodology was not new, as findings from bodies of literature comprised primarily of single-case studies were used to inform the previously mentioned mandates found in the 1997 reauthorization of IDEA. In addition, robust evaluations of practices to improve outcomes for early childhood and grade school students began in the 1980s in the fields of applied behavior analysis and special education (Odom, 1988). The necessity for SCDs to identify effective practices for populations of students with disabilities was well documented prior to the establishment of WWC and the EBP movement (Wolery, 2013).
In 2005, Horner and colleagues created standards to evaluate the rigor and effects of SCDs to identify research-based practices within the field of special education. In 2010, WWC followed suit and developed standards specific to SCDs to identify research-based practices evaluated through single-case methodology within WWC affiliated reviews (Kratochwill et al., 2010). Although other SCD standards developed as a result of the EBP movement, the WWC standards quickly became prominent in published SCD syntheses and reviews of the literature (Kiuhara et al., 2017). In addition, the WWC’s standards were the only SCD standards developed under the auspices of the U.S. Department of Education. The recognition of SCD research as a valid methodology in education paralleled an increase in the number syntheses and reviews comprised of single-case studies (Jamshidi et al., 2018). The increased attention provided to SCDs brought about further discussions concerning the rigor and quality of single-case research. Arguably, the most meaningful point of consensus reached by single-case researchers was that evaluations of single-case studies should be done separately for rigor and outcomes (Ledford et al., 2018; Maggin et al., 2014; O’Keefe et al., 2012; Wendt & Miller, 2012). Evaluating SCD rigor separately from outcomes may be critically important given the design logic of SCDs. That is, outcomes cannot be evaluated if the design on which the outcomes are based failed to meet minimum standards to indicate that the study is internally valid (Ledford et al., 2018). In response to the separation of rigor and outcomes when evaluating SCDs, gating procedures were recommended when identifying research-based practices, such that if a study lacked adequate rigor, then any outcomes should not be evaluated or used within a determination regarding the classification of a practice as research-based.
In recent years, additional discussions surrounding SCDs and corresponding standards for determining research-based practices have continued (e.g., Ledford et al., 2018; Wolery, 2013). These discussions have led to the iterative development of additional quality standards by WWC (2014, 2017), professional organizations (Council for Exceptional Children; Cook et al., 2014), and SCD researchers (Ledford et al., 2016; Reichow et al., 2018; Tate et al., 2016). Recent reviews of meta-analyses have found that present day researchers synthesizing single-case literature are adhering to rigorous standards more than they have in the past (Jamshidi et al., 2018). However, despite (a) the attention and recognition given to SCD methodology, (b) nuanced discussions concerning standards for evaluating SCDs, and (c) increasing trends in reviews of the single-literature, there has been little, if any, discussion concerning the impact of the EBP movement on the rigor of single-case studies. That is, the extent to which single-case researchers are responding to the EBP movement and as a result conducting more rigorous research is unknown. Ideally, the rigor of SCD studies should have increased in response to the creation, availability, and dissemination of design standards. Furthermore, given mandates within IDEA 2004 dictating that teachers use research-based practices, an increase in the rigor of single-case research would allow for improved confidence in outcomes by practitioners and administrators selecting practices for use in educational settings. Empirical research on trends in the rigor of single-case research have either focused on designs for which there are no established standards (adapted alternating treatments design; Shepley et al., 2020) or time periods that barely extend past the creation of WWC single-case standards (Burns et al., 2012). Therefore, the purpose of this study is to (a) examine the trend in the rigor of SCD studies over time and (b) analyze the impact of WWC SCD standards on the rigor of single-case research. We use a subset of the single-case literature that has a rich tradition in special education research and practice: studies evaluating SLP with individuals with disabilities (discussed further in the “Method” section). Research questions for this study were as follows:
Method
Data
For this study, we used data from a review of SLP conducted by Shepley et al. (2019). The data set is appropriate for our study given the range of years for which studies were searched in the review. That is, evaluating the possible variability in study rigor over time requires the identification of a practice that has been thoroughly evaluated both before and after the establishment SCD standards. Shepley and colleagues’ review spans 22 years prior to WWC’s establishment of SCD standards and 6 years after. In addition, SLP has a rich history in the special education literature given its prevalence compared with other instructional strategies (Wolery et al., 1986), with researchers highlighting that it is the “prompting strategy most commonly used by special education teachers” (Fisher et al., 2007, p. 490). As an example of its continued relevancy to special education teachers, WWC commissioned their own review of the system of least prompts and recently disseminated their findings (WWC, 2018). All studies meeting WWC standards for rigor to be evaluated for effects utilized a SCD. However, it should be noted that the WWC evaluated 60% fewer studies against design standards than did Shepley and colleagues. Despite the low number of studies, we attempted to examine WWC’s data for use; however, we were informed by the WWC that their “master review record (coding reports) are not publicly available” (WWC Help Desk, 2018). Regarding accessibility of data, Shepley and colleagues’ data set is publicly available on Open Science Framework. Finally, the data set contains a relatively large number of studies with significant covariance present in the data, thus allowing for methods of analysis that better account for bias than the procedures used in previous studies examining rigor in single-case research (e.g., chi-square tests [Burns et al., 2012]; ordinary least squares [OLS] regression [Shepley et al., 2020]). To form variables for our analyses, we extracted relevant information from Shepley and colleagues’ Microsoft Excel and SPSS tables.
Shepley and colleagues’ (2019) data set contains information on 123 peer-reviewed studies published between 1988 and 2016, in which SLP was used with individuals with disabilities. Their data indicate that SLP was used across various settings and implementers, with studies mostly conducted in schools (n = 65, 52%). Review authors also evaluated the rigor of 217 SCDs across 86 studies (e.g., three multiple baseline designs within one study) using standards developed by WWC. Of the 86 studies, 44 (52%) did not contain a design meeting WWC standards (code of does not meet), whereas 46 studies (48%) did contain a design meeting WWC standards (code of either meets with reservations or meets without reservations). Regarding the reliability of their coding, 20% of studies (n = 25) were independently coded by two reviewers, with a mean interrater agreement of 89% specific to outcome variables which included the coding of rigor.
Analytical sample
From Shepley and colleagues’ (2019) review, we extracted each study that was evaluated against WWC design standards, resulting in 86 studies. Four of the studies were from two published manuscripts, as Shepley and colleagues coded experiments “separated by appropriate headings within [a] manuscript (e.g., Study 1, Study 2, Study 3)” as individual studies (p. 2). For our purposes, a study was defined as a published manuscript, regardless of the number of experiments reported in that manuscript. Thus, we had 84 studies (i.e., published manuscripts) in our sample.
Variables
We used three variables coded by Shepley and colleagues’ (2019). First was whether each study contained a design meeting WWC standards with or without reservations for rigor (RigorMeets). This variable was dichotomous, with a value of 1 indicating that a study contained a design meeting WWC standards with or without reservations and a value of 0 indicating that a study did not meet WWC design standards. The RigorMeets variable was used as our outcome variable in analyses; all other variables were treated as independent variables. The second variable was the year a study was published (T), which took an integer value and ranged from 1988 to 2016; we recoded the values for this variable by subtracting them by 1988. The last variable indicated the SCD type employed within each study (Z). This was a categorical variable with an exclusive coding system indicating if a study contained withdrawal, alternating treatments, multiple baseline, or multiple probe designs. We recoded this variable as binary, to indicate if a study contained a multiple probe design (value of 0) or other design (value of 1).
Estimation Strategy
Trend
To analyze the trend in the rigor of single-case studies over the years (Research Question 1), we first examined descriptive data by creating a series of line graphs to allow for visual analysis of (a) the number of studies published each year, (b) the number of studies published each year that contained a design meeting WWC standards for rigor, and (c) the percentage of studies published each year that contained a design meeting WWC standards for rigor. Next, we conducted OLS regression analyses using the data depicted in each graph, by regressing the number or percentage of studies published (dependent variable), on years (independent variable). The linear regression analyses provided a trend line (i.e., line of best fit) to illustrate the (a) average change in the number of studies published each year, (b) average change in the number of studies containing a design meeting WWC standards for rigor published each year, and (c) average change in the percentage of studies containing a design meeting WWC standards for rigor published each year.
It should be noted that these linear models used the number of years as the sample size in the analyses (i.e., 1988–2016); as such, the sample size was limited to 28 and did not utilize each study as the sample, which would have provided a sample size of 84. To account for each study as part of the sample in analyses examining the trend in rigor over the years, we created a logistic regression model (Model 1). We regressed whether a study contained a design meeting WWC standards for rigor, on the year in which a study was published:
Subscript s served as an identifier for each study and u was the error term. In this model, parameter β0 represents the likelihood of a study containing a design meeting WWC standards for rigor in 1988. Although researchers have generally used linear regression models to examine the trend in the rigor of publications in education (e.g., Sundell & Åhsberg, 2018), logistic regression is appropriate for our data set given our binary dependent variable (i.e., whether a study contained a design meeting WWC standards for rigor).
Impact
Previous studies examining the impact WWC standards on the rigor of publications have used chi-square tests for their analysis (e.g., Burns et al., 2012). When using this test, studies are separated into groups based on when they were published (e.g., before or after the establishment of WWC standards) and their level of rigor (e.g., meets without reservations, meets with reservations, or does not meet). The chi-square test is then conducted to determine if when a study was published is related to the rigor of the study; that is, was there a change in the percentage of studies meeting rigor standards between the two time periods? The test does not assess for changes in trend between the time periods, and therefore, when using the test it is assumed that an establishment of standards for rigor will have an impact solely on changes in level. Regarding our research questions, we did not assume that WWC standards would have an impact on the level of rigor in publications in different time periods. Rather, we hypothesized that if there was an impact, it would be detected on the trend in the rigor of publications. Our hypothesis assumed that changes in research practices happen gradually, with adoption and accurate implementation of new research practices requiring training, feedback, and revisions. Support for this assumption comes from IES and their regular dissemination of resources, workshops, and training grants that aim to help researchers conduct studies that adhere to WWC standards.
To analyze the impact of WWC standards on the rigor of published research (Research Question 2), we created a series of logistic regression models (Models 2 and 3) using an interrupted time series framework (for an accessible overview of time series analyses, we refer readers to Bernal et al., 2017). Model 2 examined the impact of WWC standards on the trend in the rigor of single-case studies:
It should be noted that we did not include a variable to account for immediate change in level between the time periods given our previously discussed assumptions. For the XT variable, studies published before 2011 received a value of 0, and studies published starting in 2011 received a value of 0 with each additional year receiving 1 additional value (e.g., 2011 = 0, 2012 = 1, 2013 = 2, and 2014 = 3). The XT variable indicated when a study was published in relation to the expected impact of WWC standards for rigor. Given the duration of the peer review process and publication lags (i.e., the amount of time between when a study is submitted for publication and when it is published), we assumed that studies published in the same year as initial WWC SCD standards (i.e., 2010) were not influenced by the standards, as the studies were likely conceived and conducted prior to researchers encountering the standards. Therefore, we assumed that the earliest impact of WWC standards would begin for studies published in 2011. Parameter β0 represents the likelihood of a study containing a design meeting WWC standards for rigor in 1988; β1 represents the trend in the rigor of single-case studies prior to 2011; and β2 represents the difference in trend between studies conducted prior to 2011 and starting in 2011. Parameter β2 serves as our estimate of the impact of WWC standards on the trend in the rigor of single-case studies.
A significant limitation of our data and their use in Model 2 is that we treated all studies equally, regardless of the type of SCD employed in a study. This is problematic when considering that the initial WWC SCD standards published in 2010 did not contain specific criteria for evaluating the rigor of multiple probe designs. It was not until 2014 that WWC released criteria specific to the multiple probe design; therefore, it may be inaccurate to expect the 2010 WWC standards to have an impact on the rigor of SCD studies that used multiple probe designs. To account for the type of design used in a single-case study, we used studies employing multiple probe designs as a comparison group (see also comparative interrupted time series analysis [Shadish et al., 2002]). We compared changes in rigor for studies using withdrawal, multiple baseline, and alternating treatments designs (i.e., treatment group) with changes in rigor for studies using multiple probe designs (i.e., comparison group):
ZT represents the interaction between variables Z and T; and ZXT represents the interaction between variables Z and XT. Parameters β1 and β2 represent studies that used multiple probe designs, and parameters β3 and β4 represent studies that used withdrawal, multiple baseline, or alternating treatments designs. With this model, we do not expect to see significant differences between the trends of studies containing a design meeting WWC standards prior to the publication of WWC’s design standards for either group. If WWC design standards resulted in a change in publication practices, we would expect to see the change reflected on the trends of the groups following the publication of WWC design standards. β1 should be interpreted as the trend in rigor prior to 2011 for studies in the comparison; β2 should be interpreted as the trend in rigor starting in 2011 for studies in the comparison group; β3 should be interpreted as the difference in the trend in rigor between the comparison and treatment groups prior to 2011; and β4 should be interpreted as the difference in the trend in rigor between the groups starting in 2011. Parameter β4 serves as our estimate of the impact of WWC standards on the rigor of SCDs in the treatment group relative to the comparison group.
Results
Trend
The number of SLP studies published each year between 1988 and 2016 ranged from zero to six, with an average of 2.89 studies published per year. Visual analysis of the number of SLP studies published each year suggests an accelerating trend with modest variability (see Supplemental Figure S1). Our visual analysis is supported by an OLS regression analysis indicating that each year the number of SLP studies published increased on average by 0.143 studies (SE = 0.025, p ≤ .001). Regarding studies containing a design meeting WWC standards for rigor, an average of 1.45 studies (range zero to five studies) were published each year (see Supplemental Figure S1). Visual analysis of the number of studies containing a design meeting WWC standards for rigor suggests an accelerating trend, again with modest variability. An OLS regression analysis indicates that each year the number of studies containing a design meeting WWC standards for rigor increased on average by 0.098 studies (SE = 0.026, p ≤ .001).
In contrast to the number of SLP studies containing a design meeting WWC standards for rigor, visual analysis of the percentage of studies containing a design meeting WWC standards suggests substantial variability across the years with no clear trend (see Figure 1). Our visual analysis is supported by an OLS regression analysis indicating that, on average, the percentage of studies containing a design meeting WWC standards for rigor increased by less than half of a percentage point each year (B = 0.429). In addition, the standard error of this estimate was relatively large resulting in a lack of statistical significance (SE = 0.807, p = .600). The lack of statistical significance supports the interpretation that the actual trend is no more likely to be accelerating or decelerating than it is to be zero-celerating.

Percentage of studies from 1988 to 2016 containing a single-case design that meets What Works Clearinghouse standards for rigor.
As noted in the “Data Analysis” section of this article, the use of an OLS regression analysis with years as the sample does not adequately account for each study, as the year in which one study was published is weighted equally to the year in which six studies were published. To better account for each study in the sample, we used a logistic regression model (Model 1). Descriptive statistics for the variable in Model 1 are presented in Supplemental Table S1 and results from the model are presented in Supplemental Table S2. Results from Model 1 indicate that on average, each additional year since 1988 is associated with an increase in the odds of a SLP study containing a design meeting WWC standards for rigor of 1.041. In other words, a SLP study published in 2016 is 1.041 times as likely to contain a design meeting WWC standards for rigor as a SLP study published in 2015. The average coefficient estimate is positive, which suggests an accelerating trend in the likelihood that a study contains a design meeting WWC standards for rigor. However, the standard error of this estimate yields a 95% confidence interval for which the actual average trend may be decelerating or zero-celerating. Supplemental Figure S2 depicts (a) the average estimated trend in the probability that a study contains a design meeting WWC standards for rigor and (b) the trend in the probability of a study containing a design meeting WWC standards for rigor when using the lower-bound estimate of the 95% confidence interval.
Impact
Model 2 examines the impact of the WWC standards on the trend in the rigor of single-case SLP publications. Supplemental Table S1 contains descriptive statistics pertaining to each variable in the model. Supplemental Table S3 details the results of the model. The coefficient estimate for XT indicates that on average, each additional year starting in 2011 (i.e., after the publication of WWC SCD standards) is associated with a decrease in the odds of a SLP study containing a design meeting WWC design standards of 1.096 relative to the trend prior to the publication of the standards. The average coefficient estimate of XT suggests an accelerating trend after the publication of WWC design standards; however, the standard error of this estimate yields a 95% confidence interval indicating that the actual relative difference in the trend post-WWC is may be decelerating or zero-celerating compared with the trend prior to the publication of WWC standards (see Supplemental Figure S4).
To account for the absence of multiple probe designs in initial WWC standards, we used SLP studies employing multiple probe designs as a comparison group in our analyses. This allowed us to examine differences in the trends of SLP studies employing non-multiple probe designs (i.e., withdrawal, multiple baseline, alternating treatments) relative to SLP studies that included multiple probe designs, both before and after the publication of WWC design standards. Descriptive statistics for the variables in Model 3 are presented in Supplemental Table S1, and Table 1 details the results of the analysis. The coefficient estimate for ZT indicates a minimal average difference between the trends of the groups prior the establishment of WWC SCD standards. An interpretation of this coefficient indicates that on average, for each additional year prior to 2011, the difference in the odds of a study in the treatment group containing a design meeting WWC standards for rigor was 0.988 (SE = 0.038, p = .752), when controlling for other variables in the model. In examining the trends of the groups represented using average coefficient values from Model 3 to obtain estimated probabilities across years (refer to the top graph in Figure 2), visual analysis suggests that the trend for studies containing multiple probe designs is accelerating only slightly faster than the trend for studies without multiple probe designs prior to the establishment of WWC design standards. The coefficient estimate for ZXT indicates that on average, for each additional year starting in 2011, the difference in the odds of a SLP study in the treatment group (i.e., studies that did not contain multiple probe designs) containing a design meeting WWC standards for rigor was 0.747 (SE = 0.028, p = .449), when controlling for other variables in the model. Although the average estimate of ZXT reflects a decelerating trend for SLP studies in the treatment group published starting in 2011, the 95% confidence interval for the estimate indicates that the actual trend may be accelerating or zero-celerating (see Figure 2).
Logistic Regression Model Estimating the Impact of WWC Standards on the Trend in Rigor of Single-Case Studies for Treatment and Comparison Groups.
Note. Studies in the comparison group contained multiple probe designs; studies in the treatment group did not contain multiple probe designs. Coefficient estimates are reported as odds ratios (OR); standard errors (SE) are shown in parentheses. CI = confidence interval, VIF = variable inflation factor, WWC = What Works Clearinghouse.

Graphs display the estimated trend lines indicating the probability of a study containing a design meeting What Works Clearinghouse (WWC) standards for rigor when studies contain (i.e., comparison group) and do not contain (i.e., treatment group) a multiple-probe design; top graph trend lines are calculated using average coefficient estimates from Model 3; bottom graph trend lines are calculated using the upper-bound 95% confidence interval estimate for ZXT variable and all other variables held constant from Model 3.
Discussion
Using a quasi-experimental design, this study provides empirical data to suggest that WWC design standards arising from the EBP movement and U.S. Department of Education policies have not resulted in a change in the rigor of single-case SLP research. When viewed alongside increases in the number of studies published over time, the absence of improvements in the rigor of single-case SLP studies suggests (a) practitioners are likely to encounter SCD studies on SLP with limited rigor, (b) the peer-review process in its current form may not adequately evaluate critical components of SCDs examining SLP, and (c) researchers may not be consistently applying design standards when planning or evaluating SLP single-case studies. Each of these implications is explored in detail below.
Practitioner Implications
There is a poor probability that teachers and related school-based practitioners attempting to utilize research during instructional planning will identify a rigorous SLP study from which they may draw reliable conclusions. As an example, using the data from our study, if a practitioner identified two SLP studies, there is a 0.25 probability that neither study will contain a design meeting WWC standards. If a practitioner identified three SLP studies, there is a 0.12 probability that none of the studies will contain a design meeting WWC standards. Given that we did not detect a change in the trend of the rigor of single-case SLP research, this probability does not improve if practitioners look only at recently published SLP studies. Thus, for teachers who are required by federal policies under IDEA to use research-based practices in schools, they may interpret positive findings reported in a study as evidence of effect without adequately assessing the rigor supporting those findings. For practitioners who receive training in single-case research methodology, such as board certified behavior analysts, the situation is still problematic, as these practitioners must continually wade through more research every year to identify a rigorous study without the expectation that recent publications will be more rigorous.
Difficulty associated with locating rigorous research to support instructional decision making in applied educational settings may contribute to the frequent use of alternative sources, such as web-based resources, for decision-making for students in special education (e.g., Pinterest and Teachers Pay Teachers; Beahm et al., 2019; Cleaver & Wood, 2018). Given that there is a small likelihood that these web-based resources include research-supported practices (Beahm et al., 2019), future studies should investigate how practitioners differentiate high- and low-quality studies when selecting research to support instructional decision-making given the findings in this article.
Reviewing SCDs
On a positive note, our findings indicate the number of rigorous SLP studies published each year has increased. However, the number of non-rigorous studies has also increased, a finding supported by reports from special education journal editors and associate editors (Ganz & Ayres, 2018). We interpret the increases in non-rigorous SLP studies and lack of change in the proportion of published rigorous studies, as possibly indicating that journalistic standards for publishing single-case SLP research have remained constant over the years. As an example, in reviewing special education journals that routinely publish SLP research, we found guidelines pertaining to reporting outcomes from SCDs (e.g., requirement that effect sizes be included for SCD studies), but we did not identify guidelines concerning the rigor of SCDs required for publication. We recognize that journalistic standards have been proposed (Ganz & Ayres, 2018); however, we did not find that journals were incorporating them into author guidelines. If the number of non-rigorous studies published each year was decreasing or stable, then the proportion of published rigorous studies would indicate an increasing trend. We propose that this could be accomplished if journals employed a gating system similar to those used for meta-analyses of single-case literature (Maggin et al., 2014; O’Keefe et al., 2012; Wendt & Miller, 2012) in which studies that do not meet an adequate level of rigor are not evaluated in outcomes analyses. In other words, reviewers should evaluate SCD rigor first, then the outcomes should be evaluated only if the rigor of the study meets quality standards.
We recognize that a move to a gating system for publication may be prohibitive or prevent meaningful research from dissemination, and therefore suggest an alternative. Alongside authors discussing changes in a study’s dependent variable in the “Results” section of a manuscript, we think authors should begin discussing the rigor of their SCDs by descriptively addressing the five main components of rigor in SCD research: design appropriateness, opportunities for sufficient replication, reliability, fidelity, and data sufficiency (Ledford et al., 2018). In doing so, authors can detail how their designs meet ratings based on different frameworks for assessing rigor that, overall, share the aforementioned components (e.g., WWC, Council for Exceptional Children). Specifically labeling the rigor of a design with a rating may better convey to consumers how a study’s findings should be used. For example, authors may note findings should be interpreted with caution given the limited reliability data collected across conditions. Providing a description of a study’s rigor may also help other researchers identify areas to be addressed through direct and systematic replications.
Evaluating SCDs
Despite a lack of guidelines provided by journals on publishing rigorous and non-rigorous single-case research, we are surprised that the corresponding autonomy allotted to peer-reviewers has not resulted in a greater proportion of rigorous SLP studies being published following the establishment of design standards. In considering this, we question whether the recent attention provided to single-case research has been too heavily focused on making quantitative outcomes analyses more accessible, resulting in less attention focused on enhancing consumers’ understanding of the rigor of SCDs. If this is the case, then peer reviewers may be responding to outcomes reported in SCDs more than they are the rigor on which those outcomes are based. We strongly urge journal editors to recruit reviewers with experience conducting applied single-case research to minimize the possibility that unqualified peer-reviewers contribute to the lack of improvement in the rigor of single-case research.
Further Considerations
If changes are to occur to improve the rigor of published single-case studies, then there remain various issues that warrant discussion: (a) lack of agreement among SCD design frameworks for analyzing rigor, (b) identification of a common set of essential SCD quality standards required for designs to be considered experimental and valid to evaluate outcomes, and (c) creation of rigor standards for complex and combination SCDs. Numerous comparisons of quality frameworks for evaluating SCDs have yielded a common conclusion: the identification of research-based practices can be impacted by the tool researchers select to evaluate bodies of literature containing SCDs (Maggin et al., 2014; Zimmerman, Ledford, et al., 2018; Zimmerman, Pustejovsky, et al., 2018). Some essential elements of SCD rigor are not present across all frameworks, whereas other elements are consistently present but measured in various forms that result in inconsistent conclusions regarding the sufficiency of the element in designs (Zimmerman, Ledford, et al., 2018). For example, procedural fidelity measurement is absent from WWC standards (Wolery, 2013), but present in other commonly referenced rigor analyses (e.g., Council for Exceptional Children; Horner et al., 2005), whereas data sufficiency is present in all frameworks, but measured via inconsistent methods (e.g., total number of data points in WWC [2014] standards; visual analysis of the stability of data using the Single Case Analysis and Review Framework [Ledford et al., 2016]). Comparisons of rigor frameworks suggest conclusions regarding the summative rigor of a body of literature vary contingent on if and how each of these indices is measured (Zimmerman, Ledford, et al., 2018). Consensus regarding the indices to be measured in SCD syntheses should be established to support researchers in using frameworks to evaluate the quality and rigor of SCDs in syntheses designed to identify research-based practices.
As the complexity of SCDs increase, guidance is also needed to evaluate the rigor of complex and combination designs. Evaluations and comparisons of research-based practices are conducted in the context of complex SCDs such as adapted alternating treatments designs (Shepley et al., 2020). Comparing the efficiency of two effective interventions (e.g., Swain et al., 2015) allows for the refinement of research-based practices given the context, participants, and behaviors of interest. However, standards for evaluating the rigor of comparison designs are consistently absent from quality frameworks (Shepley et al., 2020). Furthermore, standards are also absent for combination designs that may evaluate the effectiveness of interventions while simultaneously comparing their efficiency such as adapted alternating treatments designs embedded in multiple probe designs (Ledford et al., 2018). Standards for evaluating the rigor of comparison and combination SCDs should be established to better ensure that the overall rigor of single-case research improves.
Limitations
The findings of this study should be evaluated in the context of some limitations. The data for our study come from a subset of the SCD literature; specifically, studies in which researchers evaluated the effects of SLP for individuals with disabilities. Given a lack of research examining the impact of design standards on publication practices, we are unsure how the findings from this body of literature compare to others. In addition, readers should be cautious of inferring causation as we cannot dismiss the impact of other established design standards. For example, Horner and colleagues’ (2005) initial design standards for single-case research were published 5 years prior to WWC’s standards. As such, it may be that single-case researchers responded to Horner and colleagues’ standards more favorably than those proposed by the WWC (see also Maggin et al., 2014; Zimmerman, Ledford, et al., 2018). Another potential limitation is that we did not evaluate the extent to which studies were rigorous; rather, we used a dichotomous outcome indicating if a study meets (i.e., with or without reservations) or does not meets WWC design standards. Although some past studies have used a continuous outcome variable to indicate the number of standards met by a SCD (e.g., Shepley et al., 2020), we think our use of a binary variable is appropriate given that it aligns with gating criteria used by WWC, in which a study must meet a specific threshold of rigor/quality standards to evaluate outcomes. As a last point of discussion regarding our limitations, there may be concerns surrounding the cut-point we used for the year in which we assumed that researchers had the opportunity to incorporate WWC standards into published studies SCD studies (i.e., 2011). To address this concern, we conducted sensitivity analyses with year 2010 and 2012 as the cut-points; the statistical significance and implications of findings was unchanged. We recognize that there are likely limitations we have not discussed and have not recognized; therefore, we ask readers to be skeptical and examine our posted data.
Conclusion
Our estimation strategy did not detect a significant change in the trend of SCD studies evaluating SLP following the establishment of WWC’s design standards. Moving forward, we recommend (a) improved journalistic standards for publishing single-case research, (b) greater attention and resources provided to establish widespread understanding among researchers about the logic and internal validity of SCDs, and (c) that all single-case researchers should adequately plan their studies prior to beginning to ensure they have necessary resources to conduct their studies within rigorous experimental designs.
Supplemental Material
sj-docx-1-dps-10.1177_1044207320934048 – Supplemental material for Estimating the Impact of Design Standards on the Rigor of a Subset of Single-Case Research
Supplemental material, sj-docx-1-dps-10.1177_1044207320934048 for Estimating the Impact of Design Standards on the Rigor of a Subset of Single-Case Research by Collin Shepley, Kathleen N. Zimmerman and Kevin M. Ayres in Journal of Disability Policy Studies
Supplemental Material
sj-pdf-2-dps-10.1177_1044207320934048 – Supplemental material for Estimating the Impact of Design Standards on the Rigor of a Subset of Single-Case Research
Supplemental material, sj-pdf-2-dps-10.1177_1044207320934048 for Estimating the Impact of Design Standards on the Rigor of a Subset of Single-Case Research by Collin Shepley, Kathleen N. Zimmerman and Kevin M. Ayres in Journal of Disability Policy Studies
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
