Abstract
Demonstration of experimental control is considered a hallmark of high-quality single-case research design (SCRD). Studies that fail to demonstrate experimental control may not be published because researchers are unwilling to submit these papers for publication and journals are unlikely to publish negative results (i.e., the file drawer effect). SCRD studies comprise a large proportion of intervention research in special education. Consequently, the existing body of research, comprised mainly of studies that show experimental control, may artificially inflate efficacy of interventions. We discuss how experimental control evolved as the standard for high-quality SCRD; why, in the era of evidence-based practice, rigorous studies that fail to fully demonstrate experimental control are important to include in the body of published intervention research; the role of non-replication studies in discovering intervention boundaries; and considerations for researchers who wish to conduct and appraise studies that fail to yield full experimental control.
Successful applied science depends on the capacity of researchers to build a sound body of empirical studies that progressively increase knowledge about how interventions work, given individual characteristics and circumstances. For decades, researchers have utilized single-case research designs (SCRDs) to evaluate the efficacy of educational and behavioral interventions. The origins of contemporary SCRD are traceable to Skinner’s (1938) experimental work detailed in his first book, The Behavior of Organisms. His unique experimental approach was characterized by (a) a focus on the individual’s behavior rather than a group, (b) repeated measurement of behavior over time, (c) reliance on experimenter control to understand the relationship between behavior and the environment, (d) visual analysis of graphically depicted behavior to determine experimental effect, and (e) induction, using a series of specific observations to form more general knowledge (i.e., principles of behavior). Although SCRD has substantially evolved since Skinner’s early work, contemporary SCRDs retain most of these essential elements (Gast & Ledford, 2014).
Skinner expanded his early experimental work in the following decades (Ferster & Skinner, 1957), which subsequently influenced the emergent field of behavioral therapy (Barlow & Hersen, 1984). This progression led researchers in the 1960s to begin applying Skinner’s behavior analysis to treat children and adults with disabilities (e.g., Ferster & DeMyer, 1962; Lovaas, Freitas, Nelson, & Whalen, 1967; Sherman, 1965; Wolf, Risley, & Mees, 1964). These preliminary studies culminated in Baer, Wolf, and Risley’s (1968) seminal paper, Some Current Dimensions of Applied Behavior Analysis, in which they codified essential features of SCRD, including descriptions of now commonly used reversal and multiple-baseline designs to demonstrate experimental control.
The group research design approach, which dominated psychology during this early period and is still regarded as the standard of educational and social and behavioral sciences research, relies on a fundamentally different experimental logic (Perone, 1999). The group research approach is characterized by application of an intervention to one or more treatment groups, non-application of an intervention to one or more comparison or control groups, and the use of inferential statistics to detect greater-than-chance differences between the groups on one or more aggregated dependent measures (Campbell & Stanley, 1963).
Recently, group design researchers have raised concerns over the apparent “replication crisis” in psychology, the inability of researchers to replicate the positive findings of previous studies (Pashler & Wagenmakers, 2012). Supporting concerns about a replication crisis, Makel, Plucker, and Hegarty (2012) reported that only 1.07% of articles published in the top 100 psychology journals were replication studies. Similarly, Makel and Plucker (2014) found that only .13% of articles published in the top 100 education journals were explicitly described as replications. No similar concern has been raised about replication in SCRD, specifically, although examinations of replication research in the special education literature have revealed that very few studies are explicit replications (Coyne, Cook, & Therrien, 2016). This apparent lack of concern could be attributed to belief that the inductive nature of SCRD effectively prevents publication bias, and that external validity through systematic replication of SCRD findings is immune to problems associated with deductive inferences and statistical significance of group research. Nonetheless, given the prominence of SCRD in special education, further attention to the issue of replication may help researchers better understand whether similar publication biases could influence scientific progress of an evidence-based special education.
Experimental Control, Baseline Logic, and Causal Determination in SCRD
In SCRD research, experimental control is demonstrated when the experimenter’s application of an intervention (independent variable [IV]) reliably produces a change in behavior (dependent variable [DV]), and this change is not otherwise explained by confounding or extraneous variables (Cooper, Heron, & Heward, 2007). Sidman (1960) introduced the “baseline technique” (i.e., baseline logic; Cooper et al., 2007; Wolery & Ezell, 1993) to clarify the role of experimental control in SCRD. Baseline logic enables the experimenter to make a causal determination about an intervention if a treatment effect is evident through visual analysis. Following collection of sufficient and stable baseline data to determine that a behavior is unlikely to improve, a change in behavior coinciding with an intervention is causally attributed to the intervention. Figure 1 illustrates baseline logic in SCRD. The left panel of Figure 1 shows a hypothetical steady state baseline in which the data are unlikely to improve without treatment, as denoted by the dotted lines which depict the current range (left panel) and predicted future range (right panel) of the behavior without treatment. The right panel also depicts a single demonstration of experimental effect, in which the frequency of behavior increases above the predicted range with treatment.

Illustration of baseline logic in SCRD.
Causal determination in SCRD is strengthened by replication of experimental control, which enables the experimenter to confirm functional relations between IV and DV (Baer et al., 1968; Cooper et al., 2007; Sidman, 1960). For example, hypothetical data in the reversal design depicted in Figure 2 show three demonstrations of experimental control. Although reversal designs and other SCRD designs that allow for three demonstrations of experimental control have been commonly used in SCRD for decades, more recently, SCRD researchers have proposed that three demonstrations of experimental control is a critical feature of high-quality studies (Horner et al., 2005) and that SCRD design standards should permit only designs that show replication of experimental effect, namely, reversal, multiple-baseline, and alternating treatments (Kratochwill et al., 2013).

Reversal design depicting three demonstrations of experimental control.
Two implicit conventions are key to understanding causal determination and experimental control with baseline logic in SCRD. Foremost, causal determination and experimental control are regarded as part of the same process. Accordingly, if experimental control is demonstrated via a change in a behavior coinciding with the intervention, then the experimenter may conclude with some confidence that the intervention caused the change in behavior. Replication of effect increases the experimenter’s confidence in making a causal determination. However, it is argued, if a change in behavior does not coincide with the intervention, or if the initial demonstration of experimental control is not replicated, then the experimenter can make no causal determination. As Baer et al. (1968) explain, “An experimenter has achieved an analysis of a behavior when he can exercise control over it” (p. 94).
Relatedly, experimental control in SCRD is conceived as a unidirectional process. From this perspective, if the experimenter can produce a reliable change in behavior from baseline to intervention, then she or he may confidently conclude that the intervention was effective. However, if the intervention fails to yield a change in behavior, or if the change is not replicated, then convention dictates that she or he cannot draw any conclusion about the relationship between intervention and behavior (i.e., that the treatment was ineffective). This is because lack of experimental control in an SCRD experiment is attributed to uncontrolled variables extraneous to the experiment, problems with treatment fidelity, or other threats to internal validity, all of which, it is argued, undermine functional relations and render results uninterpretable (Cooper et al., 2007). Therefore, demonstration of experimental control has been regarded as an essential feature of SCRD (Baer et al., 1968; Horner et al., 2005; Sidman, 1960).
The experimental control requirement of SCRD studies represents an essential distinction in logic between SCRD research and group design research. As explained above, experimental control in SCRD is necessary to determine causal or functional relations. This means that an SCRD study that does not demonstrate experimental control may be perceived to have little informative value to evidence-based practice (EBP). Conversely, group research is advantageous in the sense that inherent design features and procedures (e.g., randomization, blinding, treatment integrity, sample size to attain sufficient statistical power, and use of multivariate techniques to reveal moderating variables) function as controls for threats to internal validity. Generally speaking, if various relevant threats to internal validity are controlled or otherwise accounted for in a group research study, then the experimenter is able to make inferences about treatment effects (Council for Exceptional Children [CEC], 2014). Consequently, if a well-designed group research experiment yields statistically nonsignificant differences between groups, between group effect sizes are marginal, or countertherapeutic effects are evident, then the presence of various study design elements may permit the experimenter to draw conclusions about the limited effects of an intervention. Conversely, absence of full experimental control in an SCRD study may be perceived to render the study as poor quality or uninterpretable, even when quality design standards are otherwise sufficiently present (Horner et al., 2005; Kratochwill et al., 2013).
Contemporary SCRD Research and the EBP Movement
In the 30 years following the publication of Some Current Dimensions of Applied Behavior Analysis (Baer et al., 1968), the nascent field of applied behavior analysis (ABA) exponentially grew into a distinct academic discipline. Since the 1970s, SCRD research has flourished in the pages of academic journals, especially, though not exclusively, those featuring interventions for children and adults with intellectual and developmental disabilities (e.g., Focus on Autism and Other Developmental Disabilities, Education and Training in Autism and Developmental Disabilities, Journal of Positive Behavior Interventions). Lovaas’ (1987) seminal article, Behavioral Treatment and Normal Educational and Intellectual Functioning in Young Autistic Children, used a group design to examine the effects of a systematic application of ABA treatments for children with autism spectrum disorder (ASD). His findings constituted a watershed moment for the field and spurred widespread dissemination of early intensive ABA as the treatment of choice for young children with ASD (Smith & Eikeseth, 2011).
Separately, in the 1990s, the field of medicine formally recognized the necessity of treatments based on sound scientific research versus those supported by opinion, professional preference, and anecdote. Sackett, Rosenberg, Gray, Haynes, & Richardson (1996) described evidence-based medicine as “integrating individual clinical expertise with the best available external clinical evidence from systematic research” (p. 71). Scholars and policy makers in other clinical fields, including psychology (American Psychological Association Presidential Task Force on Evidence-Based Practice, 2006; American Psychological Association Task Force on Evidence-Based Practice for Children and Adolescents, 2008), social work (McNeece & Thyer, 2004), and, more recently, ABA (Slocum et al., 2014), have supported the main tenets of EBP, including adherence to rigorous research in clinical decision making. Influenced by the EBP movement, in 2009, the National Autism Center (NAC) published a review of research to establish empirically supported treatments for individuals with ASD, which was updated in 2015. Wong et al. (2015) also published a comprehensive review of the ASD intervention research with a similar purpose and obtained 456 methodologically rigorous articles that supported 27 practices. These reviews involved extracting peer-reviewed ASD intervention studies from the literature, evaluating the methodological quality of the studies, eliminating studies that did not meet quality indicators, categorizing the studies into specific interventions, and appraising the quality of evidence in support of the interventions. Interventions with a minimum number of studies by different research teams that reported positive effects were deemed established (NAC, 2009, 2015) or evidence-based (Wong et al., 2015), whereas those without similar evidence were relegated to subordinate categories (e.g., emerging, partially supported, unsupported, and harmful). Importantly, the large majority of studies in each of these empirically supported treatment reviews used SCRD methodology. Publication of these reviews mirrors a progression of efforts by the CEC (2014) to develop standards for appraisal of EBPs through establishment of quality indicators for group design and SCRD research (Cook, Tankersley, & Landrum; 2009; CEC, 2014; Odom et al., 2005).
One proposed criterion for disqualifying an intervention as an EBP is that no studies indicate the intervention is ineffective (CEC, 2014; Odom, Dunst, & Horner, 2012, as cited in Ledford, Wolery, & Gast, 2014). For example, the CEC’s (2014) Standards for Evidence-Based Practices in Special Education stipulates that presence of a single group design or SCRD study with negative (i.e., non) effects prevents a practice from being categorized as an EBP. The merits of disqualifying a practice with a large and rigorous body of positive supporting studies based on non-effects of a single study is debatable, especially when the single study uses a small number of subjects as it is common in SCRD. Nonetheless, a valuable concept conveyed in the CEC Standards document of considering all studies with positive, mixed, and negative results is consistent with a thorough and rigorous appraisal of the available evidence for a specific intervention. We believe this criterion highlights the necessity of publishing SCRD studies that show partial and no evidence of experimental control in identifying specific EBPs.
The File Drawer Effect and SCRD
Given the evolution of SCRD with an emphasis on experimental control as a necessary criterion for a high-quality study, researchers, reviewers, and journal editors may regard studies that fail to demonstrate robust experimental control as poor quality and therefore unpublishable. As Cooper et al. (2007) asserted, “An experiment is interesting and convincing, and yields the most useful information for application, when it provides an unambiguous demonstration that the independent variable was solely responsible for the observed behavior change” (p. 230). Reflecting this view on the necessity of experimental control as a criterion for high-quality SCRD studies, widely disseminated quality indicators have included experimental control, or replication of experimental effect, as a condition for quality. Horner et al.’s (2005) influential paper on use of SCRD research to identify EPB in special education indicates that three demonstrations of experimental effect is a definitive feature of SCRD research studies. Similarly, Kratochwill et al.’s (2013) single-case intervention research standards indicate that a study must include at least three attempts to demonstrate experimental control to meet their design standard, along with “at least three demonstrations of the intervention effect, each occurring at a different point in time, combined with no failures to observe an effect” (p. 32) to provide strong evidence of functional relations.
Given the historical and current emphasis on experimental control as a necessary condition for quality SCRD studies, some preliminary evidence of the file drawer effect on SCRD has emerged. The file drawer effect occurs when studies that fail to show experimental effects are disproportionately unpublished (Rosenthal, 1979). Shadish, Zelinsky, Vevea, and Kratochwill (2016) presented SCRD research experts with a series of hypothetical datasets that demonstrated varying degrees of experimental effect. They found a majority of SCRD researchers were more likely to submit or recommend for publication datasets that showed large experimental effects; a minority of researchers were also willing to drop cases which demonstrated small treatment effects prior to submitting them for publication. More direct evidence of the file drawer effect was provided by Sham and Smith (2014). They compared effect sizes of pivotal response treatment (PRT) in published peer-reviewed journals to unpublished dissertation studies. PRT has been deemed an EBP for ASD in recent treatment reviews (NAC, 2009, 2015; Wong et al., 2015). Sham and Smith (2014) reported greater treatment effects in published outcome studies compared with unpublished outcome studies, suggesting the possibility of inflated intervention effects in published studies and the presence of publication bias resulting from the file drawer effect.
The impact of the file drawer effect may also manifest in meta-analyses of SCRD studies that intend to establish an aggregate effect size for an intervention. Meta-analyses of SCRD research have used quality indicators, including demonstration of experimental effect (Horner et al., 2005), as criteria for inclusion in datasets (e.g., Tincani & De Mers, 2016). Although meta-analyses may not represent direct evidence of the file drawer effect, the reliance on procedures that may exclude studies with sound methods and no clear demonstration of experimental control illustrates how aggregate effect size for an intervention may be artificially inflated. In turn, this may discourage future replications, inflate confidence about intervention effectiveness, and threaten the validity of a body of research (Travers, Cook, Therrien, & Coyne, 2016).
Beyond publication bias, there are critical and broad consequences of the file drawer effect on the progression of research. Academic publication has become associated with potent reinforcers including attainment of more prestigious academic appointments, extramural funding, promotion, tenure, and salary increases (Hilmer, Ransom, & Hilmer, 2015). Experiments investigating novel interventions or innovative modifications of existing interventions can yield interesting findings that lead to better treatment outcomes, but these pursuits may be compromised by decisions to avoid studies that have outcomes that are difficult to predict and less likely to produce robust demonstrations of experimental control (i.e., high-risk studies). Conversely, publication pressures may influence researchers to conduct more conventional replication studies more likely to produce large magnitude effects consistent with standards of acceptance (Shadish et al., 2016).
Consumers rely on a community of researchers who publish results of methodologically sound intervention research, both positive and negative, to draw sound conclusions about the efficacy of specific interventions. The product of the file drawer effect is a biased literature that artificially inflates perceived effectiveness of specific interventions because studies that fail to show treatment effects are disproportionately unpublished. The file drawer effect is particularly problematic in the EBP era, especially given the increase in the number of methodological reviews and meta-analyses used to gauge the evidence in support of interventions.
Preventing the File Drawer Effect in SCRD
We have discussed how experimental control evolved as a condition for high-quality SCRD studies, how researcher preference for experimental control may lead to non-publication of otherwise quality studies that fail to yield therapeutic treatment outcomes (i.e., the file drawer effect), and how this is problematic in the development and evaluation of EBP. In the following section, we offer tentative guidelines on ways researchers might conduct and appraise the methodological rigor of SCRD studies that fail to demonstrate complete experimental control. Our aim is to encourage researchers to reconsider the ways they conduct their own SCRD research and how they evaluate others’ SCRD research toward contributing to the larger body of EBP in special education.
Experimental Control as an Outcome and Not a Condition in High-Quality SCRD
From the earliest SCRD experiments, experimental control was regarded as a hallmark feature of quality SCRD (Tawney & Gast, 1984). The purpose of early applied SCRD experiments was to evaluate how the basic principles of behavior could be applied to improve socially significant behavior (Baer et al., 1968). As the technology of behavior change evolved, the focus of research shifted to evaluating whether specific intervention approaches, strategies, and packages, as derived from basic principles of behavior, could change socially significant behavior (Barlow & Hersen, 1984). The expanding body of SCRD research has led contemporary scholars to develop quality indicators for evaluating SCRD studies individually and collectively for the purpose of identifying specific EBPs (Horner et al., 2005; Kratochwill et al., 2013). We applaud these efforts as quality indicators are no doubt critical for evaluating the rigor of SCRD. However, we believe high-quality studies that fail to demonstrate experimental control should also be published. Specifically, studies that have the characteristics associated with strong internal validity may simply show that an intervention does not work given specific conditions and parameters (i.e., boundary conditions). Publication of these studies is critical for highlighting the boundary conditions under which interventions are more or less effective.
The Search for Boundary Conditions in SCRD
Establishing experimental control in SCRD is a multi-step process of applying tools of visual analysis to determine whether and to what degree an intervention caused changes in the target behavior(s) (Ledford & Gast, 2014). Given the complexity of environment-behavior relations, experimental control is not an all-or-nothing process in which an intervention is or isn’t effective for a particular population under most conditions (cf., NAC, 2015; Wong et al., 2015). Rather, variables including procedural fidelity (Warren, Fey, & Yoder, 2007) and intervention intensity (Codding & Lane, 2015), along with individual participant characteristics within a population, moderate experimental control and influence whether a specific intervention yields evidence that ranges from no evidence of experimental control to very strong evidence of experimental control. Identifying the boundary conditions of an intervention requires understanding the role of each of these elements on the DV(s), as well as how they bear on visual analysis of resulting data.
Recent treatment reviews have deemed specific intervention strategies as EPBs based on a sufficient body of high-quality supporting studies. An intervention deemed an EBP may accompany a practice guide that includes stepwise instructions for applying the intervention (e.g., Sam & AFIRM Team, 2015). In these cases, the explicit series of steps for applying the intervention are derived from procedures used in the experimental studies that found the intervention to be effective. When new replication studies of an EBP that exhibit the characteristics of methodologically sound SCRD (Kratochwill et al., 2013) do not obtain similar positive results (i.e., attain experimental control), the finding may be explained by at least three factors: (a) procedural fidelity was insufficient for producing expected results, (b) intervention intensity was inconsistent with previous studies, or (c) the researcher has found a limitation of intervention efficacy given specific participant characteristics.
For example, Figure 3 depicts a series of hypothetical ABAB reversal designs for a specific intervention across three participants. The top dataset illustrates strong evidence of an experimental effect as judged by visual analysis of level, trend, and variability of data between baseline and intervention conditions. However, the second dataset illustrates weaker evidence of an experimental effect, and the third dataset illustrates no evidence of experimental effect. If procedural fidelity and intervention intensity across the three participants were appropriately measured and consistent with previous studies that reported positive results, then lack of a strong effect in the second and third datasets may constitute a limitation or boundary of the intervention. That is, if procedural fidelity and intervention intensity were reasonably consistent with previous studies, the lack of response may be attributed to the limited utility of the intervention for a particular learner. For example, intensity (i.e., dosage) of the intervention for Participants 2 and 3 may have been insufficient to yield the positive effects observed for Participant 1, or Participants 2 and 3 differed in some way from Participant 1 such that the intervention was less effective. This is an important discovery with practical and empirical significance. Professionals with students who have similar profiles to those who responded less favorably to the intervention may select a different intervention more likely to produce desired effects during the clinical decision-making process (Slocum et al., 2014). Researchers also increase their understanding about the boundary conditions of an intervention while uncovering new questions about why the intervention appears to be less or ineffective for some learners (and more effective for others). Researchers in future studies may manipulate intervention intensity or apply modifications to an existing intervention based on characteristics of participants who do not respond as expected. Drawing such conclusions from SCRD studies may be difficult not only because evidence for an absent effect is inherently more difficult to detect but also because SCRD studies have historically depended on demonstrations of experimenter control to rule out threats to internal validity. Nevertheless, it is useful to articulate the design features necessary for drawing conclusions about the limitations of an intervention when studies generate partial or no evidence experimental control.

Reversal designs depicting strong (top panel), weaker (middle panel), and no evidence of experimental control (bottom panel).
Procedural Fidelity in Studies That Do Not Demonstrate Experimental Control
Treatment integrity describes the degree to which an intervention is implemented as intended (Hagermoser Sanetti & Kratochwill, 2014). Although the importance of treatment integrity in the efficacy of behavioral interventions has been recognized for some time (Peterson, Homer, & Wonderlich, 1982), only recently have researchers begun to strongly focus on evaluation of treatment integrity within experimental intervention research. Treatment integrity is conceived as a multidimensional construct comprised of variables such as interventionist competence, treatment differentiation, and accurate and consistent adherence to treatment procedures (Gresham, 2014). Whereas treatment integrity entails collecting data about adherence to treatment procedures, procedural fidelity entails collecting data about procedures used in all conditions of a study, including baseline (Ledford et al., 2014). Specifically, Ledford and Gast (2014) explained that procedural fidelity involves measures to evaluate (a) whether the IV was applied as intended in the treatment conditions (i.e., treatment integrity), (b) whether the IV was not applied as intended during baseline conditions (i.e., control condition integrity), and (c) whether there were no differences in procedural fidelity in baseline and intervention conditions that might explain changes in responding between the conditions (e.g., restricted or excessive opportunities to respond; differences in baseline and intervention agents). Presence of sound procedural fidelity increases the researcher’s confidence that lack of experimental control is a function of an intervention’s boundary conditions. In contrast, unplanned changes in procedural fidelity constitute threats to internal validity and diminish or prevent causal conclusions regardless of participants’ responding.
When a replication study of an EBP fails to demonstrate full experimental control, compelling evidence is required to show that procedural differences unrelated to the intervention (i.e., threats to internal validity) did not account for lack of responding. Alternatively, strong evidence of procedural fidelity in all conditions in the absence of experimental effect may permit the researcher to consider whether she or he has identified boundary conditions of the intervention. Confidence in conclusions about the discovery of boundary conditions depends on evidence that all procedures in previous studies were consistently followed in the failed replication. This requires researchers to report procedural fidelity data by step, participant, and condition in conjunction with DV data (i.e., a moderator analysis; Ledford & Gast, 2014).
A moderator analysis typically has referred to a statistical method used to investigate whether a relationship between two variables (i.e., IV and DV) depends on the presence or effect of a third variable (Baron & Kenny, 1986). For example, a correlational study investigating racial disparity in ASD diagnosis might include IQ as a moderating variable to determine whether IQ and race (rather than race alone) more reliably predict diagnosis (e.g., Mandell et al., 2009). The purpose of the moderator analysis in correlational research is to clarify relationships between unique and combined IVs on a DV.
Ledford and Gast (2014) proposed that the logic of a moderator analysis could be helpful to SCRD researchers interested in more clearly understanding the relation between IVs and DVs. A moderator analysis of an SCRD study would entail consideration of control condition integrity data and treatment integrity data in relation to the DV, and evidence that changes other than the presence and absence of the IV (i.e., confounds) do not explain observed differences in DV responding. Ledford and Gast explained that reporting procedural fidelity data by step, participant, and condition (rather than providing a single summary statistic) and plotting those data similar to Wood, Umbreit, Liaupsin, and Gresham (2007) would support moderator analyses and more reliable conclusions about relationships between variables of interest. For example, if a researcher reported procedural fidelity for step, participant, and condition, and plots of those data indicated acceptable levels alongside low levels of responding to the intervention, then this may suggest detection of the limited ability of the intervention to produce desired effects. This discovery may, in turn, help clarify for whom and under what conditions an intervention is effective. Conversely, if plotted data suggest low levels of treatment integrity during some, most, or all steps of the intervention, this would not highlight a boundary condition of the intervention, but rather a failure of the intervention as a function of poor treatment fidelity (Wood et al., 2007). Furthermore, close examination of the steps in which the intervention was more or less adhered to could provide valuable information for future investigators who wish to modify the intervention to improve treatment fidelity and participant responsiveness to the intervention.
The importance of procedural fidelity in behavioral research has long been recognized (Billingsley, White, & Munson, 1980; Gast, 2014; Vollmer, Sloman, & Pipkin, 2008; Wolery, 1994). A precise presentation of procedural fidelity may help clarify sources of variability (Sidman, 1960), as well as the absence of an expected response upon introduction to an established intervention. To ensure the accuracy of procedural fidelity data when an effect is observed, researchers may report fidelity by step, participant, and condition for all sessions and include reliability checks of fidelity data for at least 33% of sessions (Ledford & Gast, 2014). When intervention effects are not observed, we propose fidelity at all levels very closely approximate or result in 100% adherence with an acceptable interrater reliability (e.g., ≥ 80% for at least 33% of sessions). More sessions of procedural fidelity interrater reliability data (i.e., 50%, 66%, 100% of sessions) increase confidence in reliability of high procedural fidelity in the absence of experimental effect. That is, procedural fidelity and reliability of fidelity should be sufficiently high to allay concerns that inconsistent intervention implementation (i.e., varied procedures between studies), and/or variation in control/comparison conditions (i.e., varied procedures within a study), do not explain the failed replication.
Intervention Intensity in Studies That Do Not Demonstrate Experimental Control
Warren et al. (2007) suggested that intervention intensity is a critical, albeit neglected feature of ascertaining intervention efficacy. They described how intervention dose (i.e., the number of teaching episodes provided within a single treatment session), dose frequency (i.e., the number of times a treatment session is provided per day and per week), and total intervention duration (i.e., the total time period which the intervention is presented), as well as cumulative intervention intensity (i.e., Dose × Dose Frequency × Total Intervention Duration) are ways to measure the effects of treatment intensity. Their rationale is predicated on the fact that intervention efficacy must be measured not only according to whether a desired effect is obtained but also according to whether more (or less) of the intervention yields greater (or diminished) effects (Warren et al.).
Similarly, Codding and Lane (2015) explained how intervention intensity likely influences efficacy and adoption of interventions. They described how intervention intensity can be altered by varying the number of learning trials presented per intervention session (i.e., dose), session length, session frequency (e.g., daily, weekly, or monthly), length of intervention, and cumulative intervention intensity (i.e., Dose × Session Frequency × Total Length of Treatment). They contended that interventions dosed too weakly could lead students to become resistant to behavior change, reduce student motivation to engage with interventions, and facilitate resistance from intervention agents. Conversely, interventions dosed too strongly could incur unnecessary costs, lower treatment acceptability, and thereby reduce students’ future access to potentially effective interventions. For example, functional communication training (FCT; Carr & Durand, 1985) is considered an EBP for learners with ASD (Wong et al., 2015), but the effectiveness of FCT for a particular learner is influenced by factors other than interventionist adherence to FCT procedures. The number of opportunities to respond with a functionally equivalent replacement response (i.e., dose) influences rate of acquisition (i.e., how quickly the replacement behavior becomes a conditioned response; Carr et al., 1994; Fisher et al., 1993; Richman, Wacker, & Winborn, 2001). Similarly, (in)consistent contingencies of reinforcement for the replacement behavior diminishes rate of acquisition, fluency, and generalization (Kelley, Lerman, & Van Camp, 2002; Shirley, Iwata, Kahng, Mazaleski, & Lerman, 1997; Worsdell, Iwata, Hanley, Thompson, & Kahng, 2000). If only a few opportunities are provided over a long period of time, or if too many opportunities are provided during a very short period of time, then alternative responses may not be acquired in a manner consistent with desired treatment outcomes. Similarly, if insufficiently reinforcing consequences are used (i.e., requests are only reinforced once every 15 attempts), then FCT will appear to be ineffective. Any of these changes could cause interventionists to resist or abandon FCT entirely. It could be reasonably argued that such deviations mean FCT is not actually being used. This is precisely the point; demarcation of intervention boundaries is of practical significance and defines an EBP. If a researcher (or professional) uses lower intervention intensity, it may inaccurately suggest the intervention does not work rather than clarifying that reduced intensity alters treatment effects (Yoder & Woynaroski, 2015).
The lack of intervention intensity reporting in SCRD studies suggests that researchers are not always collecting these data (Codding & Lane, 2015) and, consequently, not always considering intervention intensity when evaluating the literature to qualify an EBP. Limited reporting of intervention intensity also means that attempts to replicate the effects of varying parameters of intervention intensity cannot be attempted, which further limits the identification of intervention boundaries. Thus, reporting intervention intensity seems critical for clarifying what interventions may qualify as EBP, and also is necessary to conduct replication studies that explicitly examine varying parameters of intensity (Codding & Lane, 2015; Warren et al., 2007; Yoder & Woynaroski, 2015).
Studies that systematically examine intervention intensity, including SCRD experiments using multiple replication attempts comparing two or more parameters of intensity (e.g., ABABCACAC), will help clarify for whom and/or under what conditions an intervention is (in)effective. Variables such as dosage intensity (e.g., trials, session frequency, and session length), teacher-to-student ratio, intervention design (e.g., amount and quality of feedback, pace, and opportunities to respond), and interventionist level of expertise (e.g., amount of training, special certifications) may affect learner responding (Codding & Lane, 2015; Warren et al., 2007; Yoder & Woynaroski, 2015), and should thus be examined. When studies demonstrate non- or diminished responding contingent on changes in intervention intensity given this information, our knowledge about the boundaries of intervention efficacy is increased.
Importantly, no EBP is guaranteed to produce previously obtained results for every learner. Rather, an EBP is one that is more likely to be beneficial than an intervention that has not been subjected to experimental scrutiny (Cook, Cook, & Collins, 2016). An evidence-based special education depends on knowing what works, for whom, and under what conditions. If sound experimental studies that find low or no effects for an intervention are not readily available to research and professional communities, then students with disabilities may be subjected to interventions that result in limited or no benefit; student, teacher, and school resources will be lost. If (a) procedures for baseline and intervention conditions are similar to those used in studies used to qualify it an EBP, (b) procedural fidelity is sufficiently high, and (c) treatment intensity is similar to previous studies, then a failure to respond to the intervention may be the result of the limited utility of that intervention. SCRD experiments that demonstrate procedural fidelity and intervention intensity consistent with previous studies, but find limited or no benefit, are important to the researchers because they clarify the potential boundaries of the EBP. Thus, the research community should not immediately consider SCRD experiments that fail to demonstrate full experimental control as fatally flawed, especially if intervention intensity is described with replicable precision along with procedural fidelity by step, participant, and condition. We therefore propose that study characteristics listed in Table 1 may support research claims that an intervention study was methodologically rigorous despite failing to demonstrate experimental control and has possibly detected an intervention boundary.
Types of Evidence to Support Claims That a Failed Replication May Result From Intervention Boundaries.
Conclusion
For decades, SCRD has been widely used to evaluate the efficacy of educational and behavioral interventions in special education. In the era of EBP, it is more important than ever for the published body of SCRD research to accurately reflect the effects of special education interventions given the variety of individual student characteristics and complex natural environments and circumstances in which interventions occur. We have highlighted concerns about the file drawer effect in SCRD, and suggested strategies for researchers to prevent the file drawer effect in conducting and appraising SCRD studies based on detailed attention to procedural fidelity, examination of intervention dosage in relation to individual participant responding, and usage of moderator analysis.
We modestly hope that researchers who conduct SCRD studies will shift their focus to searching for the boundary conditions of interventions, given the complexity of implementation variables. Similarly, we hope that researchers who appraise others’ work will shift their focus from evaluating whether a study has demonstrated robust evidence of experimental control to fuller consideration of boundary conditions of demonstrably effective interventions. We believe this shift in focus will ultimately lead to better outcomes for individuals in special education contexts.
Footnotes
Acknowledgements
The authors wish to thank Dr. Donald Hantula for his feedback on this paper.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
