Abstract
Many students with learning disabilities (LD) do not master basic reading skills, which affects later reading fluency and reading comprehension development. Single-case experimental design (SCED) research yields unique opportunities to better understand which aspects of a basic reading intervention are effective with a particular student, including the individual’s rate of growth, stability, or maintenance of acquired skills, and whether or not modifications need to be made to the intervention under study. In this article, we use a case study illustration to highlight unique considerations for using SCED research to investigate basic reading interventions for students with LD. Finally, we provide a discussion of future directions and a potential shift in SCED methodology that is responsive to the comprehensive and multiple skill nature of reading instruction.
The ability to fluently recognize words is requisite for understanding printed text, which is the ultimate purpose of reading (Berkeley & Ray, 2020). Basic reading skills are foundational to fluent reading, which is defined as reading quickly, accurately, and with expression (Kuhn et al., 2010). Fluency rate (i.e., reading speed) is often used to assess student reading outcomes (e.g., Wanzek et al., 2014), but the accuracy component (i.e., word recognition) of fluent reading is less often measured at a specific skill level. This is particularly the case within group-design research where numerous basic reading skills are taught simultaneously over a period of time. Furthermore, group experimental design research evaluates whether reading interventions change the average growth of groups of participating students, yet effectiveness for each participant can be better understood through the use of single-case experimental design (SCED). Specifically, SCEDs provide unique opportunities to explore the effectiveness of an intervention under all possible conditions and for all types of learners and adaptations (Leko, 2015; Leko et al., 2015).
SCED research yields unique opportunities to better understand which aspects of a basic reading intervention are effective with a particular student, including the individual’s rate of growth, stability of maintenance of acquired skills, and whether or not modifications need to be made to the intervention being investigated. This is especially important for understanding the reading development of students with learning disabilities (LD) because a large body of research shows that even with systematic instruction in the primary grades, many students do not master basic reading skills. This, in turn, affects both reading fluency and reading comprehension (e.g., Brasseur-Hock et al., 2011; Cirino et al., 2013; Denton et al., 2008).
In addition, because of the inherent characteristics of SCED research, critical insights can be obtained for populations of students with uneven learning profiles, including students with LD. Among recent key findings, the science of learning indicates that opportunities for growth and change persist across the development continuum; variability among individuals is the norm; and each individual’s development is nonlinear due to neural plasticity and brain malleability in response to learning tasks and environment (Cantor et al., 2019). Specific to adolescent literacy, some struggling readers reach adolescence with gaps in several areas (e.g., phonics, decoding, vocabulary, fluency, and comprehension) and have profiles of mixed strengths and weaknesses within and across these different reading component skills (Hock et al., 2009). Analysis of SCED studies can inform researchers and practitioners about which aspect(s) of a word-level intervention causes change for individuals who tend to be variable in their skill development.
The purpose of this article is to present issues that researchers should consider when using SCEDs to evaluate basic reading skills. We begin with a brief overview of experimental research methodologies, including SCED designs commonly used in basic reading research. Next, we have purposefully selected a published SCED study of basic reading skills of students with LD (Regan et al., 2014) as a case study illustration of unique considerations when conducting this type of research. Finally, we provide a discussion of future directions for research and practice.
Experimental Research Methodologies
Experimental research methodologies fall into two categories—group designs and SCEDs. Group experimental designs compare the mean performance of one group of students to another group of students in a highly controlled context that allows the researcher to systematically control threats to internal validity so that causal claims can be made from statistical analysis of resulting data. Two of the most rigorous group experimental designs are randomized control trials (RCTs) and regression discontinuity designs (RDDs). An RCT reduces bias through the random assignment of participants to condition and is considered to be the “gold standard” in experimental research (What Works Clearinghouse [WWC], 2017). In contrast, an RDD compares the performance of participants in the intervention group (or groups) to a predetermined cut-value on a selected measure. Group experimental designs provide critical information about the probability of the effect of the intervention on students with similar characteristics as well as the magnitude of the expected effect. However, group designs provide limited insights about the conditions under which the intervention might be effective for students who score well below the average (i.e., nonresponders), who are often students with LD. As White (2016) aptly stated, as researchers we also need to look beyond averages and “chase the tail.”
When researchers seek to understand how specific interventions work with individual students, SCED studies may provide the information they seek (Horner et al., 2005, 2012; Kratochwill et al., 2010; Ledford & Gast, 2018; Shadish et al., 2015). Specifically, SCED methodology can supplement findings from RCT and RDD experimental designs by providing researchers with a deeper understanding of what works for students with LD who receive a reading intervention (Eckert et al., 2000; Kratochwill & Levin, 2015).
SCEDs have a long history of use in multiple fields such as medicine, psychology, and education. Much of the early work in psychology and education focused on interventions designed to promote positive behavior or eliminate negative behavior (e.g., Baer et al., 1968). Today, SCED studies are used to determine the effectiveness of a variety of educational interventions as well. This experimental methodology establishes a functional relation between independent and dependent variables through within-subject comparisons (i.e., intrasubject replication). That is, within SCED studies, a student’s performance during a baseline or probe phase (referred to as condition “A”) is compared with their performance during an intervention phase (referred to as condition “B”). Thus, the student acts as his or her own control. All of the data collected during both conditions are charted on line graphs and analyzed through visual, and in some cases statistical, analysis (Shadish et al., 2015). These comparisons apply both when a single student serves as a case and when a case is made up of a group of students.
In this article, we review SCEDs and their use in determining the efficacy of various programs, practices, and interventions. The WWC (2020) Standards Handbook, provides a detailed description of the standards used by the WWC when reviewing studies using one of the following eligible designs: (a) RCTs, (b) quasi-experimental designs, (c) RDDs, and (d) single-case designs (SCD). In this article, we have chosen to call SCDs SCED as recommended by experts who contributed to the Single-Case Reporting Guideline in BEhavioural Interventions (SCRIBE) 2016 Statement (Tate et al., 2016). Although multiple SCEDs and design variations exist, we discuss only those recognized by the WWC as eligible for review. Acceptable SCEDs include reversal or withdrawal (i.e., ABA or ABAB), multiple-baseline and multiple-probe, and alternating treatment.
The strongest SCEDs allow intersubject and systematic replications (e.g., Horner et al., 2005; Ledford & Gast, 2018; Shadish et al., 2015). The design selected for an SCED study is closely aligned with the research questions researchers seek to answer and the nature of the intervention. Although withdrawal designs are frequently used in behavioral research, they are rarely used to evaluate academic interventions because student performance would not be expected to return to baseline levels due to the permanent effects of the intervention (Horner et al., 2005). In other words, learned skills will (hopefully) not be “unlearned” immediately upon the removal of instruction. Although there are specific situations where alternating treatments and changing criteria designs may be appropriate for answering specific educational research questions, these occurrences are rare in the academic intervention literature, especially in comparison to multiple-baseline and multiple-probe designs that occur frequently.
Multiple-baseline and multiple-probe designs allow researchers to evaluate interventions in which students are taught skills and strategies that are retained once the intervention ends. Multiple-baseline studies involve researchers continuously monitoring multiple cases (i.e., behaviors, conditions, or participants) at the same point in time prior to researchers introducing the intervention. The intervention is then introduced at different times to evaluate experimental control. Multiple-probe designs are a variation of multiple-baseline designs in which intermittent rather than continuous data are collected on each case prior to introducing the intervention (Ledford & Gast, 2018). In both multiple-baseline and multiple-probe designs, an intervention is considered to have a positive effect when one case improves while other cases remain at or near initial preintervention levels. In contrast to a multiple-baseline design, multiple-probe designs allow a fewer number of preintervention days (or sessions), thus making the design more practical in educational classroom research (Gast et al., 2018). The inherent characteristics of multiple-baseline and multiple-probe designs make them appropriate for investigating a wide range of academic skills, including basic reading skills (Barger-Anderson et al., 2004). Furthermore, they have been employed with a range of basic academic skills (e.g., science, math, vocabulary) and populations (e.g., LD, autism, intellectual disability) to add to the research base on effective interventions for students with disabilities (e.g., Knight et al., 2013).
Although the basic structure of SCED is straightforward to comprehend, methodological standards for rigor in SCED studies should be met regardless of the chosen design. For example, the influential report SCRIBE established a minimum set of items that should be addressed and reported within any SCED study with the goal to “improve the clarity, completeness, transparency, and accuracy of reporting single-case research in the behavioral sciences” (Tate et al., 2016, p. 8). Similarly, in the field of educational research, a study must meet minimum standards for review by the WWC. These design criteria include: (a) systematic manipulation of the independent variable, (b) systematic reliable measurement of the dependent variable over time by multiple assessors, (c) inclusion of a specified number of data points, (d) at least three attempts to demonstrate an effect at three different points in time, and (e) absence of confounding factors. These standards are widely used by researchers when designing SCED studies. In the field of special education, quality indicators for SCED studies with students with disabilities have been established (see Cook et al., 2015; Horner et al., 2012; Lane et al., 2007). In this article, we explore SCEDs for furthering the research base on effective interventions for improving word-level reading skills of students and illustrate the unique opportunities and challenges that arise when using SCEDs to this end.
Methodological Considerations for SCED Studies of Basic Reading
In their 2014 study, Regan and colleagues utilized an SCED to investigate the effects of computer-assisted instruction (CAI) for struggling elementary readers. Specifically, researchers focused on the impact of CAI on word-level reading skills of four elementary school students as part of Tier 2 reading instruction. Word-level reading was targeted because reviews of the research base on CAI for improving basic reading skills found empirical evidence to be limited and inconclusive (e.g., Hall et al., 2000), despite the potential to support students through individualized repetitive practice (Maccini et al., 2002). Outcomes from this study showed that all students were able to demonstrate mastery of targeted basic reading skills, although some students needed a “double dose” with additional direct instruction. As such, findings supported the use of CAI as supplemental support to basic reading instruction. This article will serve as an illustrative example of challenges that must be addressed by researchers of basic reading skills and/or researchers investigating academic outcomes for students who display uneven learning development.
The crux of SCED research is determining the functional relation between the intervention and change in a student’s behavior. In the case of academic skills, one would expect to see increased student performance in the target skill area as a result of the intervention. In this article’s illustrative example study (i.e., Regan et al., 2014), researchers faced challenges setting up the study in such a way that met current research standards (e.g., Horner et al., 2012). Specifically, researchers expected the targeted basic reading skills of students to improve after receiving intervention. The following research questions were addressed: Does Lexia SOS improve outcomes for four elementary school students with mild disabilities in (a) word reading, (b) mastery of word reading, (c) maintenance of word reading, and (d) generalization of word reading skills to novel words in isolation and/or within passages? (Regan et al., 2014, p. 108)
Like most applied SCED studies, outcomes included direct measures of target skills, as well as measures of maintenance and generalization of these skills. In addition, researchers added a research question that addressed “mastery of word reading,” because of the importance of accuracy (i.e., 93%–97%) to later fluency development (e.g., Burns, 2002).
Research Design
To answer these research questions, Regan et al. implemented a multiple-probe across behaviors design (Gast et al., 2018) with staggered introduction of the Lexia SOSTM (Lexia Learning Systems, 2001) intervention across three targeted decoding skills. Within this design, participants needed to reach criterion (>90% accuracy) to demonstrate a socially valid effect. In cases where mastery was not met after initial CAI on a skill (condition “B”), CAI was introduced a second time (i.e., a double dose of the intervention) with supplemental direct instruction procedures (condition “BC”). The design allowed for effects to be evaluated at three different points in time, across three individual skills, and across four different students (Horner et al., 2005; Kratochwill et al., 2010). A functional relation was demonstrated when percentage accuracy improved only after Lexia was introduced.
A withdrawal design was ruled out as a design choice because once words were learned, they would not be unlearned with the removal of the intervention. To address concerns about students spending prolonged time in baseline (e.g., Barger-Anderson et al., 2004), researchers opted to implement a variation of the multiple-baseline design—the multiple-probe design. This enabled all students to begin instruction on Skill 1 immediately following an initial baseline probe phase rather than waiting for another student to achieve mastery. This design also allowed the intervention to be individualized to specific learning deficits of each participant. As students with LD, it is not surprising that each student had strengths and weaknesses in different skill areas. In addition, use of the multiple-probe design across skills reduced the number of probe sessions required of students prior to introducing Lexia.
Participants
In Regan et al. (2014), information was provided about each participant including age, grade, ethnicity, description of how the student qualified for special education (intelligence quotient [IQ] and discrepancy with achievement), time spent in special and general education settings, any related services, and the most recent norm-referenced literacy information from the student’s educational record. Student performance on researcher administered norm-referenced assessments of decoding (real words, nonsense words; Test of Word Reading Efficiency) and fluency (Reading Fluency Benchmark Assessor) administered immediately prior to the start of the CAI were also reported. This type of detailed description of participants is consistent with recommended reporting practices for SCED research.
When conducting SCED studies, clearly understanding the characteristics of participants is critical if researchers are to understand what works with whom and under what conditions, as well as knowing if findings will generalize to other students with similar characteristics. For this reason, SCED studies characteristically contain elaborate descriptions of participants, and these descriptions are at the individual rather than aggregate level. In 1991, the Council for Learning Disabilities Research Committee undertook the task of clarifying LD participant descriptions in SCED research (Wolery & Ezell, 1993). The committee took on this task for the primary reason of supporting external validity in SCED studies and made several broad recommendations for participant descriptions. First, preintervention (baseline) conditions and variables that influence baseline performance should be described, including (a) method and criteria used to identify participating students as “learning disabled”; (b) Individualized Education Program (IEP) goals aligned with the intervention (e.g., reading goals); (c) achievement levels related to IEP goals; and (d) context in which the student experiences his or her education. Second, information should be provided about how participants were included or excluded from the study (e.g., general education classroom). Information in this area may include the severity of the LD, percentile, or standard scores on skill measures that meet criteria for inclusion or exclusion for a study, and other information that indicates for which individuals similar interventions might be targeted. Finally, the committee recommended that researchers provide “status variables,” such as participant age, gender, and ethnic/racial background.
Measures
In Regan et al. (2014), researchers used the teacher guide for the Lexia CAI program to create upward of 30 unique generalization probe words of comparable difficulty that comprised words directly taught within each targeted game in the software. This process ensured direct alignment to the target skill for each game (e.g., short and long vowels, two-syllable words).
One of the challenges associated with SCED studies is the development or selection of generalization probe items of targeted academic skills. Although RCT and RDD designs usually require pre/post measurement, SCEDs require multiple ongoing measures of similar, non-taught items to assess skill generalization. For example, some studies may have five or more preintervention word probes, 5 to 20 intervention word probes, and three to five maintenance word probes. Thus, there could be 30 or more measurement occasions. Because of time constraints when assessing so frequently, probe items need to be concise yet highly accurate in reflecting a student’s learning of the target skill. For these reasons, curriculum-based measures are often used in SCED research. Curriculum-based measures have been used extensively in monitoring student progress and are considered an efficient assessment method to monitor students’ progress with the acquisition of basic reading skills (Berkeley & Riccomini, 2017).
For researchers of basic reading skills, navigating the tension between the interconnected nature of skills during reading development and the need for assessment specificity within SCEDs is a distinct challenge. These challenges are especially pronounced when using multiple-baseline and multiple-probe designs across behaviors because there is an added challenge of selecting dependent variables that will not covary (Barger-Anderson et al., 2004). For example, when developing probe words representing a reading skill, researchers must ensure that each word directly assesses the taught skill (e.g., medial vowel sounds, two-syllable words, suffixes) and that all example probe words are of equal length and comparable difficulty. Despite the most careful procedures, researchers may not be able to account for the influence of sight word acquisition as part of the reading development process. As students are exposed to patterns of words, many of these words will become part of their sight word vocabularies through the course of the intervention or even through other exposures to print outside of the instructional context. Automaticity of word recognition is the overall goal of basic reading instruction; thus, attempting to limit sight word acquisition for the purpose of experimental control would be suspect at best in applied research. Similar challenges arise for research on reading comprehension where student background and vocabulary knowledge can support or thwart understanding of a specific text (Berkeley & Ray, 2020).
Procedures
Within the example study, Regan et al. (2014) employed strict protocols to establish experimental control. First, instructors received training prior to the study including shadowing the lead researcher, reviewing steps within an instructional checklist prior to each session, and receiving retraining or clarification as needed throughout the study. In addition, 20% of lessons in each condition for each of the four students were observed by an independent external evaluator who reported 99% to 100% interobserver reliability.
Internal validity is assured when experimental control is demonstrated (Gast & Ledford, 2018). The presence of internal validity permits confidence in the functional relation that exists between the independent and dependent variables (i.e., changes in student performance are the result of the intervention rather than other confounding variables). In SCED research, strict adherence to established research procedures is extremely important because deviations pose significant threats to a study’s internal validity and have the potential to compromise the accurate interpretation of findings. Reliability procedures include both maintaining an adequate level of interobserver agreement (in academic research, this includes scoring of probe items), and adherence to study procedures (procedural fidelity).
Despite involving only a few participants, conducting SCED studies requires a substantial time commitment, intensive and time-sensitive assessment procedures, and meticulous communication among researchers and instructors. For example, performing reliability checks requires the researcher to be in the classroom using procedural and intervention checklists often during each condition for each student. In addition, assessment procedures must ensure timely communication patterns through administrating, scoring, and interpreting results. First, the instructor, researcher, or technology system gathers student performance data on reading component skill(s). Then, the researcher(s) scores the assessment, using procedures to document interobserver agreement. During the preintervention condition, the researcher evaluates student performance to determine the appropriate time for condition change (i.e., after stability among data points is established) and communicates the next steps to the instructor. Likewise, during the intervention condition, the researcher communicates student performance trend and level to the instructor to determine if any adaptations to the original intervention are needed to improve student learning.
Social Validity
Social validity, a secondary measure in most applied SCED studies, was addressed in Regan et al. (2014) by soliciting student feedback about games played using interview questions to determine the extent to which students liked or found features of the Lexia CAI program to be helpful. This type of participant feedback is consistent with recommendations for applied SCED research. Since researchers implemented the intervention, information from teachers (e.g., feasibility) was not obtained.
Social validity is one indicator of the extent to which an intervention has practicality and is viewed as being useful and helpful by participants in research studies (Leko, 2015). Although social validity is a valuable addition to any applied research, it is common within SCEDs. Furthermore, consideration of whether an intervention is practical and/or cost-effective and that student outcomes are socially meaningful are indicators of high-quality applied research (e.g., Cook et al., 2015; Horner et al., 2005).
Displaying and Interpreting Results
Within SCED research, the heart of establishing evidence of an effect is the comparison of a student’s behavior after receiving intervention to the student’s behavior before the intervention (e.g., “B” performance compared with “A” performance). Traditionally, visual analysis approaches have been used, in some cases with data overlap metrics to quantify and evaluate evidence of a functional relation (Barton et al., 2018). More recently, statistical analyses have begun to be explored that enable SCED data to be aggregated across studies. Each of these approaches will be discussed next using excerpts from Regan et al. (2014) to discuss implications for basic reading research with students who display uneven skill development.
Visual Analysis
Criteria for visual analysis include an evaluation of changes in six domains: (a) level, (b) trend, (c) variability, (d) immediacy of effects, (e) overlap, and (f) consistency of data patterns across similar conditions. These characteristics can be examined between each condition for each participant or skill depending on the design. If improvements in reading outcomes occur when, and only when, an intervention is initiated, and those changes are replicated across behaviors/skills/participants, one would conclude that the intervention is responsible for the observed change.
These visual analysis criteria were used by Regan et al. (2014) to identify a clear effect for 8 out of 12 opportunities to replicate an effect from Lexia (3 Reading Skills × 4 Participants). Although some students reached mastery with CAI alone, others needed a double dose of the intervention with supplemental instruction. Researchers probed during all conditions (probe, instruction, postinstruction, maintenance, and generalizations) and analyzed data accordingly. Had researchers decided not to collect data during the instructional condition (as is common for interventions that take time for skill acquisition to occur), evaluation using traditional visual analysis techniques would have been more conclusive, but important information related to skill acquisition would have been lost. This is illustrated in data displayed in Figure 1 (left panel) from the example study.

Comparison of data displayed as baseline versus intervention and baseline versus postintervention condition.
Figure 1 displays baseline probe and intervention data (both during and after intervention) for “Doug.” The following visual analysis findings were reported: For Skill 1, an immediate effect was not demonstrated but the slight downward trend during baseline was reversed with minimal latency on introduction of Lexia instruction. For Skill 2, although Doug exhibited variability during baseline and instruction, data exhibited a clear upward trend after Lexia was introduced. For Skill 3, Doug displayed variability in baseline with a slight level change when Lexia was introduced; however, variability in performance continued. Doug displayed mastery (>90% for 3 days) after Lexia alone for Skills 1 and 2, and after Lexia Modified for Skill 3. However, Doug needed twice as many sessions to do so for Skill 3. (Regan et al., 2014, pp. 113–114)
Notice in Figure 1 (right panel) how the data display for Doug differs when only postintervention data are graphed. The description of study findings for the same data would have more clearly demonstrated an effect for each skill. The description of this data presentation might have been: “Across all three target skills there was a clear and immediate level change that met the required mastery level (>90% over 3 consecutive days) following intervention.” Note that making the a priori decision to only collect data after intervention was completed and students had time to master the requisite skill would have resulted in effects that could be more clearly evaluated; however, information about the intensity of instruction and additional needed instructional supports would not have been collected thus limiting the value of the outcomes.
The WWC has recently made changes to address the competing demands of evaluating outcomes versus gaining information from the instructional process by intentionally separating the Intervention Phase into an Instructional or Training Phase and a Post-Intervention Phase (WWC, 2017). Although this may seem like a subtle distinction, it has important implications for maximizing the value of SCED research. Inclusion of an instructional phase is primarily used when researchers have reason to believe that the intervention is complex and comprehensive and will take more than two or three sessions to learn (which applies to many academic interventions, including reading). Specifically, effects of the intervention and individual probe data may be reported during instruction and after instruction has been completed, thus allowing researchers to use postintervention data to determine the effectiveness of the intervention without sacrificing knowledge that might be gained during the intervention itself. Figure 2 displays data from another student (“Gus”) in a way that reflects these WWC guideline changes.

Baseline versus intervention conditions (with instruction and postintervention delineated).
WWC guideline changes do not address all unique methodological considerations that reading researchers are likely to confront due to the nature of basic reading development and students with uneven skill development. For example, notice the variability of Gus’ data in the probe phase (see Figure 2). Such variability is common for students with uneven learning profiles and is a reason that reaching consistent automaticity in basic reading is critical to their overall success with both accessing text and comprehending meaning. However, this data instability pattern interferes with many data overlap metrics. Although researchers might take steps to adjust for such data instability (e.g., a Tau-U correction), this sort of data variability is likely to be looked at as a study flaw by those expecting stable data.
Another methodological consideration when conducting and evaluating basic reading interventions relates to the nature of reading development. Notice Probe 3 data variability for “Gus” on Skill 3 (see Figure 2). For each skill, student performance improves after introduction of the intervention demonstrating automaticity at a mastery level both immediately following intervention and after a delay. The student does not, however, reach the desired skill automaticity at the established target mastery level during Probe 3, though there is an improving trend. As noted earlier, Regan et al. (2014) took steps to ensure that each word directly assessed skills taught (e.g., medial vowel sounds, two-syllable words, suffixes) and that all probe words were of equal length and comparable difficulty. As students are exposed to word patterns, many of these words will become part of their sight word vocabularies through the course of the intervention or through exposures to print outside of the instructional context.
Data Overlap Metrics
In addition to visual analysis, SCED researchers sometimes use standard overlap-based metrics, such as percentage of non-overlapping data (Scruggs & Mastropieri, 1998) and percentage of data points exceeding the median (Ma, 2006), that assist in establishing evidence of an effect. Because nonoverlap techniques are based on a comparison of individual data points, a large data set is not needed. With the exception of Tau-U, most overlap-based metrics do not have a mechanism to correct for a baseline trend (Parker, Vannest, & Davis, 2011), and as such, there is growing consensus that Tau-U is an effective data-overlap metric for SCED research (Parker, Vannest, Davis, & Sauber, 2011). Tau-U has several advantages that make it useful for analysis of SCED research: (a) it is more powerful than parametric techniques for most single-case study data that do not conform to parametric assumptions; (b) it follows the “S” sampling distribution so p values and confidence intervals are available; (c) it can control for preintervention trends when present; (d) it has similar power as linear regression techniques when data meet parametric assumptions; and (e) data for several phase contrasts can be analyzed independently and then combined to examine an overall or omnibus effect size (Parker, Vannest, Davis, & Sauber, 2011; Parker et al., 2014; Vannest et al., 2016).
Table 1 shows how data might have been reported for “Doug” in the Regan et al. (2014) study following non-overlap metrics. They vary based on the chosen metric and the decision about which data to collect and report.
Data Overlap Metrics for “Doug.”
Note. PND = percentage of nonoverlapping data; PEM = points exceeding the median.
Median metric.
Although overlap metrics can be useful in understanding the functional relation of single-subject data, Moeyaert and colleagues (2018) caution against referring to overlap metrics as “effect sizes.” Overlap metrics are not comparable to group effect sizes and these metrics cannot be combined with effect sizes from group studies within meta-analysis.
Meta-Analysis
Some researchers have placed value on the use of meta-analysis in educational research as a means to empirically evaluate findings from a body of research (e.g., Cooper et al., 2019). Meta-analysis allows researchers to combine knowledge from multiple studies on a specific topic that supports increased confidence in outcome findings (Horner et al., 2012). To include SCED studies in meta-analyses, an effect size statistic is required; however, there is not agreement about the best way this should be accomplished (Shadish et al., 2015). Effect size formulas used in group experimental research (e.g., Cohen’s d, Hedges g) allow for aggregation of findings across studies; however, these formulas have historically been considered inappropriate for use with SECD research because (a) they do not accommodate trend, and (b) power is substantially affected by small sample size, potentially resulting in effect sizes that are so overinflated as to be meaningless (Horner et al., 2012; Shadish et al., 2015). Advances in research methodology are starting to challenge this assumption as it relates to between-case effect sizes. When appropriate between-case effect statistics are used researchers can compare effect sizes from SCED and group experiments (Shadish et al., 2015). There are also emerging potential approaches (regression, hierarchical linear modeling [HLM]) for meta-analysis that may be more appropriate than calculating mean-difference effect sizes for meta-analysis purposes, but these approaches are not yet widely used (Moeyaert et al., 2018).
Discussion
In this article, we highlighted the unique characteristics of SCED research that can enhance our understanding of the effects of a basic reading intervention beyond outcomes for average-performing students. As illustrated in this article, linear graphic representation of student SCED data, visual analysis of these data, and statistical analysis methods will vary based on the SCED employed. Regardless of the design selected, SCED studies yield critical information about the effects of an intervention for a range of learners. This helps reading researchers identify which skills in a learner’s profile change, components of an intervention that trigger that change, and conditions under which it does so. In particular, SCED designs allow researchers to “chase the tail” (White, 2016) by unpacking the learning experiences of those outside the margins of average performance. Understanding the impact of an intervention for an individual is something that group designs are inherently unable to do. Although not all SCED options are appropriate for academic research, including basic reading research (e.g., withdrawal designs), thoughtful application of multiple-probe and multiple-baseline designs have the potential to yield meaningful information that adds to the research base on the effectiveness of an intervention. Furthermore, the contribution of information from SCED studies is enhanced by the detailed description of individual participants and social validity measures that are characteristic of the methodology.
In this article, we have also illustrated why SCED research is not for the faint of heart. Although there are not large numbers of participants included in SCED studies, it is a labor-intensive methodological choice that requires procedures be strictly followed. Although we believe that information gained using such designs outweighs the commitment of needed resources, we have illustrated that careful consideration is needed by researchers who use these designs to evaluate basic reading interventions for students with LD. These considerations are also important for journal and funding reviewers who evaluate the merits of the research and research proposals (see also Tate et al., 2016).
Future Directions
We have highlighted three areas that need careful consideration when conducting SCED research on the effectiveness of basic reading interventions for students with LD: covariation of dependent variables, variability of preintervention baseline/probe data, and immediacy of change. Some issues around probable, and potentially unavoidable, covariation of basic reading skills were presented. Specifically, we highlighted the role that exposure/repetition of basic reading skills may have in drawing the learner’s attention to the targeted discrete skill (e.g., individual phonemes in words) resulting in memorization and incorporation of this new knowledge into the learner’s background knowledge. This learning is difficult to control, and more importantly, it would be unethical to attempt to do so.
Another issue we illustrated through our example study is the likelihood of variability of preintervention probe data when investigating the academic performance of students with LD. Tau-U has emerged as an approach to statistically correct for such variability. However, we would argue that there is a need for a broader understanding that, in the area of basic reading, reducing data variability is often an instructional goal in itself. Although SCEDs tend to provide this type of empirical data, reviewers not well versed in SCED behavioral research might view such preintervention data variability as a fatal study flaw preventing important findings from being disseminated.
A final issue that we highlighted in this article is the challenge of establishing immediacy of change upon the introduction of an intervention. A potential solution to this issue is addressed in the WWC (2020) recommendations described in this article. We are hopeful that the practice of collecting data in both intervention and postintervention phases and using post-intervention data to evaluate a functional relation of the intervention will become standard best practice for academic research. Whether intervention or postintervention data are used has implications for use of overlap metrics and consistency in determining evidence of an effect both within and across studies (e.g., meta-analysis). Furthermore, collection and reporting of data during intervention is important for retaining important information about the intervention itself and adjustments needed for individual learners.
Implications for Research and Practice
Extensive rationales for the use of SCED with students, including those with LD or at risk of LD, have been provided by multiple researchers (Horner et al., 2005, 2012; Kratochwill et al., 2010; Ledford & Gast, 2018; Shadish et al., 2015; WWC, 2020). These researchers recommend the use of SCEDs when studying academic interventions to identify what works with individual students across a variety of academic skill areas. We support this recommendation. We also believe that further examination of methodological considerations for basic reading research would be a worthwhile endeavor, especially for students with LD or at risk of LD.
Footnotes
Acknowledgements
We would like to express our sincere appreciation to David Gast for his feedback on drafts of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
