Abstract
This synthesis extends a report of research on extensive interventions in kindergarten through third grade (Wanzek & Vaughn, 2007) to students in Grades 4 through 12, recognizing that many of the same questions about the effectiveness of reading interventions with younger students are important to address with older students, including (a) how effective are extensive interventions in improving reading outcomes for older students with reading difficulties or disabilities and (b) what features of extensive interventions (e.g., group size, duration, grade level) are associated with improved outcomes. Nineteen studies were synthesized. Ten studies met criteria for a meta-analysis, reporting on 22 distinct treatment/comparison differences. Mean effect sizes ranged from 0.10 to 0.16 for comprehension, word reading, word reading fluency, reading fluency, and spelling outcomes. No significant differences in student outcomes were noted among studies related to instructional group size, relative number of hours of intervention, or grade level of intervention.
Schoolwide models for literacy instruction and intervention, such as response to intervention (RTI) models, are designed to address the large numbers of students struggling with reading in our schools by ensuring effective, evidence-based general education instruction; early identification of students struggling with reading; high-quality intervention; and progress monitoring for informed decision making (Fletcher, Lyon, Fuchs, & Barnes, 2007; Jimerson, Burns, & VanDerHeyden, 2007). A key component of RTI models is the provision of increasingly more intensive interventions (e.g., use of smaller instructional groups, more time spent in intervention, more individualized intervention) for students who demonstrate insufficient response to instruction (Bradley, Danielson, & Doolittle, 2005). In 2007, Wanzek and Vaughn published a review of research-based implications from extensive reading interventions (i.e., provided for 100 or more sessions), reporting the relative effects of these interventions for students with reading difficulties or disabilities in kindergarten through third grade. That research synthesis addressed some of the fundamental questions related to effective implementation of RTI models in the early elementary grades for students with reading difficulties and/or disabilities. Specifically, the Wanzek and Vaughn synthesis examined (a) outcomes for students after participating in extensive early reading interventions and (b) features of interventions associated with high effect sizes, including instructional group size, duration, and whether the intervention was standardized across students or individualized to meet individual student goals.
For students with reading difficulties, understanding outcomes and the associated features of instruction contributing to those outcomes can inform instructional practice and assist educators in making decisions regarding students’ response to intervention. The Wanzek and Vaughn (2007) early elementary synthesis indicated that the vast majority of studies yielded positive reading outcomes, particularly when students were instructed in the smallest group sizes (e.g., one on one, small groups) and when interventions targeted kindergarten or first grade. The extant research provided little data to inform questions about whether standardized or individualized interventions were more effective, as the identified studies did not describe any individualized interventions. In the literature, highly standardized interventions and those with less standardization (i.e., more responsive to individual student needs) were not associated with differential impact.
The purpose of the synthesis provided in this article is to extend the examination of extensive interventions conducted by Wanzek and Vaughn (2007) beyond third grade to students in Grades 4 through 12, recognizing that many of the same questions about the effectiveness of reading interventions with younger students, including ways to intensify intervention for students with reading difficulties, are important to address with upper-elementary and secondary students. Also, although the evidence base for RTI models was derived primarily from studies conducted in kindergarten to third grade, increasingly, states, districts, and schools are implementing RTI approaches in Grades 4 through 12 (Vaughn & Fletcher, 2012), and research-based guidance for effective, intensive intervention implementation is required. Therefore, we selected Grades 4 through 12 as the target of this synthesis for two reasons: (a) The previous synthesis of extensive interventions was conducted for students in kindergarten through third grade and thus did not include data for the upper-elementary and secondary grades, and (b) although RTI was initially based on research in early elementary grades, it has increasingly been recommended for use by state departments of education and school districts for students after Grade 3 and through the secondary grades.
Reading Difficulties With Students in Grades 4 to 12
After third grade, the emphasis on instruction in learning to read often begins to fade from instruction in the general education classroom, meaning students who do not read proficiently by the end of Grade 3 may face serious consequences in their academic achievement. Chall and Jacobs (1983) noted that many low-income third graders reading at grade level experience a drop in reading scores by fourth grade. The researchers referred to this phenomenon as the fourth-grade slump, indicating that these students fail to thrive and can no longer meet grade-level expectations. More recent research has documented that reading difficulties can either persist in upper-elementary and later grades or, in some cases, have their initial onset in Grade 4 or beyond (Compton, Fuchs, Fuchs, Elleman, & Gilbert, 2008; Leach, Scarborough, & Rescorla, 2003; Vaughn, Cirino, et al., 2010); thus, school personnel are increasingly interested in the efficacy of reading interventions as a means of remediating these students’ reading problems. Findings from intervention studies with students in fourth grade and higher can provide educational decision makers with the knowledge of research-based practices associated with improved reading outcomes for students in these grades, including data on the features of instruction associated with the improved outcomes.
As would be expected, students in Grades 4 to 12 manifest a wide range of reading difficulties, including students who demonstrate reading achievement just below grade-level expectations; these students often require direct support for vocabulary and comprehension but are generally able to read and learn from text (Torgesen et al., 2007). Other students in Grades 4 and beyond demonstrate reading achievement more than two grades below expectations and are unable to read grade-level text, thus demonstrating more significant word reading and fluency problems as well as vocabulary and comprehension difficulties (Cirino et al., 2012). Studies conducted with students in Grades 4 and older have included students identified with reading difficulties based on reading below grade level (e.g., low word reading and/or low reading comprehension) and students identified with more significant reading disabilities, including students with and without identified learning disabilities (LD).
Interventions for Students With Reading Difficulties in Grades 4 to 12
Several syntheses related to reading interventions provide an understanding of the effects of instructional reading practices on the reading outcomes of students with reading difficulties in Grades 4 through 12; however, none of these syntheses address extensive interventions. In 2008, the Institute of Education Sciences (IES) issued a research-based practice guide on adolescent literacy (Grades 4–12), making recommendations for effective instruction (Kamil et al., 2008). Evidence used for this synthesis was not restricted to students with reading difficulties. Strong evidence was found in the research to support three instructional recommendations: (a) providing explicit vocabulary instruction, (b) using direct and explicit comprehension strategy instruction, and (c) providing struggling readers with intensive and individualized interventions delivered by trained specialists. Moderate evidence was provided for including opportunities for extended discussion and interpretation of text meaning in instruction and for increasing student motivation and engagement in literacy learning. Based on the evidence, the authors recommended intensive intervention efforts for students with reading difficulties in Grades 4 through 12 who do not perform at or near grade level. The authors also recommended supplemental, small-group instruction for extended periods of time but did not address how extensive these interventions should be.
Edmonds et al. (2009) conducted a synthesis to address the efficacy of reading interventions that include instruction in decoding, fluency, vocabulary, or comprehension—alone or in combination—on the reading comprehension outcomes of secondary students (Grades 6–12) with reading difficulties and reading disabilities. A subset of studies from this synthesis (13 studies) met criteria for a meta-analysis, yielding an effect size (ES) of 0.89 for the weighted average of the difference in comprehension outcomes between treatment and comparison students. The synthesis demonstrated that many intervention types—including multicomponent (e.g., vocabulary, comprehension), comprehension strategy instruction, fluency, and even word-level interventions—were associated with improved comprehension outcomes, providing an optimistic view of the overall effects of targeting secondary students with reading difficulties for further reading intervention. In these interventions, word reading instruction often included spelling instruction and activities. In addition, comprehension activities sometimes incorporated instruction and practice in written responses. Spelling and writing are incorporated in some reading interventions because the skills associated with successful reading—such as phonological knowledge, text structure knowledge, and reasoning—also play a role in spelling and writing (Abbott & Berninger, 1993; Graham, Harris, & Chorzempa, 2002; Wanzek et al., 2006). The studies did not analyze the features of instruction—such as instructional group size, duration, and grade level—and only one of the studies was implemented for more than 40 sessions.
Related to the Edmonds et al. (2009) synthesis are the findings from a meta-analysis Scammacca et al. (2007) conducted to address a range of reading outcomes from reading interventions for struggling students (including students with LD) in Grades 4 to 12. Findings indicated that many intervention types—including those that focused on word reading/spelling, fluency, vocabulary, comprehension, and multiple components—were associated with improved outcomes, with an overall effect size of 0.95. The largest effects on reading comprehension outcomes were from studies with a multicomponent emphasis (ES = 0.80) or a comprehension focus (ES = 1.35). The studies that included students with LD had higher effects across outcome measures than the studies with only struggling readers, but reading comprehension outcomes were similar for studies with and without students with LD.
Both Edmonds et al. (2009) and Scammacca et al. (2007) noted smaller mean effects when considering only norm-referenced outcome measures. These smaller effects were also documented in a recent meta-analysis of intervention studies for students with reading difficulties in Grades 5 to 9 that focused only on studies measuring outcomes with norm-referenced reading measures (Flynn, Zheng, & Swanson, 2012). The authors reported a mean effect size of 0.41, noting no significant differences in effects as a function of the intervention focus, type of outcome measure, age, grade, number of sessions, length of sessions, or weeks of intervention. However, the studies included in the synthesis provided a mean of 10 weeks of intervention with an average of 41 instructional sessions. We anticipate that the current synthesis can make important contributions to the field by examining the currently unaddressed relative effects of more extensive interventions for students beyond Grade 3 with reading difficulties, assisting both researchers in identifying promising intervention practices and practitioners and policymakers in making intervention decisions.
To better inform practices related to extensive interventions implemented in the upper-elementary and secondary grades, we used the same criteria for identifying articles (the Method section of this article specifies precise criteria) as the previous synthesis (Wanzek & Vaughn, 2007), with one exception. For students in Grades 4 through 12 (the grade levels excluded from the previous synthesis), we identified extensive interventions as those with 75 or more sessions, contrasting with the 100 or more sessions criterion of the previous early elementary synthesis. Our rationale for using 75 sessions as the cutoff criterion is that the majority of extensive intervention studies have been conducted with young readers; thus, we anticipated that the extant literature in the upper-elementary and secondary grades would yield fewer studies with extensive interventions. Additionally, 75 sessions is very close to what an interventionist can implement in one semester in a secondary setting, thus better aligning criteria with schooling practices for readers in Grades 4 through 12 with reading difficulties. However, we disaggregate and discuss the findings for the studies with 100 or more sessions as a means of contrasting the current synthesis findings for Grades 4 through 12 with the previous early elementary synthesis findings.
Research Questions
None of the previous syntheses of intervention provided in the upper-elementary or secondary grades examined the effects of extensive interventions on students’ outcomes. As a result, this synthesis extends the previous kindergarten to third grade synthesis of extensive interventions (Wanzek & Vaughn, 2007) by addressing two questions related to extensive interventions for struggling readers in Grades 4 through 12:
Research Question 1: How effective are extensive interventions in improving reading outcomes for students with reading difficulties or disabilities?
Research Question 2: What features of extensive interventions (e.g., group size, duration, grade level) are associated with improved outcomes for students?
Method
We conducted a comprehensive search of the literature through a three-step process. We first conducted a computer search of ERIC and PsycINFO to locate studies published between 1995 and 2011. We selected this range to reflect the most current research on this topic and as a way to continue the work of the Wanzek and Vaughn (2007) synthesis on extensive reading interventions for students in kindergarten to Grade 3, which included studies from 1995 to 2005. Thus, by extending the previous synthesis, we identified studies from more than 15 years of reading intervention research. We used key disability search terms or roots (reading difficult*, learning disabil*, disorder*, at-risk, high risk, disabil*, dyslex*) in combination with key reading terms and roots (reading interven*, instruction, special educ*, phon*, fluency, vocab*, comprehen*) to capture the greatest possible number of articles. Second, to assure coverage, we conducted a 2010 and 2011 hand search of eight major journals commonly reporting reading intervention research for students with reading difficulties or disabilities (Exceptional Children, Journal of Educational Psychology, Journal of Learning Disabilities, Journal of Special Education, Learning Disabilities Research and Practice, Reading Research Quarterly, School Psychology Review, Scientific Studies of Reading). Third, we searched for studies conducted by IES and posted on its website: www.ies.ed.gov.
We selected studies based on the following criteria:
The study was reported in English in a peer-reviewed journal or published on the IES website.
Participants were students with LD or reading difficulties (i.e., below expected grade level in reading achievement). We included studies with additional participants if disaggregated data were provided for the students with LD or reading difficulties.
The participants were enrolled in Grades 4 through 12 (ages 10–18). We included studies with additional participants if data were disaggregated for participants in Grades 4 through 12 or when more than 50% of the participants were in Grades 4 through 12.
Interventions targeted reading in an alphabetic language, were provided for 75 or more sessions, and were not part of the general education curriculum provided to all students. We did not require studies to provide information related to student response to previous interventions.
Interventions were provided as part of the school day programming (not home, clinic, or camp programs).
Dependent variables addressed reading outcomes.
The research design was experimental, quasi-experimental, single group, or single case.
The initial search yielded 24,720 abstracts. This search was intentionally broad to ensure we captured all possible articles. As a result, many studies in the initial search were topically (e.g., fluency) related but did not meet criteria for this synthesis (e.g., examinations of biomarkers, psychoses, or Alzheimer’s). Explicit statements in abstracts indicated 16,157 studies did not meet at least one of the criteria (e.g., commentaries; book reviews; studies of infants, elderly, or cadavers). We examined study details for the remaining 8,563 studies to determine whether criteria were met. A total of 8,290 studies did not provide a reading intervention or did not include students in Grades 4 through 12; another 259 studies did not provide supplemental intervention to students with reading difficulties, did not disaggregate data for students in Grades 4 through 12, or did not provide intervention for 75 sessions or more. Thus, a total of 14 studies from the initial electronic search met criteria. We located three additional studies in the hand search and two additional studies in the IES search (Somers et al., 2010; Torgesen et al., 2006). Accordingly, a total of 19 studies met selection criteria for the synthesis.
Coding Procedures
We employed extensive coding procedures to extract and organize pertinent information from each study. We used the same code sheet that Wanzek and Vaughn (2007) used in their early elementary extensive intervention synthesis. The code sheet was developed based on elements specified in the What Works Clearinghouse Design and Implementation Assessment Device (IES, 2008) and used in previous research (Edmonds et al., 2009; Wanzek et al., 2006). Data were collected on (a) participants (e.g., age, gender, exceptionality), (b) methodology (e.g., research design, assignment), (c) intervention and comparison descriptions, (d) clarity of causal inference, (e) measures, and (f) findings. Participant information was coded by using four forced-choice items (socioeconomic status, use of criteria for classifying students with disabilities, risk type, and gender) and two open-ended items (age or grades as described in text, risk type as described in text). Similarly, design information was gathered by using a combination of forced-choice items (e.g., research design, assignment method, fidelity of implementation, pretest scores) and open-ended items (selection criteria). Intervention and comparison group information was coded by using nine open-ended items (e.g., site of intervention, role of person implementing intervention, hours of intervention, duration of intervention). A written description of the treatment and comparison conditions was also provided. Information on clarity of causal inferences was gathered by using six items for studies with random assignment (e.g., sample sizes, attrition) and nine items for quasi-experimental designs (e.g., equating procedures, attrition rates). Additional items allowed coders to describe the measures, indicate measurement contaminants, and record findings, including data for effect size calculation.
Three people coded the studies—two faculty researchers with doctoral degrees and experience coding and publishing several syntheses and one doctoral student. The training on the code sheet consisted of four parts: (a) instruction on the meaning of each code and indicator with several examples provided, (b) modeling by the trainer (one of the experienced coders) of the coding process with an article from a previous study while thinking aloud about the coding categories, (c) practice coding with discussion of two articles, and (d) a reliability test with the three coders coding the same article independently. Interrater reliability was calculated as the number of agreements divided by the number of agreements plus the number of disagreements. Agreement was calculated separately for each coding category (e.g., participants, design) and ranged from 92% to 100% across categories. Each study was then independently coded by two raters. When discrepancies occurred, meetings were held to discuss the coding, with final judgments reached by consensus.
Effect Size Calculation
For all studies, the Hedges (1981) procedure for calculating unbiased estimates of Cohen’s d was used (this statistic is also known as Hedges’s g). Hedges’s g was calculated by using the means and standard deviations for treatment and comparison groups when such data were provided. In some cases, Cohen’s d was reported and means and standard deviations were not available. For these effects, Cohen’s d and the treatment and comparison group sample sizes were used to calculate Hedges’s g. Each estimate of Hedges’s g was weighted by the inverse of its variance to account for potential bias in studies with smaller samples.
Meta-Analysis Procedures
We included studies in the meta-analysis if they used a treatment-comparison experimental or quasi-experimental design and reported sufficient information to allow effect sizes to be computed. Some research reports contained studies with multiple treatment/comparison groups. In total, we included 10 research reports in the meta-analysis (Calhoon, 2005; Cantrell, Alamsi, Carter, Rintamaa, & Madden, 2010; Lang et al., 2009; Somers et al., 2010; Spencer & Manis, 2010; Torgesen et al., 2006; Vaughn, Cirino, et al., 2010; Vaughn et al., 2011; Vaughn, Wanzek, et al., 2010; Wanzek, Vaughn, Roberts, & Fletcher, 2011) containing 22 distinct treatment/comparison group contrasts. Because all studies used multiple outcome measures, these measures were categorized by type (reading comprehension, reading fluency, word reading, word reading fluency, and spelling), and a separate meta-analysis was conducted for each measure type. Reading comprehension included measures that required students to read text and answer questions or complete sentences in the text. Fluency included measures that assessed students’ rate and accuracy in reading text. Word reading included measures of students’ word reading accuracy, with words or nonwords presented in isolation (not text). Word reading fluency included word reading measures with a rate component. Spelling included measures that assessed students’ ability to encode. For studies that included more than one outcome measure within each category, effect sizes were averaged and the average and its standard error were included in the meta-analysis.
A random-effects model was used to analyze effect sizes. Recent methodological innovations in meta-analysis, such as multilevel modeling (Hox, 2002) and structural equation modeling (Cheung, 2008), were considered as approaches to the random-effects analyses of the effect sizes. However, the small number of effect sizes in each of the analyses (n = 5–22 effect sizes, depending on outcome type) significantly limited the ability to implement multilevel modeling or structural equation modeling. Therefore, a traditional approach was taken to the meta-analysis. Mean effect size statistics and their standard errors were computed and heterogeneity of variance was evaluated by using the Q statistic. When statistically significant variance was found, moderator variables were introduced into the random-effects models, resulting in mixed-effects models. Categorical moderator variables were used in all cases due to the way that the variables of interest were reported in the studies included in the meta-analysis. Moderators included (a) total hours of intervention (less than 115 vs. 115 or more), (b) size of instructional group (one to five students vs. six or more students), and (c) grade level of students (elementary school vs. middle school).
Results
A total of 19 studies with a range of study designs were represented in this synthesis. To fully explore the data, we conducted several types of analyses. First, we present a synthesis of the study features (e.g., sample, intervention) to highlight salient elements across the corpus of studies. Second, we provide the results of the meta-analysis of the treatment-comparison design studies to determine the overall effects of extensive reading interventions (Research Question 1) and possible moderators of the effects (Research Question 2). Third, we synthesize the findings for the nine studies that did not provide sufficient data for inclusion in the meta-analysis (Benner, Nelson, Stage, & Ralston, 2011; England, Collins, & Algozzine, 2002; Gabor, 2010; Graham et al., 2002; Mercer, Campbell, Miller, Mercer, & Lane, 2000; Rankhorn, England, Collins, Lockavitch, & Algozzine, 1998; Snider, 1997; Torgesen et al., 2001; Wilson & Fredrickson, 1995). We use the findings from these additional studies to support or refute findings from the meta-analysis to fully address the research questions regarding the effectiveness of extensive interventions and the features that accompany these interventions and outcomes.
Study Features
Of the 19 studies on extensive interventions that met criteria for inclusion in the synthesis, 10 were published in the past 3 years. Eleven of the studies were published in journals specific to students with disabilities. Table 1 describes the key features of each study.
Features of intervention studies
Note. SR = struggling readers; NR = not reported; LD = learning disabilities; disab. = disabilities.
Study included in the meta-analysis.
This study was conducted outside the United States; therefore, Years 4 through 7 may be a slightly different age range than Grades 4 through 7 in the United States.
Study design
The corpus of studies included 11 treatment and comparison, 1 multiple-treatment, 1 single-subject, and 6 single-group design studies. Twelve of the group design studies randomly assigned participants to conditions, and 10 of these studies included a comparison group with sufficient data for calculating effect sizes. Eleven of the studies reported the fidelity of implementation for the intervention. Seventeen of the studies used standardized reading assessments to measure student outcomes following intervention. Random assignment to treatment, fidelity data collection, and use of reliable and valid standardized measures are three elements of high-quality studies that improve the validity of the findings (Gersten et al., 2005; IES, 2008). The 10 studies included in the meta-analysis randomly assigned students to conditions, reported implementation fidelity, and measured student outcomes with standardized measures.
Sample
The 19 studies included 9,371 students, with sample sizes ranging from 4 to 5,595. Nine of the 19 samples included only students with identified disabilities, largely students with LD. The majority of the studies were conducted in Grades 6 to 8, with 10 studies including one or more of these grades. Six studies included students below Grade 6, and 3 studies included students in ninth grade. No studies examined extensive interventions in Grades 10 through 12.
Intervention implementation
The duration of intervention implementation ranged from 2 to 25 months; however, 17 of the studies implemented interventions ranging from 5 to 9 months. Five studies provided the number of sessions or hours that students received intervention, ranging from 68 to 111.3 hours of intervention. As can be seen in Table 1, the majority of studies provided daily intervention to students; however, the session length varied from 5 to 90 minutes. School staff members implemented the interventions in 12 of the studies—teachers in 8 studies, teachers and paraprofessionals in 3 studies, and paraprofessionals only in 1 study. Training was provided for all implementation. Six studies used trained research staff members to implement the interventions, and one study did not report on the intervention implementers. The large majority of studies included multicomponent interventions consisting of three or more of the following: phonics/word recognition/spelling, fluency, vocabulary, and comprehension. Seven studies provided at least one of the treatment groups with an intervention that emphasized phonics/word recognition or fluency only. Table 2 summarizes the interventions, measures, and key findings by study.
Summary of study findings
Note. T = treatment; C = control; WJ-III = Woodcock Johnson Tests of Academic Abilities, Third Edition; comp. = comprehension; BRC = Basic Reading Cluster; LST = Linguistics Skills Training; PALS = Peer Assisted Learning Strategies; SRA = Science Research Associates; ES = effect size; LSC = Learning Strategies Curriculum; GRADE = Group Reading Assessment and Diagnostic Examination; TRTS = Teaching Reading Through Spelling; CPSS = Cognitive Process Strategies for Spelling; RA = reading age; SA = spelling age; PAT = Progressive Achievement Tests; CASS = Cognitive Aptitude Assessment System; NA = not available; FCAT = Florida Comprehensive Assessment Test; RISE = Reading Intervention Through Strategy Enhancement; SOAR = School Offered Accelerated Reading; CBM = curriculum-based measurement; ORF = oral reading fluency; wcpm = words correct per minute; RAAL = Reading Apprenticeship Academic Literacy; WRMT = Woodcock Reading Mastery Test; GORT-III = Gray Oral Reading Test III; TOWRE = Test of Word Reading Efficiency; PDE = Phoneme Decoding Efficiency; SWE = Sight Word Efficiency; ADD = Auditory Discrimination in Depth Program; EP = Embedded Phonics; TAKS = Texas Assessment of Knowledge and Skills; TOSREC = Test of Silent Reading Efficiency and Comprehension; BAS = British Abilities Scale.
Study included in the meta-analysis.
p < .05. **p < .01. ***p < .001.
Meta-Analytic Findings
Reading comprehension outcomes
The estimate of the mean effect size across the 22 reading comprehension effects included in the analysis was 0.10 (p < .001; 95% confidence interval [CI] [0.06, 0.19]), indicating a small positive effect of intervention on students’ reading comprehension. The variance as measured by the Q statistic was statistically significant (Q = 35.94, p = .022). Analyses were conducted to determine whether differences in mean effect size between studies could be explained by moderator variables. No statistically significant differences were found between groups based on any moderator variable, meaning there was no evidence that intervention effectiveness differed by instructional group size, relative number of hours of intervention, or grade level of intervention. However, it is possible that the moderator analyses were not significant due to the small number of studies that could be included (not all moderators could be coded for all effects due to lack of information in the published research reports). Table 3 presents the effect sizes by moderator, standard errors, and Qbetween statistics.
Results from moderator analysis of reading comprehension outcomes
Reading fluency outcomes
The mean effect size estimate for the nine effect sizes from fluency outcome measures was 0.16 (p = .004; 95% CI [0.05, 0.26]), indicating a small positive effect of intervention on students’ reading fluency ability. The variance associated with the effect sizes was not statistically significant (Q = 5.03, p = 0.76).
Word reading outcomes
The 12 effect sizes from word reading outcome measures had a mean effect size estimate of 0.15 (p = 0.003; 95% CI [0.05, 0.24]), indicating a small positive effect of intervention on students’ word reading outcome scores. The variance was not statistically significant (Q = 9.78, p = 0.55).
Word reading fluency outcomes
Eleven effect sizes were analyzed from word reading fluency outcome measures. The mean effect size estimate was 0.16 (p = 0.001; 95% CI [0.06, 0.26]), indicating a small positive effect of intervention on students’ word reading fluency ability. The variance was not statistically significant (Q = 3.70, p = 0.96).
Spelling outcomes
Five effect sizes were available from spelling outcome measures. Their mean effect size estimate was 0.15 (p = 0.014; 95% CI [0.03, 0.27]), indicating that the interventions had a small positive effect on students’ spelling ability. The variance was not statistically significant (Q = 4.00, p = 0.406).
Publication bias
Publication bias was evaluated by using the trim-and-fill approach (Card, 2012). This approach builds on a visual inspection of a funnel plot of effect sizes for asymmetry. A trim-and-fill analysis is an iterative process that seeks to correct asymmetry in a funnel plot of effect sizes that can result from omission of nonpublished studies that found a null result and a very small effect size. The analysis deletes the effect sizes causing the asymmetry, calculates a mean effect size, and then returns the deleted effect sizes. Imputed effect sizes are added for nonpublished studies that may have been omitted, and the iterative analysis continues until the plot is symmetrical. This analysis seeks to determine whether estimates of mean effect size were biased by the exclusion of effect sizes from nonpublished research and published studies that might have been missed in the literature search. Results indicated that publication bias did not affect the mean effect size estimates for the comprehension, reading fluency, and word reading fluency outcome measures meta-analyses. For the spelling and word reading outcome measures meta-analyses, where the number of effect sizes analyzed was small, the trim-and-fill analyses found evidence of publication bias. Four studies are estimated to be missing from the word reading outcomes meta-analysis, and two studies are estimated to be missing from the spelling outcomes analysis. The mean effect size estimate for word reading measures, including imputed values for missing studies, was 0.10 (95% CI [–0.01, 0.21]). For spelling measures, the estimated mean effect size, including imputed values for missing studies, was 0.11 (95% CI [–0.01, 0.23]). As a result of the adjustment for publication bias, the confidence intervals for both word reading and spelling measures include zero, meaning that it is possible that extensive interventions had no effect on performance in these domains.
Synthesis of Additional Studies
Reading comprehension outcomes
Five studies that did not provide sufficient data for the meta-analysis also examined comprehension outcomes following extensive intervention. These studies examined gains from pretest to posttest for students participating in the interventions. In all five of the additional studies examining comprehension outcomes, statistically significant gains on standardized comprehension measures were noted between pretest and posttest following extensive intervention. Three of these studies reported gains in standard scores, suggesting that students with reading difficulties in these interventions were closing the gap with their average-achieving peers and supporting the significant effect resulting from the meta-analysis.
Torgesen et al. (2001) reported significant mean standard score gains from pretest to posttest of 6.6 to 12.3 on two measures of passage comprehension for two treatments implemented 1:1 for approximately 68 hours for upper-elementary students with LD. There was not a significant difference in comprehension outcomes between students receiving the treatment with extended instruction and practice in phonemic awareness and phonics (Auditory Discrimination in Depth) and students receiving explicit instruction in phonics with more time spent reading and comprehending text (Embedded Phonics). However, gains in standard scores suggest that students accelerated learning above expectations.
Graham, Bellert, Thomas, and Pegg (2007) also reported significant differences between pretest and posttest raw scores on a standardized measure of comprehension for struggling readers in Grades 5 through 7 following a fluency-based intervention provided three times per week for 26 weeks (30-minute sessions) in groups of two students. The comprehension of a comparison sample of average- and high-achieving readers did not improve significantly over the same time period, suggesting students receiving the intervention accelerated their learning; however, comprehension levels of the struggling readers were still significantly lower than the average- and high-achieving readers at posttest.
Three single-group studies (Benner et al., 2011; England et al., 2002; Rankhorn et al., 1998) also reported significant differences between pretest and posttest scores on standardized measures of passage comprehension. England et al. (2002) and Rankhorn et al. (1998) reported standard score gains of 11 to 12 points from pretest to posttest. Each of these interventions was provided daily (30–45 minutes) over a full school year in small groups to students with LD in the upper-elementary grades (England et al., 2002; Rankhorn et al., 1998) or upper-elementary and middle grades (Benner et al., 2011).
Reading fluency outcomes
Three studies examined pretest to posttest gains for students’ reading fluency following intervention (Mercer et al., 2000; Snider, 1997; Torgesen et al., 2001). Supporting the small effect noted in the meta-analysis, gains in words correct per minute (wcpm) ranged from 30 to 65 across the studies, but most students remained below grade-level expectations after intervention. Torgesen et al. (2001) noted significant mean standard score increases of 4.1 points for students receiving the Auditory Discrimination in Depth intervention and 0.6 points for the Embedded Phonics intervention. The mean increase in wcpm across two passages was 62 to 63 wcpm, with ending levels of reading fluency above 100 wcpm for the upper-elementary students participating in the interventions. The Auditory Discrimination in Depth treatment demonstrated significantly higher reading fluency scores than the Embedded Phonics group following treatment.
Snider (1997) conducted a single-case study of Reading Mastery and Corrective Reading intervention provided daily for 30 to 45 minutes in small groups for one school year. Four of the students in the study were in fourth grade and were identified with LD. These students demonstrated gains of approximately 30 to 65 wcpm. Although three students continued to perform below fourth-grade fluency expectations following intervention, one student obtained 126 wcpm by the end of the intervention.
Mercer et al. (2000) implemented the Great Leaps fluency intervention 1:1 in short, daily sessions of 5 to 6 minutes for either 6 to 9 months, 10 to 18 months, or 19 to 25 months for students with LD in middle school. Significant differences between pretest and posttest scores on curriculum-based oral reading fluency measures were noted for all groups, regardless of the amount of intervention received. Mean oral reading fluency gains ranged from 32 to 40 wcpm from pretest to osttest. Students in the intervention for 19 to 25 months improved the most in wcpm; however, the mean reading rate was 69 wcpm at posttest, still well below expected reading rates for eighth-grade students.
Word reading outcomes
Six studies that could not be included in the meta-analysis measured the effects of extensive intervention on word reading and decoding (Benner et al., 2011; England et al., 2002; Gabor, 2010; Rankhorn et al., 1998; Torgesen et al., 2001; Wilson & Frederickson, 1995). Most studies indicated significant differences from pretest to posttest in student word reading achievement. In addition, three studies noted gains in standard scores from pretest to posttest, suggesting intervention participants moved toward closing the gap with grade-level expectations. Wilson and Frederickson (1995) conducted a quasi-experimental study comparing a phonics, word reading, and fluency intervention to a business-as-usual comparison condition for struggling readers in Grades 4 to 7. The intervention was implemented in 20-minute sessions, 4 days a week for 20 weeks in groups of six students. Although data for calculating effect sizes were not included, significant differences in favor of the treatment condition were reported on raw scores for reading word list accuracy. However, no significant differences were noted for nonword reading.
Torgesen et al. (2001) noted more consistent results for two treatments provided for approximately 68 hours. Standard score gains of 27.9 on decoding and 13.5 on word reading for the Auditory Discrimination in Depth intervention and 20.2 on decoding and 14.1 on word reading for the Embedded Phonics intervention were reported, suggesting substantial gains relative to expected progress from the normative group. The four single-group studies implemented intervention for approximately one school year (30–45 minutes, 3–4 days per week) in small groups and also reported significant differences between pretest and posttest on standardized measures of decoding and word reading. Two of these studies noted gains in standard scores of 6 to 10 points from pretest to posttest (England et al., 2002; Rankhorn et al., 1998).
Word reading fluency outcomes
Two studies that could not be included in the meta-analysis examined the impact of extensive intervention on word reading or decoding fluency (Torgesen et al., 2001; Wilson & Frederickson, 1995). The study with largest word reading gains (Torgesen et al., 2001) also noted significant differences from pretest to posttest in word reading fluency outcomes for both treatment groups, with increased standard scores of 4.8 to 9 points. However, Wilson and Frederickson (1995) did not note significantly higher levels of word reading fluency for students receiving a treatment focused on phonics, word reading, and fluency when compared to a business-as-usual condition. The Wilson and Frederickson result differs from the small positive effect noted in the meta-analysis for word reading fluency. Torgesen et al. provided approximately 68 hours of 1:1 intervention over 8 weeks for upper-elementary students with LD. Wilson and Frederickson provided upper-elementary and middle school students with reading difficulties approximately 27 hours of intervention over 20 weeks in small groups of six students.
Spelling outcomes
Five additional studies measured spelling outcomes following extensive intervention (England et al., 2002; Gabor, 2010; Rankhorn et al., 1998; Torgesen et al., 2001; Wilson & Frederickson, 1995). Four of the studies reported significant differences between student pretest and posttest scores for spelling outcomes. In the Torgesen et al. (2001) study, only the Embedded Phonics group demonstrated significant increases from pretest to posttest on a standardized measure of spelling (5.6 standard score increase). Wilson and Frederickson (1995) noted significant differences in spelling age gains from pretest to posttest for students receiving a multicomponent treatment compared to typical practice gains. Rankhorn et al. (1998) reported pretest to posttest standard score gains of 10 points following 7 months of daily small-group instruction (30-minute sessions) using the Failure Free Reading program.
In contrast, England et al. (2002) reported no significant pretest to posttest gains in spelling following implementation of Failure Free Reading for one school year. Gabor (2010) implemented one of two interventions with students with dyslexia. One intervention focused on synthetic phonics instruction with articulation and speech training and analysis of the structure of language, including syntactic and semantic knowledge. This intervention was provided to eight of the students in the sample. Four students who had basic alphabetic principle knowledge were provided the Cognitive Process Strategies for Spelling program, but this intervention was not further described. Data were presented for all 12 students aggregated, regardless of which intervention they received. Significant differences between pretest and posttest were noted on a standardized measure of spelling, but specific scores were not provided.
Discussion
This synthesis reports on the effects of extensive reading interventions provided for 75 or more sessions to students with reading difficulties or disabilities in Grades 4 and higher. Our purpose was to update a similar synthesis of extensive reading interventions for students in kindergarten through third grade (Wanzek & Vaughn, 2007) by extending the work beyond Grade 3. Overall, the findings for students in Grades 4 through 12 indicated a small effect for extensive interventions on reading comprehension, reading fluency, word reading, word reading fluency, and spelling outcomes. In addition, the quality of the studies with sufficient data to be included in the meta-analysis was high, increasing confidence in the results. Thus, the findings suggest extensive interventions can have a small, positive impact on student learning across a variety of reading outcomes.
Following a pattern of findings in which studies that are more rigorous yield smaller effects than those that are less rigorous (Swanson, Hoskyn, & Lee, 1999), the small effects noted for extensive interventions were notably lower than effects reported in previous syntheses of reading interventions for adolescents. Edmonds et al. (2009) reported an overall effect of 0.89 on comprehension outcomes for interventions provided to students in Grades 6 to 12, and Scammacca et al. (2007) reported an overall effect of 0.95 across several reading outcome measures for interventions provided in Grades 4 to 12. Although previous meta-analyses have reported substantially lower effect sizes when only standardized outcome measures were included—ES = 0.47 (Edmonds et al., 2009), ES = 0.41 (Flynn et al., 2012), and ES = 0.42 (Scammacca et al., 2007)—these effect sizes were still higher than those noted in the current synthesis of extensive interventions.
The previous syntheses largely included studies providing intervention for less than 40 sessions, and only one of the included studies provided extensive intervention (more than 75 sessions). Although we do not suggest that the differences in reported outcomes between syntheses are due to the duration of the interventions, the finding of shorter interventions demonstrating higher effects has been noted in other syntheses (Elbaum, Vaughn, Hughes, & Moody, 2000; Wanzek et al., 2006). It may be that students experience an initial boost in learning early in the intervention, perhaps due to the addition of instructional time or the novelty of a new intervention, but further research on this finding would provide better explication of this phenomenon.
The conclusion that extensive interventions for students with reading difficulties or disabilities in the upper grades can yield a small effect on a variety of reading outcomes is important as a means of verifying the value of continued reading intervention for students beyond Grade 3. The large majority of the studies in this synthesis implemented multicomponent interventions, which could be a reflection of the needs of many adolescent struggling readers who demonstrate difficulties in lower-level skills such as word recognition as well as problems in higher-level skills such as vocabulary or comprehension. We noted very little variance in effects across the interventions for outcomes related to reading fluency, word reading, word reading fluency, and spelling. Although there was more variance in effects on comprehension outcomes across the interventions, this variance was not explained by differences in the intensity of intervention (number of hours or instructional group size) or grade level. Thus, there was no evidence that student outcomes differed in relation to the relative number of hours in intervention, whether the intervention was provided in small or large groups, and whether the intervention was provided in upper-elementary or secondary grades. It should be noted that due to lack of information provided in the studies, we were unable to examine the number of hours of intervention as a continuous variable, which would provide more specific information regarding the effect of hours in intervention. In examining the effect sizes across the studies, larger effect sizes for comprehension measures did appear more often in studies that included students with LD. However, only three studies disaggregated the data for students with LD, and we do not have evidence that the larger effect sizes are meaningful differences, suggesting this trend requires further research.
Although instructional group size is often noted as an important intervention variable for early elementary students (Elbaum et al., 2000; Vaughn et al., 2003; Wanzek & Vaughn, 2007), this synthesis did not find support for instructional group size as a significant moderator of effects for students in Grades 4 through 12, despite the variety of group sizes noted in the corpus of studies. This finding aligns with an experimental study that directly compared large- (10–15 students) and small-group (3–5 students) extensive intervention at the sixth-grade level, reporting no differences in student outcomes based on group size (Vaughn, Wanzek, et al., 2010). There are several possible interpretations of this finding. First, perhaps group size needs to be reduced further to yield effects. Second, teachers in these studies may not have adequately differentiated instruction, so that adjustments in group size are associated with differential outcomes. Third, for students struggling with reading after Grade 3, receiving the same instruction in a smaller group size may not be sufficient for improving student outcomes. Continued research examining the types of instruction and the features of instruction needed to significantly accelerate learning for adolescents with reading difficulties and disabilities is needed, and effective grouping practices may still need to be defined within these interventions.
Studies With 100 or More Sessions
The Wanzek and Vaughn (2007) synthesis of extensive interventions for kindergarten to Grade 3 included studies with 100 or more sessions of intervention. To compare findings, we present in Table 4 effect size information from the present synthesis of Grades 4 through 12 for the studies with 100 or more sessions (Calhoon, 2005; Cantrell et al., 2010; Lang et al., 2009; Somers et al., 2010; Torgesen et al., 2006; Vaughn, Cirino, et al., 2010; Vaughn et al., 2011; Vaughn, Wanzek, et al., 2010; Wanzek et al., 2011) contrasted with the findings of the previous K–3 synthesis. Overall, the mean effect sizes from the early elementary synthesis were moderate across reading outcomes, and the effect sizes for the upper grades were small.
Comparison of effect sizes in early elementary and upper grades for studies with 100 or more sessions
Note. ES = effect size.
The effect sizes from the early intervention studies show a decreasing trend in impact from the shortest to the longest interventions; however, there is an increasing trend in effect size for longer interventions in the upper grades. It is important to note that length of intervention was not a statistically significant moderator for the upper grades, so these differences may not be reliable and further research is needed. Only one upper-grade study examined intervention provided for more than one year (Vaughn et al., 2011). In this study, only the students who demonstrated insufficient response to previous intervention continued to receive intervention for an additional year, so the effect sizes reported are only for the students with the most persistent reading difficulties. Nevertheless, large effects for reading comprehension outcomes were noted for these students after receiving 2 years of intervention, due largely to the group of students not receiving the treatment falling further behind over time. For secondary students with significant reading difficulties, very intense and sustained interventions may be required to maintain reading growth each year of school.
Due to the small number of studies that provided group instruction in the early grades, Wanzek and Vaughn (2007) reported that they could not compare group instruction to 1:1 instruction and could not contrast small-group with large-group instruction. However, all of the studies implemented for 100 or more sessions in the upper grades provided instruction in groups. Therefore, we cannot compare the findings on group size from the Wanzek and Vaughn synthesis with the current synthesis. Nevertheless, group size was not a statistically significant moderator of outcomes for students after Grade 3.
One additional variable examined in the Wanzek and Vaughn (2007) synthesis related to the level of standardization for intervention implementation. Wanzek and Vaughn initially sought to contrast standardized interventions (use of research-based instructional programs delivered in a specified, sequenced manner) with individualized interventions (designing and adjusting interventions individually, based on identified student difficulties and identified goals to address the difficulties); however, the authors did not find any early elementary studies that investigated individualized interventions. Instead, the authors examined differing levels of standardization implemented in the interventions, though they found no notable differences in effect sizes for studies with high versus low levels of standardization.
In the current synthesis for students in Grades 4 through 12, we were unable to adequately examine this variable because only three treatments implemented interventions with lower levels of standardization. However, one study (Vaughn et al., 2011) directly compared a highly standardized intervention with a more individualized approach to intervention. This study noted no differences in student outcomes between students who received the standardized versus the individualized intervention, except for students with identified LD, who benefited significantly more in the standardized condition. These findings highlight the lack of research across grade levels regarding individualized, or problem-solving, approaches to intervention. Despite the lack of available research, at least in terms of extensive interventions, individualizing instruction is considered an important characteristic of instruction for students with the most significant needs (Fuchs, 1996; Fuchs, Mock, Morgan, & Young, 2003).
Limitations
This synthesis provides findings from all peer-reviewed publications of extensive interventions in Grades 4 through 12 since 1995. One key finding of our work is the small number of studies examining extensive interventions for students with reading difficulties after Grade 3. Further research in this area is needed, particularly considering the significant reading difficulties some students present in the upper grades—difficulties that brief interventions are unlikely to resolve. An increase in research on extensive interventions would allow further examination of moderators of student reading achievement following intervention and could better confirm or alter some of the findings reported in this synthesis. Relatedly, additional research is needed examining RTI models in the upper grades. Only four of the studies in this synthesis reported on general education classroom practices or reported findings from previous interventions participants had received. Providing information on previous instruction could better inform not only extensive interventions, but also ways in which highly intensive interventions can be provided for students who demonstrate insufficient response to previous interventions.
Like the early elementary extensive intervention synthesis, we used duration in sessions to define extensive intervention for students in Grades 4 to 12. As mentioned previously, we were unable to examine the total number of hours of intervention as a continuous variable moderator due to a lack of information provided in many of the studies. Although we provide some preliminary information, based on studies providing more and fewer hours in intervention, more consistent reporting of the time students spend in intervention would provide important data to examine this moderator more thoroughly.
Conclusions
The findings from this synthesis provide compelling evidence that accelerating reading growth in the upper grades may be more challenging than in the earliest grades, even when extensive interventions are implemented. It is not entirely clear why such minimal effects are determined from interventions for students after Grade 3. Students in the upper grades may have well-established deficits that have persisted despite participation in interventions during the early grades, whereas students in the earliest grades could be inaccurately identified as having a reading difficulty (false positives) when in fact they would have demonstrated on-track reading over time without intervention. In addition, reading expectations for students in the upper-elementary and secondary grades often require more cognitively demanding tasks related to word meanings, background knowledge, and understanding of complex text than expectations for readers in the earliest grades, particularly kindergarten through second grade, where goals may relate more to basic word recognition and lower-level reading comprehension skills. Nonetheless, the overall small effects noted on standardized measures in high-quality studies illustrate that adolescence is not too late to intervene in reading and that student achievement in comprehension, word recognition, fluency, word reading fluency, and spelling can be improved in small amounts through extensive interventions.
In elementary RTI models, increasing time in intervention and decreasing instructional group size are two research-based recommendations for increasing the intensity of intervention (Harn, Kame’enui, & Simmons, 2007; Wanzek & Vaughn, 2008). Because of the limited research on RTI practices related to the secondary grades, secondary schools may look to findings from reading interventions conducted at the early elementary grades. The findings of this synthesis suggest caution in assuming that elementary RTI practices for reading interventions apply at the secondary level. Within the corpus of studies available for this synthesis, we did not find that differences in relative number of hours of implementation or instructional group size moderated student outcomes. We suggest that defining intensity variables for adolescent interventions and their role in a system of supports for students through RTI models requires further research to extend our understanding of these variables at the secondary level.
Footnotes
Notes
Authors
JEANNE WANZEK, PhD, is on the research faculty at the Florida Center for Reading Research and is an assistant professor in special education at Florida State University, 1107 W. Call Street, P.O. Box 306-4304, Tallahassee, FL 32306; e-mail:
SHARON VAUGHN is the H.E. Hartfelder/Southland Regents Chair and the executive director of the Meadows Center for Preventing Educational Risk at The University of Texas at Austin. Her research addresses academically related interventions, primarily in reading, for students with reading and learning difficulties.
NANCY K. SCAMMACCA, PhD, is a research associate at the Meadows Center for Preventing Educational Risk in the College of Education at The University of Texas at Austin. Her primary research interests include meta-analysis methodology and other quantitative methods.
KRISTINA METZ is currently a doctoral candidate in school psychology at The University of Texas at Austin. Her current research interests include the implementation and dissemination of evidence-based interventions into school and community systems.
CHRISTY S. MURRAY is a project manager at the Meadows Center for Educational Risk at The University of Texas at Austin. She is co-principal investigator of the English Learner Institute for Teaching and Excellence (Project ELITE), project director of Tier I services for the Middle School Matters Institute, and previously served as deputy director of the Center on Instruction’s special education and response to intervention work.
GREG ROBERTS is director of the Vaughn Gross Center for Reading and Language Arts and the associate director of the Meadows Center for Preventing Educational Risk, both at The University of Texas. His interests include statistical modeling, program evaluation, and reading disability.
LOUIS DANIELSON, PhD, is a managing director at the American Institutes for Research. His research interests include response to intervention, learning disabilities, and assessment accommodations.
