Abstract
States are increasingly recommending that districts and schools use multi-tiered systems of support (MTSS) to improve reading outcomes for all students. States have also suggested MTSS is a viable service delivery model in response to new state legislation to screen, identify, and treat students with word-level reading disability (i.e., dyslexia). One model of MTSS that utilizes Enhanced Core Reading Instruction (ECRI MTSS), has demonstrated significant increases in students’ early acquisition of foundational reading skills (Smith et al., 2016). The purpose of this study was to conduct a conceptual replication of the Smith’s (2016) original impact study. In a cluster-randomized controlled trial, 44 schools were randomly assigned to the ECRI MTSS treatment or a business-as-usual (BAU) MTSS control condition. Across conditions, 754 students were assigned to receive Tier 2 intervention in addition to Tier 1 instruction. Impact data indicate moderate to strong effects on student decoding, word reading, and fluency skills for students in the ECRI MTSS schools. Results suggest that schools can use ECRI MTSS to improve foundational reading skills for struggling early readers, including students with or at risk for word-level reading disabilities (i.e., dyslexia).
Multitiered systems of support for early reading (MTSS-R) is proposed as an approach to service delivery that spans general, remedial, and special education (Baker et al., 2010; Fien et al., 2015). MTSS-R emphasizes high-quality instruction in general education, interventions for at-risk students that are matched to instructional need, and the use of data on student response to instruction and intervention to make educational decisions (Gersten et al., 2009). Student response data are used to assess whether students are in appropriate tiers of support, to determine whether interventions should be intensified, and in some cases to make specific learning disability (SLD) eligibility determinations (Fletcher et al., 2018). Toward the goal of identifying students with SLD, many experts suggest that response to intervention (RTI) within an MTSS-R approach offers significant advantages over models of identification that involve analysis of IQ–achievement discrepancy scores (e.g., Berninger & Abbott, 1994; Donovan & Cross, 2002; Fletcher et al., 2005). Experts cite discarding a “wait to fail” discrepancy model and provision of early intervention services as a particular advantage of RTI (Al Otaiba, 2014). However, a major assumption that underlies the validity of SLD eligibility determination is that schools are delivering high-quality classroom instruction (i.e., Tier 1) and providing at-risk students scientific, research-based interventions (i.e., Tier 2; Individuals With Disabilities Education Act [IDEA], 2004).
Although the assumption that schools are providing high-quality instruction and access to scientific, research-based interventions is debatable, the 2004 reauthorization of IDEA permits schools to use RTI in SLD eligibility determinations (IDEA, 2004). Additionally, Part B of the law allows schools to use special education funds to deliver early intervention supports prior to students being found eligible for special education, thus allowing such funds to be used to implement core components of MTSS-R (IDEA, 2004). Indeed, despite the special education attention associated with RTI, implementing RTI and MTSS-R have more to do with the nature of assessment and instructional practices in general education settings than in special education. That is, multitiered approaches require a shift in the way reading instruction is delivered in general education classrooms because MTSS-R models assume that the nature and quality of reading instruction in those classrooms, and the efficacy of interventions for struggling students, are causally linked to the rate of student progress and responsiveness to instruction and intervention (Baker et al., 2010).
There are strong and long-standing theoretical and conceptual bases for implementing models of MTSS-R (Fien et al., 2015). In fact, many states have begun to implement MTSS-R as a core feature of State Personnel Development Grants (SPDG; e.g., California, Colorado, Illinois, Nebraska, Oregon, Utah) to improve literacy scores for the entire population of elementary students, including students with or at risk for reading difficulties or disabilities. These MTSS-R approaches are purported to better support students at risk of failing to attain full literacy skills due to learning, attention, or behavioral difficulties and disabilities when compared with more traditional service delivery models. Additionally, many states have recently developed regulations and guidance for schools to use MTSS-R as an approach to screen, identify, and treat students with dyslexia (Gearin et al., 2018). Technically, dyslexia would be invoked as a word-level reading disability under SLD (Fletcher et al., 2018). Although there is ample evidence supporting key components of MTSS models (e.g., in particular, providing intensive, systematic intervention in Tier 2; Gersten et al., 2009), there is less and mixed evidence for specific models of MTSS for improving population-level achievement and for identifying and treating students with SLD, including dyslexia (Balu et al., 2015).
There are strong and long-standing theoretical and conceptual bases for implementing models of MTSS-R (Fien et al., 2015).
Additionally, many states have recently developed regulations and guidance for schools to use MTSS-R as an approach to screen, identify, and treat students with dyslexia (Gearin et al., 2018).
One recent and prominent study was a rigorous evaluation of RTI (Balu et al., 2015). In this large-scale study, schools were identified that had been utilizing RTI within MTSS for at least three years. Although the researchers did not try to manipulate or control the nature or type of interventions being delivered, they did attempt to control student assignment to intervention. Overall, the study found no positive effects, and some negative effects, for at-risk first graders assigned to reading interventions (Balu et al., 2015). However, there have been a number of criticisms of the study, including methodological and implementation issues (Fuchs & Fuchs, 2017; Gersten et al., 2017).
One MTSS-R model, enhanced core reading instruction (ECRI; Fien et al., 2015) has demonstrated strong promise and efficacy for improving teacher and student outcomes (Nelson et al., 2013; Smith et al., 2016). ECRI “enhances” core reading programs by increasing the explicitness and quality of Tier 1 instruction. Additionally, ECRI provides systematic Tier 2 reading intervention that is highly aligned with the scope and sequence of the core reading program. Tier 1 and Tier 2 material is coupled with intensive professional development and coaching to support implementation of instruction, intervention, and data-based decision making (Baker et al., 2015). The primary reasons ECRI was developed were to (a) make Tier 1 instruction more accessible to at-risk readers based on the premise that these students spend the vast majority of their time in core instruction, (b) provide more vertical practice in the same instructional content via small-group intervention using the same content from Tier 1, and (c) support the validity of using RTI as a method to make SLD eligibility determinations by examining student response when provided high-quality Tier 1 instruction and scientific, research-based Tier 2 interventions (Baker et al., 2010).
Prior ECRI research
In a cluster-randomized controlled trial (Smith et al., 2016), schools were randomly assigned to ECRI or a business-as-usual (BAU) comparison condition. To participate, all schools had to have essential features of MTSS-R in place (e.g., Tier 1 and 2 instruction, student assessments). Impact analyses targeted outcomes measured by the Dynamic Indicators of Basic Early Literacy Skills (DIBELS) Nonsense Word Fluency (NWF) and Oral Reading Fluency (ORF), the Woodcock Reading Mastery Test (WRMT) Word Identification (Word ID) and Word Attack subtests, and the Stanford Achievement Test, 10th Edition (SAT10) Total Reading, Word Reading, and Sentence Reading subtests. All effect sizes (Hedges’s g) favored ECRI students versus comparison students and ranged in value from 0.12 (ORF, fall to spring) to 0.32 (WRMT Word Attack). Differences on WRMT Word Attack and SAT10 Word Reading were statistically significant. In practical terms, 42% of comparison group Tier 2 students reached or exceeded the 30th percentile on SAT10 Word Reading at posttest, whereas 55% of ECRI Tier 2 students reached or exceeded the 30th percentile.
In a regression discontinuity study, Baker et al. (2015) used the 22 ECRI treatment schools in the cluster-randomized controlled trial to examine differences in learning between ECRI Tier 2 students and ECRI Tier 1 students. The study addressed whether students in ECRI Tier 2 were “catching up” to their peers who received ECRI Tier 1 instruction alone. Analyses revealed statistically significant intervention effects on SAT10 Total Reading (d = 0.22) and on each SAT10 subtest: Word Study (d = 0.18), Word Reading (d = 0.15), Sentence Reading (d = 0.27), and Reading Comprehension (d = 0.17). In practical terms, at the cut score, Tier 1 students performed at the 31st percentile on the SAT10 Total Reading subtest, whereas Tier 2 students scored at the 39th percentile. This corresponds to a What Works Clearinghouse (WWC; 2017) +8 improvement index.
Nelson et al. (2013) investigated whether ECRI increased the quality and intensity of explicit instruction in Tier 1 first-grade classrooms. Direct observation using the Quality of Explicit Instruction Scale and the Classroom Observations of Student–Teacher Interactions instrument (Smolkowski & Gunn, 2012) revealed that ECRI teachers provided higher-quality explicit instruction than comparison teachers. The magnitude of this difference was large (g = 1.31, pseudo R2 = .74). In addition, ECRI teachers provided higher rates of group practice than comparison teachers, and the effect was also large (g = 1.63, pseudo R2 = .63). This finding is important because group practice offers a viable way of substantially increasing the opportunity students in Tier 1—in particular, struggling readers—have to participate and practice key lesson objectives.
The Present Study
The first aim of the present study was to conduct a conceptual replication of the Smith et al. (2016) study to examine the effect of ECRI MTSS-R on first-grade at-risk readers’ foundational reading skills relative to a BAU MTSS-R control condition. Coyne et al. (2016) argue that conceptual replications should start with direct replications, if feasible, or with closely aligned conceptual replications and move toward more distal conceptual replications over time. Toward that end, we conducted a highly aligned replication study in this first follow-up of the original Smith et al. study. The only substantive change between the current study and the prior study is that the current study includes two new waves of students, attending the same schools as the original study. In both the original and present study, we chose to use the 30th percentile as the upper bound for placement in Tier 2 and the 10th percentile as the lower bound (students below the 10th percentile were placed in their school’s Tier 3 intervention). We chose these cutoffs for the following three reasons. First, we believe the 30th percentile and below more accurately reflects students as being at risk for mild to moderate forms of word-level reading disability (i.e., dyslexia). Second, the districts had a higher-than-usual number of students indicated as at risk and could not afford to serve the sheer number of students scoring as high as the 40th percentile. Third, recent findings have shown that the 40th percentile may be too high for students to benefit from Tier 2 intervention (Balu et al., 2015). Given the major similarities with the original study (i.e., setting, grade level, intervention delivery, outcome measures, research design, analyses) and using Coyne et al.’s framework, the present study represents a closely aligned conceptual replication that varies the participants to include two new cohorts of students from the same setting of schools as the original study. Table 1 summarizes the key similarities and differences between the original Smith et al. study and the current replication study.
Original Study and Replication Study Comparison.
Note. Original study refers to Smith et al. (2016). Bolded text identifies primary differences between the original study and the replication study. COSTI = Classroom Observations of Student Teacher Interactions; CTOPP = Comprehensive Test of Phonological Processing; DIBELS = Dynamic Indicators of Basic Early Literacy Skills; ECRI = enhanced core reading instruction; IPS = Instructional Practices Survey; MTSS-R = multitiered systems of support for early reading; NWF = Nonsense Word Fluency; ORF = Oral Reading Fluency; PD = professional development; PPVT-4 = Peabody Picture Vocabulary Test–4th Edition; QEI = Quality of Explicit Instruction Scale; RCMIS = Ratings of Classroom Management and Support; SAT10 = Stanford Achievement Test, 10th Edition; TKS = Teacher Knowledge Survey; WRMT = Woodcock Reading Mastery Test–Revised.
Consistent with the original study, in the present study we hypothesized that students in ECRI treatment schools would outperform students in the control schools on important foundational reading outcomes that are crucial for students with or at risk for dyslexia, including decoding, word reading, and reading fluency with connected text. The second aim of the study was, similar to Smith et al. (2016), to examine predictors of differential response to ECRI to understand if student, classroom, or school factors moderated ECRI intervention effects (see Table 1 for a list of the moderators tested). One major difference between this replication and Smith et al. is that we did not test instructional practices as a moderator of ECRI effects. Complete observations of explicit instructional practices were not possible due to budget constraints. For the most part, we did not hypothesize any student factors to moderate intervention effects. In fact, we assumed that students would respond equally well to ECRI regardless of limited English proficiency status, special education status, or gender. Our one hypothesized student-level moderator was initial skill status on pretest foundational reading skill tests such that students at the upper distribution of at-risk readers would respond stronger to ECRI than students at the lower end of the distribution of at-risk readers. This hypothesis is based on prior ECRI findings (Fien et al., 2015; Smith et al., 2016) and the fact that ECRI was designed to meet the needs of students using moderate levels of support that can be feasibly provided as a supplement to Tier 1 classroom instruction (e.g., Tier 2) as opposed to targeting the more intensive needs of students with protracted reading difficulties (e.g., Tier 3).
Our interest in examining the effect of ECRI on students with and at risk for word-level reading disabilities (i.e., dyslexia) is timely and relevant. Over the past decade, the great majority of states in the United States (N = 43) have enacted, and several more are currently considering enacting, guidance or statutes to implement dyslexia screening, identification, and intervention supports, and some states have mandated raising teachers’ awareness of dyslexia through preservice and in-service provisions. To be clear, “dyslexia” is synonymous with “word-level reading disability” and as such can be invoked as SLD under IDEA. There is a strong literature base (e.g., Fletcher et al., 2018) that indicates that we have the knowledge base to screen students at risk for word-level reading disabilities and can effectively treat students if we identify them early in their schooling and provide them foundational reading interventions (Fletcher et al., 2005; Fuchs et al., 2008). The original Smith et al. (2016) study found that we can implement this research base within an MTSS reading model and can substantively improve word reading and reading fluency outcomes with this important subgroup. The current replication study seeks to answer whether we can replicate these findings with a new cohort of students. This finding would be important as many districts and schools seek models to serve these students. Many researchers hypothesize that MTSS offers a unified approach for serving students that are at general risk for reading difficulties as well as those students that are at specific risk for word-level reading disabilities (Al Otaiba et al., 2018; Gearin et al., 2018). This study has the potential to confirm or disconfirm this hypothesis.
To be clear, dyslexia is synonymous with word-level reading disability, and as such can be invoked as a specific learning disability under the Individuals with Disabilities Education Act (IDEA, 2004).
Method
The present study was designed to examine the efficacy of the ECRI multitiered reading intervention in first-grade classrooms in a cluster-randomized controlled trial that nested students and teachers within schools. In this study, we report results from two waves of schools (N = 44) that implemented the ECRI intervention for 2 years each. The present study evaluates outcomes from both waves’ 2nd year of implementation of ECRI, which we call Study 2.
Recruitment and Assignment Procedures
Schools implementing a multitiered service delivery model for reading instruction were recruited to participate in the ECRI project in the spring prior to the 1st year of implementation (Study 1). To be eligible to participate, schools agreed to (a) use a published, comprehensive core reading program identified and adopted through standard district procedures during a 90-min reading block for Tier 1 and (b) provide students identified for Tier 2 with an additional 30 min of small-group instruction.
In the first project wave, we recruited 22 schools in three Oregon school districts to participate in Project ECRI. Four of these schools elected to not participate in the study due to changes in school leadership between recruitment and the beginning of the study. The remaining 18 schools were randomly assigned to the treatment or comparison condition. We blocked on district before random assignment to control for core curricula and other important factors, ensuring similar schools in each condition. Because there was one district with only one participating school, that school joined another district for randomization. After random assignment, two schools (one treatment and one comparison school) left the project, leaving 16 schools participating in Wave 1. In the second project wave, we recruited 20 schools in three districts in Oregon and eight schools in three districts in Massachusetts to participate and randomly assigned all schools to condition. All recruited Wave 2 schools participated in the study; thus, the combined wave sample included 44 schools. All schools that participated in Study 1 (Waves 1 and 2) continued to implement ECRI for a 2nd year. This study (Study 2) summarizes data from the ECRI project in schools’ 2nd year of implementation.
Assignment to Tiers
Students enrolled in first-grade classrooms (N = 3,560) were recruited to participate in the project (1,761 in treatment schools and 1,799 in comparison schools). Of these students, we obtained parental consent for 3,547 students (1,756 in treatment schools and 1,791 in comparison schools). Fall scores on the reading portion of the SAT10 were used to assign students to Tier 1, Tier 2, or Tier 3. Students that scored at or above the 10th percentile and at or below the 30th percentile on the SAT10 in the fall of first grade were assigned to Tier 2, using normative criteria from the SAT10 (2007) technical manual. Students who scored above the 30th percentile on the SAT10 were assigned to Tier 1, and those below the 10th percentile were assigned to Tier 3. Across the two waves of the study, 3,161 students participated in the fall assessment (1,533 in the treatment condition and 1,628 in the comparison condition). In total, 1,551 students were assigned to Tier 1 (n = 745, treatment; n = 806, comparison), 754 students were assigned to Tier 2 (n = 352, treatment; n = 402, comparison), and 827 students were assigned to Tier 3 (n = 417, treatment; n = 410, comparison) in the fall of first grade. Of the students who took the SAT10 in the fall, 29 did not finish the assessment due to absences and were not included in the analytic sample.
Participants
Study 2 took place in 44 elementary schools, 22 in each condition. A total of 146 Grade 1 reading teachers participated in this study: 72 in the treatment condition and 74 in the comparison condition. Across conditions, on average, participating teachers reported 13.84 total years of teaching experience (SD = 9.82). Teaching experience was similar across conditions (M = 14.16, SD = 10.28 years, for treatment teachers; M = 13.58, SD = 9.52 years, for comparison teachers). Nearly all of the participating teachers (97%) were female. A total of 754 students at risk for reading difficulty were assigned to Tier 2 and included in this analysis (n = 352, treatment; n = 402, comparison). Schools reported that on average, 5.0% of these students received special education services (5.3% in treatment; 4.8% in comparison) and 18.2% were English learners (24.3%, treatment; 12.7%, comparison). Using data from the National Center for Educational Statistics (2011) to calculate sample averages, 19.8% of students identified as Hispanic (22.8% in treatment; 16.9% in comparison), and 3.9% of students identified as African American (4.8% in treatment; 3.0% in comparison). Just over half of all students in the sample were eligible for free or reduced-price lunch (50.3%: 54.6% in treatment; 46.0% in comparison). Independent-sample t tests at the school level revealed no statistically significant differences between conditions in any of these demographic characteristics.
Implementation
Treatment and comparison teachers provided daily reading instruction using a published, comprehensive core reading program, which was identified and adopted through standard district procedures, during a 90-min reading block for Tier 1. Students identified for Tier 2 received an additional 30 min of small-group reading instruction each day. Tier 3 intervention was not a focus of the project (i.e., schools were permitted to select and provide interventions to students assigned to Tier 3 following typical practice in both conditions); thus, we do not describe or analyze Tier 3 support in this study.
Treatment Condition
The ECRI intervention was designed to (a) enhance the quality and explicitness of instruction provided in Tier 1 and Tier 2 using lesson maps that prioritized essential content from the core reading program, (b) increase the specificity of the instructional materials used in both tiers using explicit teaching routines, and (c) increase alignment and cohesion between Tiers 1 and Tier 2 to support the effectiveness of intervention and struggling readers’ access to Tier 1 instruction. The ECRI intervention includes three primary components: Tier 1 enhanced core reading instruction, Tier 2 small-group instruction, and initial and ongoing professional development and coaching. In addition, in Study 2, school staff engaged in regular data-based decision making to inform adjustments to instruction within and across tiers of support across the school year. The primary components are described in greater detail next.
Tier 1
ECRI treatment classrooms provided 90 min of daily Tier 1 whole-group instruction using the district adopted core reading program (see a list of core programs used across districts in the Comparison Condition section), enhanced through the use of lesson maps. Lesson maps that overlaid the core program guided teachers to prioritize essential reading content (e.g., phonics instruction, vocabulary and comprehension development), supplant activities from the core that relied on inefficient means for teaching reading content, and use explicit teaching routines in the delivery of instructional activities (Baker et al., 2010). Using teaching routines, core program activities were restructured to emphasize the following elements of explicit instruction: (a) clear learning objectives; (b) increased modeling of key content through visual models, verbal directions, and clear explanations; (c) explicit connections between new and previous content; (d) increased opportunities for guided and independent practice; and (e) deliberate and carefully designed review of previous content (Carnine & Kameenui, 1992; Coyne et al., 2011). The explicit teaching routines supplied teachers in the treatment condition with language for explicitly modeling content, providing frequent choral practice opportunities, and delivering immediate feedback (Fien, 2009–2013).
Tier 2
The Tier 2 intervention was highly aligned with Tier 1 instruction to support cohesion and improve access to content for students struggling to learn to read. Students receiving Tier 2 intervention were pretaught content in daily, 30-min lessons in small groups of three to five students. That is, students in the treatment condition assigned to receive Tier 2 intervention were instructed in small groups using the same content and instructional formats they would encounter the following day during Tier 1 foundational skills instruction. The majority of Tier 2 small-group instructors were instructional assistants. The content of small-group lessons emphasized foundational reading skills, including phonemic awareness, decoding, and fluency and accuracy in reading connected text. Because the Tier 2 intervention was aligned to each school’s core program, the scope and sequence and introduction of foundational skills content varied slightly by core program (e.g., the sequence for introducing particular letter–sound correspondences varied by core program), but lessons contained the same activities each day. Each 30-min, daily lesson included seven activities focused on essential aspects of reading content: irregular word reading, phonemic awareness, sound spelling introduction and review, blending and word reading, accuracy and fluency reading decodable text, encoding practice, and reteaching of challenging words.
Professional development and coaching
Teachers in the treatment condition participated in 3 days of professional development focused on the ECRI instructional model prior to the beginning of schools’ 1st year of participation and 2 days of follow-up activities in October of the 1st year. New teachers in participating schools received the same training in preparation for and during schools’ 2nd year of participation. Professional development for teachers emphasized (a) an overview of research on beginning reading content and skills, including phonemic awareness, phonics, vocabulary, comprehension, and fluency in reading connected text; (b) implementing instructional teaching routines with fidelity, including opportunities for practice with feedback from ECRI expert coaches that delivered the training; and (c) strategies for increasing student engagement in lessons. Small-group instructors received 2 days of professional development in the fall and 1 day in January of schools’ 1st year of participation. New small-group instructors in participating schools received the same training in preparation for and during schools’ 2nd year of participation. Professional development for small-group instructors emphasized implementing instructional teaching routines and strategies for increasing student engagement in lessons. All professional development included teaching demonstrations by ECRI expert coaches and participant practice with coach feedback.
New and returning teachers and small-group instructors also received comprehensive coaching support through classroom and small-group visits conducted once per month and regular study group meetings, facilitated by an ECRI coach. Classrooms and small-group visits consisted of the ECRI coach observing instruction, collecting data on fidelity of implementation to be used in debrief meetings, and modeling lessons or parts of lessons to facilitate live examples for improving fidelity of implementation. Regular study-group meetings involved the coach meeting with teachers and interventionists to (a) discuss strategies for improvement on the basis of observations and instructor-identified areas of struggle and (b) engage in goal setting to support prioritization of areas of focus between coaching visits.
Comparison Condition
Comparison schools used the standard, adopted core reading program for Tier 1 instruction (e.g., Scott Foresman, Reading Street; Houghton Mifflin, Journeys; Macmillan, Treasures; Harcourt, Storytown and Success for All). Comparison teachers reported that Tier 1 classrooms spent an average of 52.5 min (SD = 31.0) in whole-group instruction, 34.5 min (SD = 26.3) in small-group instruction, and 27.9 min (SD = 15.6) in independent work. Comparison schools continued with BAU to deliver Tier 2 small-group instruction (i.e., they were not trained in or provided access to ECRI Tier 2 intervention materials as part of the study), which may have included regrouping students across tiers based on progress-monitoring data. Teachers in the comparison condition reported that Tier 2 intervention materials included a range of supplemental and intervention materials, including published, standardized protocol intervention materials (e.g., publisher-sponsored supplemental programs intended to be used in conjunction with the adopted core program) as well as teacher-developed materials (e.g., mini-lessons or additional practice with specific reading skills students were identified as struggling to learn or demonstrate). In the comparison condition, 62% of teachers reported that someone in their school provided them with instructional support, guidance, and coaching about teaching students how to read.
Fidelity of Implementation
Observations of implementation fidelity conducted by trained data collectors three times per year (fall, winter, and spring) using a standardized protocol across conditions indicated all treatment teachers used ECRI intervention materials during instruction (M = 1.00, SD = 0.00). As expected, observations conducted in comparison classrooms indicated comparison teachers rarely used ECRI intervention materials during instruction across the year (M = 0.07, SD = 0.26). Although treatment diffusion across the year was minimal, some comparison teachers did have access to intervention materials, which may have resulted due to teachers sharing ECRI materials or comparison teachers being familiar with the instructional routines that are a core feature ECRI through other literacy initiatives. In addition, in treatment classrooms, the mean score for quality of explicit instruction was 0.89 (SD = 0.17), whereas it was 0.49 (SD = 0.25) in comparison classrooms. Thus, teachers in ECRI treatment classrooms provided higher-quality instruction, on average, when compared with teachers in comparison classrooms. In ECRI treatment classrooms, overall fidelity of implementation was 0.90 (SD = 0.13).
Data Collection
Students were assigned to tiers based on their performance on the SAT10, administered in the fall of first grade. Outcome measures included curriculum-based assessments of foundational reading skills and comprehensive measures of reading achievement. We selected a range of other assessments to measure student and classroom attributes we hypothesized would moderate response to the ECRI intervention, including data from teacher surveys. Student-level demographic data, including special education status, gender, and limited English proficiency status, were reported by districts in the spring of each academic year. Class size, school size, and number of Tier 2 students per first-grade classroom were derived from tier assignment files and based on student participation in the fall screening assessment.
Student Assessment Measures and Procedures
Prior to fall assessment administration, assessors attended 3 days of training targeting administration and scoring procedures. Across the winter and spring, assessors attended 4 days of additional assessment training. For individually administered student measures, assessment coordinators evaluated interrater agreement by shadow scoring with assessors and providing feedback on test administration. Interrater agreement is reported for assessments in the next paragraphs.
SAT10
The SAT10 (Harcourt Educational Measurement, 2002) is a group-administered, norm-referenced test of reading proficiency. Total scaled scores, derived from grade-based norms, were used in all analyses. Trained data collectors administered the Stanford Early School Achievement Test (SESAT) 2 in the fall—the appropriate version of SAT10 in the fall of Grade 1—and the Primary 1 in the spring of Grade 1 to all participating students. The test manual indicates that Kuder-Richardson reliability coefficients, a measure of internal consistency, are .94 for SESAT 2 and .97 for Primary 1. Also, SESAT 2 and Primary 1 Total Reading scores are moderately correlated with Otis-Lennon School Ability Test, Eighth Edition, Total scores (r = .68 and .61, respectively). SESAT 2 subtests administered included Sounds and Letters, Word Reading, and Sentence Reading; Primary 1 subtests administered included Word Study Skills, Word Reading, Sentence Reading, and Reading Comprehension. Testing time ranged from 110 to 155 min. Results of the fall administration were used to assign students to tiers, and spring results served as the primary reading outcome measure in the study.
DIBELS
DIBELS assessments were used to measure phonemic decoding skill and passage reading fluency. NWF and ORF measures were administered in the fall, winter, and spring using the DIBELS-specified fall, winter, and spring benchmark measures, respectively. Trained data collectors administered DIBELS assessments in all districts. Average interrater agreement between data collectors was 92.9% (range = 87%–100%) for NWF across the study; average interrater agreement for ORF was 97.9% (range = 94%–100%) across the study.
NWF
NWF (Kaminski & Good, 1996) is an individually administered, 1-min, timed measure of student skill in reading consonant-vowel and consonant-vowel-consonant pseudowords. Students can either give the sounds of the individual letters or read the pseudoword as a unit. The score for Correct Letter Sounds (NWF-CLS) was obtained by counting the number of correct letter sounds students provided. The score for Words Recoded Completely and Correctly (NWF-WRC) was obtained by counting the number of nonwords students recoded accurately. Alternate-form reliability coefficients range from .67 to .80, and concurrent validity coefficients with readiness subtests of the Woodcock-Johnson Psycho-Educational Test ranged from .35 to .55 (Good & Kaminski, 2002).
ORF
DIBELS ORF (Good & Kaminski, 2002) is an individually administered, 1-min, timed measure of student skill in accurately and fluently reading connected text. The number of words read correctly in 1 min is the student’s score on a single passage. To determine a student’s benchmark score, the student is administered three grade-level passages at a single benchmark assessment time point during the school year (beginning, middle, or end), and the median score is recorded. In the beginning grades, alternate-form reliability coefficients range from .89 to .94, and test-retest reliability coefficients range from .92 to .97 (Good & Kaminski, 2002).
WRMT
The WRMT (Woodcock, 1998) is a standardized, comprehensive battery of tests that measures multiple aspects of reading ability, including comprehension, word recognition, and word analysis. Two subtests were administered in this study: Word ID and Word Attack. As reported in the testing manual, the correlation between Total Reading scores on the WRMT and the Woodcock-Johnson Psycho-Educational Battery is .88 at Grade 1. Internal consistency for the subtests ranges from .94 to .98 in Grade 1. We administered Form H of the WRMT in the fall and spring for Wave 1 only. Average interrater agreement between data collectors across the study was 99% (range = 95%–100%).
Teacher Surveys
Two teacher surveys were administered in the spring of first grade to assess teachers’ knowledge of beginning reading instruction and espoused instructional practices in reading across the school year. The surveys were administered online in the same session. Completion rate for the surveys were 73% (n =108 teachers responded).
Teacher Knowledge Survey (TKS)
The TKS (see Teaching Reading Essentials survey; Moats, 2006) was designed to assess teacher knowledge of reading concepts (e.g., “How many spoken syllables are in the word ‘rhythm’?”) and reading instructional practices (e.g., “True or false? In whole-word blending, all the phonemes of a word are articulated in order and then blended in sequence.”). The survey contains 17 multiple-choice questions and 18 true-or-false questions. Two items from the original TKS were adapted for ECRI administration to explore teacher knowledge related to key features of the intervention. The total score, a sum of the number of correct responses, was used for analysis.
Statistical Analysis
We assessed intervention effects on each of the primary outcomes with a mixed-model (multilevel) Time × Condition analysis (Murray, 1998) to account for the intraclass correlation associated with students nested within schools, the level of random assignment. The analysis tests net differences between conditions on change in outcomes from the fall (T1) to spring (T2) of Grade 1, with gains for individual students clustered within schools. The test of net differences provides an unbiased and straightforward interpretation of the results (Cribbie & Jamieson, 2000; Fitzmaurice et al., 2004). The specific model tests time, T, coded 0 at T1 and 1 at T2; condition, C, coded 0 for control and 1 for ECRI; and the interaction between the two with the following composite model:
Ytjk represents a score for assessment occasion t on student j in school k. The model includes three predictors: time, Ttjk; condition, Ck; and their interaction. Given the coding of C and T, the model included the pretest intercept for the control condition, γ000; the difference between conditions at pretest, γ001; the estimate of gains for the control condition, γ100; and the difference in gains between conditions, γ101, the primary estimate of intervention efficacy. The model also includes four error variances: the school-level intercept,
Analyses included the students assigned to Tier 2. For schools in recruitment Wave 1, Tier 2 included students that scored in the 10th to 30th percentiles on the SAT10 Total Reading scale in the fall of first grade. For schools in recruitment Wave 2, the assignment criteria changed and included students between the 15th and 40th percentiles (exclusively, i.e., >15th and <40th percentile). The WRMT was not administered in Wave 2 schools. The secondary aim required an analysis to explore differential response due to various student- and classroom- and school-level variables (see Table 1 for list of moderators). We expanded the model to test interaction effects. The statistical model included a predictor and its interaction with condition, time, and the Time × Condition term, resulting in a three-way interaction, all corresponding two-way interactions, and individual (conditional) effects. The three-way interaction of the predictor, time, and condition provides an estimate of whether condition effects vary by the predictor. The analysis included dichotomous and continuous predictors, and we used continuous variables whenever possible.
Model Estimation
We fit models to our data with SAS PROC MIXED Version 9.2 (SAS Institute, 2009) using restricted maximum likelihood and included all available data, whether or not students’ scores were present at both time points. Maximum likelihood estimation with all available data produces potentially unbiased results even in the face of substantial attrition, provided the missing data were missing at random (Schafer & Graham, 2002). In the present study, we did not believe that attrition or other missing data represented a meaningful departure from the missing-at-random assumption, meaning that missing data did not likely depend on unobserved determinants of the outcomes of interest (Little & Rubin, 2002). In addition, data missing not at random is “often not sufficient to affect the internal validity of an experimental study” (Graham, 2009, p. 568). Most missing data involved students who were absent on the day of assessment due to illness or other reasons unrelated to the study or who had transferred to a new school.
The models assume independent and normally distributed observations. We addressed the first, more important assumption (van Belle, 2008) by explicitly modeling the multilevel nature of the data. Regression methods have been found quite robust to violations of normality, and outliers have a limited influence on the results in a variety of multilevel modeling scenarios (Bloom et al., 1999; Donner & Klar, 1996; Fitzmaurice et al., 2004; Hannan & Murray, 1996; Murray et al., 2006). Murray and colleagues (2006) showed that violations of normality at either or both the individual and group levels do not bias results as long as the study is balanced at the group level.
Hedges’s g values were calculated to characterize the magnitude of treatment effects, and we used the Benjamini-Hochberg procedure to adjust for the false discovery rate across tests of ECRI efficacy (WWC, 2017).
Results
Table 2 provides descriptive statistics (M, SD, n) for all primary measures used for tests of the impact of the ECRI intervention. Next, we test for differences in attrition by condition and differential scores for students missing data by condition. We then test the efficacy of ECRI and explore the potential for differential response to condition by student characteristics.
Descriptive Statistics for Primary Outcome Measures.
Note. The full sample included 757 students in 44 schools, with 351 students in the 22 ECRI schools and 406 students in the 22 control schools. WRMT measures were assessed in Wave 1 schools only. ECRI = enhanced core reading instruction; NWF-CLS = Nonsense Word Fluency–Correct Letter Sounds; NWF-WRC = Nonsense Word Fluency–Words Recoded Completely and Correctly; ORF = Oral Reading Fluency; SAT10 = Stanford Achievement Test, 10th Edition; WRMT = Woodcock Reading Mastery Test–Revised.
Attrition
Student attrition was defined as students with data at T1 but missing data at T2, and we examined attrition with respect to the Tier 2 sample of 757 students, 406 in comparison schools and 351 in ECRI schools. For DIBELS data, we experienced 8.5% attrition at T2, with 39 students missing T2 data in comparison schools and 25 students missing T2 data in ECRI schools, χ2(1) = 1.50, p = .2207. SAT10 scores were missing for 6.7% of students at T2, with 31 students missing T2 data in comparison schools and 20 students missing T2 data in ECRI schools, χ2(1) = 1.12, p = .2889. We conclude that the different attrition rates were consistent with the assumption of equal attrition across study conditions. See Table 2 for additional details.
Although differential rates of attrition are undesirable, differential scores on literacy tests by condition present a greater threat to validity (Barry, 2005). We conducted an analysis to test whether student scores were differentially affected by attrition across conditions. We examined the effects of condition, attrition status, and the interaction between the two on pretest scores within a mixed-model analysis of variance (Murray, 1998), which nests students’ T1 scores within schools and condition. We tested scores for NWF-CLS, NWF-WRC, ORF, SAT10 Total Reading score, SAT10 Word Reading, and SAT10 Sentence Reading. We found no evidence of differential attrition for any of our dependent variables, p > .35 for all tests.
ECRI Efficacy
We tested the hypothesis that students in ECRI schools would perform better than those in comparison schools with six measures. We first examined differences between conditions in gains on NWF-CLS, NWF-WRC, and ORF from fall to winter and report the results in Table 3. Students in schools that implemented ECRI outperformed students in comparison schools on the two NWF measures, with effect sizes of g = 0.31 (pBH = .0220) and 0.37 (pBH = .0109). The difference between conditions for ORF was marginally significant (g = 0.20, pBH = .0806.). We also reported the intraclass correlation for gains as described by Murray (1998; see p. 301).
Results from Mixed-Model Time × Condition Analysis for Tests of Condition Effects on Fall-to-Winter Gains in Student Achievement.
Note. Table entries show parameter estimates with standard errors in parentheses except for ICCs, Hedges’ g values, and p values. Tests of fixed effects (first four rows) used 42 degrees of freedom to account for the school as the unit of analysis. The student variance component reflects the student-level covariation between pretest and posttest assessments. P values also provided with the Benjamini-Hochberg correction (pBH). ICC calculated as per Murray (1998, p. 301). ICC = intraclass correlation coefficient; NWF-CLS = Nonsense Word Fluency–Correct Letter Sounds; NWF-WRC = Nonsense Word Fluency–Words Read Correctly; ORF = Oral Reading Fluency.
p < .10. *p < .05. **p < .01. ***p < .001. ****p < .0001.
Next we examined differences between conditions in gains from fall to spring with NWF-CLS, NWF-WRC, ORF, WMRT Word ID and Word Attack, and SAT10 Total Reading, Word Reading, and Sentence Reading. Table 4 presents the results for these eight models. Students in ECRI schools outpaced their peers in comparison schools on NWF-CLS, NWF-WRC, ORF, and WMRT Word ID and Word Attack. All differences favored the ECRI condition, and effect sizes (g) ranged from 0.25 for ORF to 0.48 for WRMT Word Attack. The effect sizes for Total Reading, Word Reading, and Sentence Reading were 0.12, 0.06, and 0.01, respectively, but the analyses did not produce statistically significant differences for these three measures.
Results from Mixed-Model Time × Condition Analysis for Tests of Condition Effects on Fall-to-Spring Gains in Student Achievement.
Note. Table entries show parameter estimates with standard errors in parentheses except for ICCs, Hedges’s g values, and p values. Tests of fixed effects (first four rows) used 42 degrees of freedom to account for the school as the unit of analysis (14 degrees of freedom for WRMT outcomes assessed in Wave 1 only). The student variance component reflects the student-level covariation between pretest and posttest assessments. P values also provided with the Benjamini-Hochberg correction (pBH). ICC calculated as per Murray (1998, p. 301). DIBELS = Dynamic Indicators of Basic Early Literacy Skills; ICC = intraclass correlation coefficient; NWF-CLS = Nonsense Word Fluency–Correct Letter Sounds; NWF-WRC = Nonsense Word Fluency–Words Recoded Completely and Correctly; ORF = Oral Reading Fluency; SAT10 = Stanford Achievement Test, 10th Edition; WRMT = Woodcock Reading Mastery Test–Revised.
p < .10. *p < .05. **p < .01. ***p < .001. ****p < .0001.
Differential Response to ECRI
The tests of differential response included in each model a predictor (e.g., moderator) and its interactions with time, condition, and Time × Condition. The Predictor × Time × Condition term indicated differential response to the ECRI intervention due to the moderator. For each outcome measure administered in both Wave 1 and Wave 2 schools, we tested differential response for two sets of pretest measures. We tested SAT10 Total Reading, as this variable was used to determine the assignment to Tier 2. We also tested the pretest measure for each of the dependent variables. Next, we tested three other student characteristics: limited English proficiency status, special education status, and gender. Finally, we tested six teacher, class, or school characteristics: number of years teaching, teacher knowledge, number of at-risk readers per class, class size, school size, and school recruitment wave. Because these exploratory analyses involved 11 predictors of nine outcome measures, or 99 total tests, we expect five Type I errors (false positives) with a 95% chance of between 1 and 10. We will therefore discuss only patterns of results, which we emphasize must be interpreted cautiously and in an exploratory manner given the number of tests involved. Nine interactions indicated the possibility of differential response. Seven of the statistically significant interactions, however, were found for the winter outcomes, with just two interactions for spring outcomes.
For gains from fall to spring, we found that SAT10 Total Reading moderated NWF-CLS (p = .0036) and that pretest ORF moderated ORF gains (p = .0024). The results of these interactions produced similar patterns, with greater gains among the students who demonstrated higher proficiency at the beginning of the year. Figure 1 depicts the moderation effect for the SAT10 Total Reading on NWF-CLS. Students above the 37th sample percentile (i.e., the upper 63% of students in Tier 2) demonstrated greater gains in ECRI schools than in control schools. The results for pretest ORF were similar: Students with fall ORF scores above 14, the top 32% of the sample, made greater gains in ECRI schools than control schools.

Differences in student gains between conditions on NWF-CLS plotted by fall SAT10 Total Reading scaled sore. The heavy line depicts the estimate of the difference between conditions on gains in NWF-CLS across the range of the SAT10 Total Reading scores, from the 5th to the 95th sample percentile (454 to 497). The two thin lines show the 95% confidence interval around the mean estimate. Students with scores below 470 did not differ between conditions because the 95% confidence bounds include zero. The difference between conditions becomes statistically significantly at values of 470 or greater, where the confidence bounds exclude zero and becomes statistically significantly different from zero at values of 470 or greater, where the confidence bounds exclude zero. NWF-CLS = Nonsense Word Fluency–Correct Letter Sounds; SAT10 = Stanford Achievement Test, 10th Edition.
Discussion
The primary aim of the present study was to conduct a closely aligned conceptual replication study of Smith et al.’s (2016) study and to evaluate whether the ECRI MTSS-R model improved the foundational reading outcomes for students at risk for word-level reading disabilities (i.e., dyslexia) in a follow-up, rigorous, cluster-randomized controlled study. Similar to the Smith et al. study, ECRI MTSS-R was compared with a BAU MTSS-R condition such that control schools had to implement key features of MTSS-R in ways similar to the ECRI treatment but using typical practices, including (a) conducting universal screening three times per year; (b) providing Tier 1 core reading instruction for 90 min per day, 5 days per week; (c) providing Tier 2 small-group reading intervention for students determined to be at risk on universal screening using the same cut score as the treatment condition; and (d) engaging in data-based decision making to monitor student progress and making instructional changes accordingly. ECRI MTSS-R also provided the same key elements of MTSS but with a highly specified model that included detailed Tier 1 and Tier 2 materials and data-based decision-making protocols, supported through extensive professional development and coaching. In contrast, professional development and coaching in control schools consisted of BAU supports provided by the district or school.
Summary of Results
First, we examined student gains on proximal outcomes from fall-to-winter and fall-to-spring on proximal measures and on distal outcomes from fall to spring. Similar to the findings from Smith et al. (2016), the present study expected at-risk students in the ECRI MTSS-R condition to outperform their peers in the BAU MTSS-R condition on each of the proximal measures of foundational reading skills (i.e., decoding and reading fluency) and on distal measures of basic reading skills (i.e., word attack and word identification). Analysis of proximal measures indicated a statistically significant difference favoring ECRI MTSS-R students from fall to winter on NWF-CLS and NWF-WRC, with effect sizes of g = 0.31 and 0.37, respectively, and a marginally significant difference (p = .058) favoring ECRI intervention students from fall to winter on ORF, with an effect size of g = 0.20.
These results largely align with the Smith et al. (2016) findings of significant differences favoring ECRI treatment students from fall to winter on NWF-WRC and ORF but no significant differences from fall to winter on NWF-CLS. However, whereas there were no significant differences favoring ECRI treatment students from fall to spring on NWF-CLS, NWF-WRC, or ORF gains in the Smith et al. study, the current study found significant differences favoring ECRI students on each of these metrics, with effect sizes of g = 0.39, 0.41, and 0.25, respectively.
Although we only collected pre-/posttest data on WRMT Word ID and Word Attack with the first wave of students (n = 252), power was sufficient and results indicated significant differences favoring ECRI MTSS-R students from fall to spring on Word ID, with effect sizes of g = 0.41, and similar gains on Word Attack, with an effect size of g = 0.48. These results from the present study are quite a bit stronger than effects reported in the Smith et al. (2016) study of g = 0.32 on Word Attack gains and a nonsignificant effect size of g = 0.24 on Word ID. Oddly, given the current findings on Word ID and Word Attack, Smith et al. found significant effects on SAT10 Word Reading gains, whereas the present study found no significant differences.
On its face, this is an unusual finding. SAT10 Word Reading and the Word ID and Word Attack subscales are ostensibly measuring similar constructs, with most overlap between Word ID and Word Reading. It is even more unusual that Smith et al. (2016) found a positive effect on both Word ID and Word Reading and that the current study found a positive effect only on Word ID. Our hypothesis for the different and seemingly contradictory findings (positive effect on Word ID but not on Word Reading) is the difference in how each test assesses the word-reading construct. The WRMT, an individually administered measure where students are asked to read words, is an arguably lower inference measure. The assessor scores words read as correct or incorrect. In contrast, SAT10 Word Reading is a group-administered measure, and students must select the correct written word in a multiple-choice format from a word read by the assessor. This test format confounds word reading with other skills (e.g., listening, encoding word in memory, matching encoded word in memory to printed word) and makes the measure a higher inference test of word reading. Another plausible hypothesis for the disparity in studies is the slight differences in the samples. Whereas the Smith et al. sample and the current sample performed commensurately on most pretests, the current sample was almost 10 points higher on pretest Word Reading relative to the Smith et al. sample. Although the current sample treatment group outgained the control group, its growth was not as pronounced as the treatment group’s gains in the Smith et al. sample. Similar to the Smith et al. study, we found no significant differences on SAT10 Sentence Reading or Total Reading gains.
Next, we examined predictors of differential response to ECRI to determine if student factors moderated ECRI MTSS-R intervention effects. Similar to the Smith et al. (2016) study, we expected a similar pattern of findings such that treatment effects might be moderated by initial skill level (favoring the top half of the distribution of at-risk readers); however, we did not expect other moderators at the student level to be significant. As we have noted, we will discuss only patterns of findings, as these findings should be interpreted cautiously and as exploratory, given the number of statistical tests conducted. In terms of patterns of findings within the current study, we found that in two cases (pretest SAT10 Total Reading and pretest ORF), students’ scores moderated intervention effects on decoding and reading fluency gains across the school year. Students in the upper half of the pretest distribution responded stronger to the ECRI intervention than did students in the lower end of the distribution. Students scoring between the 10th and 30th percentiles were included in the analytic sample, so ECRI appears to benefit those students that the field would generally agree would benefit from a Tier 2 intervention (relative to those students that would likely benefit from a Tier 3 intervention). This pattern is also consistent with previous studies suggesting that this pattern may be more robust.
Implications and Directions for Future Research for At-Risk Readers
This study adds to the converging evidence base that ECRI MTSS-R represents a viable service delivery model for schools and districts to adopt to improve foundational reading skills for students at risk for reading difficulties and disabilities (Baker et al., 2015; Fien et al., 2015; Smith et al., 2016). ECRI has been consistently identified as a systemic intervention approach with some of the strongest levels of evidence currently available in early reading—for example, by the National Center on Intensive Intervention’s Intervention Tools Chart, Evidence for ESSA, and the WWC (to qualify for the new Institute of Education Sciences Systematic Replication competition). The present study adds a strong replication finding in further support of ECRI MTSS-R. This has major implications for the field as schools and districts seek to identify and implement evidence-based models of MTSS to improve students’ acquisition of foundational reading skills, including the areas of decoding, word reading, and reading fluency with connected text. These are crucial constructs for most students at risk for reading difficulties and SLD but in particular for students at risk for dyslexia (Fletcher et al., 2018). Importantly, schools can use the ECRI MTSS-R model as a way to improve outcomes for all students at risk for reading problems as well as students at risk for SLD. In other words, schools can use a uniform MTSS-R model for both purposes and do not need to implement a new or different MTSS-R model solely to meet the needs of students with or at risk for dyslexia.
Importantly, schools can use the ECRI MTSS-R model as a way to improve outcomes for all students at risk for reading problems, as well as students at risk for SLD.
ECRI MTSS-R as a Viable Model for Implementing New Dyslexia Legislation
In what some would describe as a remarkable policy phenomenon, over the last several years, 43 states have enacted new dyslexia legislation addressing screening, identification, intervention, and teacher-training requirements for students at risk for dyslexia. Of these 43 states, 22 states require screening for dyslexia risk, and 16 states have intervention requirements. Researchers have noted that MTSS offers an approach to address these new state requirements and that the requirements are consistent with current MTSS models for addressing general reading risk and disability (Fletcher et al., 2018). Further, in the Office of Special Education Programs’ (2017) “Dear Colleague: Dyslexia Guidance” letter, the U.S. Department of Education also identifies MTSS as a viable model to address the needs of students with or at risk for specific learning disabilities, including dyslexia. In other words, research and federal policy are consistent in promoting the notion that we do not need another, separate system from MTSS-R to address the needs of students with or at risk for dyslexia.
We contend that the results support the use of ECRI MTSS-R with students with mild to moderate form of dyslexia and dyslexia risk. We believe the findings from the present study have substantial implications for schools and districts attempting to respond to new state dyslexia legislation, for example, in the area of screening students for dyslexia risk and supporting the provision of interventions on the basis of such risk. Although schools in the study used the SAT10 to assign students to tiers of support, schools also used DIBELS at each benchmarking period to monitor student progress across the school year and inform instructional decision making. Many states include DIBELS 8th Edition and other curriculum-based measurement (CBM; e.g., Aimsweb) as a viable measure to screen for dyslexia risk; thus, this study provides support for current dyslexia policy context related to risk screening. CBMs that measure both decoding skills through quasi-word reading (e.g., Nonsense Word Fluency) and word reading fluency (e.g., Word Reading Fluency, Word Identification Fluency) seem particularly promising for screening students with or at risk for word-level reading disability (i.e., dyslexia).
IDEA requires that for schools to use RTI for SLD eligibility determinations, students must be provided high-quality reading instruction (e.g., Tier 1) and have access to scientific, research-based reading interventions (i.e., Tier 2). Further, lack of adequate student response must be in the context of generally effective instruction and intervention. Put another way, schools need to implement evidence-based, high-quality instruction and intervention supports across Tiers 1 and 2 to distinguish those students who might have a protracted reading difficulty or disability from those students who have not had the opportunity to benefit from such robust MTSS-R implementation. We contend that schools must provide better and stronger documentation that they are in fact providing generally effective instruction and intervention supports to justify their use of RTI for SLD eligibility determinations. This may become more salient as states are implementing statutes and guidance on screening, identifying, and treating students at risk for dyslexia, potentially opening schools and districts to legal challenges. Schools and districts would be on firm empirical ground to implement and document the use of evidence-based MTSS-R models as part of their RTI approach for SLD eligibility determinations, such as ECRI MTSS-R. For example, if a school could demonstrate that ECRI MTSS-R was delivered with fidelity in Tiers 1 and 2, it could serve as strong documentation that students who demonstrated inadequate response were provided high-quality instruction and scientific, research-based intervention, meeting the spirit and intent of IDEA’s allowance of using an RTI framework to determine students’ eligibility for special education under SLD.
Next Steps for Research
In a recent special issue on replication research and special education research, Travers et al. (2016) make a strong case for the necessity of replication research to further our knowledge base in the field. Further, they claim that the identification of evidence-based practices depends on replications to affirm or disconfirm intervention efficacy (Travers et al., 2016). Coyne et al. (2016) present a program of research that extends from original studies to include a continuum of closely aligned conceptual replications to more distal conceptual replications to eventually conducting meta-analyses across intervention studies. The U.S Department of Education has also noted the need for more systematic replication studies in education research. For example, in summer 2019, the Institute for Education Science invited applicants to submit to a new competition titled Research Grants Focused on Systematic Replication in Special Education. As part of the request for applications (RFA), applicants were required to submit only replication studies that evaluated interventions that meet the WWC’s strongest evidence of effectiveness standards and plan to systematically vary at least one aspect from the original impact study. Because ECRI meets the WWC requirements specified in the RFA, applicants were invited to submit an ECRI application, and therefore, we have begun conceptualizing a series of systematic replications of the ECRI MTSS-R approach, including one application currently under review, to contribute to the field’s understanding of the features of MTSS-R that maximize learning outcomes for students with or at risk for reading disabilities, including dyslexia.
For example, in our effort to further systematically vary aspects of the original ECRI study, it is important to consider the impact of MTSS-R in other grade levels. ECRI MTSS-R effects have been studied experimentally in Grade 1 alone. However, the ECRI intervention is a K–2 system, and the effects in kindergarten and second grade have not been rigorously studied. Although there is strong evidence that interventions in kindergarten can improve student reading outcomes from a range of other studies (e.g., Connor et al. 2013; Foorman et al., 2003; Torgesen et al., 1999; Vellutino et al., 2006), we have not, to date, generated evidence for ECRI MTSS-R in kindergarten or longitudinally across grades. Because ECRI MTSS-R is a comprehensive reading approach for use across Grades K–2, generating evidence of impact in kindergarten is a crucial first step to establish the foundation of the program and further support the use of MTSS-R as a prevention-oriented approach to serving students with or at risk for dyslexia. Evidence suggests there are kindergarten students at risk for reading difficulty for whom high-quality core reading instruction and an effective supplemental intervention will enable them to get on track for successful reading outcomes in kindergarten and subsequent grades. We want to know if ECRI MTSS-R can help produce this outcome.
Additionally, it is important to understand the longitudinal effects of multiple years of ECRI—specifically, what are the 2- and 3-year effects of ECRI MTSS-R? Based on the field’s knowledge of student response patterns and an understanding that some students will require sustained periods of high-quality core instruction and supplemental intervention, it is important to examine the effects of multigrade interventions (i.e., K–2) on student reading performance and the likelihood students will get and stay on track to achieve long-term reading goals (e.g., in Grade 3 and beyond). Unlike the strong evidence of 1-year kindergarten effects in reading, there is limited research on multiyear effects in the early grades. Although there are fewer studies documenting the effect of multiple years of intervention across the early grades, the small but converging pattern is that effects are much stronger for longitudinal interventions than for single-year interventions (Connor et al., 2013; Simmons et al., 2008; Vellutino et al., 2006). For example, Conner et al. (2013) attempted to determine the effects of 1, 2, or 3 years of reading intervention support on student reading outcomes. Results indicated an effect size of d = 0.20 for 1 year, d = 0.40 for 1 years, and d = 0.60 for 3 years of intervention. By design, ECRI MTSS-R has the potential to provide multiple years of high-quality early reading instruction and intervention that many students need.
Limitations
One limitation of the present study was our ability to collect WRMT data from only the first wave of students and our inability to collect the same data from the second wave of students. Even though we had enough statistical power to detect the rather large effects observed on Word Attack and Word ID, it would have been more compelling if the effects were observed for the full analytic sample across Waves 1 and 2. However, given there was no intervention moderation effect for wave on any of the other dependent measures, we would expect the same pattern of performance for the second wave of students, where Tier 2 students in the treatment group outperformed Tier 2 student in the control group.
A second limitation of the study was our inability to deconstruct the total effect of the ECRI MTSS-R approach into its key components (e.g., Tier 1 instruction, Tier 2 intervention, professional development, coaching). Although we can state with a high level of confidence that the complete ECRI MTSS-R model had a positive effect on student reading outcomes, similar to the original, Smith et al. (2016) study, we cannot attribute any aspect of the overall effect of ECRI MTSS-R to any particular component. We have published an analysis of the independent effect of ECRI Tier 2 using a regression discontinuity design (Baker et al., 2015); however we would like to engage in component analysis to further unpack the independent contributions for each key component of the MTSS-R approach.
A final limitation is our limited focus and ability to interpret factors related to ECRI implementation quality and fidelity. Following the field of public health (Curran et al., 2012), the field of education research is moving toward hybrid efficacy–implementation science studies and encouraging researchers to examine school and teacher factors that may hinder or enhance implementation quality and fidelity. In future studies of ECRI we will expand our theory of change to include factors that we hypothesize will positively or negatively affect implementation quality and fidelity and include a measurement net to assess these factors to examine how they might predict quality and fidelity.
Conclusion
In the present study, we sought to conduct a closely aligned conceptual replication study of the Smith et al. (2016) study of the effect of ECRI MTSS-R on at-risk students’ learning of foundational reading skills in first grade. We confirmed our hypothesis that at-risk students in the ECRI treatment group would outperform their at-risk peers in BAU MTSS-R control schools. Furthermore, we replicated the strong findings found in the Smith et al. study. The findings in the present study provide further evidence that ECRI MTSS-R is a valid and evidence-based approach for supporting strong foundational skills learning for students at general reading risk, relative to a strong counterfactual (BAU MTSS-R that implemented many of the same key features of ECRI MTSS-R). Additionally, the present study provides initial evidence that schools and districts can utilize ECRI MTSS-R as a model to screen students at risk for mild to moderate forms of dyslexia and improve their word reading, decoding, and reading fluency skills—crucial constructs for students with or at risk for dyslexia. This finding is encouraging given the number of states with new dyslexia legislation to screen students for dyslexia risk in the early grades and to intervene accordingly. Finally, we believe ECRI MTSS-R could serve as a valid model for RTI decision making to distinguish those students who respond to evidence-based interventions from those students who demonstrate inadequate response and may be ultimately determined to be eligible for special education under the category of SLD.
