Abstract
This review synthesizes research on English reading outcomes of all types of programs for Spanish-dominant English language learners (ELLs) in elementary schools. It is divided into two major sections. One focuses on studies of language of instruction and one on reading approaches for ELLs holding constant language of instruction. A total of 13 qualifying studies met the inclusion criteria for language of instruction. Though the overall findings indicate a positive effect (effect size = .21) in favor of bilingual education, the largest and longest term evaluations, including the only multiyear randomized evaluation of transitional bilingual education, did not find any differences in outcomes by the end of elementary school for children who were either taught in Spanish and transitioned to English or taught only in English. The review also identified whole-school and whole-class interventions with good evidence of effectiveness for ELLs, including Success for All, cooperative learning, Direct Instruction, and ELLA. Programs that use phonetic small group or one-to-one tutoring have also shown positive effects for struggling ELL readers. What is in common across the most promising interventions is their use of extensive professional development, coaching, and cooperative learning. The findings support a conclusion increasingly being made by researchers and policymakers concerned with optimal outcomes for ELLs and other language minority students: Quality of instruction is more important than language of instruction.
Keywords
The number of English language learners (ELLs) has been increasing rapidly in the United States and will no doubt continue to rise. According to the National Clearinghouse for English Language Acquisition (2011), there were over 5 million ELLs in the United States in 2009, making up 10% of all K–12 students, compared to 3.5 million a decade ago. The percentage is expected to rise to 25% by 2030. Based on the 2005 American Community Survey by the Modern Language Association, there were about 52 million speakers of languages other than English in the United States. Among all non-native English speakers, Spanish speakers were by far the largest group (32 million or 62%). No other language was spoken by more than 3%.
In comparison to their non-ELL counterparts, ELLs tend to be at higher risk of performing poorly in early literacy. As their oral English improves, so does their English reading, but many ELLs do not to catch up with their non-ELL counterparts over time. On the most recent 2011 National Assessment of Educational Progress (2011), only 7% of fourth-grade ELLs scored at or above the proficient level, while 46% of non-ELLs scored this well. Among eighth graders, only 3% of ELLs scored at or above the proficient level, as compared to 39% of non-ELLs.
Teachers are facing enormous challenges in knowing how to best serve and educate ELLs in their schools. A critical and contentious issue in the education of ELLs is the language of instruction. In the 1970s and 1980s, bilingual programs to teach ELLs were common in most states. With the English-only movement in the late 1990s, however, several states enacted policies against the use of bilingual education, including California in 1998, Arizona in 2000, and Massachusetts in 2002. Though these propositions usually included waivers for parents who wanted their children to be in bilingual education programs, they were designed to make such waivers difficult to obtain. For example, after Proposition 227 was passed in California, the proportion of ELLs receiving primary language instruction with English language development dropped from 30% to 8%. To evaluate the effects of the implementation of Proposition 227 on ELLs, the California Department of Education contracted with the American Institutes for Research (AIR) and WestEd to carry out a nonexperimental evaluation. No sizable effect of Proposition 227 (in either direction) was found on ELL students’ academic achievement in English (Parrish et al., 2006). Similar results were found in a study of Question 2, the Massachusetts English immersion law, on third-grade ELL students’ reading achievement (Guo, in press).
With the passage of the No Child Left Behind Act (NCLB) in 2002, the use of bilingual education was further discouraged throughout the United States. For one thing, NCLB required all states to include all ELLs in state testing programs in English, usually by third grade. Wright (2007) argued that “the high-stakes testing policies of NCLB, along with the accountability provisions which demand that ‘limited English proficient’ students learn English as quickly as possible, ultimately serve to discourage schools from offering heritage language programs” (p. 3).
The fundamental question has been whether ELLs ultimately become better English readers if they are taught using their native language partially or entirely or in an English-only learning environment. Opponents of bilingual education argue that ELLs are better served by early and intensive exposure to all-English teaching (e.g., Rossell & Baker, 1996). On the other hand, bilingual advocates believe that ELLs should be gradually transitioned from their native language to English-only, because they can start with success in a language they understand and then what they learn in their native language can transfer as they learn English (Goldenberg, 1996; Thomas & Collier, 1997). Several reviews have been conducted on the relative effectiveness of bilingual education and structured immersion programs (e.g., Greene, 1997; Rossell & Baker, 1996; Slavin & Cheung, 2005; Willig, 1987). Conclusions of these reviews, summarized in the following, have been quite diverse. However, the focus of research and policy has shifted toward identifying effective strategies for helping ELLs succeed in English rather than focusing on initial language of instruction (August & Hakuta, 1997; Christain & Genesee, 2001; Slavin, Madden, Calderon, Chamberlain, & Hennessy, 2011).
The purpose of this review is to review effective reading interventions for Spanish-dominant ELLs, including native-language instruction as one among an array of means of potentially improving English reading. The overall focus on Spanish-dominant ELLs is justified by two factors. First, Spanish-speaking students are by far the largest minority ELL group in our public school systems. In addition, they have historically low educational attainment and a high dropout rate. According to 2009 data, the high school dropout rate for U.S. Hispanics was highest (17.6%) among all minority groups, as compared with African American (9.6%), White (5.2%), and Asian (2.1%) students (Child Trends Data Bank, 2011). Clearly, the United States cannot reach its national educational goals unless educators can greatly improve outcomes for this large and growing group.
Working Definitions of ELLs and Types of Language of Instruction
Descriptors of ELLs
The term English language learners describes students who are in the process of acquiring English language skills and knowledge. Some educators and researchers refer to these students using the term limited English proficient (LEP), and the term English learners (ELs) is also becoming common. The term language minority students is used to refer to students whose parents speak a language other than English at home, but who may or may not have limited English proficiency themselves. This broader term is often used to define study populations when individual data on English proficiency are not available.
Programs Serving ELLs
English-immersion or English-only programs focus mainly on English language development, and all instruction and activities are conducted in English. A typical equivalent is structured English immersion (SEI), reflecting the idea that even when native language plays little or no role in reading instruction, ELLs are supported in their acquisition of English reading and speaking. Transitional bilingual programs provide most instruction in students’ native language (L1) in the early grades then gradually transition into an all-English (L2) learning environment in later grades. Two-way bilingual immersion programs provide instruction in both English and (usually) Spanish for ELLs and non-ELLs in the same classes. The goal is for both ELLs and native English-speaking students to become bilingual and biliterate (Genesee, 1999; Genesee, Paradis, & Crago, 2004). Finally, paired bilingual programs provide reading instruction to ELLs in both Spanish and English at different times of the day. They differ from two-way bilingual programs mainly in that English-proficient students are not taught in Spanish. For example, paired bilingual programs may have English reading in the morning and the Spanish reading in the afternoon.
Previous Reviews on Language of Instruction
Several major meta-analyses of the impact of bilingual education on reading have been conducted in the past two decades (Greene, 1997; Rossell & Baker, 1996; Slavin & Cheung, 2005; Willig, 1987). The conclusions have differed widely. For example, Rossell and Baker (1996) examined 72 studies from the 1960s onward by using a vote-counting method. They concluded that most studies did not favor bilingual education. However, Greene (1997), Willig (1987), and Slavin and Cheung (2005) concluded that bilingual programs produced better reading results for ELLs. For example, Greene used a meta-analysis to examine the same studies that were included in the Rossell and Baker study. He reported that only 11 out of the 72 studies included in the Rossell and Baker review were methodologically adequate. Greene found an overall effect size (ES) of .21 in support of bilingual programs among the methodologically adequate studies. Consistent with Greene’s findings, Slavin and Cheung found a positive effect of bilingual programs, especially paired bilingual programs, on English reading achievement, with an overall effect size of .31 among methodologically rigorous evaluations. It is important to mention, however, that few long-term randomized studies were included in these reviews. Also, most of the studies were done long ago, especially in the 1970s.
The Present Study
In 2005, in an effort to produce a more satisfying answer to the long-standing debate on bilingual education, the U.S. Department of Education funded three large-scale longitudinal studies that were intended to use rigorous research designs to examine the relative effectiveness of transitional bilingual education (TBE) and structured English immersion. Results of these three longitudinal studies have appeared in the past few years (Francis & Vaughn, 2009; Irby et al., 2010; Slavin et al., 2011). With this new evidence, there is a need to revisit the review of research on language of instruction. This review focuses on two main questions:
For English language learners, what approaches to language of instruction are most beneficial for development of proficiency in English reading: bilingual, English-only, or dual language?
Holding constant language of instruction, which reading programs and approaches are most effective for building the English reading of English language learners?
Method
The current review employed the best evidence synthesis review technique proposed by Slavin (1986), which seeks to apply consistent, clear standards to identify unbiased, meaningful information from experimental studies and then discusses each qualifying study, computing effect sizes, but also describing the context, design, and findings of each study. Comprehensive Meta-Analysis Software Version 2 (Borenstein, Hedges, Higgins, & Rothstein, 2005) was used to calculate effect sizes and to carry out various meta-analytical tests, such as Q statistics and sensitivity analyses. Like many previous research reviews, this study follows five key steps: (a) locating all possible studies, (b) screening potential studies for inclusion using preset criteria, (c) coding all qualified studies based on their methodological and substantive features, (d) calculating effect sizes for all qualified studies for further combined analyses, and (e) carrying out comprehensive statistical analyses covering both average effect sizes and the relationships between effect sizes and study features.
Literature Search Procedures
In an attempt to locate every study that could possibly meet the inclusion criteria, a literature search of articles written between 1970 and 2012 was carried out. Electronic searches were made of educational databases (e.g., JSTOR, ERIC, EBSCO, Psych INFO, Dissertation Abstracts), Web-based repositories (e.g., Google Scholar), and ELL reading program providers’ Web sites, using different combinations of key words. Descriptors included bilingual education, structured immersion programs, English language learners, language of instruction, language minority students, English immersion, dual language, two-way bilingual education, English as a second language, effective reading program, reading intervention, elementary reading, and secondary reading.
We also conducted searches by program name. We attempted to contact producers and developers of ELL reading programs to check whether they knew of studies that we had missed. References from other reviews of language of instruction and effective reading programs for ELLs were further investigated, including What Works Clearinghouse (2007, 2012) reviews on programs for ELLs. We also conducted searches of recent tables of contents of key journals from 2000 to 2012: Reading Research Quarterly, American Educational Research Journal, Journal of Educational Research, Journal of Adolescent and Adult Literacy, Journal of Educational Psychology, Bilingual Research Journal, and Reading and Writing Quarterly. Citations in the articles from these and other current sources were located.
Effect Size Calculation and Statistical Analyses
In general, effect sizes were computed as the difference between experimental and control individual student posttests after adjustment for pretests and other covariates, divided by the unadjusted posttest pooled SD. Procedures described by Lipsey and Wilson (2001) and Sedlmeier and Gigerenzer (1989) were used to estimate effect sizes when unadjusted standard deviations were not available, such as when the only standard deviation presented was already adjusted for covariates or when only gain score SDs were available. If pretest and posttest means and SDs were presented but adjusted means were not, effect sizes for pretests were subtracted from effect sizes for posttests. F ratios and t ratios were converted to effect sizes when means and standard deviations were not reported. If a study reported multiple outcome measures or grades, an overall mean effect size was then calculated to avoid dependence of effect sizes within each study (Hedges, Tipton, & Johnson, 2010).
Inclusion Criteria: Language of Instruction
As noted earlier, this review is divided into two major sections. The first section focuses on studies of language of instruction (e.g., bilingual vs. English-only instruction); the second section focuses on reading approaches for ELLs other than bilingual education. In order to be included in the review of language of instruction, studies had to meet the following inclusion criteria.
First, the studies compared children taught reading in bilingual classes to those taught in English immersion classes, as defined earlier. Second, either random assignment to conditions was used, or pretesting or other matching criteria established the degree of comparability of bilingual and English immersion groups before the treatments began. If these matching variables were not identical at pretest, analyses adjusted for pretest differences or data permitting such adjustments were presented. Studies without control groups, such as pre-post comparisons or comparisons to expected scores or gains, were excluded. Studies with pretest differences exceeding half of a standard deviation were excluded.
A special category of studies was rejected based on the requirement of pretest measurement before treatments began. These were studies in which the bilingual and immersion programs were already under way before pretesting or matching. For example, Danoff, Coles, McLaughlin, and Reynolds (1978), in a widely cited study, compared 1-year reading gains in many schools using bilingual or English immersion methods. The treatments began in kindergarten or first grade, but the pretests were administered to children in second grade. Because the bilingual children were primarily taught in their native language in K–1 and the immersion children were taught in English, their pretests in second grade would surely have been affected by their different treatment before pretesting. Additional studies of this kind include those by Curiel, Stenning, and Cooper-Stenning (1980) and Thomas and Collier (2002). Meyer and Fienberg (1992, p. 24) noted the same problem with reference to the widely cited Ramirez, Pasta, Yuen, Billings, and Ramey (1991) study, which also obtained pretests after students had been in bilingual or English-only programs: “It is like watching a baseball game beginning in the fifth inning: If you are not told the score from the previous innings, nothing you see can tell you who is winning the game.” Studies that tested children in upper elementary or secondary grades who had experienced bilingual or English-immersion programs in earlier years were included if premeasures were available from before the programs began, but in most cases such premeasures were not reported, so there is no way to know if the groups were equivalent beforehand.
Third, the subjects were Spanish-dominant English language learners in elementary schools in the United States. Studies that identified children as language minority (i.e., they came from homes in which Spanish was spoken but may or may not have been ELLs themselves) were included if data were not available on the language proficiency of individual children. Studies that mixed ELLs and English monolingual students in a way that did not allow for separate analyses were excluded (e.g., Skoczylas, 1972). Studies of children learning a foreign language were not included (e.g., monolingual English speakers studying Spanish). In addition, studies that involved languages other than Spanish were excluded (Morgan, 1971).
Fourth, the dependent variables included quantitative measures of English reading performance, such as standardized tests and informal reading inventories. If experimenter-made measures were used, they were included only if there was evidence that all groups focused equally on the same outcomes. Measures of outcomes other than reading, such as language arts, writing, and spelling, were not included. Fifth, the treatment duration was at least one school year. For the reasons discussed later, even 1-year studies of transitional bilingual education are less than ideal because students taught in their native language are unlikely to have transitioned to English by the end of the study. Studies even shorter than this do not address the question in a meaningful way.
Study Coding
To examine the relationship between effects and studies’ methodological and substantive features, studies were coded. Methodological and substantive features included grade levels, types of programs, evidence of initial equivalence, year of publications, research design, sample characteristics, and duration. Study coding was conducted by two researchers working independently. The interrater agreement was 95%. When disagreement arose, both researchers reexamined the studies in question together until they reached a consensus.
Results: Language of Instruction
Study Characteristics
A total of 13 qualifying studies based on approximately 2,000 elementary school children met the inclusion criteria for language of instruction. A study was defined as a unique comparison of experimental and treatment effects. Several articles reported more than one study. For example, the report by Campeau et al. (1975) included five separate treatment-control studies in different locations. The characteristics and findings of these studies appear in Table 1. Of these, 2 were published articles and 11 were unpublished reports such as technical reports or dissertations. The majority of the studies (n = 10) were carried out in the 1970s, 1 in the 1990s, and 2 after 2000. Only 3 used random assignment and the rest were matched control studies. There were only 2 5-year longitudinal studies (Maldonado, 1977; Slavin et al., 2011). It is important to note that the majority of the included studies used a model of paired bilingual program (n = 9) that is quite different from those that have been commonly used since the 1990s. Unlike later bilingual transition programs, students in these paired bilingual programs were taught reading in both English and Spanish at different times of the day. Two used a two-way bilingual approach and the other 2 were traditional transitional bilingual programs.
Language of reading instruction: Descriptive information and effect sizes for qualifying studies
Note. C = control, English immersion program; E = experimental, bilingual program.
These effect sizes were slightly different than those reported in Slavin and Cheung (2005). For Maldonado (1977), Slavin and Cheung stated that data for exact computation for effect sizes were not available and reported ES = .00 for all measures. However, we were able to locate additional information in the report to estimate the effect sizes. For Alvarez (1975), the effect sizes reported in Slavin and Cheung were +.12 and –.25 for vocabulary and comprehension, respectively, but these were posttest-only effect sizes, not adjusted posttest effect sizes. After adjusting for initial differences, the effect sizes were +.08 and –.15 for vocabulary and comprehension, respectively.
Overall Effects
As indicated in Table 2, the findings indicate a positive but modest effect (ES = .21, p < .01) in favor of bilingual education under a random-effects model. A substantial variation in this set of studies was found (Q B = 24.53, df = 12, p < .05). We found that effect sizes of matched control studies (ES = .26) were generally higher than those of randomized experiments (ES = .05). In addition, paired bilingual programs (ES = .30) produced a higher effect size than two-way bilingual programs (ES = .10) and transitional bilingual programs (ES = –.01). However, comparisons among these effect sizes must be done with caution, as there were only two studies of two-way bilingual education and two studies of transitional bilingual programs.
Language of instruction
As mentioned earlier, of the 13 qualifying studies, there were only 2 long-term longitudinal studies (Maldonado, 1977; Slavin et al., 2011). Of these, only Slavin et al. (2011) used random assignment to bilingual or English-only conditions. Only part of the 4-year matched study of early-exit TBE carried out by Ramirez et al. (1991) was included. The longitudinal aspect of the study was excluded due to inadequate controls for pretest differences (Meyer & Fienberg, 1992; Slavin & Cheung, 2005). Random assignment is particularly important in studies of language of instruction because it avoids selection bias, a serious problem when parents or teachers decide whether children within the school are initially taught in Spanish or English. In addition, many studies comparing TBE and SEI were too brief to have given students in TBE sufficient time to make their transition to English. The Maldonado (1977) and Slavin et al. (2011) studies are described in detail in the following; for a detailed description of most of the other included studies, see Slavin and Cheung (2005). Barnett, Yarosz, Thomas, Jung, and Blanco (2007) was published subsequently and was not included. Descriptive information for this study can be found in Table 1.
The Maldonado (1977) study involved Spanish-speaking Mexican American elementary school children in Corpus Christi, Texas. The main objective of the study was to investigate how well ELL students were able to succeed in the regular education program of the school district after they had left a bilingual program. A total of 126 children in six elementary schools participated in the study. The treatment group was comprised of 47 children who had participated in the bilingual program for 4 consecutive years, from first grade to fourth grade. The control group consisted of 79 students matched on socioeconomic status (SES) and numbers of years in school enrolled in regular English-only classrooms for the same 4 years. These two groups were followed until they reached fifth grade, 1 full year after the treatment group left the bilingual programs.
During the 4-year period, treatment students received a minimum of 2 hours of instruction in Spanish daily in language arts, reading, mathematics, and social studies. However, the author did not provide specific information about the control condition. After controlling for pretest differences, no statistically significant differences were found between the treatment and control group at any grade. The final effect size for the fifth-grade results was .11 (favoring bilingual education). It is important to mention that teachers in both conditions were bilingual. However, it is not stated if or how much the bilingual teachers in the control condition used Spanish in their classrooms to help children who were in need of bilingual explanations. As the author stated, “It is highly possible that the control group bilingual teachers might have used the Spanish language for clarification of some concepts. This, in turn, would not only assist those students in the comprehension of those concepts but at the same time lower the difference between the groups in the areas of mathematics and reading” (Maldonado, 1977, p. 104).
The second 5-year longitudinal study, conducted by Slavin and colleagues (2011), was one of three longitudinal studies funded by the U.S. Department of Education in 2005. The other two studies used 2 × 2 factorial designs to examine the effects of both language of instruction and an enhanced classroom intervention within each language of instruction (Francis, York, August, & Vaughn, 2009; Irby et al., 2010). In each case, the bilingual versus English-only factor involved matching, not random assignment. We included the parts of the studies that examined the effectiveness of the enhanced intervention within each language of instruction, and these comparisons will be discussed in the second part of this article. However, the comparisons that assessed the relative effectiveness of TBE and SEI did not meet the inclusion criteria. There were large pretest differences (ES > 1.00) in oral language composite scores between the SEI and bilingual groups in the Francis et al. (2009) study, suggesting that children were more likely to be selected into SEI if their English was already good. In the Irby et al. (2010) study, only a select group of TBE students (25%) participated in the English Texas Assessment of Knowledge and Skills (TAKS) posttesting as compared to all students in the SEI group. This created possibilities for bias in that the TBE students who took the English TAKS were sure to be more proficient in English than those who were not deemed to be ready to take the TAKS.
Slavin et al. (2011) also compared the effectiveness of TBE and SEI. Six schools located in Los Angeles, California; Denver, Colorado; Albuquerque, New Mexico; St. Paul, Minnesota; Rockford, Illinois; and Alamo, Texas, participated in the study. All participating schools had both transitional bilingual and structured English immersion programs. The study used a randomized within-school design in which kindergarteners were randomly assigned to either a transitional bilingual program or a structured immersion program. To increase the sample size, three successive cohorts (children entering kindergarten in 2004, 2005, and 2006) were included and were pretested in the fall of kindergarten and then assessed each spring on both Spanish and English reading tests. Children in the TBE classes were initially taught reading in Spanish. Transition to English reading could begin as early as first grade, but the majority of the participating schools did not start the transition until second grade. By fourth grade, all children were taught in English entirely. Children in the SEI classes were taught in an English-only environment except for occasional Spanish explanations.
Since children in both conditions used the Success for All reading program as their instructional materials, the content being taught was basically the same. The only difference between the two treatments was the language of instruction. The initial sample size was 247 (130 TBE, 117 SEI), and the final sample size was 115 (60 TBE, 55 SEI) due to attrition. No statistically significant difference was found between the two groups in terms of their attrition rate or the pretest scores of the final samples. Both the Peabody Picture Vocabulary Test and its Spanish equivalent (Test de Vocabulario en Imagenes Peabody) were used as pretests and as covariates in the final analyses. As expected, first graders in the TBE classes scored significantly higher on the Spanish Woodcock reading posttest (ES = .60) and significantly lower in English (ES = –.41) than their SEI counterparts. However, the differences between the two conditions started to narrow by second and third grades. By fourth grade, no significant differences were found between conditions on all three English reading measures. Similarly, there were no significant differences in Spanish posttests. The English differences had a mean effect size of –.26.
Inclusion Criteria: Effective Reading Programs for ELLs
In the next section, we review research on reading programs for ELLs other than use of native language. To be included in the effective reading programs section of this review, studies had to meet the following inclusion criteria. First, the studies involved K–6 students identified as Spanish-dominant ELLs. Studies that mixed ELLs and English monolingual students in a way that did not allow for separate analyses were excluded (e.g., Hurley, Chamberlain, Slavin, & Madden, 2001; Skoczylas, 1972). Studies of children learning a foreign language were not included. Second, either random assignment to conditions was used, or pretesting or other matching criteria established the degree of comparability of the treatment and control groups. Analyses adjusted for pretest differences or data permitting such adjustments were presented. Studies without control groups, such as pre-post comparisons or comparisons to expected scores or gains, were excluded. Studies with pretest differences exceeding half of a standard deviation were excluded.
Third, the languages of instruction were the same in both experimental and control groups. Fourth, the dependent measures included quantitative measures of English reading performance, such as standardized reading measures. In all cases, measures included assessment of comprehension, not just phonics or decoding. Measures of content taught in the treatment group but not the control group, such as a specific set of target words taught in a vocabulary intervention, were excluded. Fifth, a minimum of treatment duration of 12 weeks was required.
Findings: Effective Reading Programs for ELLs
Study Characteristics
Twenty-two qualifying studies involving over 4,300 students were included in the final analysis. Characteristics and findings of all the included studies of reading programs for ELLs are summarized in Table 3. Of these, 15 were published articles and 7 were unpublished studies such as technical reports or dissertations. The majority of the studies (n = 15) were conducted in the 2000s, 1 in the 1980s, 2 in the 1990s, and 4 in the 2010s. Seven of the 15 studies used an experimental design. There were two main types of interventions: whole-school and whole-class intervention (n = 14) and small group and one-to-one supplemental intervention (n = 8).
Effective reading programs: Descriptive information and effect sizes (ES) for qualifying studies
Overall Effects
As indicated in Table 4, the overall effect size for all 22 studies was .23 (p < .01). The large Q value (Q B = 52.07, df = 21, p < .01) suggests that there is a substantial variation in this collective set of studies. Types of programs may explain some of this variation. Effect sizes varied substantially among different types of programs. For example, approaches that used cooperative learning and small group instruction (e.g., Success for All, Bilingual Cooperative Integrated Reading and Composition, Peer Assisted Learning Strategies, Literacy Express Curriculum) generally produced larger effects than other approaches (see Table 5).
Effective reading programs
Effect sizes (ES) by types of programs
Whole-School and Whole-Class Interventions
Success for All
Among the reading studies that met the inclusion criteria, three evaluated the Success for All program (Slavin, Madden, Chambers, & Haxby, 2009). Success for All is a comprehensive reform model that provides schools with well-structured curriculum materials emphasizing systematic phonics in Grades K–1 and cooperative learning, direct instruction in comprehension skills, and other elements in Grades 2–6. It also provides extensive professional development and follow-up for teachers, frequent assessment and regrouping, one-to-one tutoring for children who are struggling in reading, and family support programs. A full-time facilitator helps all teachers implement the model.
English language development (ELD) adaptation of Success for All
Ross, Smith, and Nunnery (1998) conducted a 1-year matched control study on the ELD adaptation of Success for All (SFA) in six schools in an Arizona school district. Participants were 540 first-grade Spanish-dominant students in two SFA schools using an ELD adaptation of SFA and four schools using locally developed Title I schoolwide projects. Students were pretested on the English Peabody Picture Vocabulary Test (PPVT) and then posttested on the Woodcock Word Identification, Word Attack, and Passage Comprehension scales, and the Durrell Oral Reading Test. After adjusting for initial differences, Spanish-dominant SFA students scored significantly higher than control students on all measures, with a mean effect size of .52.
Success for All with embedded video
Chambers, Slavin, Madden, Cheung, and Gifford (2004) investigated the effectiveness of an adaptation of SFA that incorporated embedded video. A variety of video materials were used: animations to present letter sounds, puppet vignettes to present sound blending, live-action skits to present vocabulary, and segments from the television program, Between the Lions, to reinforce various skills. The brief video segments were interspersed in teachers’ lessons in Grades K–1. Spanish-dominant students were expected to benefit in particular from the embedded video treatment because the videos included vocabulary presentations and clear, visual reinforcements of reading skills. A total of 455 K–1 Hispanic students (311 treatment and 144 control) in eight schools in New York City, Washington, DC, rural Arizona, and southern California participated in this 1-year long matched control study. The two groups were well-matched on their pretest scores. Analyses of covariance, using pretests as covariates, found that schools using SFA with embedded video scored significantly higher than controls using reading approaches other than SFA on Woodcock Word Identification (ES = .40), Word Attack (ES = .36), and Passage Comprehension (ES = .21), with an overall mean effect size of 0.32.
Bilingual transition with Success for All
A 1-year matched control experiment was carried out by Calderón et al. (2004) evaluating an enriched transition program for children who had been taught in Spanish using SFA and were moving to the English program in third grade. The enriched program was a modified version of Bilingual Cooperative Integrated Reading and Composition (BCIRC), which consisted of components of the SFA beginning reading program (Reading Roots), including the embedded videos described earlier, and explicit instruction in vocabulary using strategies similar to those used by Carlo et al. (2004). Participants were 238 Spanish-dominant students in eight schools in El Paso, Texas. The study compared students who received the full program to matched students in similar control schools. After controlling for Spanish and English Woodcock Scales, treatment students scored higher than control students on Woodcock Word Attack (ES = .21), Passage Comprehension (ES = .16), and Picture Vocabulary (ES = .11), with a mean effect size of .16. Across all three studies, the overall weighted effect size for SFA on the achievement of Spanish-dominant ELLs was .35.
Embedded multimedia in Success for All
A study of the embedded multimedia component of SFA was conducted by Chambers, Cheung, Madden, Slavin, and Gifford (2004). It compared SFA schools using the embedded video materials described previously to schools also implementing SFA but without the embedded videos. Since all 10 participating schools used SFA, this was not a study of SFA but of the added embedded video treatment. A total of 172 first-grade Hispanic students in inner-city Hartford, Connecticut, were randomly assigned to SFA plus embedded video or SFA-only (control) conditions for a 1-year experiment. Results for Spanish-dominant children, who were 66% of the sample, found positive effects controlling for the PPVT and the Woodcock Word Identification scale on Woodcock Word Identification (ES = .23), Word Attack (ES = .36), and Passage Comprehension (ES = .16), and fluency on Dynamic Indicators of Basic Early Literacy Skills (ES = .07), with an overall mean effect size of .20. Because this was a study of a program variation and did not have a control group using traditional methods, it does not appear in Table 3.
Literacy Intervention With Cooperative Learning
Bilingual Cooperative Integrated Reading and Composition
An experiment by Calderón, Hertz-Lazarowitz, and Slavin (1998) evaluated a cooperative learning program called Bilingual Cooperative Integrated Reading and Composition. BCIRC is an adaptation of Cooperative Integrated Reading and Composition (CIRC), an upper elementary reading program based on principles of cooperative learning that has been successfully evaluated in several studies (see Stevens, Madden, Slavin, & Farnish, 1987). BCIRC was adapted to meet the needs of limited English proficient children in bilingual programs who are transitioning from Spanish to English reading.
In CIRC and BCIRC, students work in four-member heterogeneous teams. After a teacher introduction, students engage in a set of activities related to a story they are reading. These include partner reading in pairs and team activities focused on vocabulary, story grammar, summarization, reading comprehension, creative writing, and language arts. BCIRC adds to these activities transitional texts (in this study, Macmillan’s Campanitas de Oro and Transitional Reading Program) and ESL strategies, such as total physical response, realia, and appropriate use of cognates, to help children transfer skills from Spanish to English reading. Control teachers also used the same Campanitas de Oro and Transitional Reading Program textbooks and received training in generic cooperative learning strategies. None of the control teachers used cooperative learning consistently, although all of them made occasional use of these strategies.
Participants were 222 Hispanic children in the Ysleta Independent School District in El Paso, Texas. Seven of the highest poverty schools in the district were assigned to experimental (3 schools) or control (4 schools) conditions. The experimental and control groups were well matched on pretest and demographics. Two cohorts were assessed, one of which was involved for just 1 year (Grade 2) and the other for 2 years (Grades 2–3). Analyses of covariance controlling for Bilingual Syntax Measure scores found significantly higher scores for students in BCIRC classes in both cohorts, with a mean effect size of .54.
Peer-Assisted Learning Strategies (PALS)
A small matched control study was carried out by Saenz (2002) to evaluate the effectiveness of PALS for LEP students with learning disabilities and their ELL peers. A total of 132 students and 12 teachers from 12 classrooms participated in this study. Students and teachers were well matched on demographic characteristics and achievement data. The duration of the study was 15 weeks. Key components of PALS included partner reading with story retell, paragraph shrinking, prediction relay, and terms and points. Teachers in the treatment condition received a full-day workshop on PALS. Teachers in the control condition were asked to conduct their reading instruction in their normal fashion. At the conclusion of the study, significant differences were detected between the treatment and control conditions on all three measures, words correct (ES = .17), questions correct (ES = .76), and maze choice (ES = .16), with an overall mean effect size of .36. The overall effect size for the two studies using literacy intervention with cooperative learning was .47.
Direct Instruction (DI)
Direct Instruction (DI), or Distar (Adams & Engelmann, 1996), is a reading program that starts in kindergarten with very specific instructions to teachers on how to teach beginning reading skills. It uses reading materials with a phonetically controlled vocabulary, rapidly paced instruction, regular assessment, and systematic approaches to language development. Like SFA, DI provides extensive professional development and coaching to all teachers. DI was not specifically written for ELL students, but it is often used with them.
The most important evaluation of DI was the Follow Through study of the 1970s, in which nine early literacy programs were evaluated (Stebbins, St. Pierre, Proper, Anderson, & Cerva, 1977). In sites throughout the United States, matched experimental and control schools were compared on various measures of reading. One of the sites was in Uvalde, Texas, which primarily served Hispanic students. Becker and Gersten (1982) carried out a follow-up of the Follow Through study when the children who had experienced the treatments in Grades K–3 were in Grades 5–6. Participants were 225 Hispanic English language learners. The Uvalde DI students were well matched on demographic factors with their control group. After 2 years, the treatment group scored significantly higher than the controls on both Wide Range Achievement Test (WRAT) and Metropolitan Achievement Test (MAT) subtests. Effect sizes averaged .47 for two scales of the individually administered WRAT and .16 across three MAT subscales, for a mean effect size across five tests in two grades of .28.
Vocabulary Intervention
Direct Instruction With Key Vocabulary
Carlo et al. (2004) conducted a 2-year evaluation of a vocabulary teaching intervention with 142 Spanish-dominant ELL fifth graders (94 treatment and 48 control students) in California, Massachusetts, and Virginia. The intervention involved introducing 12 vocabulary words each week using a variety of strategies, such as charades, 20 questions, discussions of Spanish cognates, word webs, and word association games. The experimental students were taught in one 5-week unit and two 6-week units in the first year and three 5-week units in the second year. Matched control students continued their usual instruction. Experimental and control students were not significantly different on any of an extensive set of English pretests such as the Peabody Picture Vocabulary Test and reading comprehension. At the end of the first year, ELLs showed greater gains from pretest than controls, but surprisingly, gains were lower after 2 years of intervention. The mean effect size across five English measures in Year 2 was .17.
Improving Comprehension Online (ICON)
This quasi-experimental study conducted by Proctor and his colleagues (2011) evaluated the effectiveness of an Internet-delivered vocabulary and comprehension intervention that targeted both English-speaking and Spanish-dominant students. The ICON intervention was integrated into the existing curriculum and consisted of two 50-minute sessions per week for 16 weeks in a school computer lab. A total of 12 classrooms and teachers were assigned to either ICON or a traditional literacy curriculum. A total of 240 fifth-grade students from four schools in three primarily Hispanic districts in a northeast metropolitan area participated in the study. One hundred and eighteen of them were bilingual students (59 treatment and 59 control). The treatment students worked in a strategic digital reading intervention that was designed to improve both vocabulary and reading comprehension in a whole class setting. The main features of strategic digital reading included Spanish translations of all texts, human read-alouds of each text in English and Spanish, a revisable electronic work log that collected student responses, and a multimedia glossary. While students worked on ICON, teachers monitored and reviewed students’ work in the electronic work logs. No statistically significant differences were found for Spanish-dominant or English-dominant students. For the Spanish-dominant students, effect sizes on the Gates-MacGinitie test of reading vocabulary was .02, and it was also .02 for comprehension after adjusting for initial pretest differences.
Academic Language Instruction for All Students
Another study of an academic vocabulary program was carried out by Lesaux and her colleagues (Lesaux, Kieffer, Faller, & Kelley, 2010). Teachers in 21 classes were randomly assigned to treatment or control classes within each school. Participants were 476 sixth-grade students in seven middle schools in urban districts in California. Over 70% of them were language minority students and approximately 60% of them listed Spanish as their home language. The intervention was a text-based academic language program designed to “build knowledge of the words incrementally over time by providing multiple exposures to the words in different forms and in different meaningful contexts” (Lesaux et al, 2010, p. 202). The program was 18 weeks long and had eight 2-week units, including an 8-day lesson cycle and two 1-week review units. The two groups were well matched on their pretest scores.
After adjusting for initial pretest differences, the treatment students scored significantly higher than the control students on all four researcher-developed measures: target word mastery (ES = .39, p < .001), morphological decomposition (ES = .22, p < .001), word-meanings-in-context (ES = .20, p < .05), and target word association (ES = .15, ns). On the two standardized measures, the treatment group only scored slightly higher than the controls: Gates-MacGinitie reading (ES = .15, ns) and SAT-10 reading vocabulary (ES = .01, ns). We excluded the four researcher-developed measures due to their excessive alignment with the treatment. The effects were similar for language minority students and English-only students. The mean effect size for language minority students across the two standardized measures was .08. The overall effect size for all three vocabulary interventions was .09.
English Language and Literacy Acquisition (ELLA)
A project called English Language and Literacy Acquisition provided students with an intervention comprised of three tiers. Tier I was regular language arts, mathematics, science, and social studies instruction in Spanish in kindergarten and first grade. Tier II was English intervention, including three integrated strands: (a) daily small group tutorial instruction from the Santillana Intensive English program (Ventriglia & Gonzalez, 2000), (b) a storytelling and retelling activity (Irby, Lara-Alecio, Quiros, Mathes, & Rodriguez, 2004), and (c) teacher-conducted academic oral language in kindergarten and academic oral language in science in first grade. Tier III was intensive English tutorials delivered in small groups by highly qualified paraprofessionals for low-performing students. Teachers were provided with regular professional development workshops by the research team.
Three closely linked ELLA studies were included in this review. Irby and her colleagues (2010) conducted a 4-year longitudinal study to examine the impact of ELLA. It used a 2 × 2 design (Treatment × Language of Instruction), in which classes within each language of instruction were randomly assigned to either enhanced intervention (ELLA) or traditional instruction. Participants were 381 Spanish-dominant ELLs from 22 low-SES elementary schools in an urban school district in Southeast Texas. In order to qualify for the study, schools had to house programs of either SEI or TBE. Schools were randomly assigned to either the treatment (enhanced) or control (typical) condition. For the SEI comparison, the results were mixed. The treatment group scored higher than the control group on all three English measures: TAKS Reading (ES = .14), Woodcock Listening Comprehension (ES = .13), and Woodcock Passage Comprehension (ES = .15). The mean effect size across three measures was .14. For the TBE comparison, the treatment group scored higher than the controls on three Spanish outcome measures with a mean effect size of .18. Since the control group for the TBE comparison was not tested in English, no comparison was made on their English outcomes.
A report by Tong, Lara-Alecio, Irby, and Mathes (2011) was part of the larger experimental longitudinal study mentioned previously (see Irby et al., 2010). The main focus of this study was to investigate ELLs’ performance in English and Spanish oral language and reading skills from kindergarten to the end of first grade across treatment and gender. A total of 140 students (70 in each condition) were randomly selected from the larger longitudinal study. All of these students were placed in TBE classes in 10 schools and 12 classrooms. Students were tested three times on oral language (fall kindergarten, spring kindergarten, and spring of Grade 1) and twice on literacy skills (fall and spring of Grade 1). Results showed that students who received the ELLA enhanced TBE program outperformed their counterparts in the control condition on only two of the six English outcome measures: IDEA Oral Language Proficiency Test (ES = .48) and Woodcock Passage Comprehension (ES = .15). The mean effect size across all six English measures was .04. However, the treatment group scored significantly higher than the controls on five out of the six Spanish outcome measures, with a mean effect size of .28.
The third ELLA study was a 3-year (K–2) longitudinal randomized study (Tong, Irby, Lara-Alecio, & Mathes, 2008) derived from the larger 4-year longitudinal study mentioned previously. The main objective of this study was to look at the effectiveness of ELLA in the TBE classrooms. Nineteen schools were randomly assigned to either ELLA (N = 10) or control (N = 9) conditions. Treatment students received an enhanced developmental bilingual education program, which used a 70% Spanish and 30% English model, whereas the control group used a more typical 80% Spanish and 20% English bilingual model. The initial sample size was 502 and the final sample was 262, with an attrition rate of 48%. No differential attrition rate was found between the two groups. The findings indicated that treatment students outperformed their control counterparts in the areas of oral language, pre-literacy skills, and reading fluency and comprehension on English measures, with effect sizes ranging from .13 to .70. The mean effect size across all 14 English measures was .23. The treatment students also scored significantly higher than the control students on 7 of the 14 Spanish outcome measures, with a mean effect size of .12. Across the three linked studies, ELLA produced a weighted mean effect size of .15.
Language and Literacy Curriculum
Francis et al. (2009) conducted a 4-year longitudinal study (K–3) to examine the effects of the Language and Literacy Curriculum. The intervention focused on developing students’ literacy skills as well as their oral language proficiency skills by providing instruction in listening comprehension and vocabulary as well as providing more practice reading connected text. The materials developed for the structured immersion program were in English whereas those developed for the bilingual group were in Spanish. Cooperative learning strategies were used and professional development was provided to all treatment teachers throughout the school year on a monthly basis.
The study employed a 2 × 2 factorial design (Treatment × Language of Instruction). Teachers within each language of instruction were randomly assigned to either the treatment or control condition. A total of 1,271 kindergarteners and 55 teachers from 13 public schools in Brownsville, Texas, participated in Year 1. At third grade, the sample size was reduced to 744 students and 11 schools due to attrition. Students were assessed twice a year (fall and spring) from kindergarten to third grade. Out of the 13 original participating schools, 5 used only transitional bilingual education, 1 school used only the structured immersion program, and the other 7 had both SEI and TBE programs. Results were mixed. For the SEI group, after adjusting for pretest differences, the treatment group scored lower than the control group on Basic Reading (ES = –.18), Broad Reading (ES = –.26), and Oral Language Composition (ES = –.51), with a mean effect size of –.30 across all three English measures. The treatment group scored significantly higher than the control group on all three Spanish measures with a mean effect size of .58. For the bilingual group, the treatment students only scored slightly higher than the controls on two of the three English measures with a mean effect size of .04, but significantly higher on all three Spanish measures with a mean effect size of .81. The overall effect size for the English measures for all ELLs was –.12.
Small Group Supplemental Interventions
Small Group Tutorials With Direct Instruction
A randomized experiment was carried out by Gunn, Biglan, Smolkowski, and Ary (2000) to evaluate the effects of a small group supplemental tutorial program that used two forms of DI, Reading Mastery and Corrective Reading. Participants were Hispanic and non-Hispanic children who were struggling in reading. Participants were at-risk early elementary school children (K–3) and were selected from nine rural Oregon elementary schools in three school districts. After screening for eligibility, children were randomly assigned to experimental or control conditions. Those children assigned to the experimental group were taught in homogeneous groups of one to three children using Reading Mastery if they were in Grades K–2 or Corrective Reading if they were in Grades 3–4. They were taught daily by instructional assistants for 2 years. Only 19 of the 122 Hispanic students were considered non–English speaking; the oral English skills of the remaining students were not specified.
The experimental and control groups were well matched on their pretest scores. After the first year, the treatment students outperformed control students on all three measures, Woodcock Letter-Word ID (ES = .22), Word Attack (ES = .70), and Fluency (ES = .16), with an overall mean effect size of .36. At the end of the second year, after 15 to 16 months of instruction, effect sizes for gains from pretest on these measures were .46, .91, and .43, respectively. The mean effect size across these three measures was .60. In addition, there were positive effects on Woodcock Reading Vocabulary (ES = .44) and Passage Comprehension (ES = .48), given as posttests only (but adjusted for pretests). Experimental-control differences on all five measures were substantial after 2 years, with a mean effect size of .54 on the English measures.
An Explicit, Systematic Supplemental Reading Intervention
Vaughn, Mathes, Linan-Thompson, and Francis (2005) evaluated a supplemental program in which trained bilingual teachers provide systematic and explicit instruction in phonemic awareness and phonics applied to word and text reading in English. Students meet in small groups (3–5) for 50 minutes daily, 5 days a week. Extensive professional development is provided to teachers before and during the intervention.
Vaughn, Mathes, et al. (2006) carried out a small-scale randomized study examining the effectiveness of this intervention for at-risk first-grade ELLs. To be eligible for the study, students needed to score below the 25th percentile on the Woodcock Letter-Word Identification pretests in both English and Spanish at the beginning of first grade. Fifty-six of the 216 students in four low SES schools from two districts in Texas were eligible for the study, which lasted from October to May. The intervention was provided in addition to core reading lessons. Students in the control classrooms also received one or more types of supplemental reading intervention in addition to their core reading instruction. The treatment and control students were well matched on both English and Spanish pretest measures. At the end of the 7-month study, treatment students scored significantly higher than control students on 7 of the 14 English outcome measures. Strongest effects were in the areas of phonemic awareness (ES = 1.24), Word Attack (ES = 1.09), Passage Comprehension (ES = 1.08), and phonological processing (ES = 1.01). The mean effect size for all 14 English measures was 0.68, and the Spanish measures were also in favor of the treatment group.
Vaughn, Cirino, et al. (2006) replicated a pair of similar randomized experiments with two separate samples of at-risk first-grade ELLs to investigate the effectiveness of both the English and Spanish interventions as reported in the two previous studies (Vaughn, Linan-Thompson, et al., 2006; Vaughn, Mathes, et al., 2006). Due to the focus of this review, only the study that used English outcome measures is reported here. Participants were 90 students from four schools and 20 classrooms from three sites in Texas. Students in the study were first screened and then randomly assigned to either the treatment or the control condition (using the methods described in Vaughn, Mathes, et al., 2006). Treatment students were provided about 115 sessions of supplemental reading for 50 minutes every day in a small group setting (3–5 students). At the end of the study, treatment students outperformed their counterparts in the control condition on measures of phonological awareness, word attack, word reading, and spelling, with effect sizes ranging from .40 to .75. The mean effect size across all 16 English measures was .27.
Literacy Express Preschool Curriculum
An evaluation was conducted by Farver, Lonigan, and Eppe (2009) to examine the effects of the Literacy Express Preschool Curriculum. A total of 94 Spanish-dominant ELL preschoolers from a Head Start preschool program in Los Angeles participated in this 6-month-long randomized study. Students were randomly assigned to one of the three conditions: High/Scope Curriculum (control = 32), the Literacy Express Preschool Curriculum in English only (treatment English = 31), and the Literacy Express Preschool Curriculum initially in Spanish then transitioning to English (treatment transition = 31). The Literacy Express Preschool Curriculum is a comprehensive preschool program designed to improve young children’s oral language, emergent literacy skills, and socio-emotional development. The curriculum is structured around 10 thematic units and provides intensive small group (4–5 students) instruction that focuses on dialogic reading activities, phonological awareness activities, and print knowledge activities.
During the study, both treatment groups received a 20-minute small group intervention four times a week for 6 months. The first treatment group received the intervention from the Literacy Express Preschool Curriculum in English. The transition group received the same intervention in Spanish for 2 months and was then transitioned to English instruction. The transition period took place over 3 to 4 weeks. The treatment groups and control group were well matched on pretest measures. Analyses of covariance indicated that both treatment groups outperformed the control group on all five English measures, with mean effect sizes of .41 and .71, respectively, for a combined effect size of .49. The transition treatment group also scored significantly higher than the control group on all five Spanish language outcomes. No significant difference was found on Spanish outcome measures between the English treatment and control group. Across the four small group supplemental studies, the weighted effect size was .48.
One-to-One Tutoring Supplemental Interventions
Read Well and Read Naturally
Read Well used as a one-to-one tutoring approach combines systematic, explicit phonics instruction with practice in decodable text and contextualized vocabulary and comprehension instruction. Activities in the Read Well program include tutor-directed decoding practice, practice reading decodable text with pre-reading and during-reading discussion and questioning designed to build vocabulary and comprehension, and completion of simple comprehension worksheets (Sprick, Howard, & Fidanque, 1998). Read Naturally, an alternative tutoring approach, is a supplementary reading program aimed at improving reading fluency using a combination of books, audiotapes, and computer software for elementary and middle school students. Key strategies used include repeated reading of text for developing oral reading fluency, teacher modeling of story reading, and systematic monitoring of student progress by teachers and the students themselves (Ihnot, 1992).
Read Well and Read Naturally were studied in an experiment by Denton, Anthony, Parker, and Hasbrouck (2004). Spanish-dominant students in Grades 2–5 in a bilingual program in Texas were assigned to one of two separate experiments. Those scoring lower than the first-grade level on the Woodcock Word Attack scale were randomly assigned to Read Well or to an untutored control group. Those scoring higher than this were randomly assigned to a tutoring program using Read Naturally or to an untutored control group. Tutors were undergraduate education majors. All tutoring was done in English. The final sample of students in the Read Well evaluation included 19 experimental and 14 control children. Those in the experimental group received an average of 22 tutoring sessions. In the Read Naturally comparison, there were 32 tutored and 28 nontutored children. The results indicated substantially higher achievement for the Read Well students than for controls, with a mean effect size of .47 across six measures. Differences were statistically significant only on the Woodcock Word Attack scale (p < .05) and an oral reading accuracy scale (p < .001). In contrast, there were no differences between the children tutored with Read Naturally and those who were not tutored (ES = .09).
Kemp (2006) carried out a randomized experiment to examine how Read Naturally influenced reading and reading-related skills. Forty-two third-grade ELLs from three schools were randomly assigned to Read Naturally or to a control condition. All students participated in 20 minutes of independent reading 4 days a week. The treatment group received Read Naturally whereas the control group received scaffolded sustained silent reading for the same amount of time. After the 4-month intervention, the two groups did not differ significantly on all five outcome measures, with an overall mean effect size of .10.
Phonics-Based Supplemental Reading Intervention
Vadasy and Sanders (2011) evaluated a supplemental phonics-based tutoring intervention. Participants included both language minority students and non–language minority students. Due to the focus of this review, only results from the language minority students are reported here. Ninety-eight first graders who performed in the lower half of their class were randomly assigned to either treatment (N = 48) or control condition (N = 50) within each classroom. The treatment students received 30 minutes of individual tutoring in English for 4 days a week for 6 months. Scripted lessons were provided to each tutor and each lesson focused on several key components, including letter-sound correspondences, phoneme decoding, irregular words, spelling, and oral reading practice. Tutors also received ongoing coaching and modeling of appropriate scaffolding from research staff. Treatment and control students were fairly well matched on key demographic characteristics and pretest scores. The majority of the participants were Spanish-dominant. After the intervention, the treatment students scored higher than the control students on all outcome measures, with effect sizes ranging from .03 to .40. The mean effect size across all six measures was .21. Overall, the four one-to-one tutoring studies produced a combined weighted effect size of .19.
Discussion
The purpose of this review was to synthesize research on outcomes of all types of programs likely to improve English reading outcomes for Spanish-dominant English language learners in elementary school. In the past, the focus in this area has been on language of instruction, with reviewers debating the merits of bilingual versus English-only approaches. In the present review, however, we have treated various forms of bilingual education as interventions for ELLs and have considered their outcomes on essentially the same basis as we consider other types of interventions intended to improve English reading outcomes for ELLs. This is the first systematic review to consider all alternatives together in this way, focusing on studies that meet high, consistent standards of methodological rigor.
The findings of the review support a conclusion increasingly being made by researchers and policymakers concerned with optimal outcomes for ELLs and other language minority students: Quality of instruction is more important than language of instruction. Combining outcomes from 13 qualifying studies of bilingual education going back to the 1970s, average effect sizes weighted by sample size favor bilingual education, with an effect size of +.21. However, most of these studies are quite old and used forms of bilingual education that are not common today. The largest and longest modern-day study, which is also the only multiyear randomized evaluation of transitional bilingual education, did not find any differences in outcomes by the end of elementary school for children who were either taught in Spanish and transitioned to English or taught only in English.
More important, there are several types of interventions that have been found to be effective in improving reading outcomes for Spanish-dominant ELLs. Some of these are widely used or capable of being used broadly. For example, SFA, a whole-school reform approach with specific adaptations for English language development, was found to have positive effects with a weighted mean effect size of .35 across three studies. Two forms of cooperative learning had positive effects on ELLs; one, BCIRC, had an effect size of .54, and the other, PALS, had an effect size of .36 for ELLs. Positive effects were also reported for DI (ES = .28) and ELLA (ES = .15). SFA, cooperative learning, DI, and ELLA are all cost-effective whole-class or whole-school interventions that exist at some scale in schools serving many ELLs.
Another category of promising and scalable interventions includes small group (ES = .48) and one-to-one tutoring (ES = .19) for English language learners who are struggling in reading. Programs of this kind focusing on phonics and language development have shown promise, especially the structured small group programs evaluated by Vaughn, Cirino, et al. (2006); Vaughn, Linan-Thompson, et al., (2006); Gunn et al. (2000); and Farver et al. (2009).
Looking across the most promising interventions, there are several themes that appear. First, all of the proven programs provide extensive professional development and coaching to help teachers effectively implement promising models. Effective programs provide explicit manuals, videos, and simulations to start teachers off in the right direction and then have experienced coaches visit teachers using new strategies to offer feedback and support. Second, almost all of the effective strategies make extensive use of cooperative learning, which gives English language learners extensive, daily opportunities to use their developing language skills in meaningful contexts. In particular, cooperative learning appears to be able to build confidence in the use of school-specific English, of the kind students are unlikely to hear on the playground or in their communities.
Language of instruction remains an important question, if for no other reason than that building on students’ home language gives them skills in that language sure to be important in their lives. However, when English reading is the goal, different approaches may work equally well, bilingual as well as structured English immersion. We now have many approaches that can be used in either bilingual or English-only settings with evidence of effectiveness from rigorous evaluations.
Conclusions and Policy Implications
Policies and practices for English language learners should be based on the best evidence available about how to improve outcomes. For reading outcomes in the elementary grades, the evidence summarized in this review supports a focus on professional development in strategies such as cooperative learning, small group and one-to-one tutoring, and comprehensive school reform. Policies should be designed to broadly disseminate proven approaches among schools serving many ELLs and to develop, rigorously evaluate, and disseminate additional effective methods.
For example, in competitive grants for schools serving many ELLs, government might provide competitive preference points for schools proposing to implement proven programs with fidelity. Investments in development and evaluation of practical programs for ELLs are needed, perhaps patterned on the Investing in Innovation program currently underway. What should matter for policy is evidence of effectiveness for programs and practices that can be scaled up to serve many ELLs. Effective programs may make use of native language to a greater or lesser extent, but it is time to move beyond a main focus on language of instruction to consider all possibilities to enhance outcomes for Spanish-dominant children.
Footnotes
Authors
ALAN C. K. CHEUNG is an associate professor in the Department of Educational Administration and Policy at The Chinese University of Hong Kong, Ho Tim Building, Shatin, CUHK, Hong Kong; e-mail:
ROBERT E. SLAVIN is the director of the Center for Research and Reform in Education at Johns Hopkins University. His areas of specialization include cooperative learning, comprehensive school reform, evidence-based policy, English language learners, and research methods.
