Abstract
This article investigates the differences in academic achievement trajectories from elementary through middle school among English Learner (EL) students in four different instructional programs: English Immersion (EI), Transitional Bilingual (TB), Developmental Bilingual (DB), and Dual Immersion (DI). Comparing students with the same parental preferences but who attend different programs, we find that the English Language Arts (ELA) test scores of ELs in all bilingual programs grow at least as fast as, if not faster than, those in EI. The same is generally true of math, with the exception of DB programs, where average student scores grow more slowly than those of students in EI. Furthermore, Latino ELs perform better longitudinally in both subjects when in bilingual programs than their Chinese EL counterparts. We find no differences in program effectiveness by ELs’ initial English proficiency.
Keywords
Given these patterns, it is critical to determine what the best and most effective instructional methods are for ELs. Despite a large body of research on the topic, the long-running debate over whether bilingual education (in contrast to English-only instruction) is beneficial for ELs’ academic development continues. As a result, there is much variability across states and school districts in the kinds of programs available to ELs (Goldenberg, 2008). Some offer several instructional options such as bilingual education or English immersion (EI) instruction, whereas others have effectively banned bilingual education altogether (Rolstad, Mahoney, & Glass, 2005).
On one hand, some data and theory suggest that ELs benefit most from being immersed in English-only classrooms, because spending more time on task practicing English results in quicker English language development (Baker, 1998; Porter, 1990; Rossell & Baker, 1996). On the other hand, some theory and evidence suggest that to learn a new language, children require a fundamental literacy base in their first language, and that fostering the continued development of children’s first language will later transfer to the development of the second language because languages share common underlying proficiencies (Cummins, 1979, 2000; Goldenberg, 1996; see also Genesee, Geva, Dressler, & Kamil, 2008). This perspective also stresses that academic content in subjects such as math and science may be lost in translation when instruction is not in students’ first language.
There is slightly more empirical support for the latter argument, suggesting that bilingual education is superior to English-only instruction for ELs (Goldenberg, 2008; Greene, 1998; Rolstad et al., 2005; Willig, 1985), or at a minimum not detrimental (see Slavin & Cheung, 2005). However, much of the research on the issue is not very rigorous (see Rossell & Baker, 1996). Most research on EI versus bilingual education is not based on randomized experiments or rigorous quasi-experiments; most looks at short-term rather than long-term outcomes (for an exception, see Slavin, Madden, Calderon, Chamberlain, & Hennessy, 2011), and much of it is based on studies conducted on French Immersion programs in Canada (Cummins, 1999) or exclusively with Spanish-speaking ELs. Furthermore, “bilingual” instruction is implemented differently in different studies, complicating any synthesis of results. For example, some bilingual models serve ELs in classrooms separate from native English-speaking students, whereas others serve both ELs and non-ELs in the same classroom with the goal of creating biliteracy among both groups.
In this article, we address these gaps in the literature by using longitudinal student-level data from a large school district and more rigorous methods to address two main research questions:
Review of the Literature
Theoretical Perspective
Two theoretical perspectives frame the debate about the benefits and drawbacks of bilingual education. One perspective argues that bilingual education and the use of a student’s home language are essential to fostering English language acquisition and continued academic development in other subject areas (Cummins, 1979; Goldenberg, 1996). The contrary perspective argues that spending more time on task with maximum exposure to English language instruction results in quicker acquisition of and better performance in English (Baker, 1998; Porter, 1990; Rossell & Baker, 1996).
The first perspective—that bilingual education is preferable to EI instruction—is based on two arguments. First, if students are immersed in English-only instruction but have not developed a minimum level of competency in English, there will likely be a discrepancy between what is taught and what is understood (Goldenberg, 1996). Furthermore, children need a knowledge base to be effective readers and speakers. They may be able to continue expanding that knowledge base more quickly if they are taught in a language that they are more familiar with than if they are learning in a language that they do not fully understand.
Second, the continued development of children’s first language may facilitate acquisition of the second language, as academic language skills may be developmentally linked to similar underlying proficiencies that are common across languages (Cummins, 1979, 2000; Genesee et al., 2008). For instance, Collier and Thomas (1989) found evidence that immigrant students with 2 to 3 years of initial schooling in their country of origin tend to perform better academically than those who start their schooling in a new country. These findings are consistent with the idea that children should learn to read in their home language first, rather than learning to read in general and read in a new language simultaneously (Cummins, 1999).
The second perspective—that EI classrooms are better for ELs than bilingual classrooms—is based on the argument that spending instructional time on a language other than English necessarily detracts from students’ exposure to English. Given that the primary language of instruction in U.S. schools is English, the argument goes, delaying students’ development of English skills delays their opportunity to learn academic material.
To date, research has not yet consistently supported either hypothesis. In part, this is due to the fact that much of the research relies on research designs that do not provide a strong causal warrant. Moreover, bilingual education is implemented in different ways in different studies. These factors make it difficult to draw any firm conclusions about the relative benefits of bilingual instruction and EI instruction. In the section that follows, we attempt to highlight the most rigorous studies to date.
Effectiveness of Bilingual Instruction
Bilingual education has been shown to influence a number of student outcomes. These include both oral and written language development, rate of reclassification as fluent English proficient, and academic course-taking patterns (Jepsen, 2010; Riches & Genesee, 2006; Saunders & O’Brien, 2006; Umansky, 2014; Umansky & Reardon, 2014). As this article considers academic outcomes in ELA and math, we focus here on reviewing literature that considers effects on academic outcomes.
There is a sizable body of literature documenting the effects of bilingual education compared with EI instruction on ELs’ academic performance (Lindholm-Leary & Borsato, 2006). A handful of reviews and meta-analyses have tried to summarize the literature, but the conclusions of these meta-analyses vary depending on the study inclusion criteria they use. In a review of studies comparing bilingual programs with EI ones, Rossell and Baker (1996) found that about 30% of the studies show that bilingual education is worse than EI for reading outcomes, 20% show that it is better than EI, and the remaining 50% find that there is no difference between the two. Findings for math are similar. Although comprehensive and effective at highlighting the mixed nature of research on bilingual education, this review relies heavily on the effectiveness of French immersion programs in Canada, the results of which may not generalize to bilingual programs in the United States. Furthermore, although studies were restricted to those including a comparison group, most did not rely on experimental or quasi-experimental research designs.
The two meta-analyses that used the most stringent study inclusion criteria generally conclude that ELs who attended bilingual programs outperformed their peers who attended EI programs by anywhere from 0.18 to 0.33 SD per year in academic subjects. Furthermore, when restricted to only randomized experiments or only studies conducted in the United States, effect sizes were on the higher end of this range (about 0.3 SD per year) in each case (Greene, 1997; Slavin & Cheung, 2005).
While the findings from these meta-analyses tend to suggest that bilingual instruction leads to equal or better academic outcomes than English-only instruction, with the exception of a few studies, many of these studies relied on small locally specific samples leading to limited generalizability, and most only tracked student outcomes for a few years at best. Furthermore, much of this literature does little to tease apart the differential effectiveness of specific bilingual instructional models (e.g., TB vs. DB), making it difficult to disentangle which components make bilingual programs work. There are a number of different models of two-language instruction and there is not conclusive evidence to suggest that each model provides equally beneficial effects. There are three main models of instruction that utilize a two-language model in the classroom: TB, DB, and DI instruction. We review the evidence on the differential effectiveness of these models below.
Transitional Bilingual
TB classrooms serve only ELs, separate from their non-EL peers. Instruction starts primarily in students’ home language in kindergarten and increases in the amount of English used for instructional purposes at a rapid pace in the early elementary years, with the intention of transitioning ELs into EI programs quickly—usually by Grade 2 or 3. Transitional programs use ELs’ home languages to support learning, but do not have a goal of promoting bilingualism prior to transitioning to EI.
In a longitudinal quasi-experimental study, Matsudaira (2005) used a regression-discontinuity design to estimate the effect of enrolling in a TB 1 education class. The analysis finds negligible effects of bilingual education in ELA and math across Grades 3 through 8. However, because the estimates are based on a regression-discontinuity design, the findings apply only to ELs with relatively high levels of English proficiency (i.e., those scoring just below the cut score of EL classification), making it difficult to know whether the findings would generalize to ELs with lower initial English proficiency. Furthermore, in the district studied, there was considerable movement of students in and out of bilingual programs; only 30% of ELs remained in a bilingual program for 2 or more years. It is possible that if there was higher compliance of students attending programs for more years, the effects would be different.
Slavin et al. (2011) randomly assigned students to a TB or an EI program and tracked Spanish-speaking ELs’ reading and vocabulary achievement from kindergarten through fourth grade. They found that in the early grades, ELs enrolled in an EI program outperformed their peers who attended TB programs in academic outcomes in English, but by fourth grade, no significant differences in these assessments emerged (Slavin et al., 2011). These findings suggest that in early grades, some forms of bilingual instruction may slow the process of English language development, simply because much instructional time is spent on home-language development, but that ultimately transfer may occur from the home language to English, which is why ELs in bilingual instruction ultimately catch up. Among other things, the findings point to the importance of long-term follow-up to determine “effectiveness.”
Developmental Bilingual
Other research has compared the effects of TB programs with those of DB programs. DB education programs are similar to TB programs, in that they incorporate EL students’ home language into classrooms and exclusively enroll ELs, but these programs are longer term, often lasting through the fifth grade or later, and have the goal of helping students develop competency in English while maintaining and continuing to develop competency in their native language.
Ramirez, Yuen, Ramey, and Pasta (1991) compared both TB and DB programs with EI programs among Spanish-speaking ELs. Similar to Slavin et al. (2011), the authors found that in early grades, students attending transitional and DB programs performed worse in ELA than their peers enrolled in EI classrooms, but by second grade, this significant difference disappeared. They also found that by sixth grade, ELs in EI actually appeared to fall further behind their peers in bilingual programs. The findings from this study should, however, be interpreted with caution, as the authors’ matching algorithm did not account for students’ pretest scores.
DI
The above studies do little to shed light on the potential benefits of DI instructional programs, which to date have not been as extensively researched. DI programs are more similar to DB than TB instruction because they hold a goal of facilitating biliteracy through longer term programs, but they differ in that they enroll both native English speakers and ELs in the same classroom. In some ways, DI programs can be thought of as a hybrid approach of EI and DB instruction, as they are based on the notion that the integration of native speakers of both languages into a single classroom offers students the opportunity to learn with students who model high-quality language in the language they are not yet proficient in (Valdés, 1998). In some DI models, regardless of grade, approximately 50% of instructional time is spent on English and the other 50% is spent on the ELs’ native language (often referred to as the target language). In others, and in the case of the DI model in this article, the majority of instruction occurs in the target language in the early elementary grades. This gradually becomes more balanced across each grade until late elementary school, at which point about half of the instructional time is spent in the target language and the other half is spent in English (Christian, 1998).
Two noteworthy studies consider the effects of such programs on students’ outcomes. Thomas and Collier (2002) found that across five large school districts, ELs attending DI programs almost always performed higher academically in English, Spanish, and math than their peers in TB and DB programs. Furthermore, in all districts, the students attending the DB programs always performed at least as well as and in some districts better than those in the TB programs. This study provides good descriptive evidence of differences in EL students’ performance across programs, but only controlled for a very limited set of student-level variables. It is possible that the observed differences across programs were due to the fact that students enrolling in different types of programs differ systematically on characteristics related to their later academic outcomes.
The second study randomly assigned preschool students to either DI or English-only preschool classrooms and found that by the end of the first grade, DI instruction led to significant gains in the Spanish language development of both language minority students and native English-speaking children without loss to their development of academic skills in English (Barnett, Yarosz, Thomas, Jung, & Blanco, 2007). It is unclear whether the results of this study generalize to elementary school DI programs, however, because the randomized treatment assignment was maintained only through the preschool year. Moreover, the study focused on language minority children in general, only some of whom might have classified as ELs once they enrolled in kindergarten.
Taken together, these studies yield quite mixed results, but suggest that at the very least, bilingual education (generally defined) does not hinder academic performance in English in the medium term.
Motivation for the Current Study
Long-Term Effects by Subject
Although there is a sizable body of literature comparing the effectiveness of bilingual education with EI instruction among ELs, there are still many gaps in the literature. First, the overwhelming majority of studies tracking elementary-aged ELs consider outcomes for only 1 to 3 years after initial program attendance, and even the few exceptions to this still track differences in academic abilities only through fourth (Slavin et al., 2011) or fifth grades (Collier & Thomas, 2004; Maldonado, 1977). Tracking outcomes beyond these grades is particularly important in light of the fact that children initially enrolled in bilingual programs need time to develop English skills (Hakuta, Butler, & Witt, 2000) and may actually realize the largest gains from program attendance in the longer term. Furthermore, most current studies almost exclusively consider outcomes in English and/or ELs’ home languages, without considering the impact of bilingual instruction on academic development in other core subjects (for exceptions, see Barnett et al., 2007; Ramirez et al., 1991; Willig, 1985).
In this study, we add a longitudinal and multisubject perspective by looking at outcomes from kindergarten through late middle school in both ELA and math. We hypothesize that the two-language instructional programs will lead to slower initial growth, but faster later growth in ELA than will EI instruction because more exposure to English will lead to quicker acquisition of English language skills initially, but the transfer of skills across languages will allow students in bilingual programs to catch up after a few years. For math, however, several competing hypotheses seem plausible. On one hand, we expect that two-language programs should enable faster acquisition of math skills than English-only programs because instruction in EL students’ home languages will allow access to academic content. On the other hand, two-language programs may spend more instructional time on ELA than EI classrooms, and less time on math instruction, particularly if two-language programs enroll students with lower levels of English proficiency than EI programs. Finally, if performance on math tests is partly mediated by language skills, and if ELs in two-language programs initially develop English language skills more slowly than those in EI programs (as we hypothesize above), ELs’ test scores may not reflect their math skills in early elementary school as well for those in two-language programs as those in EI programs (because math tests are administered in English). This would make it appear that two-language programs lead to lower initial math skills than do EI programs. Because it is not clear which of these different mechanisms might dominate, we have no clear hypotheses about the effects of EL instructional programs on math.
Effects by Subgroup
Most research that has been conducted on EL instruction in the United States focuses exclusively on the effectiveness of different instructional programs for Spanish-speaking ELs. Further, some studies treat all ELs as one undifferentiated category, without considering differences in students’ home language and initial English proficiency. Although generally evidence suggests that supporting a child’s home-language development can ultimately transfer to second language proficiency because some features of language, such as reading comprehension, are universal across languages (Goldenberg, 2008); other research also indicates that the degree of transfer across languages may vary depending on the structures of the languages in question. When languages are typologically distant (such as English and many character-based East Asian languages), procedural literacy skills may be less likely to transfer (Genesee et al., 2008; Lado, 1964). One potential reason for this is that visual processes are more dominant when learning to read a character-based language like Japanese, than when learning an alphabetic language such as English or Spanish (Geva, 2006). When there are typological language differences, it is thus unlikely that all features of learning language such as letter–sound correspondence, phonological awareness, and reading comprehension will be identical (and thus transfer) across languages (a reality that is more likely between typologically similar languages).
Motivated by this background research, we disaggregate findings by Chinese and Latino ELs. Because Spanish and English have many structural similarities across languages, we hypothesize that Latino ELs in two-language programs, particularly those who foster continued development of one’s home language over several years, will do significantly better than their Latino peers who are enrolled in EI programs. However, because Chinese and English have very different phonological structures and distinct alphabets, we hypothesize that Chinese ELs in EI programs will perform better than their Chinese peers in bilingual programs. To our knowledge, only one study to date has specifically estimated the differential effectiveness of bilingual instruction for Latino and Chinese ELs. Conger (2010) found that bilingual instruction has a negative effect on English proficiency for both Latino and Chinese ELs. She argues, however, that the apparent similarity in program effects may be driven by differential selection processes, rather than by true similarities in the effects of bilingual education. We build on Conger’s work by estimating program effects by ethnicity on academic trajectories (rather than English proficiency) separately for Latino and Chinese ELs.
In addition to estimating our models separately by ethnicity, we also test whether the effects of EL instructional programs differ by students’ initial English proficiency. To our knowledge, there is little research to date on this question, with the exception of a study by Jepsen (2010), which found that bilingual programs had positive effects on English proficiency among those students with high prior English listening/speaking proficiency, and negative effects among those with low prior proficiency. Jepsen did not examine academic outcomes, however. Because of the limited prior research in this area, we have no clear hypotheses about whether and how EL instructional program effects may differ in relation to ELs’ initial English proficiency.
Rigorous Methods
One challenge in the study of EL instruction is potential selection bias. Many of the studies reviewed here include only a small set of control variables in regression models to reduce selection bias, but because the selection process is generally unknown, it is not clear whether these variables provide sufficient controls. In our analyses, we use random coefficients growth models with a relatively robust set of controls. Importantly, we are able to include a set of variables that directly control for parental preferences regarding the type of EL program they would like their child enrolled in. The school district where our research is based uses a complicated student assignment algorithm to assign EL students to schools and, within schools, to instructional programs. The algorithm takes parental preferences into account, but when schools and programs are oversubscribed, it relies on random assignment. Our models use this feature of the assignment process to estimate the effects of different programs, comparing the academic outcomes of ELs whose parents preferred the same school and program but who attended different programs. Because we can control explicitly for the parental preferences used in the algorithm, our results arguably have a somewhat stronger causal warrant than if we could control only for observable student characteristics.
Taken together, this study adds to the literature on the effects of EL instructional programs in several ways: (a) It estimates the effects of four different EL programs; (b) it examines long-term program impacts on academic trajectories; (c) it examines differences in program effects by student ethnicity/home language and initial English proficiency; and (d) it uses a set of models that provide a stronger causal warrant than much of the research to date.
Data and Method
Data
The data used in the current study come from a large urban district that serves a sizable EL population. Our analytic sample includes 13,750 EL students who entered the district in kindergarten between the 2001–2002 and 2009–2010 academic years. Approximately 1,500 ELs enter our sample each year. Our outcome data come from the state standardized tests in ELA and math that students took each year from second through up to eighth grade. We standardize these ELA and math scores relative to the state distribution within each grade and year, so all outcome test scores are reported in terms of SD from the statewide mean. While we use ELA scores through eighth grade, we only analyze math scores through sixth grade. We do so because, starting in seventh grade, students may take a subject-specific math test (e.g., general math vs. Algebra). Because not all students enroll in the same level of math class in seventh and eighth grade, math scores in these grades are not comparable across students. All ELs in our analysis are observed through at least third grade, but we do not observe all students in our sample through sixth or eighth grade, because the later cohorts of kindergarteners had not yet reached the later grades by fall of 2012, the last year for which we have outcome data.
Program Preferences
Prior to the start of kindergarten (but after they have been assigned to a school and EL program), students are assessed to determine their English proficiency. The district of study implements a choice model for school selection, where families rank program (i.e., 191 instructional programs located within schools) preferences. Students are then assigned to schools by a complex algorithm that attempts to assign students to the school/program combination requested by their parents, subject to a set of school diversity constraints and a set of priority rules. The district’s algorithm attempts to give applicants their highest possible choice, but uses a number of “tie-breakers” to determine who gets into programs that have more applicants than slots (which many do). Among students with the same priority rankings, ties are broken using random assignment. Importantly, teachers and administrators—who might have knowledge of students’ skills or needs—do not play a role in assigning students to schools or instructional programs within schools. As a result, there are students whose parents requested the same school/program combinations, but who were assigned to different EL programs through the priority rules or random assignment. By controlling for program preference fixed effects in our models, we can compare students who had the same school-by-program preferences, but attended different programs and/or schools due to the use of tie-breakers. 2
One concern related to our strategy is that families may be able to tamper with the lottery and/or may differentially leave the district if they are not assigned to one of their top school/program preferences. 3 In our district of study, there is little concern about tampering, as all school/program assignments are made by the algorithm, which is administered in the district’s central office. Families can, however, submit a formal appeal of extenuating circumstances (e.g., medical issues) to be granted a new assignment. The district reported to us that such appeals affect a negligible portion (less than 1%) of students assigned to schools/programs each year. In addition to the primary assignment process, there is also a second much smaller lottery (involving roughly 10% of students) that occurs after the initial assignment process to accommodate (a) late district entrants, and (b) families who wish to enter a lottery of remaining slots because other individuals who entered the lottery neglected to enroll. Through this additional lottery process, approximately 5% of all students receive a higher choice than they were initially assigned. Finally, although another study in the district found evidence that families whose child did not receive their first-choice school are less likely to enroll than those who did receive their first choice, this differential attrition pattern is largely driven by White (non-EL) students, and so has little effect on the students in our sample (Kasman, 2014). EL students enroll in the district at a high rate, regardless of whether they are assigned to their first-choice school and program. These patterns suggest that manipulation of the assignment process and differential enrollment/attrition patterns likely have little impact on our estimates.
Initial Program
EI, TB, DB, and DI program definitions, including the mission of each program, the population of students served, and the amount of instructional time spent on English versus the target language, are found in Table 1. We classify students according to the initial EL instructional program they attended, and interpret our findings as the effect of one’s initial EL program. The majority of our sample attends the same program for at least 3 (99.5%) or 4 (95.2%) years, from kindergarten through third grade, indicating that there is little movement in and out of programs once ELs enroll in a particular program during their kindergarten year. A student’s initial program is, in most cases, the program he or she attends for at least 4 years. After third grade, the proportion of students who are enrolled in the same program that they were initially enrolled in begins to differentially drop depending on the program. For instance, TB programs are designed to reclassify students as fluent English proficient and transition them into EI programs more quickly than the DB and DI programs. The proportion of ELs who were initially enrolled in TB and are still enrolled in TB drops by 32 percentage points (from 90%–58%) from Grade 3 to 4, compared with a 15- and 3-percentage-point drop between these grades for DB and DI, respectively. This difference is simply an artifact of the program design rather than reflecting a lack of compliance. Across programs, by middle school students are generally transitioned into EI programs.
Description of the Four EL Academic Programs Offered in the District of Study
Source. District Program Guide (2014).
Note. EL = English learners.
Sample Descriptives
As can be seen in Table 2, of our analytic sample, approximately 33% are Latino ELs, approximately 45% are Chinese ELs, and the remaining are ELs of a variety of other ethnic backgrounds, including approximately 5% of Japanese, Korean, or Filipino backgrounds. The majority of students in our sample (57%) are initially enrolled in EI programs. Approximately 21% of ELs in EI are Latino, while approximately 47% are Chinese. About equal proportions of EL students are enrolled in the TB and DB programs—20% and 17%, respectively. More specifically, approximately 37% of those initially attending the TB programs are Latino ELs and 56% are Chinese, while these figures are 50% and 43%, respectively, in the DB program. The DI program enrolled the smallest portion of ELs in our sample (8%), in part because there are fewer such programs available and in part because up to half of the slots in DI programs are reserved for non-EL students. Latino ELs make up the majority of ELs enrolled in DI (71%), followed by Chinese (14%) ELs.
Proportions of ELs of Each Ethnicity and of Total ELs Initially Attending Each Program; Average Pretreatment Variables, by Program; and Proportion of ELs With Each Initial Preference, by Program
Note. Initial English proficiency is standardized around the sample average. EL = English learners; ELA = English language arts.
Students initially enrolled in each of the two-language instructional programs have lower initial English proficiency in the fall of kindergarten than those in EI. This may in part be because in kindergarten, the two-language programs spend much instructional time on the target language. Parents may choose these programs for their children partly because of their incoming level of proficiency. Furthermore, in second-grade ELs in EI and TB score above their peers in DB and DI in both ELA and math. Those in DI score substantially below their peers in all of the other programs in both subjects in second grade. This pattern remains in middle school grades but is slightly less pronounced. Also noteworthy, relative to the state average in those grades, the average ELA and math scores of those in all programs increase from second through sixth/seventh grade.
Method
Research Question 1
To answer the first research question regarding the differential effect of each instructional program on ELs’ academic growth through middle school, we estimate four separate random coefficient student growth models (a special case of what are sometimes called mixed models, multilevel models, or hierarchical linear models): the first without student controls, the second with added student controls, the third with added student controls and school fixed effects, and the fourth including student controls, school fixed effects, and fixed effects for parent preferences. Although Model 3 adjusts for a set of observable student and school characteristics that are undoubtedly related to students’ academic growth trajectories and students’ choice of programs to attend, alone they may not fully account for student selection into programs. The four models—those with pretreatment controls for parental preferences of the type and location of the EL instructional program—are our preferred models for identifying the effect of programs on students’ outcomes. These models identify the effects of the instructional models by comparing students whose parents requested the same school-by-program combination but who were assigned to different programs by the algorithm. Allowing p to index 191 school-by-instructional program combinations, 4 i to index students, and t to index grades, we fit random coefficients models of the following form:
where both the intercept
The random effects are assumed to be mean 0 and multivariate normal among students and among programs. Likelihood ratio tests of the null hypotheses that the variance for each random effect is equal to 0 indicated that, in all of our models, each of the random effects improves the model fit (p < .001 in all cases).
In the above model,
In this model,
We fit several versions of this model. Model 1 does not include any student-level covariates (no vector
In Model 3, we add initial school of attendance fixed effects to Model 2. This allows us to adjust for any school-specific factors that might account for observed differences across programs. The program coefficients in this model are identified off of within-school variation in program enrollment. Finally, in Model 4 we include a vector of dummy variables indicating which of 191 school-by-program options parents listed first on their school-entry application. We add this set of additional school-by-program preference fixed effects to our existing vector of student/family controls,
Because school-choice data are only available for students who entered the district in kindergarten since 2004, we only analyze academic outcomes through seventh grade in ELA for these models to ensure that we have adequate sample sizes in all grades. Because of this, and also the fact that we have to restrict our sample to those students for whom we have preferences data, the sample in Model 4 is roughly half the size of the sample in Models 1 and 2. To ensure that any differences between Models 3 and 4 are not due to the difference in samples, we also fit Model 3 with the smaller sample used in Model 4. These are presented as “Model 3: Restricted Sample” in our results tables.
Research Question 2
To test whether program effects vary by ethnicity, we fit the same models for Latino and Chinese students separately. All control variables in each of these models are centered around their ethnicity-specific sample means. To test whether program effects vary by initial English proficiency, we add a set of two-way interactions between program type dummies and standardized initial English language proficiency score, and a set of three-way interactions between program dummies, initial proficiency, and grade. 7 A full set of model estimates are available in the online appendix (see the online appendix available at http://epa.sagepub.com/supplemental).
Interpretation of Coefficients
Recall that the coefficients of interest in our models are the vectors
Although these control variables might account for much of the selection bias one might worry about, they may not fully capture any differences among programs in EL students’ initial academic skill. Despite the fact that we control for initial English ability, which is associated with later academic performance, we may not fully capture important variations between programs in academic skill; some ELs may be low in English language proficiency but otherwise perform high academically in their home language. If this initial academic skill were correlated with program enrollment, net of the other variables in our models, our estimates may be biased. Although prekindergarten or kindergarten measures of ELA and math skill are not available (because state tests are first administered in second grade, not kindergarten), the school district did administer a general early childhood developmental inventory (ECDI) in the fall of kindergarten in the last 3 years. We cannot include this variable as a control in our models due to the limited years of availability, but Table 1a of the online appendix shows that average ECDI scores in the fall of kindergarten do not differ significantly among the EL programs, and that the inclusion of ECDI as a control variable does not significantly change the second-grade ELA and math coefficients for this sample after adjusting for our existing set of controls. This analysis suggests that our results do not suffer from omitted variable bias due to the omission of an unobserved measure of prekindergarten academic skill.
Results
Differences in Academic Trajectories Among EL Instructional Programs
Results for our first research question, regarding the differential effect of each instructional program on ELs’ academic growth through middle school, are presented in Table 3. The table includes estimates from the five models described above (Models 1–4, plus a second version of Model 3 based on the Model 4 sample). For each model, we tested the null hypotheses that the program-specific intercepts are equal and that the program-specific grade slopes are equal; p values for these joint tests are at the bottom of Table 3. In general, the coefficient estimates are relatively similar across the specifications. For the sake of parsimony, and because it includes the most extensive set of control variables, we focus primarily on Model 4 in our discussion of the results below.
Estimated Parameters of Average ELA and Math Second-Grade Scores and Growth Trajectories, by Initial Program Attended
Note. Stable student controls include gender, ethnicity, special education status, and initial English proficiency score. All models allow students’ individual intercepts and slopes to vary. The reference category is English immersion, and as such the intercept and grade terms represent the average starting point and trend for those initially attending this program. Grade slopes for ELA represent an effect from Grades 2 to 8 for Models 1 and 2 and Grades 2 to 7 for Models 3 restricted and 4. School-program random effects represent the initial program (e.g., dual immersion program in School A) that students were enrolled in. ELA = English language arts; FE = fixed effects; TB = transitional bilingual; DB = developmental bilingual; DI = dual immersion; EL = English learners; RE = random effects.
p < .10. *p < .05. **p < .01. ***p < .001.
ELA
The estimated intercepts indicate the differences in average ELA scores in second grade among the programs. By second grade, students in EI classrooms have average ELA scores that are not statistically distinguishable from the performance of the average student in the state. Relative to students in EI classrooms, and net of the covariates and fixed effects in the model, students in TB score significantly higher (by 0.08 SD) on the ELA test in second grade, whereas those in DB score no different, and those in DI score significantly lower (about 0.19 SD lower).
The estimated differences between programs in rates of growth in ELA scores from second through seventh grade show a somewhat different pattern. In general, the test scores of ELs in EI increase at a rate that is significantly slower than the rate of the average student in the state (recall that, because test scores are standardized relative to the state distribution in each grade, the average student in the state has a growth rate of exactly 0). Furthermore, the rate at which the ELA test scores of ELs in TB increase is significantly faster than those of EI, whereas the rate for DB is not significantly distinguishable from those of students in EI, conditional on the covariates in the model. Finally, although ELs in DI classrooms have ELA scores well below those of their peers in EI classrooms in second grade, from second through seventh grade the ELA test scores of ELs in DI increase at a rate that is 0.064 SD faster per grade than those in EI. This rate is sufficiently faster than EI students that by sixth grade the average ELA scores of DI-enrolled students match the state average, and surpass those of observationally similar ELs in EI and DB (see Figure 1). These findings suggest that although in the early years of attendance DI programs may have a negative effect on performance in ELA, in the long term, the short-term negative effects are more than overturned by the positive effects on test score growth.

Estimated average ELA and math achievement trajectories, relative to state average: EL kindergarten entrants, by instructional program
One thing to be noted in Table 3 is that the estimates are generally consistent in the models with and without controls for parental program preferences (i.e., in Model 3: Restricted and Model 4). This suggests that differences in parental preferences are not highly confounded with ELs’ potential academic trajectories. Although it is possible that there are still other factors that we did not observe that affect selection into programs and that are correlated with academic trajectories, this pattern of results, in conjunction with the ECDI results presented in Online Appendix A, suggests that the coefficients might be interpreted as largely unbiased estimates of the effects of the different EL instructional programs in this district.
Math
In math, Models 3 and 4 likewise yield similar results to each other. By second grade, the math scores of EL students enrolled in EI classrooms are significantly higher than the state average (about 0.15 SD), whereas the scores of observationally similar ELs in TB and DB classrooms are even higher (by about 0.21 and 0.12 SD, respectively). The scores of those in DI did not significantly differ from those in EI in second grade, which indicates that students in these programs, like those in EI, score above the state average in math in second grade.
The slope estimates in Table 3 indicate that the math test scores of students receiving EI instruction grow significantly more slowly than the state average. The math scores of EL students in DB classrooms grow significantly more slowly than those in EI, by about 0.04 SD per grade; the growth rates of the scores of those in TB and DI programs are not statistically distinguishable from those of similar students in EI classrooms (see Figure 1).
Differences in Program Effects by Ethnicity and Initial English Proficiency
Estimates from models designed to determine whether program effects vary by EL students’ ethnicity or initial level of English proficiency are presented in Table 4. Here, we report the results from only Model 4 (estimates from the other models are available in Online Appendix B). Table 4 clearly shows that the academic trajectories differ sharply between Latino and Chinese ELs; among those enrolled in EI, for example, the typical Latino EL has ELA and math scores about 0.8 to 1.0 SD, respectively, below those of the typical Chinese EL student in second grade. This large achievement gap is evident in Figures 2 and 3.
Estimated Parameters of Average ELA and Math Growth Trajectories, by Initial Program Attended and Ethnicity (Left Panel 1) and Initial English Proficiency (Right Panel 2)
Note. All coefficients estimated from Model 4 (controls, school and preference fixed effects). Stable student controls include gender, ethnicity, special education status, initial English proficiency score, and initial program preferences. All covariates, including the fixed effects, are group-mean centered within the sample used in each model (i.e., in the Latino models, initial English proficiency is centered around the mean initial English proficiency for Latinos, whereas in the Chinese models it is centered around the mean initial English proficiency for Chinese ELs). All models allow students’ individual intercepts and slopes to vary. The reference category is English immersion, and as such the intercept and grade terms represent the average starting point and trend for those initially attending this program. Grade slopes for ELA represent an effect from Grades 2 to 7 and in math Grades 2 to 6. School-program random effects represent the initial program (e.g., dual immersion program in School A) that students were enrolled in. ELA = English language arts; TB = transitional bilingual; EP = English proficiency; DB = developmental bilingual; DI = dual immersion; EL = English learners; RE = random effects.
p < .10. *p < .05. **p < .01. ***p < .001.

Estimated average ELA achievement trajectory relative to the state average: EL kindergarten entrants, by instructional program and ethnicity

Estimated average math achievement trajectory relative to the state average: EL kindergarten entrants, by instructional program and ethnicity
In addition to these large between-group differences in average scores, the effects of all three bilingual programs, relative to EI, appear to also vary between the two groups. For Latino ELs, the second-grade ELA scores of those in DB and DI are significantly lower than those attending EI, whereas the scores of those attending TB are not significantly different from the scores of those attending EI. However, the estimated growth rates in Table 4 indicate that although Latino ELs in all three of the bilingual programs score significantly lower than (or at best no different than) those in EI in second grade, their rates of growth in ELA are significantly faster than the rate of growth of their Latino peers in EI. As can be seen in the left panel of Figure 2, this means that although in second grade Latino ELs in two-language instructional programs score below or the same as their Latino peers in EI, by seventh grade, Latino ELs in all of these programs score above those in EI on average (see Figure 2). The growth rate for Latino students in DI classrooms is roughly twice the growth rate of Latino ELs in the TB and DB programs.
The pattern of differences among programs in ELA trajectories for Chinese ELs is very different. In the second grade, Chinese ELs in TB have scores that are significantly higher than Chinese ELs in EI, and the ELA scores of those in DB and DI are not significantly different from the scores of those in EI. However, the average growth rates of ELA scores of Chinese ELs in TB and DI classrooms do not significantly differ from that of observationally similar students in EI classrooms, and the average growth rate for Chinese ELs in DB classrooms is significantly slower than that of their Chinese EL peers in EI. This indicates that in general, the ELA score trajectories of Chinese ELs are most positive for those in DI, followed by EI. Best seen in Figure 2, it is noteworthy that regardless of program, the test scores of Chinese ELs are almost always above the ELA scores of the average student in the state.
In math, the coefficients on the grade-by-program interaction variables (but not the program intercept differences) for Latino ELs are somewhat similar to those in the ELA models. Latino ELs in EI score significantly below the state average in math in second grade and the rate at which their math scores grow over time is significantly slower than the average rate of math score growth in the state. The second-grade scores of Latino ELs in TB and DI are not significantly different from those in EI, whereas the second-grade math scores of those in DB are significantly higher than those in EI. However, the rate of test score growth of Latino ELs in DI is significantly faster than the rate at which the math scores of those in EI increase. The slopes for Latino ELs in TB and DB do not differ from (or at best are marginally significantly better than) the average slope of their Latino peers in EI. By sixth grade, Latino ELs in each of the two-language programs have higher average math scores than their observationally similar peers in EI classrooms, a pattern similar to the patterns in ELA scores (see Figure 3).
Chinese ELs show almost exactly the same pattern of results in math as they do in ELA, with the exception of one finding that the second-grade math test scores of Chinese ELs in DB are significantly higher than the second-grade math scores of their Chinese peers in EI. The by-ethnicity results suggest that Latino ELs perform the best in both ELA and math in the long term when they are enrolled in any of the bilingual programs, but especially have the most optimal long-term outcomes in DI. While Chinese ELs do best longitudinally in ELA and math when enrolled in DI, they also do very well in EI—the program that uses no home-language instruction. They perform worst longitudinally in DB in both subjects, but especially in math.
For both Latino and Chinese ELs and both math and ELA, separate significance tests of the null hypotheses that program Grade 2 intercepts are jointly equal to 0 and that program slopes are jointly equal to 0 are found at the bottom of Table 4. All tests indicate significant between-program differences in intercepts and slopes. Finally, we note that tests of whether the patterns of program effects differ between Chinese and Latino students (estimated by fitting a fully interacted model on the full sample of Latino and Chinese students; results not shown) indicate that program-specific intercepts and rates of test score growth among Chinese ELs differ significantly from those of Latino ELs.
The right panel of Table 4 reports the estimated differences in program effects by ELs’ initial level of English proficiency. Note that in this table, results from a single model are presented across two columns (main effects and by initial EP side-by-side). There is little evidence of significant differences in program effects by initial proficiency, as evidenced by the large p values (at the bottom of Table 4) from the tests of the null hypotheses that the Grade 2 program effects are equal and that the program effects on growth rates are equal.
Discussion
In this article, we estimate the associations among elementary school EL instructional programs and EL students’ longitudinal academic outcomes in ELA and math. We build on prior research on the topic by focusing on academic outcomes in two subjects through middle school, by comparing the effectiveness of four different two-language instructional models, and by evaluating whether these EL programs are differentially effective for students of different ethnicities or language backgrounds. In addition, our models arguably provide more plausible estimates of program effects than much of the existing literature, as we are able to eliminate two key potential sources of selection bias: the confounding of program enrollment with parental preferences (a common unobservable characteristic in other similar studies) and the confounding of program enrollment with differences in academic preparation prior to kindergarten.
Four key findings are worth noting in this study. First, we find that in the short run (by second grade), there are substantial differences in the academic performance in ELA and math among EL students who start with different instructional programs in kindergarten. By second grade, ELs in DI classrooms have ELA test scores that are well below those of their peers in EI. At the same time, ELs in TB have test scores well above those of ELs in EI in both ELA and math, and those in DB have math test scores that are significantly higher than their peers in EI.
Second, the effects of EL instructional programs on longer term academic trajectories (into middle school) differ from the apparent effects on shorter term academic outcomes. For example, in the short term (through second grade), ELs in DI score substantially below their EL counterparts attending other instructional programs in ELA. By seventh grade, however, students in DI and TB programs have much higher ELA scores than those in EI classrooms. This pattern of a reversal in the relative effects of EL programs is consistent with other research that, for Latino ELs, both the development of English proficiency and reclassification patterns are slower in early elementary school for those in bilingual EL programs than for those in EI programs, but that ELs in two-language programs catch up or surpass their EI-enrolled peers by middle school (Umansky & Reardon, 2014).
In some ways, these patterns are not particularly surprising; indeed, they are likely at least partly an artifact of the programs’ designs. ELs in DI spend more time early on in the target language (e.g., Spanish, Cantonese, etc.) than any of the other programs do (about 80%–90% of their instructional time in kindergarten through first grade; see Table 1). As a result, they develop English proficiency more slowly in the early grades. This may partly explain their lower ELA performance. Moreover, because the ELA and math tests are administered solely in English, students in DI classrooms may not be able to fully demonstrate their knowledge, particularly in math. Thus, although ELs in DI score poorly on tests administered in English in the early grades, this is not necessarily an indicator that they are not developing important content knowledge and literacy skills that in the long term will ultimately transfer to English language and other academic development.
Furthermore, the test score growth rates of ELs in DI far outpace those of ELs in the other programs. It is possible that DI programs have this effect because they combine both EI and bilingual instructional models into one program. Specifically, DI instruction (a) exposes ELs to native English-speaking peers, while still (b) providing instruction in ELs’ home language to support continued development of that language. The first piece is important because having classmates, one third of whom are native English speakers, may prove useful for modeling English language use. The second piece is important for two key reasons: first, because use of ELs’ home languages will help ensure that they do not fall behind in core academic subjects due to a lack of understanding, and second because ELs might benefit from transfer of language skills from one language to the other if continued development of literacy in their home language is supported. More specifically, there is evidence that languages share core underlying structures that require similar proficiency skills, and that children who are just beginning to learn to read and write can benefit from continued support or their home-language development because such underlying proficiency skills ultimately transfer across languages (Cummins, 1979, 2000; Genesee et al., 2008; Goldenberg, 1996). Given this argument, however, one might be surprised that the ELA test scores of ELs in DB increase more slowly than those in EI, as this seems inconsistent with theory and research on transfer across languages. However, as is evident in Table 4, this negative effect is driven by the effects among Chinese ELs, which we discuss in further detail below.
One implication of the comparison of short- and long-term effects is that EL programs should be evaluated using both short- and long-term outcomes. Measuring EL programs’ “effectiveness” by looking at only short-term outcomes might lead one to conclude that DI programs are the least effective of the four models, and that programs that emphasize more English instruction earlier (TB and EI) are superior. An examination of longer term findings yields a different conclusion, however, which highlights the need to include longer term outcomes in the evaluations of EL programs.
A third notable finding is that the effects of the different EL instructional programs appear to differ for Latino and Chinese ELs. For instance, we generally find that, compared with Latino ELs in EI classrooms, Latino ELs in bilingual programs initially score lower on ELA tests in second grade and improve their ELA scores faster following second grade. The reverse pattern was observed for Chinese ELs in transitional and DB programs, although not for those in DI programs. Indeed, the one commonality between the Latino and Chinese patterns is that for both groups, in both math and ELA, EL students in DI programs had the fastest growth rates from second grade into middle school (although in the case of the Chinese ELs, growth rates in DI classrooms were not significantly faster than those of children in EI).
The significant negative effects of TB and DB relative to EI instruction on Chinese ELs’ test score growth have two plausible explanations. The first comes from evidence suggesting that the extent to which home-language use in the classroom transfers to second language acquisition depends on the structural similarity of the two languages (Genesee, Lindholm-Leary, Saunders, & Christian, 2006; Lado, 1964). Transfer is more likely if the first and second languages are typologically similar (e.g., Spanish or French and English), but less likely if the languages are typologically distant (e.g., Japanese or Chinese and English). In the latter case, because alphabets, phonemes, and overall language structures are mismatched, bilingual education may be less effective in promoting English language development. This could, in turn, mean that more time spent “on task” in English may be a more effective means of academic instruction for Chinese ELs than it is for Latino ELs (if, of course, the outcomes of interest are measured by tests administered in English). This might explain why Chinese students in DI classrooms have ELA and math trajectories that are not statistically distinguishable from those in EI, given that DI classrooms include native English speakers and some instructional time in English. Although our results seem consistent with this explanation, it is not clear that typological similarity entirely accounts for the difference, especially given the apparent positive early effects of TB education for Chinese students. Moreover, some researchers have argued that even if transfer is less likely among some languages than others, there may still be benefits of bilingual education across language types because there are underlying proficiencies that are common across all languages such as language processing and reading comprehension (Goldenberg, 2008).
Another potential explanation is that the Chinese and Spanish language bilingual programs are implemented differently in this district. We were not able to directly observe EL classrooms as part of this study, but it may be that finding well-qualified teachers for Chinese bilingual programs is harder than for Spanish language programs (a difficulty some district officials have described to us); as a result, the Chinese programs may not be implemented with the same fidelity as the Spanish programs, leading to different patterns of effects.
A fourth notable result is that the effects of the EL instructional models appear to be similar for ELs at all levels of initial English proficiency. This is in contrast to Jepsen’s (2010) findings that bilingual instruction had a positive effect on English proficiency among ELs with high prior proficiency, and negative effects among those with low prior proficiency. However, Jepsen considered differences in program effectiveness for English proficiency rather than academic outcomes, which could be one explanation for the divergence in results. Furthermore, his measure of prior English proficiency was defined as proficiency in the year prior, while our measure considered initial English proficiency.
Concluding Remarks and Study Limitations
Although this study provides some suggestive evidence about the effects of different EL instructional program models, it has a number of limitations. First, our estimated program effects are not based on a randomized experiment to draw full causal conclusions. Our estimates are interpretable as “effects” of the programs only to the extent that the models include sufficient control variables to render program enrollment ignorably assigned. We are able to include not only a standard set of demographic controls, but controls for initial English proficiency, school fixed effects, and a rich set of parental preference control variables. In addition, our supplemental analyses based on the subsample of students with ECDI scores suggest that our main estimates are not biased by the exclusion of measures of prekindergarten academic skills. These features of the analysis suggest that we might think of our estimates as largely, but not completely, unbiased. They provide a useful piece of evidence on what should surely be a more extensive and ongoing research agenda.
Second, the data we use come from a single school district, one which is somewhat unique in terms of its ethnic and linguistic diversity and its historical commitment to providing multiple different types of EL instructional models. It is not clear whether the patterns we observe here generalize to other settings, particularly given the heterogeneity of the EL population and of the design and delivery of two-language instructional models across the United States. For instance, some bilingual programs begin in kindergarten providing instruction half of the time in each language, whereas others start heavily (about 90% of instructional time) weighted toward instruction in the EL students’ home language (Collier & Thomas, 2004). Our study speaks to the effectiveness of four distinct and very specific program models that primarily serve Latino and Chinese EL students in one large school district.
Third, our interpretation of “program effectiveness” is limited to outcomes measured by tests administered in English. We cannot estimate the effects of the programs on other important outcomes that matter for EL students’ development. For example, we find that the test scores of Chinese ELs in DB programs grow at a rate that is statistically slower than that of their peers in EI classrooms. However, ELs enrolled in bilingual programs for 6 years or more may reap the added benefits of bilingualism and biliteracy, potentially important skills for both personal development and future labor market success (Gándara & Rumberger, 2009). Because we have no measure of home-language proficiency or literacy, we cannot estimate the programs’ effects on these outcomes.
A fourth limitation of the study is that we are blind to differences among the programs in the quality of instruction and classroom environments. Our inclusion of school fixed effects in the models does adjust for differences in classroom and instructional quality across schools, but it does not eliminate any bias due to systematic differences within schools. To the extent that there are systematic differences in classroom quality across programs within schools, or to the extent to which teacher qualifications and skills differ among the programs, we may be capturing differences in teaching quality rather than what the differences in the effectiveness of the four instructional models would be if each were well implemented and staffed.
In sum, the results here suggest, in broad strokes, that there are meaningful differences in the effects of different models of EL instruction. These effects are not simple to characterize, as they vary as children progress through school; they differ for Latino and Chinese EL students; and they differ somewhat between math and reading outcomes. In particular, the findings here suggest that, for Latino students in particular, two-language programs lead to better academic outcomes than EI programs in the long term. Nonetheless, we do not think these findings, by themselves, should lead all districts to exclusively adopt two-language programs. Our estimates are not identified sharply enough; the sample is not generalizable enough; and the mechanisms driving these patterns are not clear enough to warrant strong policy recommendations. Instead, we hope they contribute to a robust, empirically grounded discussion about how best to educate our EL students.
Footnotes
Acknowledgements
The authors acknowledge the substantive contributions made by their district partners to help clean and acquire data and for providing valuable feedback to help interpret research findings. They also thank Kenji Hakuta, Ilana Umansky, Sandy Nader, Camille Whitney, Christopher Candelaria, and Lindsay Fox for their invaluable feedback on earlier versions of this research and article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Grant Award R305A110670 from the Institute for Education Sciences (IES), U.S. Department of Education. Preparation of this manuscript by Rachel A. Valentino was also supported in part by the IES, U.S. Department of Education, through Grant R305B090016 to Stanford University.
Notes
Authors
RACHEL A. VALENTINO is a doctoral candidate in administration and policy analysis at Stanford University. She studies the effects of early childhood education policies and practices on a variety of child outcomes, with a particular focus on the measurement and implications of high-quality instructional approaches for English learners (ELs), and racially underrepresented and socioeconomically disadvantaged children.
SEAN F. REARDON is the endowed professor of poverty and inequality in education and is a professor (by courtesy) of sociology at Stanford University. His research focuses on the causes, patterns, trends, and consequences of social and educational inequality; the effects of educational policy on educational and social inequality; and applied statistical methods for education research.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
