Teacher Effects on Chilean Children’s Achievement Growth: A Cross-Classified Multiple Membership Accelerated Growth Curve Model

Abstract

We investigated teacher effects (magnitude, predictors, and cumulativeness) on primary students’ achievement trajectories in Chile, using multilevel cross-classified (accelerated) growth models (four overlapping cohorts, spanning Grades 3 to 8; n = 19,704 students, and 851 language and 812 mathematics teachers, in 156 schools). It was found that teacher effects on achievement growth are large, exceeding school effects. Also, the contribution of teachers to student achievement growth was found to accumulate over time. The study advances the field by exploring teacher effects in the context of an emerging economy, contributing further evidence on the properties of teacher effects on student achievement growth and demonstrating the combined use of accelerated longitudinal designs, growth curve approaches, and cross-classified and multiple membership models.

Keywords

school/teacher effectiveness teacher characteristics achievement hierarchical linear modeling longitudinal studies

Introduction

A growing number of studies have investigated the effect of teachers on a range of cognitive and noncognitive student outcomes (Blazar & Kraft, 2017; Muijs et al., 2014; Sammons, Davis, & Gray, 2016). In this literature, the term “teacher effect” usually refers to the proportion of variance in student outcomes that is attributable to the assigned teacher. Thus, teacher effects are an estimation of how much the teacher matters in predicting differences in student outcomes compared with other sources of variation (such as, the student and school). Small teacher effects indicate a weak contribution of teachers to variation on student outcomes and can be interpreted as homogeneity among teachers (this could be due to, for example, similarities in terms of their initial training, knowledge and skills, and pedagogical practices, among other factors). Researchers have used a variety of analytic procedures to estimate the size of teacher effects, and these alternative procedures have produced markedly different conclusions. There is growing recognition of the relationship between methodological advances and increased theoretical understanding in educational effectiveness research (EER) and teacher effects research (TER; Creemers, Kyriakides, & Sammons, 2010). This article illustrates how using a combination of statistical modeling approaches provides robust new evidence on the size and importance of teacher effects in Chile.

In Latin America, the study of teacher effects has been hampered by the scarceness of suitable longitudinal data on students’ achievement that can be linked to the teacher(s) that taught them during the period of time under study. Furthermore, very few studies have been designed that permit the estimation of these effects on student achievement growth, that is, that have measured student achievement on at least three occasions during their school trajectories and that provide equated achievement scores. As it has been stressed in previous reviews of the Latin American literature, more longitudinal evidence is needed to understand the effect of schools and teachers on children’s cognitive growth over time in the region (Murillo, 2007).

This study examines teacher effects in Chile, with a focus on their magnitude, predictors, and the extent to which they accumulate over time. The following research questions are addressed:

Research Question 1: How large are the effects of teachers on the achievement of their students and on their changes in achievement over time?

Research Question 2: What teacher characteristics account for the variation in student trajectories over time?

Research Question 3: To what extent do teacher effects accumulate over time?

Conceptual Framework

Measuring Teacher Effects

When studying teacher effects in real life contexts using a nonexperimental research approach, it is important to consider that teachers are neither distributed randomly among schools nor within schools. This selection effect implies that if better-qualified teachers tend to teach in more affluent schools, for example, due to the possibility of experiencing better working conditions, then a simple analysis based on unconditional models and cross-sectional data would yield an overestimation of teacher effects on student achievement.

Because research based on cross-sectional data is not likely to overcome this issue (Clotfelter, Ladd, & Vigdor, 2006; Hanushek, 1997), more recent studies analyze the impact of teachers in promoting student academic achievement using longitudinal data. Under this approach, data that follow students’ achievement over time and identifies the teachers who taught them in each stage are used, allowing researchers to separate and identify the contribution of schools, teachers, and students to student achievement over school years (Rivkin, Hanushek, & Kain, 2005).

When estimating educational effects, two of the most common empirical approaches to value-added estimates of teacher effects on student achievement are the covariate-adjustment model (i.e., models with current scores regressed on prior scores and other control variables, such as student background variables) and the gain scores model (i.e., current year score less prior year score as the dependent variable in the model, with adjustment for background variables; McCaffrey, Lockwood, Koretz, Louis, & Hamilton, 2004). However, these models do not capture the complex structure of relationships between students and teachers, as pupils can be taught by a different teacher each year and the effects of these teachers might accumulate over time. When this situation occurs (i.e., students change teachers during the period of time under study), data on student outcomes do not follow a traditional nested design of hierarchical models (i.e., students’ achievement scores are not perfectly nested within students, who are in turn are not perfectly nested within teachers), and alternative model formulations are necessary. Estimating the proportion of variance in students’ achievement growth rate that lies among teachers, when students change teachers over time, requires the use of cross-classified random effects models (Raudenbush, 1995).

The development of cross-classified multilevel modeling techniques, and its implementation in statistical packages, has allowed the expansion of TER in recent years (Creemers et al., 2010). However, these models are still only rarely used in education research (Beretvas, 2008; Luo & Kwok, 2012), often because of the lack of longitudinal data on students and teachers over several time points.

Although there have been a few applications and methodological studies using schools as clustering units to model longitudinal cross-classified data (e.g., Choi & Wilson, 2016; Goldstein & Sammons, 2006; Grady, 2010; Grady & Beretvas, 2010; Leckie, 2009; Leroux & Beretvas, 2018; Luo & Kwok, 2012), in this article, we focus on research that has modeled longitudinal cross-classified data with teachers as clusters. As shown in Table 1, only four studies (to the best of our knowledge) have applied models that account for the crossed grouping factors in longitudinal data with teachers as clusters (i.e., the student classification crossed with the teacher classification).

Table 1

Synthesis of Studies Estimating Teacher Effects on Student Achievement Growth Using Cross-Classified Random Effects Models

Study	Country	Number of cohorts	School level	Grades	Number of schools	Number of teachers	Number of students	Number of occasions
Raudenbush (1993)	United States	1	Primary school	Grades 1 to 4	—	1,553	3,250	4
Rowan, Correnti, and Miller (2002)	United States	2	Primary school	Cohort 1: Grades 1 to 3 Cohort 2: Grades 3 to 6	138–166	1,378–2,033	5,454–6,153	3
Kyriakides and Creemers (2008)	Cyprus	1	Primary school	Grades 1 to 4	28	61	1,681	5
Palardy (2010)	United States	1	Preprimary and primary school	Beginning and end of kindergarten and Grade 1	Not specified	1,553	3,250	4

The first application of the cross-classified model to estimate teacher effects from repeated measurements of student achievement (where students encounter multiple teachers over time) was proposed by Raudenbush (1993). This model specification of teacher effects consisted of two levels but was later extended by Rowan, Correnti, and Miller (2002) to incorporate schools as a third level.

Much research on teacher effects has been conducted in the United States, linked to the development of complex teacher accountability systems in some states that with the often-elusive aim of reliably estimating individual teachers’ value-added scores. These studies have advanced the field by implementing more sophisticated statistical models, although their use for accountability purposes has been criticized (Kupermintz, 2003; Papay, 2011; Rothstein, 2009).

The Magnitude of Teacher Effects

Under the covariate-adjustment model approach, it has been found that 4% to 16% of the variance in students’ adjusted reading achievement, and 8% to 18% of the variance in adjusted mathematics achievement, lies among classrooms (depending on the grade at which the analyses are conducted; Rowan et al., 2002). Using the same approach, studies in the Latin American context have estimated the percentage of variance in student achievement at the classroom/teacher level to be slightly larger: around 11% in language and 22% in mathematics (Murillo, 2007; United Nations Educational, Scientific and Cultural Organization [UNESCO], 2015a). The larger teacher effects in the region can be interpreted as an indication of stronger variation across teachers in their ability to promote student achievement. These differences may reflect systematic differences on teachers’ individual characteristics (e.g., demographic characteristics, teaching experience, subject specialization, etc.) and/or on the teaching processes implemented by the teachers in Latin American countries, which often feature more segregated and stratified educational systems than those in postindustrialized economies. The magnitude of teacher effects, using multilevel gain scores models, has been found to be between 7% and 21% of the variance in achievement gains, depending on the subject assessed and the grade level of the students (Nye, Konstantopoulos, & Hedges, 2004).

Cross-classified random effects models produce very different estimates of the overall magnitude of teacher effects than do simple covariate adjustment and gain scores models. Raudenbush (1993) found that the classroom contribution, of which teacher effects are a dominant part (Wright, Horn, & Sanders, 1997), is estimated to be about 47% of the individual component of the variance of increments to learning in mathematics per year¹ (although part of this variance may be due to school to school differences, not specified in their model). Other TER studies using growth models have also concluded that teachers vary substantially in their effects on individual student learning growth. For example, Rowan et al. (2002) reported that the classrooms to which students were assigned in a given year accounted for roughly 60% to 61% of the variance in students’ rates of academic growth in reading achievement, and 52% to 72% of the variance in students’ rates of academic growth in mathematics achievement, in primary school.² Using an equivalent linear crossed random effects growth model, Palardy (2010) found that the percentage of reading achievement growth between classrooms within schools for kindergarten and first grade was 70%.³ These estimates are 2 to 10 times the magnitude established in the literature using covariate adjustment and gain scores models. The authors attribute the larger estimates to better measurement properties of growth curve models.

Also, numerous international studies have suggested greater between-classroom than between-school variance in student achievement and larger teacher effects in primary than in secondary schools for both language and mathematics (e.g., Hill & Rowe, 1996; Luyten, 2003; Muijs et al., 2014; Reynolds et al., 2014).

The Predictors of Teacher Effects

TER has demonstrated that classroom-level variance can be predicted, to a great extent, by variation in teachers’ practice, based on indicators of quality of teaching obtained from either student ratings or systematic observations by trained researchers (Carnoy, 2007; Muijs et al., 2014; Pianta, Hamre, & Mintz, 2011). However, in this section the effects of other types of teacher characteristics are discussed, as they are the focus of subsequent analyses.

Several studies have investigated the impact of teacher characteristics, such as teaching experience, qualifications, certification, and knowledge, as well as the effect of their initial training and working conditions, providing rather mixed evidence (Darling-Hammond, 2000; Goldhaber & Brewer, 2000; Hanushek, 1997; McCaffrey, Lockwood, Koretz, & Hamilton, 2003; Wayne & Youngs, 2003). For example, Hanushek, Kain, O’Brien, and Rivkin (2005) used panel data of students and teachers in Texas to estimate the variation in teacher effects through a value-added approach based on student progress. The authors found a positive relationship between teacher effects and teacher characteristics such as certification, qualifications, and teaching experience (with higher gains during the first years of teaching). Similar results were obtained by Clotfelter, Ladd, and Vigdor (2007) and Goldhaber and Brewer (2000), in terms of the relevance of teacher credentials, certification processes and outcomes as predictors of student achievement. Based on data from the state of Kentucky, Kukla-Acevedo, Streams, and Toma (2009) found other significant predictors of student achievement, such as teacher’s prior achievement, and identified interactions between teacher-level variables, such as experience, and student-level variables, such as socioeconomic status (SES) and ethnicity.

Rockoff (2004) used matched student–teacher data from the state of New Jersey, where both student achievement and teacher data were collected in multiple years. A random-effects meta-analysis approach was adopted to measure the variance of teacher fixed effects while taking explicit account of estimation error and revealed that teaching experience significantly raised student test scores, particularly in reading. Finally, the study by Muñoz and Chang (2007) in the state of Kentucky used a multilevel growth curve model and found that teacher experience, education, and race did not predict high school reading achievement growth.

Overall, research suggests that teacher characteristics have significant but small effects on student achievement (Hanushek & Rivkin, 2010). For example, a review carried out by Greenwald, Hedges, and Laine (1996) found that for teacher test scores, the average effect size was d = 0.12, and for years of experience and postgraduate studies the average effect sizes were less than d = 0.05. Hattie’s (2009) review, in turn, found that the overall effect sizes for teacher training and for teacher subject matter knowledge were d = 0.11 and d = 0.09, respectively.

Teacher Cumulative Effects

Although extensive research has been conducted on the issue of how teachers impact students’ academic outcomes, less attention has been paid to the continuity of teacher effects measured at different stages of a student’s school career. This gap in TER can be attributed to how demanding investigating teacher cumulative effects can be, as it requires analyzing high-quality longitudinal student data linked to teachers and applying advanced statistical models. Indeed, apart from the research carried out in the United States noted above, studies on the measurement of teacher effects using value-added models that allow the analysis of cumulative effects are scarce, due to the lack of annually administered standardized tests linked to teacher information.

Raudenbush and Bryk (2002) used a cross-classified model and incorporated a multiple membership component to estimate teachers’ cumulative effects over time. This model fitted their data significantly better than a cross-classified model, which did not incorporate the multiple membership of students to teachers. Using a similar method, Kyriakides and Creemers (2008) investigated the long-term effect of schools and teachers in mathematics using longitudinal data from Cypriot students during their first 4 years of primary school. They concluded that, in conventional approaches used in EER, the short-term effects of teachers and student background factors are overestimated and the long-term effects of both teachers and schools are underestimated.

Other studies that have examined the cumulative effects of teachers over time have found significant effects of varying sizes between earlier teachers and subsequent students’ academic success (Antoniou, 2012; Hill & Rowe, 1998; Pustjens, Van de gaer, Van Damme, Onghena, & Van Landeghem, 2007; Rivkin et al., 2005; Rowan et al., 2002; Thum, 2003). In this line of research, Tymms, Merrell, and Henderson (2000) showed that effective classroom experiences in the first years of schooling continued to have a positive influence on students 2 years later. However, the evidence available is inconsistent with regard to the magnitude of the long-term effects of teachers, and more research is needed in this area.

The following conclusions, relevant to the present study, emerge from the literature: (a) Teacher effects tend to be larger when achievement growth over time, rather than achievement status, is studied; (b) teacher effects exceed school effects, in terms of magnitude; (c) teacher characteristics tend to have a significant, although small, effect on student achievement gains; and (d) the effects of teachers seem to accumulate over time. In the following section, the current state of TER in Chile is discussed, and relevant knowledge gaps are identified.

TER in Chile

The aim of this study is to analyze teacher effects on student achievement trajectories in Chile. Despite being among the highest-performing Latin American countries in international assessments, Chile is also one of the systems with the highest within-country variability in outcomes in the region (UNESCO, 2015b). The strength of the relationship between student performance and SES in the country is above the Organisation for Economic Co-Operation and Development (OECD) average and one of the strongest in Latin America (OECD, 2013; UNESCO, 2015a). In this context, where social background is a strong predictor of students’ school destination and achievement status, it is relevant to investigate to what extent can schools and teachers ameliorate or exacerbate existing inequalities by affecting students’ achievement growth. This situation warrants an in-depth investigation of the magnitude and sources of variation in performance in the country.

With regard to the teaching force in Chile, an important source of variation is their initial training. Teacher education in Chile takes place within a decentralized unregulated and highly privatized tertiary education system (Brunner & Uribe, 2007; Matear, 2007). Consequently, the Ministry of Education has little control over the curriculum and arrangements of the programs offered by teacher training institutions, which vary significantly in terms of their duration, content focus and subject specialization (Avalos & Matus, 2010). To be appointed in a teaching position, the only requirement is to hold a teaching qualification from a university or a professional institute. No other type of certification or registration is required.

Chilean primary teachers at the end of their teacher education showed variable and overall poor results in the mathematics and pedagogy content knowledge Teacher Education and Development Study in Mathematics (TEDS-M) tests, where the country ranked 15th out of the 16 participant countries (Tatto et al., 2012). The national diagnostic assessment for recently graduated teachers “Inicia” has also revealed large variation in subject and pedagogical knowledge across training institutions (Ministerio de Educación [MINEDUC], 2015). Furthermore, previous research has shown that initial teacher training (ITT) programs in the country differ significantly in terms of their effectiveness in promoting preservice teachers’ pedagogical and subject knowledge, after controlling for student intake (Manzi, Lacerna, Meckes, Ramos, & Ortega, 2012).

Few studies have explored other teacher characteristics associated with student achievement. In these studies, teacher gender, certification, years of teaching experience, and being trained in programs with subject specialization and strong practicum components are factors that have been found to correlate with student outcomes (Lara, Mizala, & Repetto, 2010; Ortúzar, Flores, Milesi, & Cox, 2009; Velez, Schiefelbein, & Valenzuela, 1993).

Research on the impact of teachers on student achievement in Chile is scarce in part due to the difficulty of linking student and teacher data and the lack of longitudinal data suitable for applying teacher value-added models. Indeed, most of the existing TER in Chile has used cross-sectional data (i.e., Alvarado, Cabezas, Falck, & Ortega, 2012; Lara et al., 2010; León, Manzi, & Paredes, 2009; Ortúzar et al., 2009; Ramírez, 2006; Willms & Somer, 2001).

Previous research on teacher effects in Chile has also been restricted by technical difficulties in modeling relationships between students and their successive classroom settings. Most of these studies have not been able to disentangle the teacher contribution from that of the school, nor have they used value-added approaches to estimate teacher effects and, therefore, their results are likely to be biased (McCaffrey et al., 2003).

Method

Data

Several data sets were linked to form a unique database of student, teacher, and school records. The data used in these analyses derive from the Sistema de Evaluación de Progreso del Aprendizaje (SEPA)⁴, developed by the MIDE UC Assessment Center of the Pontificia Universidad Católica, as well as from the Sistema de Medición de la Calidad de la Educación (SIMCE)⁵, the Student Enrolment Recording System (SERS), the Sistema de Información General de Estudiantes (SIGE)⁶, and the Teacher Census, maintained by the Chilean Ministry of Education.

An important match was that between each student’s SEPA test score in a specific subject in a given year, and the teacher who taught that particular subject to that student that year. This link was allowed by the grade level and class group identification data available in both, SERS and SIGE data sets.

Measures

The dependent variables of the study are the language and mathematics test scores obtained from the SEPA project. Both, the language and mathematics tests, consist of 35 multiple-choice items in Grade 3, 40 in Grades 4 to 7, and 50 in Grade 8. For each year and grade level considered, the language and mathematics achievement scales present satisfactory estimates of internal consistency (Cronbach’s α > .85). Scores were vertically and horizontally equated using Item Response Theory (IRT), which makes scores comparable across both, grade levels and cohorts.

The student-level control variables introduced into the models are described below. Female is a dichotomous variable that distinguishes boys (0) from girls (1). Age refers to student age, calculated in years and months, as in December of 2010 and cohort-mean centered. SES is a family SES index obtained from a factor analysis of mother’s education, father’s education, and family monthly income, and standardized to have a mean of zero and a standard deviation of unity. This index shows high internal consistency (Cronbach’s α = .88). Finally, Number of books at home (books), a proxy variable for cultural capital and the value of scholarly culture, was reported by parents and categorized in five values (1 = none, 2 = less than 10 books, 3 = between 10 and 50 books, 4 = between 51 and 100 books, and 5 = more than 100 books).

School-level predictors were included to depict composition effects. The school-level variables used were Achievement Mean, indicating school mean score on the SIMCE Assessment System test for the relevant subject, Achievement SD, referring to the within-school standard deviation in SIMCE test scores for the relevant subject, a measure of diversity in the levels of achievement of the student body, and School SES, a composite indicator created and calculated by the Chilean Ministry of Education.⁷

The teacher-level variables used are Female Teacher, a dichotomous variable that distinguishes male teachers (0) from female teachers (1), ITT Duration, indicating the duration of the teacher’s ITT program in semesters, Experience, denoting the number of years that the educator has been teaching in any school and, finally, Major, a dichotomous variable that distinguishes referring to whether the teacher has undertaken specialized training in the subject in the subject assessed (1) or not (0).

The student-, teacher-, and school-level variables were treated as time-invariant covariates. Descriptive statistics are presented in Appendix 1 (available in the online version of the journal). The table shows that the sample of the study is diverse, including students living in urban and rural areas, attending public and private schools, and from a wide range of socioeconomic backgrounds. However, there are some differences between the study’s sample and the population, which are likely to be an artifact of both the way in which the SEPA project operates and Chile’s highly socially stratified education system.

The SEPA project, the main source of data for this study, is a low-stakes assessment initiative designed to inform individual schools about their students’ overall progress in comparison to that of students in other schools in the system. Thus, SEPA is not a school census nor is it a survey of randomly sampled schools. Instead, individual schools, or municipalities that administrate groups of public schools, voluntarily decide to participate in the project and have to pay for this service.

This self-selection process may introduce bias and an equalizing force across the sample, in the sense that those school and municipality administrators who are more confident about their schools’ academic performance, have more sophisticated assessment practices in place, are less averse to external assessment and more motivated about improving their students’ academic performance, and can fund the implementation of this assessment, are more likely to participate in the project. Students who come from higher SES backgrounds and whose families show higher levels of cultural capital are, in turn, more likely to attend those schools.

Missing Data

The largest proportions of missing data were found in the student-level variables retrieved from the SIMCE Assessment System. Family income (19%), mother’s educational level (19%), and father’s educational level (22%), the three variables used for creating a student SES indicator, had missing data, as it did the variable number of books at home (19%). The school-level variables, in turn, presented negligible proportions of missing data (below 1%).

Also, due to student and school attrition, as well as to the incorporation of new students and schools into the project each year, scores had a considerable proportion of missing values at each time point. For language test scores, the percentage of missing data was 31%, 44%, and 45% in 2010, 2011, and 2012, respectively. Similarly, for mathematics test scores, the percentage of missing data was 33%, 42%, and 42% in 2010, 2011, and 2012, respectively.

From the analysis of missing data mechanisms, it was concluded that data were at least missing at random (MAR; Little & Rubin, 2002). Thus, the results presented in this article were obtained after performing Bayesian multiple imputation (MI) via Mplus (Muthén & Muthén, 2010). Based on recommendations by Rubin (1987) five imputed data sets were generated. The missing data were imputed from an unrestricted two-level model and the hierarchical structure was accommodated by means of imputing data with test scores in wide format, students as Level 1, and schools as Level 2. The language and mathematics databases were linked together so the imputation of data in the language data set would benefit from information on mathematics test scores as auxiliary variables, and vice versa. Finally, all the results obtained from the five multiply imputed data sets were combined using Rubin’s (1987) rules.

It was decided to perform MI on all the variables in the models, including the dependent variables (i.e., language and mathematics test scores). The decision was based on well-established missing data treatment evidence that indicates that (a) MI is an appropriate method under general MAR conditions that, compared with listwise deletion (LW; that is, complete-case analysis), makes better use of the observed information, increases robustness to nonignorable missingness and improves estimation precision (Schafer & Graham, 2002); (b) the dependent variable should be included in the models used to impute independent variables, otherwise it would be tacitly assumed in imputation that there is no relationship between the independent and dependent variables and, when the imputed data are analyzed, the estimated slope of the dependent variable on the independent variable would be biased toward zero (Allison, 2000; Von Hippel, 2007); and (c) imputing outcome data are common practice and leads to correct inference when performed using MI (Groenwold, Donders, Roes, Harrell, & Moons, 2012; Little, 1992; Sullivan, Salter, Ryan, & Lee, 2015).

Simulation studies have shown that, as long as the outcome is included in the imputation model, there are very small performance differences between the possible MI approaches: no outcome imputation, imputation, or imputation and deletion (Kontopantelis, White, & Sperrin, 2017). Still, to check the reliability of the findings obtained using MI, the analyses were also run using two alternative approaches for dealing with missing data: (a) LW and (b) multiple imputation, then deletion (MID), an approach introduced by Von Hippel (2007), where all cases are used for imputation but, following imputation, imputed values on the dependent variables are excluded from the analysis (i.e., the dependent variables are used in the imputation model but kept as missing in the analyses). The results obtained for Model 1 under LW and MID are presented in Appendices 2 and 3 (available in the online version of the journal), respectively. The three approaches produce very similar estimates. The direction, magnitude, and significance of the fixed effects are generally consistent across the different approaches, and, as shown by the overlap of the credible intervals for the variance components, the student, teacher, and school variances, which are used to calculate teacher effects, do not differ significantly from those obtained for Model 1 when using MI for all the variables (see Table 4). The only exception is the variance for linear growth in language achievement in the student classification, which is significantly larger under the MI approach and leads to the estimation of somewhat smaller teacher effects on student achievement growth (53.4%), when compared with LW and MID (60.4% and 63.5%, respectively).

Sample

The sample consists of students in Grades 3 to 8, who took the SEPA language or mathematics tests. Analyses were carried out considering only those schools with 22 or more students and those teachers with at least five students in each of the 3 years assessed.⁸ After performing MI, the language and mathematics samples were balanced, that is, the number of time-point observations is three for each of the students. The sample for both academic subjects comprises 59,112 measurement occasions nested in 19,704 students, nested, in turn, in 156 schools. Only the number of teachers is different between the language (n = 851) and the mathematics (n = 812) samples.

Accelerated Longitudinal Data

As mentioned above, the data included participants belonging to four different student cohorts, each followed over 3 years. The grade levels in which these cohorts are located each year are presented in Table 2, where cohorts are identified by Roman numerals. Cohort 1 comprises students who were third graders in 2010, Cohort 2 those who were fourth graders in 2010, and so on. The last cohort (Cohort 4) covers students who enter the estimation sample as sixth graders in 2010.

Table 2

Descriptive Statistics of Language and Mathematics Achievement by Grade Level and Cohort

	Grade level
	3	4	5	6	7	8
Cohort 1
n	5,782	5,782	5,782	—	—	—
M language (SD)	350.601 (21.342)	377.004 (21.648)	387.978 (19.901)	—	—	—
M mathematics (SD)	346.168 (20.383)	368.356 (20.948)	374.612 (20.147)	—	—	—
Cohort 2
n	—	5,108	5,108	5,108	—	—
M language (SD)	—	373.423 (21.159)	393.327 (20.769)	403.865 (19.122)	—	—
M mathematics (SD)	—	366.066 (20.722)	380.887 (22.146)	397.011 (20.033)	—	—
Cohort 3
n	—	—	4,269	4,269	4,269	—
M language (SD)	—	—	387.064 (21.585)	406.427 (20.705)	416.093 (19.366)	—
M mathematics (SD)	—	—	378.354 (20.820)	401.178 (22.206)	419.984 (19.748)	—
Cohort 4
n	—	—	—	4,545	4,545	4,545
M language (SD)	—	—	—	403.841 (20.996)	423.353 (22.430)	440.272 (19.430)
M mathematics (SD)	—	—	—	399.711 (21.284)	424.657 (21.908)	446.473 (20.449)

Note. Dashes indicate that data are not available in that grade for that cohort.

As shown in Table 2, created from the pooled imputed data, the sample sizes vary by cohort, ranging from n = 4,269 for Cohort 3 to n = 5,782 for Cohort 1. The data resembles a 3-year accelerated longitudinal design with four overlapping cohorts, permitting the study of Grades 3 to 8. Descriptive statistics of SEPA language and mathematics scores are summarized by cohort and grade level.

Cross-Classified Data

The structure of longitudinal educational data is such that a lower level unit (i.e., a measurement occasion) is perfectly nested in one (or more) higher level unit (i.e., a student and a school). However, when persons cross contextual boundaries during the study, the data no longer have a perfectly nested structure. Rather, the structure involves cross-classification of persons by social setting as it occurs in attempts to study the effects of teachers on children’s cognitive growth across years (Raudenbush & Bryk, 2002). In real life contexts, students can often change teachers during their school career. In this situation, where lower level units (i.e., an occasion) belong to different higher level units (i.e., students and teachers) at the same time, data are cross-classified. Cross-classified random effects models were developed for analyzing data with such structure (Goldstein, 1987; Raudenbush, 1993).

The longitudinal data in the sample are partially crossed because most students changed teachers at least once during the 3 years under study (87.45% in language and 88.34% in mathematics) but not all students do so. Table 3 shows how many students were taught by one, two, and three teachers over the 3 years studied, A indicating the first teacher, B the second, and C the third. Five logical patterns were possible, and the data in this study presented four of these.

Table 3

Sample Size by Number of Teachers Associated to Student

	Language	Mathematics
1 Teacher (AAA)	2,472 (12.55%)	2,297 (11.66%)
2 Teachers (AAB)	4,754 (24.13%)	4,828 (24.50%)
2 Teachers (ABB)	4,363 (22.14%)	4,570 (23.19%)
2 Teachers (ABA)	0 (0.00%)	0 (0.00%)
3 Teachers (ABC)	8,115 (41.18%)	8,009 (40.65%)
Total	19,704 (100.00%)	19,704 (100.00%)

Not all occasions from the same student are necessarily linked to the same teacher, nor do all occasions from the same teacher belong to the same student. Thus, the data have a two-way cross-classified nonhierarchical structure; the student and teacher hierarchies are crossed with one another. Also, there are two classifications in Level 2 because students are not nested within teachers or teachers within students; thus, the assumptions of a pure multilevel structure do not hold.

Multiple Membership Data

A further complication in using cross-classified models occurs when confronted with student mobility, if students change teachers, then more than one teacher is related to their school performance at a given time point. This situation represents another type of imperfect clustering that requires an extension of the conventional multilevel model. Because some students are members of multiple teachers, the data are said to have a multiple membership structure, that is, a situation where lower level units (i.e., students’ scores) belong to more than one higher level unit of a population of interest (i.e., teachers). As shown above, 88% of students were taught language and mathematics by more than one teacher during the period studied.

In multiple membership data structures, the degree to which each lower level unit belongs to each higher level unit will often vary across those higher level units. When fitting multiple membership models, multiple membership weights should be applied to quantify this phenomenon (Leckie, 2013). In our study, students may spend more time with some teachers than others. Thus, multiple membership weights are defined as the proportion of time spent with each teacher. For example, if a student is taught for 1 year by teacher A and then for 2 years by teacher B, multiple membership weights would be assigned proportionally: 0.33 and 0.66 for teachers A and B, respectively. These weights reflect the assumption that we might expect teacher B to be more influential in determining the student’s outcome than teacher A.

Researchers typically use one of two following procedures for handling complex multilevel data structures such as cross-classified and multiple membership data (Beretvas, 2011): As a first strategy, researchers might delete from analysis the sets of units that prevent the data from being a pure hierarchy (i.e., deleting mobile students from the data sets being analyzed) and, as an alternative strategy, researchers could ignore one of the cross-classified factors or all but one higher level unit associated with multiple member units.

However, deleting cases reduces power and can affect generalizability and validity of inferences (Meyers & Beretvas, 2006). Also, ignoring one of the classification factors or all but one higher level unit associated with multiple member units can lead to inaccurate variance component estimation (Fielding & Goldstein, 2006; Rasbash & Browne, 2001). Indeed, fitting the nearest equivalent hierarchical model to cross-classified data will misattribute response variation to the included levels (Moerbeek, 2004; van den Noortgate, Opdenakker, & Onghena, 2005). This, in turn, may lead to misleading findings about the relative importance of different sources of influence on the outcome measure. Similarly, if we were to assign students to the first teacher that teaches them and then fit a students-within-teachers model of student achievement, this will likely underestimate the importance of teachers and overestimate the importance of students as sources of variation in student achievement.

Thus, deleting cases and ignoring complex structures compromises the validity of inferences. Instead, it is important that both the cross-classified and multiple membership nature of the data are modeled, when present, using combined cross-classified and multiple membership random effects models, as it will be illustrated below.

Models

Models in this study were implemented to accommodate the multiple measures of the same student (via growth curve models), and multiple cohorts of students (using accelerated longitudinal models). Thus, the proportion of variance of primary student achievement growth at the teacher level is estimated using accelerated growth curve models. However, as some students change teachers across years, student outcomes do not follow the traditional nested design of hierarchical models and an alternative specification (i.e., cross-classified random effects) was adopted in all the four models reported.⁹

Model 0 is a baseline three-level growth model (measurement occasions nested within students, nested in turn within schools) against which subsequent models are compared. Model 0, as well as the subsequent models, incorporates cohort effects as it was tested in previous analyses, that it is not possible to assume that the four cohorts studied follow a common developmental trajectory (Ortega, 2016; Ortega, Malmberg, & Sammons, 2018). Model 1 is a two-way cross-classified model in which student and teacher hierarchies are crossed with one another and nested within schools. We use this model to analyze the magnitude of teacher effects. Model 2 includes the effect of teacher-level predictors on student achievement status and growth. Finally, Model 3 is a cross-classified multiple membership model that assumes that the effects of teachers from previous years are carried forward to the following years. Throughout the article, multiple subscript notation, introduced by Rasbash and Browne (2001), is adopted as it facilitates the description of multilevel models with combinations of hierarchical, crossed, and multiple membership structures.

Model 0

In Equation 1, Model 0 is shown. This is a between-school model with student- and school-level characteristics. The importance of controlling for student background and school characteristics in statistical models, before comparisons across schools and teachers can be made, is well-established in the EER field (Sammons & Luyten, 2009). Independent student and school variables (i.e., Female, Age, SES, Number of books at home, Achievement Mean, Achievement SD, and School SES) are introduced in the model as both fixed effects and in interaction with the time variable. At the first level of the model (t), each person’s observed development is conceived as a quadratic function of grade level plus random error.¹⁰ At the second level of the model (i), the individual intercept and linear growth rate coefficients are assumed to vary as a function of cohort plus person-specific random effects. Thus, separate mean trajectories are estimated for each cohort. The specification is

Y_{t i j} = β_{0 i j} + β_{1 i j} t_{t i j} + β_{2} t_{t i j}^{2} + e_{t i j},

\begin{array}{l} β_{0 i j} = β_{00 j} + β_{010} Cohort 2_{i j} + β_{020} Cohort 3_{i j} \\ + β_{030} Cohort 4_{i j} + β_{040} {Female}_{i j} + β_{050} {SES}_{i j} \\ + β_{060} {Books}_{i j} + β_{070} {Age}_{i j} + u_{0 i j}, \end{array}

\begin{array}{l} β_{1 i j} = β_{10 j} + β_{110} Cohort 2_{i j} + β_{120} Cohort 3_{i j} \\ + β_{130} Cohort 4_{i j} + β_{140} {Female}_{i j} + β_{150} {SES}_{i j} \\ + β_{160} {Books}_{i j} + β_{170} {Age}_{i j} + u_{1 i j}, \end{array}

\begin{array}{l} β_{00 j} = β_{000} + β_{200} Achievement {Mean}_{j} \\ + β_{300} Achievement S D_{j} + β_{400} School {SES}_{j} + u_{00 j}, \end{array}

\begin{array}{l} β_{10 j} = β_{100} + β_{500} Achievement {Mean}_{j} \\ + β_{600} Achievement S D_{j} + β_{700} School {SES}_{j} + u_{01 j}, \end{array}

(\begin{matrix} u_{0 i j} \\ u_{1 i j} \end{matrix}) ~ MVN [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{u 0}^{2} \\ σ_{u 0 u 1} & σ_{u 1}^{2} \end{matrix})],

e_{t i j} ~ N (0, σ_{e}^{2}),

where $Y_{t i j}$ is the language/mathematics test score at occasion t of individual i in school j. At the first level, student achievement is described by a linear and a quadratic function of time and the second and third levels describe the variability in the individual and school growth curves. The student specific residuals ( $u_{0 i j}$ , $u_{1 i j}$ ) are assumed to be generated by a bivariate normal distribution with average zero, variances $σ_{u 0}^{2}$ and $σ_{u 1}^{2}$ and covariance $σ_{u 0 u 1}$ . The school residuals ( $u_{00 j}$ , $u_{01 j}$ ) are also assumed to be generated by a bivariate normal distribution with average zero, variance $σ_{u 00}^{2}$ , $σ_{u 01}^{2}$ and covariance $σ_{u 00 u 01}$ . The error term at Level 1 ( $e_{t i}$ ) is the error at time t for the ith individual. These within person residuals are assumed to be mutually independent and normally distributed with mean zero and constant variance $σ_{e}^{2}$ .

The parameter $β_{0 i}$ denotes the achievement score of student i at time t = 0. In this time-structured design, “grade level” is the time metric chosen. The data are ordered by data collection occasion, with difference in age at the first measurement occasion introduced as a student-level predictor. In the models, the time variable “grade level” is scaled with the midpoint of the three grades observed for each cohort coded as zero, that is, with the intercept as achievement status at the second measurement occasion.

Model 1

Crossed-random effects are introduced in Model 1 to investigate the magnitude of teacher effects. Equation 2 below denotes this model:

Y_{t (i_{1}, i_{2}) j} = β_{0 (i_{1}, i_{2}) j} + β_{1 (i_{1}, i_{2}) j} t_{t (i_{1}, i_{2}) j} + β_{2} t_{t (i_{1}, i_{2}) j}^{2} + e_{t (i_{1}, i_{2}) j},

\begin{matrix} β_{0 (i_{1}, i_{2}) j} = β_{000 j} + β_{0010} Cohort 2_{i_{1} j} + β_{0020} Cohort 3_{i_{1} j} \\ + β_{0030} Cohort 4_{i_{1} j} + β_{0040} {Female}_{i_{1} j} \\ + β_{0050} {Age}_{i_{1} j} + β_{0060} {SES}_{i_{1} j} + β_{0070} {Books}_{i_{1} j} \\ + u_{0 i_{1} j} + u_{00 i_{2} j}, \end{matrix}

\begin{matrix} β_{1 (i_{1}, i_{2}) j} = β_{010 j} + β_{0110} Cohort 2_{i_{1} j} + β_{0120} Cohort 3_{i_{1} j} \\ + β_{0130} Cohort 4_{i_{1} j} + β_{0140} {Female}_{i_{1} j} \\ + β_{0150} {Age}_{i_{1} j} + β_{0160} {SES}_{i_{1} j} + β_{0170} {Books}_{i_{1} j} \\ + u_{1 i_{1} j} + u_{10 i_{2} j}, \end{matrix}

\begin{array}{l} β_{000 j} = β_{0000} + β_{0100} Achievement {Mean}_{j} \\ + β_{0200} Achievement S D_{j} \\ + β_{0300} School {SES}_{j} + u_{000 j}, \end{array}

\begin{array}{l} β_{010 j} = β_{1000} + β_{1100} Achievement {Mean}_{j} \\ + β_{1200} Achievement S D_{j} \\ + β_{1300} School {SES}_{j} + u_{100 j}, \end{array}

(\begin{matrix} u_{0 i_{1} j} \\ u_{1 i_{1} j} \end{matrix}) ~ MVN [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{u 0}^{2} \\ σ_{u 0 u 1} & σ_{u 1}^{2} \end{matrix})],

(\begin{matrix} u_{00 i_{2} j} \\ u_{10 i_{2} j} \end{matrix}) ~ MVN [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{u 00}^{2} \\ σ_{u 00 u 10} & σ_{u 10}^{2} \end{matrix})],

(\begin{matrix} u_{000 j} \\ u_{100 j} \end{matrix}) ~ MVN [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{u 000}^{2} \\ σ_{u 000 u 100} & σ_{u 100}^{2} \end{matrix})],

e_{t (i_{1}, i_{2}) j} ~ N (0, σ_{e}^{2}) .

The number of letters in the subscript identifies the number of classifications (here, there are four: occasion, student, teacher, and school). Subscripts with the same common letter (here, $i_{1}$ and $i_{2}$ ) appearing in parenthesis, separated by a comma, to identify cross-classified factors at the same level. The student, $i_{1}$ , appears before the teacher identifier, $i_{2}$ . $Y_{t (i_{1}, i_{2}) j}$ represents the score at time $t$ of student $i_{1}$ , with teacher $i_{2}$ , from school $j$ . Covariances among residuals in different classifications are assumed to be zero.

The variance partition coefficient (VPC) can be used as a measure of overall magnitude of school and teacher effects when dealing with complex random effect structures. In this study, teacher effects are calculated as the percentage of variation that lies between teachers for both the initial status and growth, as recommended by Palardy (2010). Thus, for Equations 2 to 4 (i.e., Models 1 to 3), the magnitude of teacher effects on student achievement status is defined as

Teacher V P C o n I n i t i a l S t a t u s = \frac{σ_{u 00}^{2}}{σ_{u 000}^{2} + σ_{u 00}^{2} + σ_{u 0}^{2}},

and the magnitude of teacher effects on student achievement growth is defined as

Teacher VPC on Growth = \frac{σ_{u 01}^{2}}{σ_{u 001}^{2} + σ_{u 01}^{2} + σ_{u 1}^{2}} .

This elicits the simultaneous comparison between teacher-to-teacher differences in achievement level and teacher-to-teacher differences in growth.¹¹

Model 2

In Model 2, shown in Equation 3, teacher-level variables (i.e., teachers’ years of experience, gender, subject specialization, and ITT duration) are introduced both as main effects and in interaction with the time variable. The random part of the model remains as in Model 1:

Y_{t (i_{1}, i_{2}) j} = β_{0 (i_{1}, i_{2}) j} + β_{1 (i_{1}, i_{2}) j} t_{t (i_{1}, i_{2}) j} + β_{2} t_{t (i_{1}, i_{2}) j}^{2} + e_{t (i_{1}, i_{2}) j},

\begin{matrix} β_{0 (i_{1}, i_{2}) j} = β_{000 j} + β_{0010} Cohort 2_{i_{1} j} + β_{0020} Cohort 3_{i_{1} j} \\ + β_{0030} Cohort 4_{i_{1} j} + β_{0040} {Female}_{i_{1} j} + β_{0050} {Age}_{i_{1} j} \\ + β_{0060} {SES}_{i_{1} j} + β_{0070} {Books}_{i_{1} j} \\ + β_{0080} Female {Teacher}_{i_{2} j} \\ + β_{0090} ITT {Duration}_{i_{2} j} + β_{0100} {Major}_{i_{2} j} \\ + β_{0110} {Experience}_{i_{2} j} + u_{0 i_{1} j} + u_{00 i_{2} j}, \end{matrix}

\begin{matrix} β_{1 (i_{1}, i_{2}) j} = β_{010 j} + β_{0120} Cohort 2_{i_{1} j} + β_{0130} Cohort 3_{i_{1} j} \\ + β_{0140} Cohort 4_{i_{1} j} + β_{0150} {Female}_{i_{1} j} + β_{0160} {Age}_{i_{1} j} \\ + β_{0170} {SES}_{i_{1} j} + β_{0180} {Books}_{i_{1} j} \\ + β_{0190} Female {Teacher}_{i_{2} j} \\ + β_{0200} ITT {Duration}_{i_{2} j} + β_{0210} {Major}_{i_{2} j} \\ + β_{0220} {Experience}_{i_{2} j} + u_{1 i_{1} j} + u_{10 i_{2} j}, \end{matrix}

\begin{array}{l} β_{000 j} = β_{0000} + β_{1000} Achievement {Mean}_{j} \\ + β_{1100} Achievement S D_{j} \\ + β_{1200} School {SES}_{j} + u_{000 j}, \end{array}

\begin{array}{l} β_{010 j} = β_{1300} + β_{1400} Achievement {Mean}_{j} \\ + β_{1500} Achievement S D_{j} \\ + β_{1600} School {SES}_{j} + u_{100 j}, \end{array}

(\begin{matrix} u_{0 i_{1} j} \\ u_{1 i_{1} j} \end{matrix}) ~ MVN [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{u 0}^{2} \\ σ_{u 0 u 1} & σ_{u 1}^{2} \end{matrix})],

(\begin{matrix} u_{00 i_{2} j} \\ u_{10 i_{2} j} \end{matrix}) ~ MVN [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{u 00}^{2} \\ σ_{u 00 u 10} & σ_{u 10}^{2} \end{matrix})],

(\begin{matrix} u_{000 j} \\ u_{100 j} \end{matrix}) ~ MVN [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{u 000}^{2} \\ σ_{u 000 u 100} & σ_{u 100}^{2} \end{matrix})],

e_{t (i_{1}, i_{2}) j} ~ N (0, σ_{e}^{2}) .

Model 3

In Chile, students are frequently assigned to different teachers each year and prior research has not been able to incorporate the potential cumulative effects of teachers on student outcomes. The complexity of the models applied is expected to match the system being studied to a larger extent than models used in previous research.

The hypothesis of cumulative effects of teachers was tested by comparing the fit of the model that assumes that teachers from prior years make no contributions to current achievement growth (Model 1, Equation 2) with a model that assumes that previous teachers’ effects persist undiminished in future years (Model 3, Equation 4). The latter model specifies the teacher effect on student achievement growth in a given year as the joint effect of all of the previous teachers the student had for that subject, during the period considered in the study, with the contribution of each teacher being assigned an equal weight.¹² This is done by adding a multiple membership component to the cross-classified accelerated growth model, as shown in Equation 4.

\begin{array}{l} Y_{t (i_{1}, {i_{2}}) j} = β_{0 (i_{1}, {i_{2}}) j} + β_{1 (i_{1}, {i_{2}}) j} t_{t (i_{1}, {i_{2}}) j} \\ + β_{2} t_{t (i_{1}, {i_{2}}) j}^{2} + e_{t (i_{1}, {i_{2}}) j}, \end{array}

\begin{matrix} β_{0 (i_{1}, {i_{2}}) j} = β_{000 j} + β_{0010} Cohort 2_{i_{1} j} + β_{0020} Cohort 3_{i_{1} j} \\ + β_{0030} Cohort 4_{i_{1} j} + β_{0040} {Female}_{i_{1} j} \\ + β_{0050} {SES}_{i_{1} j} + β_{0060} {Books}_{0 i_{1} j} \\ + β_{0070} {Age}_{0 i_{1} j} + u_{0 i_{1} j} + \sum_{h \in {i_{2}}} w_{t h j} u_{00 h j}, \end{matrix}

\begin{array}{l} β_{1 (i_{1}, {i_{2}}) j} = β_{010 j} + β_{0110} Cohort 2_{i_{1} j} \\ + β_{0120} Cohort 3_{i_{1} j} + β_{0130} Cohort 4_{i_{1} j} \\ + β_{0140} {Female}_{i_{1} j} + β_{0150} {SES}_{i_{1} j} \\ + β_{0160} {Books}_{i_{1} j} + β_{0170} {Age}_{i_{1} j} \\ + u_{1 i_{1} j} + \sum_{h \in {i_{2}}} w_{t h j} u_{10 h j}, \end{array}

\begin{array}{l} β_{000 j} = β_{0000} + β_{0100} Achievement {Mean}_{j} \\ + β_{0200} Achievement S D_{j} + u_{000 j}, \end{array}

\begin{array}{l} β_{010 j} = β_{1000} + β_{1100} Achievement {Mean}_{j} \\ + β_{1200} Achievement S D_{j} + u_{100 j}, \end{array}

(\begin{matrix} u_{0 i_{1} j} \\ u_{1 i_{1} j} \end{matrix}) ~ MVN [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{u 0}^{2} \\ σ_{u 0 u 1} & σ_{u 1}^{2} \end{matrix})],

(\begin{matrix} u_{00 i_{2} j} \\ u_{10 i_{2} j} \end{matrix}) ~ MVN [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{u 00}^{2} \\ σ_{u 00 u 10} & σ_{u 10}^{2} \end{matrix})],

(\begin{matrix} u_{000 j} \\ u_{100 j} \end{matrix}) ~ MVN [(\begin{matrix} 0 \\ 0 \end{matrix}), (\begin{matrix} σ_{u 000}^{2} \\ σ_{u 000 u 100} & σ_{u 100}^{2} \end{matrix})],

e_{t (i_{1}, i_{2}) j} ~ N (0, σ_{e}^{2}) .

Here, $Y_{t (i_{1}, {i_{2}}) j}$ represents the score at time $t$ of student $i_{1}$ that was taught by a set of $i_{2}$ teachers in school $j$ . Thus, $t$ indexes the Level-1 unit (occasions) that is a member of multiple units of the Level-2 classification $i_{2}$ (teachers); $u_{00 h j}$ and $u_{10 h j}$ are the residuals associated with teachers and $w_{t h j}$ is the weight assigned to occasions’ association with teachers.

As explained above, multiple membership data are modeled using weighting, where the membership weights are usually proportional to the time a lower level unit spent at a higher level unit, with the weights summing to 1. In Model 3, a pupil who was taught by the same teacher from Grade 3 to Grade 5 has a membership weight of 1 for that teacher and 0 for all other teachers. A pupil taught by a different teacher each of the 3 years in which data were collected has a membership weight of 1/3 for each of them and 0 for all other teachers.

Thus, Model 3 assumes that teacher effects persist undamped into the future. The validity of this assumption has not been fully explored in the literature, and while there is evidence that teacher effects are long-lasting, it is also reasonably to hypothesize that a teacher’s effect will dampen over time as students grow and are exposed to other teachers and learning experiences (McCaffrey et al., 2004). This issue will be explored by comparing alternative teacher weighting schemes, as part of a sensitivity analysis for the multiple membership model.

Estimation and Model Fit

Estimation was performed using Bayesian estimation via Markov chain Monte Carlo (MCMC) methods implemented in the software MLwiN (Browne, 2012).¹³ The MLwiN software was operated via the Stata command runmlwin (Leckie & Charlton, 2013). The means and standard deviations of the sampled parameters from the monitoring period were used as parameter estimates and standard errors while the 2.5th and 97.5th percentiles of the MCMC chain provided Bayesian 95% credible intervals, analogous to 95% confidence intervals.

The Bayesian Deviance Information Criterion (DIC) is recommended to compare model fit, with lower values reflecting superior models and differences in DIC values of more than 5 units between two models are regarded as strong evidence in favor of the model with the smaller DIC (Lunn, Jackson, Best, Thomas, & Spiegelhalter, 2012).

Results

In this study, we investigate teacher effects in Chile. The main aspects addressed are the size of teacher effects on students’ achievement growth in language and mathematics, the teacher-level predictors of these effects, and whether these effects accumulate over time.

As explained above, the complex structure of the data was acknowledged by using a cross-classified model, which addresses the fact that repeated measurements of achievement test scores are nested within both students and teachers and a cross-classified multiple membership model, to test the hypothesis of teachers’ cumulative effects. Thus, the models combine these aspects with an accelerated growth model, obtaining a cross-classified accelerated growth model and a cross-classified multiple membership accelerated growth model. Results derived from these models are presented in this section.

The Magnitude of Teacher Effects

Table 4 shows the results of Model 1: the cross-classified accelerated growth model. The DIC values indicate that there is a sizable improvement in the model fit when the teacher classification is incorporated and the cross-classified structure of the data acknowledged, as shown by the reduction on DIC values from Model 0, the three-level accelerated growth model without the student–teacher cross-classification (see results in Appendix 4, available in the online version of the journal), to Model 1 (ΔDIC = −3,481 in language, and ΔDIC = −3,043 in mathematics).

Table 4

Results From Model 1

	Language		Mathematics
	M (SE)	95% CI	M (SE)	95% CI
Fixed part
Intercept	375.095 (0.51)	[374.078, 376.113]	368.379 (0.557)	[367.240, 369.518]
Time	19.148 (0.355)	[18.433, 19.862]	15.259 (0.656)	[13.691, 16.826]
Time²	−7.568 (0.591)	[−8.998, −6.138]	−8.133 (0.758)	[−10.040, −6.227]
Cohort 2	17.946 (0.551)	[16.812, 19.081]	15.523 (0.649)	[14.114, 16.933]
Cohort 3	29.798 (0.515)	[28.771, 30.824]	34.152 (0.776)	[32.395, 35.910]
Cohort 4	47.491 (0.561)	[46.355, 48.627]	58.428 (0.782)	[56.655, 60.201]
Time × Cohort 2	−4.321 (0.314)	[−4.953, −3.689]	0.539 (0.307)	[−0.084, 1.162]
Time × Cohort 3	−4.718 (0.354)	[−5.430, −4.007]	5.219 (0.431)	[4.285, 6.154]
Time × Cohort 4	−1.379 (0.330)	[−2.031, −0.726]	7.620 (0.447)	[6.649, 8.591]
Time² × Cohort 2	3.035 (0.572)	[1.792, 4.278]	8.055 (0.613)	[6.696, 9.414]
Time² × Cohort 3	2.694 (0.581)	[1.423, 3.966]	5.521 (0.941)	[3.193, 7.849
Time² × Cohort 4	6.306 (0.584)	[5.026, 7.585]	6.128 (0.912)	[3.883, 8.374]
Female student	2.325 (0.265)	[1.794, 2.856]	−2.760 (0.231)	[−3.217, −2.303]
Age	−2.047 (0.240)	[−2.526, −1.567]	−2.126 (0.211)	[−2.541, −1.710]
SES	0.699 (0.182)	[0.311, 1.087]	0.750 (0.173)	[0.380, 1.120]
Number books at home	2.134 (0.166)	[1.789, 2.478]	1.927 (0.138)	[1.650, 2.204]
Time × Female	0.378 (0.203)	[−0.078, 0.833]	−0.664 (0.145)	[−0.963, −0.364]
Time × Age	−0.351 (0.128)	[−0.609, −0.093]	−0.461 (0.208)	[−0.955, 0.032]
Time × SES	−0.062 (0.278)	[−0.751, 0.627]	−0.120 (0.138)	[−0.407, 0.166]
Time × Number Books at Home	−0.205 (0.107)	[−0.443, 0.033]	−0.096 (0.087)	[−0.278, 0.087]
Achievement M	0.164 (0.034)	[0.095, 0.233]	0.239 (0.033)	[0.168, 0.309]
Achievement SD	−0.353 (0.126)	[−0.612, −0.094]	−0.223 (0.110)	[−0.450, 0.004]
School SES	2.276 (0.641)	[0.978, 3.575]	1.124 (0.809)	[−0.610, 2.859]
Time × Achievement Mean	−0.052 (0.023)	[−0.099, −0.006]	−0.036 (0.028)	[−0.098, 0.025]
Time × Achievement SD	−0.145 (0.112)	[−0.391, 0.101]	−0.081 (0.081)	[−0.244, 0.082]
Time × School SES	−0.174 (0.438)	[−1.055, 0.707]	0.701 (0.664)	[−0.766, 2.168]
Random part
Random Level 4 (school)
Intercept	13.826 (2.272)	[9.355, 18.297]	12.880 (2.704)	[7.309, 18.451]
Time/intercept	1.412 (1.578)	[−1.994, 4.818]	1.267 (1.773)	[−2.661, 5.194]
Time	4.680 (1.699)	[0.994, 8.366]	5.395 (1.535)	[2.290, 8.499]
Random Level 3 (teacher)
Intercept	8.028 (1.193)	[5.491, 10.565]	10.574 (1.279)	[7.989, 13.160]
Time/intercept	−0.501 (0.884)	[−2.297, 1.295]	0.186 (1.315)	[−2.681, 3.053]
Time	11.865 (1.968)	[7.550, 16.181]	12.055 (2.318)	[6.849, 17.262]
Random Level 2 (student)
Intercept	187.42 (2.688)	[182.046, 192.795]	160.072 (2.454)	[155.097, 165.047]
Time/intercept	−5.723 (1.247)	[−8.396, −3.051]	−1.484 (0.928)	[−3.373, 0.405]
Time	5.666 (1.135)	[3.338, 7.994]	0.802 (0.244)	[0.323, 1.280]
Random Level 1 (occasion)
Intercept	109.281 (1.311)	[106.676, 111.886]	105.458 (0.902)	[103.666, 107.250]
School VPC initial status	0.066		0.070
School VPC growth	0.211		0.296
Teacher VPC initial status	0.038		0.058
Teacher VPC growth	0.534		0.660
DIC	464,400		460,165
pD	18,967		17,157
Units: Schools	156		156
Units: Teachers	851		812
Units: Students	19,704		19,704
Units: Occasions	59,112		59,112

Note. CI = confidence interval; SES = socioeconomic status; VPC = variance partition coefficient; DIC = Deviance Information Criterion; pD = effective number of parameters.

In addition, in Table 4, the variance components on achievement growth rates show that most of the variance in growth appears to be at the teacher level (53.4% and 66.0% for the linear component in language and mathematics, respectively). Furthermore, the variance in growth at the school level remains sizable (21.1% and 29.6% in language and mathematics, respectively) but it is still notably smaller than the variance in growth at the teacher level, which is in line with previous research (Hattie, 2009; Muijs et al., 2014; Scheerens & Bosker, 1997). Furthermore, the teacher effects yield a large d-type effect size of 0.731 and 0.813 for achievement growth in language and mathematics, respectively.¹⁴ Because the school-level variance is still substantial and larger than estimates in models fitted across two time points in the literature, the results show that both schools and teachers are important, and incorporating both levels into educational effectiveness models remains essential.

In both, language and mathematics, teacher effects on the linear growth rate (slope) are much larger than on students’ achievement status (intercept; which are 3.8% and 5.8%, respectively). In addition, teacher effects on achievement growth are larger for mathematics than for language. This result is consistent with previous studies suggesting that school and teacher effects are larger in subject areas that are typically learned at school, as with mathematics, where exposure is limited in the family and the community (Teddlie & Reynolds, 2000; Thomas, Sammons, Mortimore, & Smees, 1997b). Furthermore, these teacher effects are similar in magnitude to those found in previous studies using comparable model specifications for similar outcomes (e.g., Palardy, 2010; Rowan et al., 2002).

To illustrate the magnitude of teacher effects on achievement status and growth, the 851 teacher residuals in the language sample and the 812 residuals for teachers in the mathematics sample are plotted in Figure 1. These caterpillar plots graph each residual, obtained from Model 1, against their rank order, accompanied by error bars corresponding to confidence intervals.

Figure 1.

Teacher residuals for language and mathematics based on Model 1.

In both subjects, there is considerable overlap of intervals, so that only widely separated teachers can be judged as having significantly different effects on students’ achievement growth. All in all, teacher effects are not estimated with great precision. Nonetheless, it is possible to distinguish some outliers at each end. In language, the confidence intervals of the residuals do not overlap zero for a group of about 16 teachers at the lower end of the intercept residual plot and for 23 at the upper end. In addition, about 25 and 32 teachers are at the lower and upper end of the language slope residual plot. In mathematics, the numbers of outlier teachers are 31 and 23 at the lower and upper end of the intercept residual plot, respectively. In the mathematics slope residual plots, in turn, there are 18 and 16 at the lower and upper end. This means that approximately 4% to 7% of the teachers differ significantly from the average teacher effect at the 0.05 significance level.

Teacher-Level Predictors

So far, the focus has been on the overall size of teacher effects on student achievement. These estimates, although informative about the large variation on teacher effects in achievement growth in the Chilean education system, do not provide indications as to why some teachers appear to be more effective than others. This section focuses on examining the effect of teacher characteristics that might explain part of the large variance found across teachers.

In Model 2, the teacher-level variables Female Teacher, ITT Duration, Major, and Experience were introduced. The inclusion of these variables does not lead to a significant improvement in model fit, as the DIC values increase in comparison to Model 1, in both subjects (ΔDIC = 19 in language, and ΔDIC = 24 in mathematics). Furthermore, the addition of the teacher-level variables explained only 8% and 6% of the teacher-level variance in achievement status observed in Model 1 in language and mathematics, respectively, and a negligible proportion of the teacher-level variance on achievement growth in both subjects.

As shown in Table 5, the interaction effect of the variable Major and the Time variable was found to be associated with achievement growth in mathematics, indicating that students with teachers that hold a major in mathematics show higher growth rates in the subject. This suggests that subject expertise in mathematics is important and associated with higher student achievement growth rates in primary school. The rest of teacher-level variables were neither significant predictors of student achievement status nor of student achievement growth.

Table 5

Results From Model 2

	Language		Mathematics
	M (SE)	95% CI	M (SE)	95% CI
Fixed part
Intercept	374.519 (0.645)	[373.233, 375.805]	367.715 (0.661)	[366.361, 369.069]
Time	19.838 (0.674)	[18.433, 21.242]	15.485 (0.630)	[14.139, 16.832]
Time²	−7.551 (0.616)	[−9.052, −6.050]	−8.194 (0.784)	[−10.171, −6.216]
Cohort 2	18.038 (0.539)	[16.930, 19.146]	15.670 (0.646)	[14.262, 17.078]
Cohort 3	29.925 (0.504)	[28.923, 30.926]	34.326 (0.772)	[32.581, 36.070]
Cohort 4	47.614 (0.554)	[46.495, 48.732]	58.629 (0.767)	[56.898, 60.359]
Time × Cohort 2	−4.317 (0.320)	[−4.965, −3.669]	0.506 (0.302)	[−0.106, 1.119]
Time × Cohort 3	−4.799 (0.363)	[−5.535, −4.062]	4.975 (0.404)	[4.121, 5.829]
Time × Cohort 4	−1.477 (0.358)	[−2.197, −0.756]	7.362 (0.429)	[6.445, 8.279]
Time² × Cohort 2	2.970 (0.561)	[1.758, 4.183]	7.910 (0.615)	[6.548, 9.273]
Time² × Cohort 3	2.659 (0.589)	[1.366, 3.952]	5.575 (0.950)	[3.227, 7.924]
Time² × Cohort 4	6.284 (0.584)	[5.003, 7.565]	6.171 (0.910)	[3.932, 8.409]
Female student	2.321 (0.268)	[1.784, 2.857]	−2.771 (0.235)	[−3.236, −2.305]
Age	−2.037 (0.237)	[−2.510, −1.563]	−2.125 (0.210)	[−2.540, −1.711]
SES	0.692 (0.181)	[0.307, 1.077]	0.753 (0.173)	[0.384, 1.123]
Number books at home	2.141 (0.168)	[1.793, 2.490]	1.931 (0.138)	[1.654, 2.208]
Time × Female	0.378 (0.203)	[−0.078, 0.833]	−0.662 (0.147)	[−0.967, −0.357]
Time × Age	−0.353 (0.129)	[−0.613, −0.093]	−0.465 (0.209)	[−0.961, 0.032]
Time × SES	−0.072 (0.278)	[−0.759, 0.616]	−0.117 (0.140)	[−0.406, 0.173]
Time × Number Books at Home	−0.204 (0.107)	[−0.440, 0.032]	−0.096 (0.086)	[−0.277, 0.085]
Achievement M	0.172 (0.033)	[0.106, 0.239]	0.236 (0.031)	[0.168, 0.303]
Achievement SD	−0.338 (0.125)	[−0.595, −0.081]	−0.241 (0.106)	[−0.460, −0.023]
School SES	2.165 (0.646)	[0.852, 3.478]	1.116 (0.805)	[−0.626, 2.858]
Time × Achievement M	−0.052 (0.024)	[−0.102, −0.002]	−0.042 (0.027)	[−0.101, 0.016]
Time × Achievement SD	−0.134 (0.117)	[−0.393, 0.125]	−0.095 (0.077)	[−0.249, 0.060]
Time × School SES	−0.182 (0.457)	[−1.107, 0.744]	0.657 (0.664)	[−0.814, 2.128]
Female teacher	0.783 (0.446)	[−0.102, 1.668]	1.066 (0.507)	[−0.018, 2.150]
ITT duration	−0.015 (0.119)	[−0.249, 0.220]	0.086 (0.165)	[−0.267, 0.440]
Major	−0.527 (0.397)	[−1.355, 0.301]	−0.748 (0.429)	[−1.687, 0.190]
Experience	0.032 (0.017)	[−0.001, 0.066]	0.006 (0.017)	[−0.028, 0.040]
Time × Female Teacher	−0.782 (0.741)	[−2.433, 0.869]	−0.561 (0.512)	[−1.619, 0.498]
Time × ITT Duration	0.292 (0.153)	[−0.019, 0.604]	0.274 (0.177)	[−0.109, 0.657]
Time × Major	0.335 (0.566)	[−0.936, 1.607]	1.175 (0.323)	[0.538, 1.812]
Time × Experience	0.031 (0.021)	[−0.013, 0.075]	−0.025 (0.021)	[−0.071, 0.021]
Random part
Random Level 4 (school)
Intercept	12.155 (2.241)	[7.714, 16.597]	11.596 (2.499)	[6.499, 16.692]
Time/intercept	0.684 (1.594)	[−2.769, 4.137]	0.986 (1.925)	[−3.380, 5.352]
Time	4.837 (1.830)	[0.830, 8.843]	5.611 (1.689)	[2.099, 9.123]
Random Level 3 (teacher)
Intercept	7.429 (1.234)	[4.761, 10.098]	9.898 (1.269)	[7.309, 12.488]
Time/intercept	−0.405 (0.892)	[−2.228, 1.417]	0.256 (1.278)	[−2.516, 3.029]
Time	11.901 (1.974)	[7.586, 16.216]	11.914 (2.316)	[6.685, 17.143]
Random Level 2 (student)
Intercept	187.410 (2.688)	[182.026, 192.794]	159.934 (2.436)	[155.011, 164.857]
Time/intercept	−5.705 (1.246)	[−8.383, −3.027]	−1.413 (0.850)	[−3.159, 0.334]
Time	5.484 (1.146)	[3.115, 7.852]	0.315 (0.178)	[−0.035, 0.664]
Random Level 1 (occasion)
Intercept	109.425 (1.298)	[106.849, 112.001]	105.868 (0.884)	[104.114, 107.622]
School VPC initial status	0.059		0.064
School VPC growth	0.218		0.315
Teacher VPC initial status	0.036		0.055
Teacher VPC growth	0.536		0.668
DIC	464,419		460,189
pD	18,920		17,001
Units: Schools	156		156
Units: Teachers	851		812
Units: Students	19,704		19,704
Units: Occasions	59,112		59,112

Note. CI = confidence interval; SES = socioeconomic status; ITT = initial teacher training; VPC = variance partition coefficient; DIC = Deviance Information Criterion; pD = effective number of parameters.

Teachers’ Cumulative Effects

The use of a cross-classified multiple membership model (Model 3) allows the depiction of teachers’ cumulative effects by capturing both, the cross-classification of students and teachers and the multiple membership of students’ scores to teachers. Results from Model 3 are shown in Table 6.

Table 6

Results From Model 3

	Language		Mathematics
	M (SE)	95% CI	M (SE)	95% CI
Fixed part
Intercept	374.793 (0.582)	[373.649, 375.936]	368.070 (0.750)	[366.523, 369.617]
Time	19.160 (0.398)	[18.373, 19.946]	15.135 (0.618)	[13.768, 16.503]
Time²	−7.412 (0.461)	[−8.474, −6.349]	−7.805 (0.736)	[−9.679, −5.930]
Cohort 2	18.329 (0.558)	[17.229, 19.430]	15.515 (0.853)	[13.611, 17.420]
Cohort 3	30.439 (0.635)	[29.193, 31.686]	35.297 (0.992)	[33.087, 37.508]
Cohort 4	48.006 (0.665)	[46.695, 49.317]	59.340 (0.976)	[57.180, 61.501]
Time × Cohort 2	−3.998 (0.341)	[−4.678, −3.318]	0.766 (0.392)	[−0.054, 1.586]
Time × Cohort 3	−4.559 (0.444)	[−5.470, −3.649]	5.551 (0.399)	[4.738, 6.364]
Time × Cohort 4	−1.297 (0.402)	[−2.099, −0.494]	7.696 (0.432)	[6.806, 8.585]
Time² × Cohort 2	2.639 (0.440)	[1.729, 3.550]	8.120 (0.675)	[6.526, 9.713]
Time² × Cohort 3	2.241 (0.405)	[1.435, 3.048]	4.960 (0.728)	[3.233, 6.687]
Time² × Cohort 4	5.972 (0.486)	[4.965, 6.980]	5.674 (0.801)	[3.741, 7.607]
Female student	2.212 (0.265)	[1.682, 2.742]	−2.863 (0.226)	[−3.308, −2.418]
Age	−2.008 (0.236)	[−2.478, −1.538]	−2.141 (0.208)	[−2.551, −1.732]
SES	0.712 (0.165)	[0.370, 1.053]	0.774 (0.163)	[0.432, 1.115]
Number books at home	2.117 (0.167)	[1.768, 2.465]	1.902 (0.138)	[1.624, 2.180]
Time × Female	0.377 (0.207)	[−0.089, 0.842]	−0.626 (0.147)	[−0.928, −0.324]
Time × Age	−0.342 (0.132)	[−0.608, −0.076]	−0.453 (0.211)	[−0.955, 0.049]
Time × SES	−0.032 (0.265)	[−0.684, 0.620]	−0.143 (0.134)	[−0.418, 0.133]
Time × Number Books at Home	−0.200 (0.104)	[−0.429, 0.028]	−0.095 (0.086)	[−0.276, 0.085]
Achievement M	0.141 (0.043)	[0.053, 0.229]	0.215 (0.032)	[0.151, 0.280]
Achievement SD	−0.378 (0.135)	[−0.649, −0.107]	−0.125 (0.124)	[−0.379, 0.129]
School SES	2.148 (0.770)	[0.573, 3.723]	1.530 (0.828)	[−0.160, 3.219]
Time × Achievement M	−0.044 (0.026)	[−0.095, 0.007]	−0.016 (0.030)	[−0.079, 0.047]
Time × Achievement SD	−0.162 (0.138)	[−0.460, 0.136]	−0.110 (0.090)	[−0.287, 0.067]
Time × School SES	−0.152 (0.542)	[−1.238, 0.934]	0.153 (0.706)	[−1.332, 1.638]
Random part
Random Level 4 (school)
Intercept	15.601 (3.319)	[8.943, 22.258]	16.600 (4.268)	[7.648, 25.551]
Time/intercept	−3.996 (1.792)	[−7.558, −0.434]	−5.284 (2.462)	[−10.544, −0.024]
Time	10.558 (3.260)	[3.133, 17.982]	11.545 (2.167)	[7.236, 15.854]
Random Level 3 (teacher)
Intercept	11.307 (1.296)	[8.670, 13.945]	14.844 (1.933)	[10.705, 18.982]
Time/intercept	−6.831 (1.305)	[−9.804, −3.858]	−9.945 (2.205)	[−15.322, −4.569]
Time	8.189 (1.541)	[4.495, 11.883]	10.445 (2.531)	[4.020, 16.869]
Random Level 2 (student)
Intercept	183.929 (2.651)	[178.630, 189.227]	156.414 (2.604)	[151.019, 161.810]
Time/intercept	−6.199 (1.322)	[−9.098, −3.299]	−1.861 (0.901)	[−3.688, −0.033]
Time	5.594 (1.177)	[3.145, 8.043]	0.851 (0.276)	[0.306, 1.395]
Random Level 1 (occasion)
Intercept	108.650 (1.243)	[106.201, 111.099]	104.649 (1.169)	[102.146, 107.152]
School VPC initial status	0.074		0.088
School VPC growth	0.434		0.505
Teacher VPC initial status	0.054		0.079
Teacher VPC growth	0.336		0.457
DIC	464,012		459,998
pD	18,926		17,141
Units: Schools	156		156
Units: Teachers	851		812
Units: Students	19,704		19,704
Units: Occasions	59,112		59,112

Note. CI = confidence interval; SES = socioeconomic status; VPC = variance partition coefficient; DIC = Deviance Information Criterion; pD = effective number of parameters.

In both subjects, Model 3, which incorporates the multiple membership of students to teachers, fits the data better than Model 1 that does not include this multiple membership component and, therefore, assumes that teacher contributions disappear from 1 year to the next. This is suggested by an important reduction on DIC values from Model 1 to Model 3 (ΔDIC = −388 in language, and ΔDIC = −167 in mathematics). These results indicate that previous teachers continue to influence student achievement scores later on in time.

Given the limitation of the three time points available in this study, it is only possible to look at growth across 2 academic years for each student and results reveal teacher effects remain cumulative in this time scale, in both language and mathematics, for the multiple cohorts covered from Grade 3 to Grade 8.

Model 3 assumes that teacher effects persist undamped into the future. We tested this assumption by comparing seven alternative teacher weighting schemes. These multiple membership weighting schemes assume that the order in which teachers taught a given student is important, with the final teacher assigned to have a greater influence than earlier teachers or with each teacher being more important than the last. When we compare the DIC of these models (see Appendix 5, available in the online version of the journal), to that of Model 3, we can see that the model with alternative Weighting Scheme 3, which assumes a contribution to teacher effects of 10%, 30%, and 60% from the teacher in the first, second, and third year, respectively, fits the data significantly better in language (ΔDIC = 165). In turn, the model with alternative Weighting Scheme 2, which assumes a contribution to teacher effects of 20%, 30%, and 50% the teachers in the first, second, and third year, respectively, fits the data better in mathematics (ΔDIC = 116). Interestingly, the rate of decay of teacher effects is similar but not the same across academic subjects, as earlier teachers seem to have a slightly larger effect on students’ later achievement.

Discussion

This study has examined teacher effects on student achievement trajectories with more advanced methods than in most previous research. First, a three-time point longitudinal design was used. Previous studies have largely relied on cross-sectional data or two data–point designs, which provide very limited information on intraindividual variability to study change. As previous research has shown, the precision of the parameter estimates is improved with three time points (Raudenbush & Liu, 2000; Rogosa, Brand, & Zimowski, 1982).

Second, the effect of teachers on student achievement trajectories was studied by means of multilevel models that incorporated crossed-classified and multiple membership random effects over accelerated growth curves. From the literature reviewed, it has been found that most methodological examinations have neglected the crossed nature of the data when modeling teacher effects. The multilevel models applied soundly distinguished within-student, between-student, between-teacher and between-school variability in a partially crossed design. Thus, the study is methodologically innovative and advances prior research.

Also, as we studied teacher and school effects with nonexperimental data, it is important to consider that teachers and students are neither distributed randomly among schools nor within schools. The estimated effect of teachers would be biased if the allocation of teachers and students into schools and classes induce a correlation between teacher characteristics and unobserved variables that impact student achievement. Although research based on cross-sectional data is not likely to overcome this issue, analyzing the impact of teachers in promoting student academic achievement using longitudinal data and controlling for relevant covariates can help to address it. Thus, an important advantage of the longitudinal approached used is that it can enhance the validity of causal inferences in nonexperimental research, by making possible some control over selection effects.

Although the methodological approaches discussed above make a contribution to TER, theoretical implications are also important. The analyses presented indicate that teacher effects are substantially larger than previously reported in Chile, due to the use of more appropriate models and measurement. The study also confirms that teacher effects at primary school level exceed school effects. It is clear that educational effects are larger when achievement progress over time, rather than achievement status, is studied, which confirms that teachers and, to a lesser extent, schools make an important contribution to variations in student achievement growth. Also, the contribution of teachers to student achievement growth seems to accumulate over time, at least in the time scale studied (2 academic years), and this holds for multiple cohorts from Grade 3 to Grade 8.

These results also reveal that teacher effects in an emerging economy such as Chile are similar in magnitude to those found in postindustrialized countries. However, the specific mechanisms generating these large teacher effects could not be explored in this study.

There are two main limitations in this study: the lack of representativeness of the sample for some important variables (see Appendix 1, available in the online version of the journal) and missing data. With regard to the former, students who are younger come from more advantageous home environments and attend high-SES, private and large schools are slightly over-represented in the analytical sample, and those attending low-SES, public and private subsidized schools are slightly underrepresented. The implications of this for the study are a potential reduction in variance at the school and teacher level which would mean that the estimates of school and teacher effects might appear lower than they otherwise would. Thus, the differences between the characteristics of the population and the composition of the study’s sample demand caution with regard to the interpretation and generalization of findings.

With regard to the treatment of missing data, as explained above, there were particularly large proportions of missing data on the dependent variables, due to attrition. This was the result of students changing schools as well as schools leaving (while others joined) the SEPA project during the period under study. As data were MAR, the issue of missing data was dealt with by implementing a suitable strategy for the data, namely, multilevel MI on all the variables in the models, including the dependent variables. Results obtained under this method were robust to those obtained with alternative approaches for dealing with missing data (i.e., LW and MID). We accommodated the hierarchical data structure of the data, as much as possible, in the imputation process, by means of imputing data with test scores in wide format, students as Level 1, and schools as Level 2. However, to our knowledge, the software available have not yet implemented MI for models more complex than two-level model and, to date, there is no evidence on the effects of imputing cross-classified data as purely hierarchical. An important avenue for future research would be to explore what imputation procedures are more appropriate when dealing with cross-classified and multiple membership data structures.

Conclusion

The aim of this study was to investigate the magnitude of teacher effects in Chile, their predictors, and the extent to which they accumulate over time. This study addressed three important gaps in the literature as it, first, explores teacher effects in the context of an emerging economy using a dynamic perspective, second, contributes further evidence on the properties of teacher effects on student achievement growth (i.e., magnitude, predictors and cumulativeness), and, third, advances the field methodologically by demonstrating the combined use of accelerated longitudinal designs, growth curve models, and cross-classified and multiple membership specifications.

One clear implication of these analyses is that TER should move toward a dynamic perspective to estimate both teacher and school effects on student achievement, this is, an approach that focuses on their contribution to students’ achievement trajectories. A promising strategy shown in this study is to use a cross-classified random effects model, as Raudenbush (1993) proposed. The analyses reported here suggest that cross-classified random effects models lead to findings of larger teacher effects. The magnitudes found are comparable with those reported in the few previous studies that have also accounted for crossed grouping factors in the data when estimating teacher effects (Palardy, 2010; Raudenbush, 1993; Rowan et al., 2002). Furthermore, the study contributes with evidence to a key debate in EER (Luyten, 2003) as it shows that teacher effects outweigh school effects, although both are significant.

The study also reveals that only relatively small numbers of individual teachers can be reliably distinguished as either significantly more or less effective than others. Thus, using achievement data for high stakes accountability purposes such as rewarding or sanctioning individual teachers would be inappropriate, due to issues of reliability.

Also, the teacher-to-teacher differences in effects on student achievement growth imply that some students make less academic progress than they would otherwise be expected to make, due to the influence of teachers and schools in which they are taught. It is then important to explain why these differences in effectiveness occur as this could provide indications on how to improve teaching effectiveness broadly.

Although the input teacher variables tested in this study (i.e., teacher gender, duration of ITT, and years of teaching experience) did not account for an important proportion of the variation on teacher effects, holding a major in mathematics was found to be predictive of student mathematics achievement growth in primary school. This variable can be seen as an indirect measure of teachers’ knowledge of the content being taught and this result is in line with prior research, which has found positive effects of measures of teachers’ knowledge on student achievement (e.g., Greenwald et al., 1996). Although more research is needed in this area, this is a first indication that actions toward improving teachers’ levels of subject specialization in Chile can promote student achievement progress, and that differences in terms of subject-specific training of teacher education programs should be monitored.

Although not addressed in this study, the international evidence also suggests that classroom process variables, if well measured, hold promise for explaining differences in teacher effectiveness. Indeed, the fact that a sizable percentage of between-school and between-teacher variance remains unexplained after controlling for the available student, school and teacher variables indicates that malleable educational conditions rather than merely student selection factors are likely to account for differences in student achievement growth. EER has drawn attention to the centrality of classroom processes in determining schools’ overall academic effectiveness. In this line of research, quality of teaching and teacher expectations have been shown to play an important role in promoting students’ learning (Carnoy, 2007; Muijs et al., 2014; Pianta et al., 2011).

Finally, the cross-classified multiple membership random effects model showed that the effects of teachers accumulate over time as previous teachers continue to influence student achievement scores in subsequent years. The rate of decay underlying these cumulative teacher effects was explored, favoring models that assume that teacher effects dampen over time as students grow and are exposed to other teachers and learning experiences.

Supplemental Material

DS_10.3102_0162373718781960 – Supplemental material for Teacher Effects on Chilean Children’s Achievement Growth: A Cross-Classified Multiple Membership Accelerated Growth Curve Model

Supplemental material, DS_10.3102_0162373718781960 for Teacher Effects on Chilean Children’s Achievement Growth: A Cross-Classified Multiple Membership Accelerated Growth Curve Model by Lorena Ortega, Lars-Erik Malmberg and Pam Sammons in Educational Evaluation and Policy Analysis

Footnotes

Acknowledgements

The authors thank the MIDE UC Assessment Center at the Pontificia Universidad Católica de Chile and the Chilean Ministry of Education for granting access to the SEPA and SIMCE data, respectively.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The completion of the article was supported by the National Commission of Scientific and Technological Research of Chile, through the project Center of Advanced Studies on Educational Justice (CONICYT–PIA CIE 160007). This work was supported by a PhD scholarship awarded to the first author by the Program of Advance Human Capital, National Commission of Scientific and Technological Research of Chile (CONICYT–PFCHA 72100834).

Notes

ORCID iD

L. Ortega

Authors

LORENA ORTEGA is an associate researcher at the Center for Advanced Studies in Educational Justice (CJE) of the Pontificia Universidad Católica de Chile. Prior to this position, she was a postdoctoral researcher at the Department of Sociology, University of Tübingen, Germany, and completed her PhD at the Oxford University Department of Education. Her research interests involve the application of quantitative methods to educational research and the modeling of multilevel and longitudinal data to investigate educational effectiveness and inequalities.

LARS-ERIK MALMBERG is an associate professor of quantitative methods in education at the Department of Education, University of Oxford. His current research interests are in intrapersonal approaches to learning processes and modeling of intrapersonal data. He has published on effects of education, child care, and parenting on developmental and educational outcomes, and teacher development. He applies advanced quantitative models to the investigation of substantive research questions in education.

PAM SAMMONS is a professor of education at the Department of Education, University of Oxford, and a senior research fellow at Jesus College, Oxford. She has been involved in educational research for the last 30 years with a special focus on the topics of school effectiveness and improvement, leadership, and equity in education. She has a particular interest in the evaluation of education policy initiatives including both formative and summative approaches.

References

Allison

P. D.

(2000). Multiple imputation for missing data: A cautionary tale. Sociological Methods & Research, 28, 301–309.

Alvarado

Cabezas

Falck

Ortega

M. E.

(2012). La evaluación docente y sus instrumentos: Discriminación del desempeño docente y asociación con los resultados de los estudiantes [The teacher evaluation system and its instruments: Teacher performance prediction and association with students’ results]. Santiago, Chile: Ministerio de Educación de Chile (MIDEUC) - Programa de las Naciones Unidas para el Desarrollo (PNUD).

Antoniou

(2012). The short- and long-term effects of secondary schools upon students’ academic success and development. Educational Research and Evaluation, 18, 621–639.

Avalos

Matus

(2010). La formación inicial docente en Chile. Informe nacional del estudio internacional IEA TEDS-M [Initial teacher training in Chile. National report of the international study IEA TEDS-M]. Santiago: Gobierno de Chile.

Beretvas

(2008). Cross-classified random effects models. In O’Connell

A. A.

McCoach

D. B.

(Eds.), Multilevel modeling of educational data (pp. 161–197). Charlotte, NC: Information Age.

Beretvas

S. N.

(2011). Cross-classified and multiple membership models. In Hox

J. J.

Roberts

J. K.

(Eds.), Handbook of advanced multilevel analysis (pp. 313–334). New York, NY: Routledge.

Blazar

Kraft

M. A.

(2017). Teacher and teaching effects on students’ attitudes and behaviors. Educational Evaluation and Policy Analysis, 39, 146–170.

Browne

W. J.

(2012). MCMC estimation in MLwiN, v2.26. Bristol, UK: Centre for Multilevel Modelling, University of Bristol.

Brunner

J. J.

Uribe

(2007). Mercados universitarios: El nuevo escenario de la educación superior [University markets: The new higher education scenario]. Santiago, Chile: Universidad Diego Portales.

10.

Carnoy

(2007). La mejora de la calidad y equidad educativa: una vision realista [The improvement of educational quality and equity: A realistic view]. Revista Pensamiento Educativo, 40, 87–102.

11.

Choi

Wilson

(2016). Incorporating mobility in growth modeling for multilevel and longitudinal item response data. Multivariate Behavioral Research, 51, 120–137.

12.

Clotfelter

C. T.

Ladd

H. F.

Vigdor

J. L.

(2006). Teacher-student matching and the assessment of teacher effectiveness. Cambridge, MA: National Bureau of Economic Research.

13.

Clotfelter

C. T.

Ladd

H. F.

Vigdor

J. L.

(2007). Teacher credentials and student achievement: Longitudinal analysis with student fixed effects. Economics of Education Review, 26, 673–682.

14.

Creemers

B. P. M.

Kyriakides

Sammons

(2010). Methodological advances in educational effectiveness research. London, England: Routledge

15.

Darling-Hammond

(2000). Teacher quality and student achievement: A review of state policy evidence. Education Policy Analysis Archives, 8, 1–50.

16.

Fielding

Goldstein

(2006). Cross-classified and multiple membership structures in multilevel models: An introduction and review. London, England: Department for Education and Skills.

17.

Goldhaber

D. D.

Brewer

D. J.

(2000). Does teacher certification matter? High school teacher certification status and student achievement. Educational Evaluation and Policy Analysis, 22, 129–145.

18.

Goldstein

(1987). Multilevel covariance component models. Biometrika, 74, 430–431.

19.

Goldstein

Sammons

(2006). The influence of secondary and junior schools on sixteen year examination performance: A cross-classified multilevel analysis. School Effectiveness and School Improvement, 8, 219–230. doi:10.1080/0924345970080203

20.

Grady

M. W.

(2010). Modeling achievement in the presence of student mobility: A growth curve model for multiple membership data (Unpublished doctoral dissertation). The University of Texas at Austin.

21.

Grady

M. W.

Beretvas

S. N.

(2010). Incorporating student mobility in achievement growth modeling: A cross-classified multiple membership growth curve model. Multivariate Behavioral Research, 45, 393–419.

22.

Greenwald

Hedges

L. V.

Laine

R. D.

(1996). The effect of school resources on student achievement. Review of Educational Research, 66, 361–396.

23.

Groenwold

R. H.

Donders

A. R.

Roes

K. C.

Harrell

F. E.

Jr. Moons

K. G.

(2012). Dealing with missing outcome data in randomized trials and observational studies. American Journal of Epidemiology, 175, 210–217.

24.

Guldemond

Bosker

R. J.

(2009). School effects on students’ progress—A dynamic perspective. School Effectiveness and School Improvement, 20, 255–268.

25.

Hanushek

E. A.

(1997). Assessing the effects of school resources on student performance: An update. Educational Evaluation and Policy Analysis, 19, 141–164.

26.

Hanushek

E. A.

Kain

O’Brien

Rivkin

S. G.

(2005). The market for teacher quality. Cambridge, MA: National Bureau of Economic Research.

27.

Hanushek

E. A.

Rivkin

S. G.

(2010). Generalizations about using value-added measures of teacher quality. The American Economic Review, 100, 267–271.

28.

Hattie

(2009). Visible learning: A synthesis of over 800 meta-analyses relating to achievement. New York, NY: Routledge.

29.

Hill

P. W.

Rowe

K. J.

(1996). Multilevel modelling in school effectiveness research. School Effectiveness and School Improvement, 7, 1–34.

30.

Hill

P. W.

Rowe

K. J.

(1998). Modelling student progress in studies of educational effectiveness. School Effectiveness and School Improvement, 9, 310–333.

31.

Kontopantelis

White

I. R.

Sperrin

(2017). Outcome-sensitive multiple imputation: A simulation study. BMC Medical Research Methodology, 17(1), Article 2.

32.

Kukla-Acevedo

Streams

Toma

E. F.

(2009). Evaluation of teacher preparation programs: A reality show in Kentucky. Lexington: University of Kentucky.

33.

Kupermintz

(2003). Teacher effects and teacher effectiveness: A validity investigation of the Tennessee value added assessment system. Educational Evaluation and Policy Analysis, 25, 287–298.

34.

Kyriakides

Creemers

B. P. M.

(2008). A longitudinal study on the stability over time of school and teacher effects on student outcomes. Oxford Review of Education, 34, 521–545.

35.

Lara

Mizala

Repetto

(2010). Una mirada a la efectividad de los profesores en Chile [An overview of teacher effectiveness in Chile]. Estudios Públicos, 120, 147–182.

36.

Leckie

(2009). The complexity of school and neighbourhood effects and movements of pupils on school differences in models of educational achievement. Journal of the Royal Statistical Society: Series A, 172, 537–554.

37.

Leckie

(2013). Multiple membership multilevel models: Concepts (LEMMA VLE Module 13). Retrieved from http://www.bristol.ac.uk/cmm/learning/course.html

38.

Leckie

Charlton

(2013). Runmlwin: A program to run the MLwiN multilevel modeling software from within Stata. Journal of Statistical Software, 52(11), 1–40.

39.

León

Manzi

Paredes

(2009). Calidad docente y rendimiento escolar en Chile: Evaluando la evaluación [Teacher quality and student achievement in Chile: Evaluating the evaluation]. Santiago: Pontificia Universidad Católica de Chile.

40.

Leroux

A. J.

Beretvas

S. N.

(2018). Estimation of a latent variable regression growth curve model for individuals cross-classified by clusters. Multivariate Behavioral Research, 53, 231–246.

41.

Little

R. J. A.

(1992). Regression with missing X’s: A review. Journal of the American Statistical Association, 87, 1227–1237.

42.

Little

R. J. A.

Rubin

D. B.

(2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: John Wiley.

43.

Lockwood

J. R.

McCaffrey

D. F.

Mariano

L. T.

Setodji

(2007). Bayesian methods for scalable multivariate value-added assessment. Journal of Educational and Behavioral Statistics, 32, 125–150.

44.

Lunn

Jackson

Best

Thomas

Spiegelhalter

(2012). The bugs book: A practical introduction to Bayesian analysis. Boca Raton, FL: CRC Press.

45.

Luo

Kwok

(2012). The consequences of ignoring individuals’ mobility in multilevel growth models. Journal of Educational and Behavioral Statistics, 37, 31–56.

46.

Luyten

(2003). The size of school effects compared to teacher effects: An overview of the research literature. School Effectiveness and School Improvement, 14, 31–51.

47.

Maas

C. J. M.

Hox

J. J.

(2005). Sufficient sample sizes for multilevel modeling. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 1, 86–92.

48.

Manzi

Lacerna

Meckes

Ramos

Ortega

(2012). ¿Qué características de la formación inicial de los docentes se asocian a mayores avances en su aprendizaje de conocimientos disciplinarios? [What are the characteristics of teacher initial training associated with teachers’ progress in disciplinary knowledge?]. Santiago, Chile: Ministerio de Educación.

49.

Matear

(2007). Equity in education in Chile: The tensions between policy and practice. International Journal of Educational Development, 27, 101–113.

50.

McCaffrey

D. F.

Lockwood

J. R.

Koretz

Louis

T. A.

Hamilton

(2004). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 29, 67–101.

51.

McCaffrey

D. F.

Lockwood

J. R.

Koretz

D. M.

Hamilton

(2003). Evaluating value-added models for teacher accountability. Santa Monica, CA: RAND Corporation.

52.

Meyers

J. L.

Beretvas

S. N.

(2006). The impact of inappropriate modeling of cross-classified data structures. Multivariate Behavioral Research, 41, 473–497.

53.

Ministerio de Educación. (2015). Resultados de la evaluación Inicia 2014 [Inicia assessment results 2014]. Santiago, Chile: Author.

54.

Moerbeek

(2004). The consequence of ignoring a level of nesting in multilevel analysis. Multivariate Behavioral Research, 39, 129–149.

55.

Muijs

Kyriakides

van der Werf

Creemers

Timperley

Earl

(2014). State of the art: Teacher effectiveness and professional learning. School Effectiveness and School Improvement, 25, 231–256.

56.

Muñoz

M. A.

Chang

F. C.

(2007). The elusive relationship between teacher characteristics and student academic growth: A longitudinal multilevel model for change. Journal of Personnel Evaluation in Education, 20, 147–164.

57.

Murillo

(2007). Investigación iberoamericana sobre eficacia escolar (IIEE) [Ibero-American research on school effectiveness (IIEE)]. Bogotá, Colombia: Convenio Andrés Bello.

58.

Muthén

L. K.

Muthén

B. O.

(2010). MPLUS user’s guide. Los Angeles, CA: Author.

59.

Nye

Konstantopoulos

Hedges

L. V.

(2004). How large are teacher effects? Educational Evaluation and Policy Analysis, 26, 237–257.

60.

Organisation for Economic Co-Operation and Development. (2013). PISA 2012 results: Excellence through equity—Giving every student the chance to succeed (Vol. II). Paris, France: Author.

61.

Ortega

(2016). Educational effectiveness and inequalities in Chile: A multilevel accelerated longitudinal study of primary school children’s achievement trajectories (Unpublished doctoral thesis). University of Oxford, UK.

62.

Ortega

Malmberg

L.-E.

Sammons

(2018). School effects on Chilean children’s achievement growth in language and mathematics: An accelerated growth curve model. School Effectiveness and School Improvement, 29, 308–337. doi:10.1080/09243453.2018.1443945

63.

Ortúzar

Flores

Milesi

Cox

(2009). Aspectos de la formación inicial docente y su influencia en el rendimiento académico de los alumnos [Aspects of initial teacher training and its influence on student academic achievement]. In Camino al bicentenario, propuestas para Chile (pp. 155–186). Santiago, Chile: Ediciones Universidad Católica.

64.

Palardy

G. J.

(2010). The multilevel crossed random effects growth model for estimating teacher and school effects: Issues and extensions. Educational and Psychological Measurement, 70, 401–419.

65.

Papay

(2011). Different tests, different answers: The stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48, 163–193.

66.

Pianta

Hamre

Mintz

(2011). Classroom Assessment Scoring System (CLASS): Upper elementary manual. Charlottesville, VA: Teachstone.

67.

Pustjens

Van de gaer

Van Damme

Onghena

Van Landeghem

(2007). The short-term and the long-term effect of primary schools and classes on mathematics and language achievement scores. British Educational Research Journal, 33, 419–440.

68.

Raftery

A. E.

Lewis

S. M.

(1992). How many iterations in the Gibbs sampler? In Bernardo

J. M.

Berger

J. O.

Dawid

A. P.

Smith

A. F. M.

(Eds.), Bayesian statistics 4 (pp. 763–773). Oxford, UK: Oxford University Press.

69.

Ramírez

M. J.

(2006). Factors related to mathematics achievement in Chile. In Howie

S. J.

Plomp

(Ed.), Context of learning mathematics and science: Lessons learned from TIMSS (pp. 97–112). London, England: Routledge.

70.

Rasbash

Browne

W. J.

(2001). Modelling non-hierarchical structures. In Leyland

A. H.

Goldstein

(Eds.), Multilevel modelling of health statistics (pp. 93–105). Chichester, UK: John Wiley.

71.

Raudenbush

S. W.

(1993). A crossed random effects model for unbalanced data with applications in cross-sectional and longitudinal research. Journal of Educational Statistics, 18, 321–349.

72.

Raudenbush

S. W.

(1995). Hierarchical linear models to study the effects social context on development. In Gottman

J. M.

(Ed.), The analysis of change (pp. 165–201). Mahwah, NJ: Lawrence Erlbaum.

73.

Raudenbush

S. W.

Bryk

A. S.

(2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: SAGE.

74.

Raudenbush

S. W.

Liu

(2000). Statistical power and optimal design for multisite randomized trials. Psychological Methods, 5, 199–213.

75.

Reynolds

Sammons

De Fraine

Van Damme

Townsend

Teddlie

Stringfield

(2014). Educational effectiveness research (EER): A state-of-the-art review. School Effectiveness and School Improvement, 25, 197–230.

76.

Rivkin

S. G.

Hanushek

E. A.

Kain

J. F.

(2005). Teachers, schools, and academic achievement. Econometrica, 73, 417–458.

77.

Rockoff

J. E.

(2004). The impact of individual teachers on student achievement: Evidence from panel data. The American Economic Review, 94, 247–252.

78.

Rogosa

Brand

Zimowski

(1982). A growth curve approach to the measurement of change. Psychological Bulletin, 92, 726–748.

79.

Rothstein

(2009). Student sorting and bias in value added estimation: Selection on observables and unobservables. Education Finance and Policy, 4, 537–571.

80.

Rowan

Correnti

Miller

R. J.

(2002). What large-scale, survey research tells us about teacher effects on student achievement: Insights from the prospects study of elementary schools. Teachers College Record, 104, 1525–1567.

81.

Rubin

D. B.

(1987). Multiple imputation for nonresponse in surveys. New York, NY: John Wiley.

82.

Sammons

Davis

Gray

(2016). The scientific properties of school effects. In Chapman

Muijs

Reynolds

Sammons

Teddlie

(Eds.), The Routledge international handbook of educational effectiveness and improvement: Research, policy, and practice (pp. 25–76). New York, NY: Routledge.

83.

Sammons

Luyten

(2009). Editorial article for special issue on alternative methods for assessing school effects and schooling effects. School Effectiveness and School Improvement, 20, 133–143.

84.

Schafer

J. L.

Graham

J. W.

(2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147–177.

85.

Scheerens

Bosker

R. J.

(1997). The foundations of educational effectiveness. New York, NY: Pergamon Press.

86.

Singer

Willett

J. B.

(2003). Applied longitudinal data analysis: Modeling change and event occurrence. Oxford, UK: Oxford University Press.

87.

Sullivan

T. R.

Salter

A. B.

Ryan

Lee

K. J.

(2015). Bias and precision of the “multiple imputation, then deletion” method for dealing with missing outcome data. American Journal of Epidemiology, 182, 528–534.

88.

Tatto

M. T.

Schwille

Senk

S. L.

Ingvarson

Rowley

Peck

. . . Reckase

M. D.

(2012). Policy, practice, and readiness to teach primary and secondary mathematics in 17 countries: Findings from the IEA teacher education and development study in mathematics (TEDS-M). Amsterdam, The Netherlands: International Association for the Evaluation of Educational Achievement.

89.

Teddlie

Reynolds

(Eds.). (2000). The international handbook of school effectiveness research. London, England: Falmer Press.

90.

Thomas

Sammons

Mortimore

Smees

(1997). Stability and consistency in secondary schools’ effects on students’ GCSE outcomes over three years. School Effectiveness and School Improvement, 8, 169–197.

91.

Thum

Y. M.

(2003). Measuring progress toward a goal: Estimating teacher productivity using a multivariate multilevel model for value-added analysis. Sociological Methods & Research, 32, 153–207.

92.

Tymms

Merrell

Henderson

(2000). Baseline assessment and progress during the first three years at school. Educational Research and Evaluation, 6, 105–129.

93.

United Nations Educational, Scientific and Cultural Organization. (2015a). Tercer estudio regional comparativo y explicativo. Factores asociados [Third regional comparative and explicative study. Associated factors]. Paris, France: Author.

94.

United Nations Educational, Scientific and Cultural Organization. (2015b). Tercer estudio regional comparativo y explicativo. Logros de aprendizajes [Third regional comparative and explicative study. Learning achievement]. Paris, France: Author.

95.

van den Noortgate

Opdenakker

M.-C.

Onghena

(2005). The effects of ignoring a level in multilevel analysis. School Effectiveness and School Improvement, 16, 281–303.

96.

Velez

Schiefelbein

Valenzuela

(1993). Factors affecting achievement in primary education. Washington, DC: The World Bank.

97.

Von Hippel

. (2007). Regression with missing Ys: An improved strategy for analyzing multiply imputed data. Sociological Methodology, 37, 83–117.

98.

Wallace

M. L.

(2015). Modeling cross-classified data with and without the crossed factors’ random effects’ interaction (Unpublished doctoral dissertation). The University of Texas at Austin.

99.

Wayne

A. J.

Youngs

(2003). Teacher characteristics and student achievement gains: A review. Review of Educational Research, 73, 89–122.

100.

Willms

J. D.

Somer

M.-A.

(2001). Family, classroom, and school effects on children’s educational outcomes in Latin America. School Effectiveness and School Improvement, 12, 409–445.

101.

Wright

S. P.

Horn

S. P.

Sanders

W. L.

(1997). Teacher and classroom context effects on student achievement: Implications for teacher evaluation. Journal of Personnel Evaluation in Education, 11, 57–67.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB