Abstract
In 2003, Chicago launched “Double-Dose Algebra,” requiring students with pretest scores below the national median to take two periods of math–algebra and supplemental coursework. In many schools, assignment to Double Dose changed the peer composition of the algebra classroom. Using school-specific instrumental variables within a regression-discontinuity design (RDD), we find that attending a lower skill classroom reduced math achievement for median-skill students. As a result, the Double-Dose policy had little or no effect for median-skill students in schools that exposed them to low-skill classrooms. However, the effects of Double Dose were substantially positive in schools that did not do so. We consider policy implications and interpretations of the results from RDDs.
Keywords
However, as we demonstrate in the current article, the “Double-Dose Algebra” policy also substantially increased classroom sorting based on prior math skill, on average, across Chicago’s 60 neighborhood high schools. As a result, for many students whose prior skill was near the median, assignment to Double Dose entailed attending an algebra class composed of comparatively low-skill peers whereas assignment to single dose entailed attending a class of comparatively high-skill peers. This suggests that, for many median-skill students, “Double-Dose Algebra” is not a single treatment, but rather a mix of two concurrent treatments: doubled instructional time and reduced classroom peer skill. An important question is whether taking a class with comparatively low-skill peers enhanced or undermined the Double-Dose reform for students of median prior skill.
If the reform had increased sorting to the same extent in every school, we would have no basis to answer this question. However, this was not the case: In some schools, students assigned to Double Dose were concentrated in classrooms characterized by low prior math skill; in contrast, other schools managed to implement the policy without resorting to skill-based sorting, such that classroom peer skill was relatively similar in Double-Dose and Single-Dose classrooms.
To study the impact of extended instructional time and peer math skill on algebra achievement, we exploit a “fleet of natural experiments,” one occurring within each of 60 neighborhood public high schools. The large heterogeneity in how schools implemented the policy enables us to employ an instrumental variable strategy that, under “Assumptions” section we discuss in detail, identifies the impact of classroom instructional time and peer math skill on students of average prior math skill.
We conclude that the Double-Dose policy was modestly effective, on average, for median-skill students attending these 60 neighborhood high schools that serve most of Chicago’s low-income students. However, the reform was substantially effective for these students only in schools where the policy did not induce large shifts in classroom peer math skill. The large policy impacts in those schools reveal the potential promise of the Double-Dose reform, a promise that is obscured by considering only the average impact across all schools. A caveat is that our results apply only to students whose prior achievement is in the neighborhood of the national median on pretest. However, because the pretest was measured with error, we also find that the fraction of students to whom our results apply is surprisingly large. Nevertheless, to understand how these policies may affect the entire achievement distribution, more needs to be known about how increased instructional time and classroom peer skill affect very low and very high achievers.
In the next section, we consider the relevance of this study broadly with respect to inequality in math achievement in the United States and more narrowly for curricular reforms now under way in many school districts. We review research investigating how assignment to low-skill classrooms may affect learning and, based on this literature, develop our hypotheses. We then describe our data and measures, our theoretical model, statistical methods and assumptions, findings, and implications for policy and program evaluation and reform. Last, we consider broad implications of our experience for conceiving and estimating causal effects using regression-discontinuity designs (RDDs) in education.
Background
Developed nations depend increasingly on high levels of human capital to compete in sectors of the economy that generate, organize, and use knowledge and data (de la Fuente & Ciccone, 2002; Murnane, Willett, & Levy, 1994). Especially important are skills in math, science, and engineering. The current low standing in mathematics of U.S. students relative to students in other developed nations therefore generates concerns about the long-term prospect for the U.S. economy. Moreover, international studies that compare the entire distribution of achievement reveal substantial inequality in mathematics achievement in the United States as compared with the highest achieving countries (Park, 2013). These studies also reveal alarmingly low performance of the lowest performing U.S. students as compared with the lowest performing students in the highest scoring countries. The lowest performing U.S. students are disproportionately from low-income, minority backgrounds. Thus, increasing achievement among the most disadvantaged young people, who tend to live in large U.S. cities characterized by high levels of poverty and racial segregation (Sampson, 2012), is potentially important to increasing the life chances of those students and to improving the achievement overall in the United States.
Recent Policy Initiatives
Over the past decades, the problem of low math achievement among U.S. high school students has inspired a national movement calling for more rigorous high school course requirements. The National Governors Association Center for Best Practices (2005) has recommended toughening high school graduation requirements, which include 3 to 4 years of mathematics starting with algebra in ninth grade or earlier. A college-prep curriculum for all is now in place in 20 states (Achieve, 2009). The most recently constructed “Common Core State Standards” are also consistent with these goals.
In urban school systems, which enroll the overwhelming majority of low-achieving and disadvantaged U.S. students, recent reform efforts have emphasized engaging all students in academic coursework. For example, in Chicago where this study takes place, “Algebra for All” in 1997 expanded algebra to low-performing students who traditionally enrolled in remedial arithmetic (Allensworth, Nomi, Montgomery, & Lee, 2009). Subsequently, the “Double-Dose Algebra” policy offered extra time and support to low-achieving students (Nomi & Allensworth, 2009). Programs like Double Dose, in place in nearly half of U.S. urban districts today (Council of the Great City Schools, 2009), respond to the widespread concern that struggling students need extra supports to succeed in academic courses. Other related efforts include the Talent Development Program (Kemple, Herlihy, & Smith, 2005; Mac Iver, Balfanz, & Plank, 1998), offering college-prep courses but with slower pacing for low-achieving students (White, Gamoran, Smithson, & Porter, 1996) and offering transition courses as a bridge to a college-prep coursework (White et al., 1996).
High School Math Reform in Chicago
Prior to 1997, algebra course taking in most Chicago high schools depended largely on a student’s incoming math skill: High skill students took algebra and low-skill students took remedial arithmetic. In 1997 and thereafter, all students were required to take algebra. This “Algebra for all” policy not only changed the content of instruction for low-skill students, it also decreased the segregation of classrooms based on prior math skill. Using an interrupted time-series design with nonequivalent control groups, Allensworth et al. (2009) showed that the new “Algebra for All” policy, as intended, dramatically increased algebra enrollment for low-skill students. However, on average, low-skill students’ math learning did not improve, plausibly because many students lacked sufficient mathematical background to benefit from instruction in algebra (Allensworth et al., 2009). The policy also led to declines in test scores for high-skill students (Nomi, 2012), suggesting that taking algebra with comparatively low-skill peers had a negative impact for high-skill students.
In 2003, the district launched the “Double-Dose Algebra” policy, a deliberate attempt to overcome the shortcomings of Algebra for All. The Double-Dose policy required all ninth-grade students scoring below the national median on the eighth-grade math test to take two periods of algebra—regular algebra and an algebra support class that focused on building foundational math skills. Also, in an attempt to create consistency between algebra and support coursework, the district recommends that schools offer the two classes sequentially, that the same teacher teach the two classes, and that the same set of students take the two classes together. To follow the district’s recommendations, many but not all schools created separate algebra classes—classes composed of Double-Dose students and classes composed of students not assigned to Double Dose. Thus, the Double-Dose policy had just the opposite effect on classroom composition of that generated by the Algebra for All policy. Whereas Algebra for All tended to reduce classroom segregation based on prior peer skill, Double Dose tended to intensify sorting by students’ skill levels (Nomi, 2012; Nomi & Allensworth, 2013).
Nomi and Allensworth (2009) used an “RDD” (see Cook, 2008; Thistlethwaite & Campbell, 1960) to assess the impact of the Double-Dose policy on math achievement for students near the cut score by exploiting the cut-score-based course assignment rule using the data on two cohorts of postpolicy students (the 2003 and 2004 cohorts). Their intent-to-treat (ITT) analysis, comparing students scoring just below with those scoring just above the cut point, found a significant positive ITT effect (Nomi & Allensworth, 2009). However, not everyone took the assigned course; thus, the ITT effect underestimated the impact of actually participating in Double-Dose coursework. To estimate the impact of actual program participation, Nomi and Allensworth (2009) used the cut score as an instrumental variable, providing the impact on those who were induced to take the course by virtue of scoring below the cut point, that is, the “complier-average causal effect” (CACE) or sometimes called “local average treatment effect” (Angrist, Imbens, & Rubin, 1996). Earlier studies have found significant positive effects of taking Double Dose on both sort-term and long-term outcomes, including high school graduation and college enrollment (Cortes, Goodman, & Nomi, 2015; Nomi & Allensworth, 2009).
Although these studies revealed a benefit of Double Dose, the interpretation of this finding is ambiguous in light of the fact that the policy significantly increased classroom segregation (or “sorting”) based on prior math skill. Did Double Dose improve students’ outcomes strictly because it doubled instructional time? Or did assignment to comparatively low-skill classrooms undermine the impact of policy for students whose skill was near the national median?
Hypotheses and Rationale
Past research gives us reason to predict a negative effect of attending a low-skill classroom. Argys, Rees, and Brewer (1996); Hoffer (1992); and Loveless (2009) found that high-skill students’ achievement suffered in detracked schools where classmates’ abilities were comparatively low. Nomi (2012) also found a negative effect on high achievers of having lower skill classmates in her evaluation of “Algebra for All” in Chicago. Recent studies on peer effects found positive effects of being assigned to peers with higher skills (Duflo, Dupas, & Kremer, 2011) and negative effects of having low-skill peers (Imberman, Kugler, & Sacerdote, 2012). 1 We caution that the literature does not speak with a single voice on this question. Burris, Hubert, and Levin (2006) studied the impact of expanding the pace of math instruction for all students in mixed-ability middle school classes. Attending a class with lower skill peers did not reduce the learning of higher skill students in that study (see also Boaler & Staples, 2008; Oakes, 2005; B. C. Rubin, 2008).
Researchers have offered several explanations for the observed negative impacts of low classroom peer skill. Most fundamentally, the effect would come from instructional change: Teachers have been found to pitch the level, expectation for content mastery, and pace of instruction to the median level of prior skill of the students in the classroom (Barr & Dreeben, 1983; Pallas, Entwisle, Alexander, & Stluka, 1994). For this reason, higher skill students may tend to be bored in a class with low-skill peers (Rosenbaum, 1999). Student participation and pedagogical practices are also related to academic composition of classrooms; students in high-skill classes are more likely to actively participate in the class and be engaged in discussions (Gamoran, Nystrand, Berends, & LePore, 1995). In contrast, low-track or low-skill classes tend to be more disruptive and have an overall low-quality classroom instructional environment (Oakes, 2005; Page, 1991; Rosenbaum, 1976; Wheelock, 1992). Classroom management is also problematic, perhaps in part because schools tend to assign less skilled teachers to low-skill classrooms (Kelly, 2004). Finally, assignment to a low-skill class may convey stigma or negative expectations about students’ capacity to learn math. If students internalize these expectations, it may tend to undermine their motivation to learn and hence their outcomes (Dweck, 2006; Oakes, 2005; Schafer & Olexa, 1971).
For these reasons, we hypothesize the following for median-skill students:
If H1 and H2 are correct, the impact of the policy may be highly heterogeneous for students in the middle of the prior achievement distribution. Such a result would change our belief about the magnitude of impact that we might achieve with this kind of reform while also raising tricky questions about how best to organize instruction under the policy. We turn to these questions in the concluding section of this article.
Method
Sample and Data
The CPS system is the third-largest school system in the United States. The district serves predominantly low-income and minority students; approximately 85% of students are eligible for free/reduced-price lunch, and racial composition is about 50% African American, 38% Latino, 9% White, and 3% Asian and Other races/ethnicities.
We use data on 11,296 first-time ninth graders attending 60 nonselective, comprehensive neighborhood public high schools in Chicago during the 2003–2004 academic year (Table 1). 2 The total number of algebra classes is 969 with the average class size of 24 students. We exclude students with disabilities, who comprised 18% of the first-time ninth graders. Many schools exempted students with disabilities from taking Double-Dose algebra and instead assigned these students to take algebra with other students with disabilities. We also excluded students without valid classroom data. 3 Four schools are excluded because they did not offer Double-Dose algebra at all or they put all students in Double-Dose algebra. Of students included in our analysis, approximately 86% were eligible for free or reduced-price lunch; 55% were African American and 34% were Hispanic (see Table 1). 4
Descriptive Statistics (N = 11,296)
Note. The analytic sample consists of students without disabilities who attend regular high schools.
Our outcome variables are the algebra subtest of the PLAN math test, developed by the American College Testing Service and administered to all students during the fall of their 10th-grade year, and algebra course grades. The subtest contains 22 items; raw scores were converted to a scale score, and this was standardized with a mean = 0 and standard deviation of 1. The key covariate for our analysis is the percentile scores on the Grade 8 Iowa Test of Basic Skills (ITBS) in math, which is used to determine Double-Dose eligibility. To measure classroom peer achievement, we first created a latent math score, using a vector of ITBS math scores from third through eighth grade. 5 We then computed the average of a student’s classmates on this latent math score.
Implementation of Double Dose
Students scoring below the 50th percentile on the ITBS during the spring of their eighth-grade year were expected to take Double-Dose algebra. In fact, 82% of those scoring below the cut point did take Double Dose whereas 4% of those scoring above the cut point and who were therefore not required to take Double Dose did in fact take Double Dose. Thus, compliance with policy, though not perfect, was high as suggested in Figure 1. The horizontal axis of Figure 1 displays the values of student eighth-grade ITBS test scores. The vertical axis is the fraction of students taking Double Dose. Each dot on the plot is the proportion of students taking Double Dose conditional on their shared ITBS score. Students scoring below the cut point exhibited a very high probability of taking Double Dose whereas those scoring above the cut point were very unlikely to take the course. There is a marked drop in Double-Dose enrollment at the cut point set by the district. 6

Double-Dose algebra enrollment rate by math percentile scores.
Crucially, scoring below the cut point also substantially reduced, on average, the classroom mean prior skill of one’s peers. This dramatic effect on classroom segregation is apparent in Figure 2. The left panel of the figure displays classroom average prior math achievement (vertical axis) as a function of a student’s own ITBS score (horizontal axis) 1 year prior to the implementation of the Double-Dose policy. The slope of the line describing this association is an index of segregation. To see this, note that a slope near zero would indicate a student’s prior skill does not predict the average skill of his or her classmates, that is, classrooms are not segregated based on prior skill. The positive slope of this line indicates that, in the present case, in the year prior to the policy, students with higher incoming math skills tended to have classroom peers with higher math skills.

Classroom average skill levels by math percentile scores.
The right panel of the figure displays the same association, but now during the year of the implementation of the Double-Dose policy. We see a marked discontinuity at the enrollment cut score of 50. On average, scoring below the cut point sharply reduced the classroom mean prior skill of one’s peers. This discontinuity can be regarded as the impact of the Double-Dose policy on classroom peer skill, generating, in principle, a powerful natural experiment that enables us to assess the impact of classroom peer skill for students in the neighborhood of the cut point.
In sum, scoring below the cut point on the pretest had two effects, on average: It strongly increased the probability of taking Double Dose and increased the chance of taking algebra with low-skill classmates. If these processes worked the same way in every school, we would have no basis for separating the impact of increased instructional time from the impact of classroom peer skill. However, policy implementation varied remarkably across schools in both the degree of compliance with the cut-score-based Double-Dose assignment and the degree to which schools created skill-based classroom segregation. This heterogeneity in implementation enables us to separately identify the impact of taking Double Dose from the effect of having low-skill peers under assumptions described in the next section.
Theoretical Model and Analytic Approach
Our primary analysis uses school-specific instrumental variables within a parametric model for the association between the pretest and the outcome to identify, for each school, the discontinuity at the cut point. This approach provides a statistically precise summary of all key causal effects under assumptions that we delineate. We also conduct a nonparametric sensitivity analysis using methods recommended by Imbens and Lemieux (IL; 2008).
The conceptual model for both analyses is displayed in Figure 3. For median-skill students, scoring below the cut point (“T”) increases the probability of taking Double-Dose algebra (“D”), on average, by an amount denoted by

Causal model.
Here
Primary Analysis: A School-by-School Parametric Approach
Following an approach devised by Kling, Liebman, and Katz (2007; see also Duncan, Morris, & Rodrigues, 2011), we use 60 school-specific instrumental variables to identify the impact of taking Double-Dose algebra and the impact of classroom peer skill on algebra achievement and grades.
Notation and Model
Let
Here
Our theoretical model for the impact of Double Dose and classroom peer skill on algebra achievement has a similar form
except that we cannot now estimate school-specific values
Reduced Form
To see how the procedure just described corresponds to the parameters described in Figure 3 (and Equation 1), we can derive the “reduced form,” that is, the model for ITT effect of T on Y, often called the “total effect” of T on Y. We obtain the reduced form by substituting expressions for
where
Assumptions
Reardon and Raudenbush (2013) derived the assumptions that must be met in multisite randomized trials that use site-specific instruments to identify the impact of multiple mediators of a treatment. We describe how these apply in our RDD study in Appendix A (available in the online version of the journal) in detail and provide a brief summary here.
Stable Unit-Treatment Value Assumption (SUTVA)
Under this assumption (D. B. Rubin, 1986), each participant possesses a single potential outcome under each possible treatment assignment. This implies that a participant’s potential outcome does not depend on the treatment assignment of one’s peers (D. B. Rubin, 1986). This assumption would appear naïve in an analysis that considers only the impact of Double Dose because the impact of scoring below versus above the cut point may well depend not only on taking Double Dose but also on the treatment assignment of other students. Thus, following Hong and Raudenbush (2006), we have effectively modified the conventional SUTVA to include the possible influence of classroom mean skill on potential outcomes.
Exclusion Restrictions
We assume that scoring below the cut point cannot influence classroom peer skill except by inducing a student to take Double Dose (i.e., no direct path between T and C in Figure 3); scoring below the cut point cannot influence algebra learning except by changing coursework or classroom peer skill (no direct path between T and Y in Figure 3). At one level, these exclusion restrictions are quite reasonable: There is little reason to think that scoring below versus above the cut point would have much influence unless it affects the setting in which the child learns or the course content. However, the interpretation of the effect of “classroom peer skill” is open to considerable debate. A negative effect might reflect not only peer influences but also institutional processes that influence the cognitive level, amount, or quality of instruction occurring in classrooms that vary in prior skill as described in our section on hypotheses above. We revisit this in the concluding section.
No Confounding of Treatment Assignment
This assumption is satisfied in the RDD study because treatment assignment is known (i.e., determined by the observed pretest scores). However, we must assume that the functional form of association between the variable that determines treatment assignment and the outcome is correctly specified. Although our graphical analyses shown in the “Findings” section suggest that the piecewise linear model (Equations 2–4) adequately captures the pretest and outcome relationship, we relaxed this assumption in our nonparametric sensitivity analysis.
Linearity and Additivity of Classroom Peer Effects
Our theoretical model is a linear additive function of Double-Dose participation and classroom peer effects (Equation 1). We checked these assumptions graphically and in our sensitivity analyses.
Monotonicity
We assume that scoring below the cut point cannot reduce the probability of taking Double Dose or increase classroom peer skill. We regard these as plausible assumptions, and ones that accord with our school-by-school graphical analyses.
The next two assumptions are required for estimating the independent impact of Double Dose and classroom peer skill.
The Association Between Our Instrument and at Least One of Our Two Endogenous Variables (Double Dose and Classroom Peer Skill) Must Vary Across Schools
This is demonstrated in the result section, ITT Impact on Taking Double-Dose Algebra and Classroom Peer Skill.
The School-Specific Impact of Scoring Below the Cut Point on Double-Dose Enrollment Cannot Be Perfectly Correlated With the School-Specific Impact of Scoring Below the Cut Point on Classroom Peer Composition
As discussed in the result section, ITT Impact on Taking Double-Dose Algebra and Classroom Peer Skill, this assumption is met with the overall correlation of r = −.33.
Independence of Site-Mean Compliance and Effect Assumptions
Identification of our theoretical model (Equation 1) using Equations 2 to 4 depends on several additional assumptions regarding the relationship between school-specific implementation of the policy and the impact of implementation. With regard to course compliance, we must assume that schools that comply Double-Dose policy (and therefore have large values of
Findings
ITT Impacts
ITT Impact on Outcomes 9
Controlling for the pretest (see Equation 4), scoring below the cut point increased algebra achievement, on average, by an estimated standardized effect size of
ITT Impact on Taking Double-Dose Algebra and Classroom Peer Skill
To what extent does scoring below the cutoff point induce shifts in the probability of taking Double Dose and classroom peer skill? To answer this question, we estimated Models 2a and 2b. Both effects are strong, on average, but vary substantially and significantly from school to school (see Table 2). Specifically, we estimate the average effect of scoring below the cut point on Double-Dose enrollment to be
Effect of Cut Point on Double-Dose Algebra Enrollment and Classroom Academic Composition: All 60 Schools (N = 10,131)
Note. Students whose ITBS scores are above the 75 percentile are excluded from the analysis. ITBS = Iowa Test of Basic Skills.
p < .1. **p < .01. ***p < .001.

The effect of cut score on Double-Dose algebra enrollment by the effect of cut score on classroom peer ability.
For students whose true prior achievement is around the cut point, scoring below the cut point also induces a substantial reduction in classroom peer skill, on average. Across all 60 schools, the average impact is estimated to be
As Figure 4 shows, these two effects are not strongly correlated with an estimated correlation coefficient of
Instrumental Variable Results
Effects of Double Dose and Classroom Peer Skill
Our analysis using the instrumental variable method (Equations 2 and 3) finds a significant positive impact of taking Double-Dose algebra,
Estimated Effect of Classroom Peer Ability, Controlling for Double-Dose Algebra and Effect of Double-Dose Algebra, and Controlling for Classroom Peer Ability on Algebra Scores
Note. Students whose ITBS scores are above the 75 percentile are excluded from the analysis. ITBS = Iowa Test of Basic Skills.
p < .1.**p < .01. ***p < .001.
These results suggest that the ITT impacts of the policy depend on both the level of course compliance and the degree of sorting, each of which varied considerably across schools. To graphically present the ITT effects for schools with different levels of course compliance and degree of sorting, we plotted the expected ITT effects along with their 95% confidence intervals (the y-axis) by the degree of sorting (θ) for schools with moderate course compliance rate of .6 (Figure 5) and high course compliance rate of .9 (Figure 6). These figures suggest that, given the level of course compliance, the ITT effects on algebra scores are larger for schools that did not segregate classrooms by students’ pretest scores. Also, comparing moderate-compliance schools (Figure 5) and high-compliance schools (Figure 6), we see that the ITT effects are larger for high-compliance schools with the same level of sorting (θ).

Predicted ITT effects on algebra scores by the effect of cut score on classroom mean peer skill for moderate-compliance schools (course compliance rates of .6).

Predicted ITT effects on algebra scores by the effect of cut score on classroom mean peer skill for high-compliance schools (course compliance rates of .9).
CACEs
Our results imply that, for compliers, the impact of taking Double Dose on classroom peer skill was

Predicted CACE on algebra scores by the effect of taking Double Dose on classroom mean peer skill for high-compliance schools.
Results for Course Grades
We conducted the same analysis using algebra course grades as an outcome. Prior research by Nomi and Allensworth (2009) showed that Double-Dose algebra led to higher algebra grades. Consistent with their findings, we found a statistically significant ITT effect; on average, scoring below the cut point would lead to higher algebra grades by .26 (SE = .05, t = 5.09, p < .001). Also, controlling for classroom peer ability, the average effect of taking Double-Dose algebra is .19 although this did not reach statistical significance (SE = .12, t = 1.60, p = .110). However, controlling for Double-Dose enrollment, we found a negative effect of classroom peer ability on algebra course grades,
Sensitivity Analyses
We conducted sensitivity analyses to see whether our results were sensitive to failure of the “independence of site-mean compliance and effect” assumption, the assumption of a linear additive association between C and Y, and the functional form assumption of our parametric model. Here, we present only the result on the impact of classroom peer skills—the key causal impact of this study that relies on all of these assumptions. The complete sensitivity analyses are presented in Appendix B (available in the online version of the journal). Overall, the results strongly corroborated the findings just described.
Sensitivity of Inferences About the Impact of Classroom Peer Skill on Algebra Achievement
Our parametric analysis isolated the impact of classroom peer skill by parametrically identifying ITT effects on Double-Dose enrollment and classroom peer skill school by school (Equations 2a and 2b). To check the sensitivity of our estimate to the violation of three assumptions stated above, we estimate the ITT effect and CACE—the effect of scoring below the cut point and taking Double-Dose algebra on the outcome—within a given level of course compliance γ by using nonparametric RDD as recommended by IL (2008). We then compare these impacts among schools that are similar in course compliance γ, but differ in the degree of skill-based sorting θ (i.e., high vs. low sorting). Note that these ITT and CACE estimates do not rely on the linear and additivity assumption of Equation 3 or the mean independence assumption. 11 Nor does the IL method depend on the parametric RDD model of Equations 2 and 3. Thus, if all other RDD assumptions hold, the difference in the ITT and CACE estimates between two groups of schools with the same level of course compliance, but differing in the degree of skill-based sorting, is attributable to the difference in the degree of sorting.
However, the IL method uses only the data on students who scored near the cut point. As a result, we do not have enough data to estimate ITT effects school by school. Instead, we stratified schools based on parametric estimates of
Number of Schools by Compliance Status
The estimated ITT effects on algebra scores within the 10 percentile bandwidths using the IL method were .146 for low-sorting schools and −.001 for high-sorting schools; estimated CACEs were .192 and −.001 for low- and high-sorting schools (Table 5). 12 These point estimates are qualitatively similar to those estimated by our parametric model but are less precise.
Comparisons of Parametric and Nonparametric Estimates of ITT and CACE Among High-Compliance Schools
Note. Parametric estimates are the pooled average impacts across schools within stratum (i.e., high-sorting and low-sorting schools with high course compliance). Standard errors are in parenthesis. ITT = intent to treat; CACE = complier-average causal effect.
Discussion
Our study can be regarded as a synthesis of 60 independent natural experiments. The Double-Dose policy mandated that all students take academic algebra in ninth grade, but that students scoring below the national average on the eighth-grade math test take two periods of ninth-grade math: one period of academic algebra and a second period of math coursework designed to support algebra learning of students who had fallen behind. We found remarkable heterogeneity in how these 60 schools implemented the policy. In most but not all schools, compliance with assignment to Double-Dose algebra was reasonably high. However, the schools varied enormously in the extent to which implementation induced classroom peer segregation based on prior math skill. This heterogeneity enabled us to disentangle the effects of access to course content via increased instructional time from the effects of classroom peer math skill.
Our evidence shows that for students with median skill in Chicago neighborhood high schools, the overall impact of the Double-Dose policy depended on the expansion of instructional time afforded by Double-Dose algebra and on the tendency of the policy to assign median-skill students to low-skill classrooms. For those median students, the impact of Double Dose on algebra learning had their classroom peer skill unchanged can be quite substantial. Specifically, those induced to take Double Dose by virtue of scoring below the cut point gained about .20 SDs in math achievement in schools where the policy did not segregate low-skill students in Double-Dose classrooms. In contrast, the benefit of Double Dose was very small or null when taking Double Dose meant attending a class with low-skill peers. The effect size of .20 in algebra is substantial, given that the average annual gain from Grades 9 to 10 is reported to be .25 SD on the nationally normed tests measuring broader content knowledge in mathematics (Hill, Bloom, Black, & Lipsey, 2007). 13
It appears that careful attention to course scheduling can enhance the effectiveness of expanding instructional time for median-skill students. This reasoning is based on further analyses that explored whether “mechanical factors” strongly determine the degree to which schools created sorted algebra classes. Specifically, we used the following school-level variables to predict the school-specific impact of scoring below the cut point on classroom peer ability: the school’s prior test score mean, prior test score standard deviation, neighborhood disadvantage, 14 racial–ethnic composition, fraction of children in special education, and compliance on Double Dose. Our descriptive analysis showed that, on average, schools that created greater sorting tend to have somewhat lower school mean pretest scores, to enroll a larger percentage of African American students and a smaller percentage of Hispanic students, and to have a higher concentration of poverty. However, these predictors together explained less than 17% of the variance in the ITT impact on classroom peer ability. 15 The fact that such mechanical factors do not determine the degree of sorting associated with implementation of the policy implies that there is a role for school administrators in designing the Double-Dose schedule, and hence shaping the impact of the policy.
A limitation of our design is that our findings apply only to those near the median of the pretest distribution. However, we find that this group is likely to be a fairly large segment of the high school population of interest here. Using data on prior achievement, ethnicity, and social background, we estimate that roughly one quarter of the students had near zero probability of scoring above the cut point whereas a similar number had virtually no chance of scoring below the cut point. Our findings do not provide information on how these very low-scoring students (who have negligible chance of scoring above the cut point) or very high-achieving students (who have negligible risk of scoring below the cut point) might respond to shifts in classroom peer achievement or extra instruction. To identify these effects will require an alternative research strategy and will constitute a topic for subsequent research. Yet, in our data, nearly 50% of all students have nonnegligible chance of scoring either above or below the cut point. This reflects the fact that the pretest score measured at one time point (i.e., eighth grade) contains measurement error.
Although providing insight into the varied impacts of the Double-Dose policy, our results also contribute to the literature on the impact of classroom peer skill on math learning in secondary school. We found a quite substantial negative impact on median-skill students of taking algebra with low-skill peers. This finding is consistent with Nomi’s (2012) finding that high-skill students lost ground when assigned to low-skill classrooms under the Algebra for All policy. Past research suggests that a negative impact of attending a low-skill class might result from several plausible mechanisms, including not only peer effects but also institutional effects.
One plausible explanation for such effects is that the average prior skill of classmates constrains the pace and conceptual level of math instruction that a teacher can provide, at least under conventional pedagogical approaches (e.g., Barr & Dreeben, 1983; Gamoran, 2010; Oakes, 2005; Page, 1991). This might be particularly true if the teacher engages in whole-class instruction, the norm in high school math instruction. In addition, Gamoran et al. (1995) found that students in classes with higher achieving students are more likely to engage in discussions and student-initiated activities. There is also evidence that school leaders often assign comparatively low-skill teachers to low-skill classrooms (Kelly, 2004) or that low-skill peers become discouraged and display low motivation for learning that negatively influences the classroom climate (e.g., Oakes, 2005; Page, 1991). There is also good reason to suppose that assignment to a low-skill class can convey negative expectations for math learning to those so assigned. If such students internalize a negative stigma, their motivation learning may decline, leading to lower math achievement (Oakes, 2005; Schafer & Olexa, 1971).
However, our results showed that the effect of low-skill classroom peers is opposite for algebra course grades, that is, for students with the median skills, their grades tend to be higher in classes with low-achieving peers. This may be because these median-skill students would be the highest achieving students in the Double-Dose algebra class if schools segregated students on the basis of the cut point (i.e., “Fish-pond effects”) and/or these students exceed expectations in low-skill classrooms if teachers set different expectations according to overall students’ skill levels (e.g., Kelly, 2008).
A key question with regard to algebra content mastery (measured by test scores) that we have not answered is whether very low-skill students benefit from taking Algebra with higher skill peers. If they do, school leaders face a difficult trade-off unless new methods of instruction are found that can optimize learning for students of heterogeneous skill. Prior research has demonstrated some evidence of successful detracking (e.g., Burris et al., 2006); however, this evidence typically comes from high-achieving and well-resourced schools as compared with a typical school in an urban district like Chicago. Thus, generalization of this research, particularly in understanding how very low-achieving students would respond, may be quite limited. 16 However, if low-skill students do as well in homogeneous classes as in heterogeneous classes with higher achieving peers, then our findings, combined with those of Nomi (2012), would seem to suggest that math instruction works best overall when conduced in classrooms composed of students who are homogeneous in prior skill. Unfortunately, the RDD method cannot assist us in studying the impact of classroom peer skill on math learning for students with very low or high skills.
Our results also have implications for RDD as a strategy for obtaining valid causal inference. When a pretest determines assignment to a treatment, we have an unusually strong quasi-experiment. For those induced to take up a treatment by scoring below a cut point, we can rule out the standard concern that unobserved covariates are generating selection bias. However, when the treatment is administered in a group setting (e.g., schools), we can anticipate that the cut-score-based treatment assignment is likely to cluster low-scoring students together for instruction. This could, in turn, influence the nature of the intervention. Gibbs (2015) brought this problem to light in her study of half- versus full-day kindergarten. Her multidistrict randomized trial showed a positive effect of full-day kindergarten on emerging literacy skills, with particularly pronounced effects for disadvantaged children. In contrast, in districts using the cut-score-based assignment approach, where only low-income students were provided full-day kindergarten, the impacts were nonsignificant. A plausible explanation is that, in the “means-tested” RDD study, low-income students were clustered together for instruction, which was not the case in the randomized study where the intervention applied universally, regardless of income. In contrast, the randomized trial was a universal intervention in which such clustering of low-income children did not occur.
Technically, an RDD study like ours and that of Gibbs is two dimensional: Falling below a cut point not only increases instructional time, but it also shapes the peer composition of the social setting in which instruction occurs. In these studies, it becomes important to disentangle different sources of influence.
The broad theoretical lesson in our study seems important: Instructional time and class composition jointly shape learning opportunities and therefore influence the distribution of human capital during adolescence.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This study was supported by a grant from the US Department’s Institute of Education Sciences entitled “Making a Success of “Algebra for All: A Fleet of National Experiments in Urban Curricular Reform” (Grant # R305A120640) and a grant from the WT Grant Foundation entitled “Building Capacity for Evaluating Group-Level Interventions”.
Notes
Authors
TAKAKO NOMI is an assistant professor of education at Saint Louis University. Her areas of specialization include urban education, education policy, inequality in education, social organization of schools, and research methodology. Her recent work examines how district or state education policies affect the distribution of academic outcomes, variation in the intervention effects, and mechanisms to account for impact variation. She received a PhD in educational policy and theory from Penn State University.
STEPHEN W. RAUDENBUSH is the Lewis-Sebring Distinguished Service Professor in the Department of Sociology, the College and the Harris School of Public Policy, and chairman of the Committee on Education at the University of Chicago. He is interested in statistical models for child and youth development within social settings such as classrooms, schools, and neighborhoods. He is best known for his work developing hierarchical linear models, with broad applications in the design and analysis of longitudinal and multilevel research. He is currently studying the development of literacy and math skills in early childhood with implications for instruction, and methods for assessing school and classroom quality. He is a member of the National Academy of Sciences, the American Academy of Arts and Sciences, and the recipient of the American Educational Research Association award for Distinguished Contributions to Educational Research. He was awarded an EdM in administration, planning, and social policy, and an EdD in policy analysis and statistics from Harvard University.
