Abstract
What is schools’ role in the stratification system? One view is that schools are an important mechanism for perpetuating inequality because children from advantaged backgrounds (white and high socioeconomic) enjoy better school learning environments than their disadvantaged peers. But it is difficult to know this with confidence because children’s development is a product of both school and nonschool factors, making it a challenge to isolate school’s role. A novel approach for isolating school effects is to estimate the difference in learning when school is in versus out, what is called impact. Scholars employing this strategy have come to a remarkable conclusion—that schools serving disadvantaged children produce as much learning as those serving advantaged children. The empirical basis for this position is modest, however, and so we address several shortcomings of the previous research by analyzing a nationally representative sample of about 3,500 children in 270 schools from the Early Childhood Longitudinal Study–Kindergarten Cohort of 2011. With more comprehensive data and better scales, we also find no difference in impact on reading scores across schools serving poor or black children versus those serving nonpoor or white children. These patterns challenge the view that differences in school quality play an important role shaping achievement gaps and prompt us to reconsider theoretical positions regarding schools and inequality.
Keywords
What role do schools play in the stratification system? The dominant view in the sociology of education has been a critical one—advantaged children enjoy substantially better schools than disadvantaged children, and so schools play an important role in generating and maintaining inequality. A subset of scholarship, however, challenges this position. Research that carefully isolates school effects (e.g., Downey, Von Hippel, and Hughes 2008) questions whether children experience more learning in schools that serve primarily advantaged versus disadvantaged children. Downey and colleagues’ (2008) “failing schools” study found that schools serving advantaged children produced no more school-based learning than did schools serving disadvantaged children. This empirical result is foundational to a broader, theoretical argument that schools mostly reflect rather than generate inequality and in some ways are even compensatory (Downey and Condron 2016).
Given how important it is to know whether schools promote more learning for advantaged versus disadvantaged children, there has been surprisingly little debate over the results from the failing schools study. Instead, the view that the school system generally favors the advantaged has largely persisted. For example, the opening paragraph of the latest Handbook of the Sociology of Education for the 21st Century states: “We need to understand why the educational needs of those with limited economic and social resources remain unheeded while the institutions that serve them remain woefully inadequate” (Schneider 2018:xvii). Similarly, in the leading sociology of education textbook for college students (Schools and Society: A Sociological Approach to Education, 6th ed.), the six readings in the “Stratification and Schools” section all emphasize ways schools reproduce or exacerbate inequality (Ballantine and Spade 2017). The failing schools study does not seem to have prompted scholars to rethink how schools influence inequality in cognitive skills.
The disjuncture between the dominant, critical narrative about schools and inequality and emerging scholarship on cognitive skills is significant. Advancing our understanding depends heavily on directly engaging contradictory evidence. The failing schools study is now more than a decade old, yet we are aware of no research that has assessed its robustness or critiqued its assumptions. This is unfortunate because there are several reasons to believe the failing school patterns may not accurately portray the relationship between schools and the opportunity to learn. First, the study has not been replicated. This may be due to the scarcity of seasonally collected data, but it leaves the field in an awkward position because such a crucial pattern is based on a limited empirical foundation. Estimates of summer or school-year learning can be unstable (von Hippel and Hamrock 2019), and results from the failing schools study are based on a single summer (after kindergarten) and a single academic year (first grade). Second, the failing schools study used a number-right scale that lacked interval properties. It is therefore not clear that gains at the bottom of the scale represent the same amount of learning as gains at the top of the scale, undermining the study’s impact measure (the difference in learning observed during the school year vs. the summer). Finally, impact estimates rely on the assumption that differences in nonschool investments across groups are constant across season. The failing schools study did not test the plausibility of this assumption.
As a result, there is significant value in reevaluating the claims from the original failing schools study and assessing their robustness. We do so by analyzing another nationally representative data set, the Early Childhood Longitudinal Study 2011 (ECLS:2011), which has several advantages over the data assessed in the 2008 study. First, it has three years of academic data (kindergarten, first grade, and second grade) and two summers, allowing for a more comprehensive evaluation of school impact. Second, the data have item-response theory-based theta reading and math scores, which have greater theoretical claim to interval-level status than do the number-right scores used in the original study. And third, these data allow us to evaluate whether nonschool investments are constant across seasons, an assumption that went untested in the original study.
The stakes are high. If school quality is distributed in the way traditionally thought, with high-socioeconomic and white children enjoying the best opportunities to learn, then efforts to improve the schools serving disadvantaged children are key to improving their life opportunities. School inequalities may be a fundamental cause of broader social inequalities. If so, efforts to reduce achievement gaps should include finding ways to attract better teachers to schools serving the disadvantaged, improving the resources and efficiency of these schools, and combatting the segregation of advantaged and disadvantaged children within and between schools. But if schools serving disadvantaged children are performing better than previously thought and achievement gaps are mostly formed prior to kindergarten, we would want to direct greater energy toward improving children’s early childhood conditions to prevent large gaps from emerging in the first place. 1
Background
The notion that schools make a meaningful contribution to inequality prompts us to consider how school quality is distributed across schools serving various social groups in the United States. On its surface, this question hardly seems worth asking: It appears clear that advantaged children enjoy the best schools. But scholars have debated this issue ever since the 1966 Coleman Report (Coleman et al. 1966) determined that indicators of school characteristics correlate only modestly with children’s cognitive skills. Coleman’s provocative conclusion—that schools play a mostly neutral role in shaping achievement gaps—struck many critical scholars as misplaced and energized a line of research that highlights the many school processes that reproduce or exacerbate inequality. This literature is too extensive to review fully here, but key points include Bowles and Gintis’s (1976) argument that schools create workers in a capitalist economy who know their place; Bourdieu and Passerson’s (1977) view that schools favor children who exhibit the “cultural capital” (e.g., speech, style, dress) of the elite; and Collins’s (1979) observation that school credentials serve to legitimate the reproduction of inequality.
Supplementing these theoretical arguments, the critical perspective describes a wide range of concrete ways schools provide better learning environments for children from advantaged versus disadvantaged families. For example, funding plans heavily dependent on local property taxes result in widely varying financial resources for children attending schools in well-to-do versus poor areas (Kozol 1991). And schools serving advantaged children are better positioned to attract and retain top teachers, expose children to more advanced coursework, and produce an atmosphere with success-oriented peers (Duncan and Murnane 2014). Residential segregation results in high-income white families concentrated in similar neighborhoods, producing segregated schools (Reardon 2018), which may exacerbate inequality because children often learn more when they are in classrooms with advantaged peers (Hanushek et al. 2003). And discriminatory practices, especially those related to discipline, result in harsher punishments and a more difficult path for educational success for disadvantaged social groups (Morris and Perry 2016; Noguera 2008; Noguera, Hurtado, and Fergus 2013; Skiba et al. 2002).
Schools as Neutral or Compensatory
In contrast to the critical view that schools exacerbate inequality, some scholars have agreed with Coleman’s original claim, that schools are mostly neutral and the primary engine of inequality lies outside of schools (Jencks 1972; Rothstein 2004). And a minority have even suggested that schools may be compensatory institutions, at least under some conditions (Downey and Condron 2016). This group emphasizes how socioeconomic achievement gaps in children’s math and reading skills are mostly formed by kindergarten entry and do not grow appreciably during the school years (Reardon 2011). In addition, some studies find that once in school, socioeconomic achievement gaps grow faster during the summer than during the school year, a pattern consistent with the position that schools do more to reduce than increase inequality (Downey, von Hippel, and Broh 2004; Entwisle and Alexander 1992; Heyns 1978; Quinn et al. 2016; von Hippel, Workman, and Downey 2018).
The prospect that schools reduce inequality seems difficult to fathom, in part because so little theoretical work describes compensatory school processes. One possibility is that schools are simply less unequal than the inequality produced by nonschool institutions (Downey et al. 2004). The argument here is not that schools serving disadvantaged children are as effective as those serving advantaged children but that schools are compensatory because they are relatively more equal environments than what children experience elsewhere.
A stronger version of the compensatory argument is that schools are more effective at serving disadvantaged students than advantaged students. Again, this position is difficult to imagine given the challenges schools serving disadvantaged children face in attracting and retaining teachers (Ronfeldt, Loeb, and Wyckoff 2012) and because exposure to disadvantaged peers is associated with lower learning rates (Perry and McConney 2010). But the observation that socioeconomic gaps sometimes grow faster when school is out versus in prompts a consideration of potentially compensatory school mechanisms.
Schools might reduce inequality through several avenues (Downey and Condron 2016). One possibility is curriculum consolidation. Readers may be more familiar with the term curriculum differentiation, in which children are exposed to different material and learning conditions through ability grouping, tracking, and retention processes. But it is worth considering how schools also consolidate children’s learning experiences, grouping them together even when their skills are unequal. As one example, schools tend to organize children by chronological age, even when their skill levels are highly disparate. At a broader level, standardized curriculums, such as the Common Core, are likely compensatory relative to systems with less standardized curriculums. Schools may also reduce inequality by targeting resources toward disadvantaged children via policies such as Title 1, Head Start, the Rehabilitation Act of 1973, and the 1990 Americans with Disabilities Act. These policies were initiated with the intent of helping disadvantaged children, and to some extent, they have succeeded (see Ludwig and Miller 2007). Finally, although some teachers exhibit discriminatory behavior, on average, teachers may be mostly egalitarian actors. For example, teachers report spending more time and effort helping “struggling students” than helping “advanced students” whose success they may take for granted (Duffett, Farkas, and Loveless 2008).
The notion that meaningful compensatory school mechanisms may exist has prompted some scholars to call for a rethinking of schools’ role in the stratification system. Downey and Condron (2016) encourage a theoretical shift away from thinking of schools as a generator of inequality. They suggest that when children arrive at kindergarten, they are already on strong trajectories that schools “refract”—sometimes increasing inequality, sometimes reproducing inequality, and other times reducing it. From this perspective, school is more of an institution where inequalities are observed (and slightly altered) rather than created.
Schools and the Black/White Achievement Gap
The compensatory view of schools is currently debated (Carter 2016; Schneider 2016; Torche 2016) in part because even the evidence from seasonal comparison studies does not consistently support it. Most notably, seasonal studies often find that black/white gaps grow faster when school is in versus out (Downey et al. 2004; Kuhfeld, Condron, and Downey 2019; Quinn et al. 2016; von Hippel et al. 2018; von Hippel and Hamrock 2019), a pattern consistent with the critical view of schools. These race patterns undermine the notion that schools are a “great equalizer,” at least across all dimensions, and they alert scholars to school mechanisms, possibly school racial segregation (Condron 2009), that may compromise the learning of black versus white children. 2
The original failing schools study, however, found no relationship between school-based learning and a school’s racial composition. Further exploration of this relationship can contribute to our understanding of whether school racial composition is a mechanism by which black/white achievement gaps are maintained.
Isolating Schools’ Contribution to Learning
Adjudicating among these various views of how schools matter is especially challenging because children spend so much of their time outside of school. It is thus hard to know, for example, if the higher test scores generally produced by schools serving children from advantaged backgrounds are due to something about the schools or something about the kinds of children (and their families) they happen to serve. To understand how schools vary in promoting learning requires a research design where it is reasonable to assume that “all else is equal” about children’s lives outside of school.
Random assignment studies meet this requirement and can be especially informative about the kinds of school characteristics that most effectively promote learning. These studies suggest children learn more in smaller versus larger classrooms (Mosteller 1995), and schools that use comprehensive reform, such as the Harlem Children’s Zone (Dobbie and Fryer 2011), the Knowledge is Power Program (KIPP), and the University of Chicago Charter School (Hassrick, Raudenbush, and Rosen 2017), may produce more learning than traditional public schools. School reforms at charter schools (e.g., increased school exposure, greater sharing of information across teachers and staff) reduced black/white gaps among children in Chicago public schools by nearly two-thirds (Hassrick et al. 2017). And Boston preschools that used the Opening the World of Learning (OWL) literature program closed the black/white gap by one-third (Duncan and Murnane 2014). 3
The primary limitation of this experimental evidence, however, is that although it suggests schools could reduce inequality, it does not reveal whether schools, as currently constituted, are responsible for producing gaps in the first place. In addition, experimental studies typically have limited generalizability. Determining how learning opportunities in schools are distributed across social groups requires that we assess current conditions in schools with data that are representative of a broad population of interest.
These issues lead us to nationally representative survey data, which are better positioned to describe schools’ current role in shaping inequality and produce a generalizable assessment of schools. Observational survey data, however, have more limited leverage for isolating school effects. A common approach for dealing with this challenge is to statistically equalize children on a wide range of nonschool characteristics (e.g., socioeconomic status, family structure, race, gender). Yet this strategy is limited because the indicators available in most large data sets fail to fully measure all the ways children’s nonschool environments vary.
A frequent solution to the weaknesses of covariate adjustment is to estimate learning (or changes in skills) rather than skills at one point in time. The advantage to this strategy is that it subtracts out earlier influences on students’ initial achievement levels that are often unobserved or poorly measured in typical surveys (e.g., stress level in the home, provision of health care, intellectual stimulation provided to the child, family wealth, grandparents’ education). Growth models control for these factors (to the extent they are time-invariant) and can change our ideas about how schools matter. For example, cross-sectional analyses reveal a strong negative association between exposure to classroom poverty and children’s skills, but Lauen and Gaddis (2013) find that most of this association disappears when estimating growth in skills, and it is eliminated in student fixed-effects models that even more effectively address confounds. Similarly, Lubienski and Lubienski (2013) predict math learning from kindergarten until the end of fifth grade among the ECLS:1998 kindergarten cohort. They find that although children in private schools had higher math scores at the descriptive level, in multivariate models predicting growth, children in public schools actually outperformed their private school counterparts. 4
Models predicting growth in reading and math skills surely provide a more accurate assessment of how school-based learning varies across school types, but even this approach may not produce a fair assessment of schools. One problem is that these models typically rely on annual data and therefore mix summer learning in with school-year learning. Summers bias estimates of school effects because children learn at different rates during the summer for reasons unrelated to schools (Soland and Thum 2019).
An alternative approach to the problem of estimating how schools matter is to use information about nonschool periods as leverage rather than as a confound. For example, schools’ contribution to learning could be thought of in terms of how much schools change children’s learning trajectories. In practical terms, one could evaluate the difference between learning observed during the school year versus the summer. Under certain assumptions, observing how learning rates change when children are in school versus not in school could isolate the school “effect” without having to identify and measure all the school and nonschool factors at stake. This analytic approach is similar to a crossover design in medical research, where the effects of a drug are determined by observing differences in patients’ outcomes while off and on treatment. In this design, one can think of schools as the treatment, and the school effect is gleaned by observing how outcomes change when the treatment is applied versus not.
To date, studies using this kind of impact measure are rare, in part because it requires data collected over multiple years and at both the beginnings and ends of school years. In one of the few studies to use this method, Downey and colleagues (2008) analyzed children in the ECLS-K:98 and gauged schools on the basis of the difference in average reading gains during the summer after kindergarten versus the first-grade school period. Compared to school rankings based on test scores at one point in time or learning over a 12-month period, this approach changed ideas about which schools were performing well. The percentage of children eligible for free lunch and the percentage of disadvantaged minority students at the school were strongly correlated (negatively) with test scores at the end of first grade, weakly correlated (negatively) with growth between the end of kindergarten and the end of first grade, and uncorrelated with impact. Downey and colleagues thus concluded that we should question the assumption that schools serving advantaged children provide considerably better learning opportunities than schools serving disadvantaged children.
Extending Past Research
Our study furthers work assessing how school-based learning is distributed across social groups in several ways. We substantially increase the empirical basis of the impact measure with newer and better data. We analyze the ECLS-K:2011, which has several key advantages over the ECLS-K:98 data used by Downey and colleagues (2008). First, this newer data set allows us to develop a more comprehensive impact measure of schools. Whereas the failing schools study relied on a single academic year and a single summer, our estimates of school impact take advantage of three academic years (kindergarten, first grade, and second grade) and two summers.
Second, determining the distribution of school-based learning via an impact measure requires an interval-level scale in which gains at the bottom of the scale are similar to gains at the top. The number-right scale used in the failing schools study did not meet this requirement as gains at different points on the scale did not mean the same thing (von Hippel and Hamrock 2019). Exactly how this problem affected the impact estimates from the failing schools study is unclear, however.
We analyze a theta scale, which has a stronger claim to interval-like characteristics than does the number-right score used in the failing schools study. When model assumptions hold, the theta scale is vertically aligned across waves (e.g., we can compare gains made in kindergarten, first grade, and second grade) and specifically designed to allow comparisons of groups starting at different places on the scale. Although this scale is clearly an improvement over the number-right scale, it may not meet assumptions precisely, and so it should be thought of as “approximately” interval (Reardon 2008). 5
Finally, an important assumption of the impact measure is that the nonschool effect on children’s learning is constant across both school and nonschool periods. We do not know, however, whether nonschool influences on children’s learning are constant across seasons. For example, suppose high-SES parents outinvest their low-SES counterparts during the summer and increase this advantage further during the school year. To assess the degree to which we are violating this assumption, we measure several parental investments during school and summer periods and statistically control for the most critical of these as a way of assessing the robustness of our models.
Research Methods
Sample
We analyze restricted-use data from the Early Childhood Longitudinal Study–Kindergarten Cohort 2010–11. The ECLS-K:2011 used a three-stage sampling procedure, in which 90 Primary Sampling Units (PSUs, counties or groups of contiguous counties) were drawn first, followed by a sampling of 968 schools within PSUs, and then roughly 19 students within each school (Tourangeau et al. 2015). Children were observed at the beginning and end of kindergarten and the end of first and second grades. A random subsample of roughly 3,800 students were also observed at the beginning of first and second grades, allowing for estimates of school-year versus summer learning across three school periods. Our analyses include only students who attended schools with a traditional nine-month calendar, did not switch schools during this time period, and had complete data on the relevant variables (resulting in 17,140 observations from 3,000 students in 230 schools for our main models and rounded to nearest 10 per NCES requirements). 6
Reading skills
Reading tests assessed children’s basic skills (print familiarity, letter recognition, beginning and ending sounds, rhyming words, word recognition), vocabulary knowledge, and reading comprehension (Tourangeau et al. 2015). The test followed a two-stage adaptive assessment: Children were first given an initial routing test and then, depending on their performance, guided toward a main test of low, high, or (in some grades) medium difficulty.
Students’ abilities were estimated using a three-parameter logistic item response model in which the probability that child c answers item i correctly is a scaled inverse cumulative logistic function of the child’s ability
By fitting this model, psychometricians working on the ECLS-K produced estimates
In theory, when model assumptions hold, theta scores are vertical interval measures of ability, at least in the technical sense that they are linear in the log odds that a student will correctly answer an item of given difficulty, discrimination, and guessability (Ballou 2009). Early impact studies analyzed “scale” scores (which estimate the number of items students would have answered correctly had they been presented with every item in the item bank) because the ability scores had not yet been released (Downey et al. 2008). This kind of scale score falls short of meeting interval requirements because it treats each additional item correct as representing equal increments of ability (Reardon 2008; von Hippel and Hamrock 2019). Theta scores were included in later releases of data for the older ECLS-K cohort, and in all data releases for the new cohort, but have yet to be analyzed via an impact model.
Analytic Strategy
Similar to Downey and colleagues (2008), we analyze school-based learning through three measures: achievement, growth, and impact. Achievement is simply a school’s deviation from the grand mean reading score at the end of second grade. We view this as a naïve estimate of school-based learning because it is clearly the product of both school and nonschool factors. Nevertheless, we present it as a baseline for how ideas about the distribution of school-based learning change as we use methods that more persuasively isolate school effects. Our second measure we call “growth.” This indicator represents the average learning rate a school’s children experienced during three nine-month school periods (kindergarten, first grade, and second grade). To target school-based learning, changes in skills during the two intervening summers are omitted.
Finally, we assess school-based learning via impact. We gauge schools’ impact by observing their students’ average learning rates while out of school and then estimating how much faster children learn while in school. A school’s impact is thus captured by the average difference between school period and summer learning among its students.
We arrive at our school effectiveness measures by fitting multilevel models that estimate the overall means and standard deviations for achievement and growth. Through post hoc combinations of the model estimates, we estimate the mean and standard deviation of impact across schools. By adding school-level characteristics to the model (school percent free or reduced-price lunch [FRL] and student racial/ethnic demographics), we test whether school characteristics predict variation in these measures of school effectiveness.
With three school years and two summers of data available, we have multiple options for constructing the learning and impact measures and for parameterizing the fitted model. We use different models for different purposes, and we test the sensitivity of our results across various modeling strategies. We present an overview of our analytic strategy here, and we provide extended discussions in the appendices in the online Supplemental Material.
To summarize learning and impact across seasons and schools, we fit models of the following form:
where
When
Despite the interpretive usefulness of these coefficients, empirically monthly learning rates decline each school year in the
To test whether school characteristics predict each measure of school effectiveness, we add to Model 1 the main effects of the proportion of a school’s sampled students by race/ethnicity (non-Hispanic black, Hispanic, Asian, and “other race,” with white as the omitted group) and school proportion FRL as well as the interactions of these variables with each time period variable (school year and summer). In the multilevel formulation of the model, these main effects predict school random intercepts (or the “achievement” measure, school mean performance at the end of second grade). Their interactions with the
Our study is novel in the way we address an assumption of the impact model—that parental investments are constant across seasons. We are able to assess this assumption in a limited way by testing whether seven parental investments (family member read to child; visited a library or bookstore; attended a play, concert, or live show; attended a museum; visited the zoo; used a computer for educational purposes; and child had an individual tutor) varied by season and by SES. After exploratory analyses, we decided to focus on how frequently a family member reads to a child because it had the strongest associations with children’s reading growth and had consistent wording across waves. As a result, we produce adjusted impact measures based on models controlling for the amount of book reading parents reported doing with their children each season. These supplemental analyses provide a way to assess the vulnerability of the impact model to the assumption that nonschool investments are constant across season (see Appendix D in the online Supplemental Material).
Results
Table 1 presents results from fitting Model 1 with reading theta scores divided by the fall kindergarten SD as the outcome. Consistent with past research, we find that on average, students make reading gains over the school year but not over summer vacation. The average monthly rate of growth across school years is .131 SD; the mean growth rate across the two summers is close to zero and not statistically significant. This yields an average school impact of .130 SDs.
Estimates from Mixed-Effects Longitudinal Model for Average Monthly School-Year Learning Rates across Grades, Average Monthly Summer Learning Rates, End of Second-Grade Achievement, and Impact.
Note: n = 17,140; child-level n = 3,000; school-level n = 230 (rounded to nearest 10 as required by NCES). Standard errors/95 percent Wald confidence intervals are in parentheses. Outcome is vertically equated theta score divided by (reliability-adjusted) fall kindergarten SD for interpretability. Estimated from mixed-effects model where observations are nested within students who are nested within schools. Model includes random intercepts for students (not shown). School random slopes are allowed to covary with school-level random intercepts and with each other (intercorrelation estimates not shown). Time period variables (school year and summer) are set to a maximum of 0 to allow intercept to be interpreted as end of second grade mean achievement.
p < .001.
Variation in School-Based Learning
Table 2 presents results from Model 1 with wave-standardized scores as the outcome. We feature the model’s random effects and the intercorrelations among random effects (given that the fixed effects are less meaningful for the wave-standardized outcome).
Estimates from Mixed-Effects Longitudinal Model Estimating Average Monthly z-Score Change.
Note: Estimates shown for standard deviation of school-level random slopes across school years and summers, end of second grade achievement, and school impact. n = 17,140; child-level n = 3,000; school-level n = 230 (rounded to nearest 10 as required by NCES). Ninety-five percent Wald confidence intervals are in parentheses. Outcome is (reliability-adjusted) wave-standardized theta score. Estimated from mixed-effects model where observations are nested within students who are nested within schools. Model includes random intercepts for students (not shown). Time period variables (school year and summer) are set to a maximum of 0 to allow intercept to be interpreted as end of second grade mean achievement.
Table 3 presents results from the model with reading theta scores (divided by the fall kindergarten SD) as the outcome while also incorporating the school characteristics. Again, we include the proportion of sampled students in each school who are eligible for FRL (measured in Wave 2) and who are black, Asian, Hispanic, or “other race” (which includes Native American, Alaska Native, Pacific Islander, and multiracial [collapsed due to their small sample sizes]). In this table, columns represent the random school-level intercept, slope, or impact measure being predicted, and rows represent the school-level variable predicting these school random effects (in the collapsed model formulation, estimates in the “end of second grade” [i.e., intercept] column report the main effects of each school-level variable, and the “School Year” and “Summer” columns report the coefficients for the interactions of each school-level variable with the respective time period variable). Table 4 displays the results from the same model using wave-standardized scores as the outcome. Again, we include these results to provide more interpretable random-effect estimates while also testing the sensitivity of results to measures of learning versus change in relative status.
Estimates from Mixed-Effects Longitudinal Model for Average Monthly School-Year Learning Rates, Average Monthly Summer Learning Rates, End of Second Grade Achievement, Impact (in fall kindergarten SD units), and Random Effects.
Note: n = 17,140; child-level n = 3,000; school-level n = 230 (rounded to nearest 10 as required by NCES). Standard errors/95 percent Wald confidence intervals are in parentheses. Outcome is vertically equated theta score divided by (reliability-adjusted) fall kindergarten SD for interpretability. School proportion white omitted. Estimated from mixed-effects model where observations are nested within students who are nested within schools. Model includes random intercepts for students (not shown). Time period variables (school year and summer) are set to a maximum of 0 to allow intercept to be interpreted as end of second grade mean achievement. FRL = free or reduced-price lunch.
~p< .10. *p < .05. ***p < .001.
Estimates from Mixed-Effects Longitudinal Model for Average Monthly School-Year z-Score Change, Average Monthly Summer z-Score Change, End of Second Grade Achievement, School Impact, and Random Effects.
Note: n = 17,140; child-level n = 3,000; school-level n = 230 (rounded to nearest 10 as required by NCES). Standard errors/95 percent Wald confidence intervals are in parentheses. Outcome is (reliability-adjusted) wave-standardized theta score. School proportion white omitted. Estimated from mixed-effects model where observations are nested within students who are nested within schools. Model includes random intercepts for students (not shown). Time-period variables (school year and summer) are set to a maximum of 0 to allow intercept to be interpreted as end of second grade mean achievement. Percentage reduction variance = percentage reduction in the random-effects variance from Table 2. FRL = free or reduced-price lunch.
~p < .10. *p < .05. **p < .01. ***p < .001.
Three Different Measures of Schools, Three Different Stories
The results in Table 2 show that the various measures of school quality operate differently. First, there is much more variation in the achievement measure (i.e., end of second grade achievement, SD = .508) than in school year z-score changes (SD = .018). Not surprisingly, however, there is more variation in impact (SD = .065) than in school year changes. The achievement measure has a small (and nonsignificant, imprecisely estimated) correlation with school year changes (r = .144, n.s.). In contrast, achievement has a modest correlation with summer changes (r = .383). This suggests average school-level performance at the end of second grade is due more to the learning students experience outside of school (over the summer and likely prior to school entry) than to learning over the school year. The negative correlation between achievement and impact (r = –.275) shows these two measures capture different aspects of school “effectiveness.” That is, students in schools with lower mean performance at the end of second grade tend to make larger relative gains over the school year compared to the summer; students in schools with higher mean performance at the end of second grade tend to have relatively smaller gains over the school year compared to the summer. This is also consistent with the negative correlation between school year z-score changes and summer z-score changes (r = –.553).
Another way to gauge differences across school effectiveness measures is to compare the extent to which school-level variables explain the variation in these measures. The bottom row of Table 4 includes the percentage reduction in variance of the random effects from the baseline model in Table 2 (without any school-level predictors). School proportion FRL and school race/ethnicity predict over half of the variation in the achievement measure (57.8 percent), but they predict only 6.9 percent of the variation in school year test score changes. In fact, the school-level variables are associated with a higher portion of the variation in summer test score changes (13.6 percent) than school year test score changes. The collection of variables predicts a similar amount of variation in school impact (14.3 percent).
Note that schools with higher proportions of Asian and Hispanic students tend to show more reading growth over summer vacation (compared to schools with higher percentages of white students). This is consistent with previous analyses of the ECLS-K:2011 showing that some racial/ethnic achievement disparities tend to narrow over the summer (Quinn et al. 2016). Next, we examine the extent to which school characteristics covary with each measure of school quality.
Achievement
If we compare schools on the basis of children’s achievement scores, in this case their average reading score at the end of second grade, then our understanding of which schools are doing well is very close to the traditional narrative. Controlling for school race/ethnicity, schools with high proportions of children eligible for FRL have lower achievement levels, on average. We see this in the cell where the “School proportion FRL” row and “End of Second Grade” column meet in Table 3: Schools with 100 percent FRL are predicted to have average achievement fall kindergarten SD .55 lower than schools with 0 percent FRL (controlling for school race/ethnicity). And Table 4 shows that schools with 100 percent FRL are predicted to have average achievement .85 wave-specific SD lower than schools with 0 percent FRL (controlling for school race/ethnicity). Controlling for school FRL, schools with higher percentages of Asian students also have higher achievement, on average. Each 10-percentage point increase in the share of a student body that is Asian (vs. white, holding constant school FRL) predicts higher achievement by approximately 2.5 percent of a fall kindergarten SD (p < .10, Table 3, “School proportion Asian” row) or approximately 4.6 percent of a wave-specific SD (Table 4). Schools with higher percentages of Hispanic students are predicted to have lower achievement after controlling for school FRL, with each 10-percentage point increase in a school’s share of Hispanic (vs. white) students predicting lower achievement by approximately 1.9 percent of an SD (Table 3).
Of course, we have criticized the achievement measure as conflating school and nonschool effects in unknown ways. We need to look beyond achievement measures when gauging schools’ role in shaping learning.
Growth during school periods
A better measure of school quality observes the growth that occurs during school periods, subtracting out the summers for which schools have little control. Growth indicators tell a different story from the achievement measure. Although achievement is negatively correlated with school proportion FRL, school proportion FRL positively predicts average school year learning (b = .013, p < .10) (controlling for school race/ethnicity; see Table 3) as well as change in relative status (Table 4). In contrast to the positive association between school proportion Asian and achievement, school proportion Asian is negatively associated with average school year learning (b = –.02, p < .10, Table 3) and change in z-scores (b = –.033; controlling for school FRL). School proportion Hispanic and black also show significant negative relationships with average school year change in z-scores (Table 4) but with somewhat smaller magnitudes (b = –.026 for black students, b = –.017 for Hispanic students).
Impact
Finally, even the nine-month growth model may not adequately account for nonschool factors because children continue to be exposed to nonschool influences during the school year. Indeed, the typical child spends only about one-third of their waking hours in school during the nine-month academic period (Downey et al. 2008). When we predict our impact measure with school characteristics, we see that the composition of high-impact schools differs somewhat from that of schools with higher achievement levels and higher nine-month learning rates. Specifically, controlling for school race/ethnicity, schools with higher percentages of students who are FRL-eligible have higher impact scores when measured through change in z-scores (Table 4; FRL does not significantly predict impact measured through absolute learning [Table 3]). Similarly, the sign of the association between impact and school proportion Asian is opposite that of the association between the achievement measure and school proportion Asian: Schools with higher proportions of Asian students have lower impact (controlling for school FRL) when measured by learning (Table 3) and change in relative status (Table 4).
Robustness Checks
The impact model assumes that nonschool investments are constant across seasons, so subtracting the summer learning rate from the school year rate should isolate schools’ contribution. This approach may be misleading, however, if nonschool factors matter differentially during the school year and the summer. We found that how frequently parents read books to children varied across seasons, with greater book reading during the summer than during school. We also found some evidence that the book reading gap between advantaged and disadvantaged groups (high SES vs. low SES) was greater during the summer than during school years, calling into question whether nonschool investments (and their ratios across groups) are similar when school is out versus in. We therefore refit Model 1 to include a control for the amount of book reading parents reported doing with their children at each wave. The book reading variable is available from parent surveys for each round except spring of kindergarten, so we excluded the kindergarten period from these analyses. In these adjusted estimates, the school-level variables are similarly predictive of achievement as estimates from the model without the book reading control and sometimes less predictive of impact. See Appendix D in the online Supplemental Material for full results.
Another concern is that schools with students who begin kindergarten with relatively lower average scores will show larger average gains over the school year simply due to differing expected rates of growth among students with different starting points. To investigate this possibility, we fit additional models that control for fall kindergarten test score and its interaction with the school year and summer variables. These models do not include outcome data for kindergarten due to the inclusion of fall kindergarten scores on the right side of the equation. These models show similar results to those in Table 3: School proportion FRL, black, Hispanic, and “other race” do not predict school impact, and school proportion Asian has marginally significant predictive value (b = –.076, p < .10; see Appendix E in the online Supplemental Material).
Finally, we estimated our models for math outcomes. Recall that for reading, school proportion FRL was a strong negative predictor of school effectiveness as measured by end of second grade achievement, but when measuring effectiveness by school year learning or the impact measure, high-FRL schools no longer looked so ineffective. This same general finding appears with math but to an even greater degree. Controlling for school race/ethnicity, school percentage FRL negatively predicts end of second grade math achievement and positively predicts school year test score changes and impact (in absolute learning and change in z-score). Although school proportion Hispanic and proportion black negatively predict school mean achievement at the end of second grade, neither significantly predicts school impact measured through learning (although school proportion black has a negative and significant relationship with impact measured by change in z-scores, b = –.105, p < .05). No other school-level variables predict impact. See Appendix F in the online Supplemental Material for full results.
Discussion
Our results suggest school-based learning is more evenly distributed across social groups than many scholars have thought. That is not to say there are no differences in school-based learning—schools vary considerably in impact—but simply that the variation is not allocated in the way we might assume. Our study adds credibility to patterns from the “Are ‘Failing’ Schools Really Failing?” impact study, whose persuasiveness was compromised by reliance on a single academic year and summer and a noninterval scale of reading skills. With considerably better data that address those weaknesses, our results contribute to the empirical foundation of impact studies and lend greater weight to the position that school-based learning is, on average, roughly similar across schools serving advantaged and disadvantaged children.
The impact results represent a significant challenge to the critical view of schools, at least with respect to how schools shape cognitive skills. Patterns from two nationally representative samples, one from 1998 and one from 2010, both find that learning in schools serving disadvantaged children is largely on par with learning in schools serving advantaged children. Of course, one can easily show that schools serving advantaged children have higher test scores than their disadvantaged counterparts, but those differences are due to a mixture of school and nonschool factors, and so we cannot confidently attribute them to differences in schools. And while some of our impact models produce results consistent with the critical view of schools (e.g., schools with high percentages of black students produce lower impact than schools with high percentages of white students), these differences do not persist across different model specifications.
If impact studies are producing a more accurate view of how schools affect students’ math and reading skills, then the implications are significant. At a theoretical level, the patterns are more consistent with the idea that schools play a mostly neutral role with respect to inequality in children’s math and reading skills rather than the more active role suggested by critical theorists. Downey and Condron (2016) raised the possibility that education scholars may have overestimated the magnitude of schools’ exacerbatory effects and underestimated schools’ compensatory effects. Schools likely influence achievement gaps in ways that increase and decrease inequality, so the value of the impact measure is that it weighs the magnitude of all these influences and produces an overall evaluation of schools’ role.
Outcomes observed in schools, like gaps in test scores, are often shaped by features of society outside of schools. One strength of the sociological perspective is its contextual emphasis, a crucial element for a more accurate assessment of how schools matter. This recognition does not undermine school-based studies of inequality (e.g., peer effects, cultural capital, teacher discrimination), but it directs our attention toward potential nonschool sources of achievement gaps that may have received insufficient attention. What features of children’s early childhood environments lead to such large inequalities at kindergarten entry? What characteristics of nonschool environments continue to matter after children begin formal schooling? Based on patterns from impact studies, we recommend a broader sociology of education, one that energetically explores how the consequences of nonschool inequalities produce and maintain achievement gaps observed in schools. Schools remain important institutions within the stratification system, but a school-centric approach to studying their role can produce distorted understandings.
The impact results raise serious questions about the validity of traditional value-added measures of school performance that do not isolate school year and summer learning. Most value-added measures are based on data collected annually and so may be biased against schools serving disadvantaged children because summers are mixed in with nine-month school periods. Schools have less control over children’s learning in the summer. The reliance on value-added models that may underestimate the quality of schools serving disadvantaged children is disconcerting because market-based reforms are supposed to improve schools by providing parents with information about which schools are performing the best. If the information they provide is based on value-added growth models, they may be steering parents in the wrong direction.
Of course, the scope of these conclusions is limited by features of our data. We were only able to study children during the beginning stages of schooling, from kindergarten through second grade. It is possible, some might argue plausible, that schools’ role changes as children progress through the school system (Gamoran 2016). Extending impact studies into later grades would provide a significant contribution to our understanding of schools and inequality. 9 In addition, scaling issues remain a concern for this kind of research. One of the significant improvements we made was to use a scale that has stronger claims to interval-level properties than that used in the failing schools study. Of course, to dismiss scaling concerns entirely, the theta scale would need to be perfectly interval, something we simply cannot determine with confidence (Ballou 2009).
The impact results are noteworthy, but it is far too soon to abandon the critical view of schools entirely. The critical view is about much more than how schools influence children’s cognitive skills, so the results of impact studies, although contrary, only challenge a subset of the critical perspective. Schools likely reproduce and exacerbate inequality in a number of ways that impact studies fail to capture. Some scholars even argue that schools that focus on improving children’s cognitive skills via “no excuses” mechanisms (i.e., a highly structured disciplinary system) may raise test scores but end up reproducing inequality by creating “worker-learners” who are ill prepared for the more wide-ranging demands of higher education and the work world (Golann 2015). Beyond schools’ effect on test scores, schools shape students’ opportunities to make network connections and their exposure to college-bound peers (Jennings et al. 2015) and sometimes provide help applying to college. Schools serving advantaged children typically provide more honors and Advanced Placement courses and more strategic preparation for the SAT and college essay. In these ways, schools might continue to exacerbate inequality through mechanisms not captured by our study.
But it is also possible that the impact results extend further than the current empirical reach and reveal how schools play a fundamentally different role within the stratification system than typically thought. Prior to collecting seasonal data, most scholars would have predicted that children enjoy more learning in schools serving advantaged versus disadvantaged children. Perhaps if seasonal comparison methods were extended to a broader range of outcomes and through later years of schooling, we would find even greater evidence that schools’ role in shaping inequality is more modest than we thought. We already have some hints of this. For example, scholars who have applied the seasonal comparison method to children’s social/behavioral skills (e.g., paying attention in class, getting along with peers, following classroom rules) have concluded that schools play a neutral role in shaping socioeconomic, racial, and gender gaps (Downey, Workman, and von Hippel 2019). And schools appear to play an active role reducing socioeconomic and racial gaps in body mass index (von Hippel et al. 2007; von Hippel and Workman 2016).
Some might view the impact results as justification for disinvesting in schools serving disadvantaged children. Why bother increasing resources to these schools, the argument would be, if they are already promoting learning on par with schools serving advantaged children? This would be an unfortunate interpretation of the impact results. Schools serving disadvantaged children may be producing as much school-based learning as schools serving advantaged children, but that does not preclude leveraging schools even further as a compensatory institution. As it stands, schools mostly prevent achievement gaps from increasing while children are in school. It may be possible to create schools that play a more active role in reducing gaps. In supplemental work, we note meaningful heterogeneity in how schools shape achievement gaps—some are highly compensatory while others exacerbate inequality. An important next step is to identify the school mechanisms associated with this heterogeneity.
But we also note that impact studies prompt a reconsideration of school-based solutions to achievement gaps. They highlight the magnitude of gaps at kindergarten entry and how exposure to school does little to increase them. School reform might produce even more compensatory results, but these efforts are necessarily remedial because the gaps are already well established. We see more opportunity, therefore, in exploring how early childhood reforms can prevent large achievement gaps from emerging in the first place.
An accurate understanding of schools’ role in the stratification system is foundational to the sociology of education. But the fact that the average 18-year-old in the United States has spent, on average, 87 percent of their waking hours outside of school (Walberg 1984) poses a significant methodological challenge in attempting to understand schools’ role. Traditional methods that try to statistically equalize children’s nonschool environments as a way of isolating school effects have well-known limitations, so it is especially important for scholars to use alternative analytic techniques, like those that compare school year and summer rates of learning. Two impact studies now contradict the critical view that schools serving advantaged children produce more school-based learning than schools serving disadvantaged children. Reconciling the impact patterns with the critical view of schools is increasingly important for developing an accurate theoretical understanding and providing policymakers with useful advice on how to reduce achievement gaps.
Supplemental Material
DS_10.1177_0038040719870683 – Supplemental material for The Distribution of School Quality: Do Schools Serving Mostly White and High-SES Children Produce the Most Learning?
Supplemental material, DS_10.1177_0038040719870683 for The Distribution of School Quality: Do Schools Serving Mostly White and High-SES Children Produce the Most Learning? by Douglas B. Downey, David M. Quinn and Melissa Alcaraz in Sociology of Education
Footnotes
Supplemental Material
Supplemental material are available in the online version of this journal.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
