Abstract
The use of test scores as a performance measure in high-stakes educational accountability has become increasingly popular since the enactment of the No Child Left Behind Act of 2001 (NCLB), which imposed sanctions such as the threat of losing federal funds unless a state implemented a school accountability system that measures student progress continuously. Since then, many in the education community have questioned whether differences in student test scores reflect actual discrepancies in the long-term well-being of individuals. In this review, we try to address this question in the light of the extant literature that examines the relationship between test scores and later life outcomes. We show that while there are certainly studies that contradict the causality of this relationship, there is also abundant evidence suggesting a causal link between test scores and later life outcomes. We conclude that any debate about the use of test scores in educational accountability (1) should be framed by use of all relevant empirical evidence, (2) should also consider the predictive validity of nontest measures of student success, and (3) should keep in mind that the predictive validity of test scores could be stronger in some contexts than others.
Keywords
Questioning Standardized Tests as a Measure of Success
The use of standardized tests as a measure of student success and progress in school goes back decades but became more widespread after the No Child Left Behind Act of 2001 (NCLB), which mandated the use of test scores as a measure of school quality in state accountability systems (Vinovskis, 2008). 1 The 2009 Race to the Top (RttT) federal grant program further expanded the use of standardized tests in educational accountability by promoting teacher evaluation reforms that included test scores as a component of a teacher’s evaluation (Goldhaber, 2015). 2
But there has been pushback against the use of tests in educational accountability. Academics and advocates, prominently including the teachers’ unions (Taylor & Rich, 2015), have raised various concerns about the consequences of reliance (or overreliance) on test scores for both school and teacher accountability purposes. One strand of criticism in this context relates to the psychometric properties of measures derived from standardized tests: Recent examples of this strand include the statements by the American Statistical Association (ASA) and the American Educational Research Association (AERA) cautioning against the use of value-added scores as the main factor in high-stakes decisions regarding educators (AERA, 2015; ASA, 2014). Another concern is that overreliance on test scores could corrupt the educational process and be harmful to student learning (Koretz, 2017).
While there is academic and policy disagreement about the efficacy of using test scores for accountability purposes, 3 there is no doubt that policymakers are placing less weight on student test scores in high-stakes decisions. As a chief example, the 2015 passage of the Every Student Succeeds Act (ESSA) continued NCLB’s requirement that students be tested annually in Grades 3 through 8 but introduced a number of new measures to be used in school accountability systems, lessening the role that tests play overall. 4
More recently, policy scholars have even begun to question whether we should use test scores as a measure of success at all—a question that is gaining broad public attention. 5 Much of this argument is based on a claim that test score gains are not always associated with changes in other schooling outcomes. One recent example of this argument is a recent report by Hitt, McShane, and Wolf (2018) that examines the use of test scores to evaluate school choice programs. In particular, the report focuses on a number of studies that examine the effects of different school choice programs on both student test scores and long-term outcomes (such as high school graduation and college enrollment) and examines how well test score impacts in these studies align with the attainment impacts. The authors suggest there is little correlation between the two and conclude that “test scores should be put in context and should not automatically occupy a privileged place over parental demand and satisfaction as short-term measures of school choice success or failure” (p. 20). 6
One might argue that test scores are only an intermediate measure of what we really care about: the extent to which students are gaining knowledge in school that enhances their later life prospects. 7 In other words, it is reasonable to argue that we should hold schools/teachers accountable for the test performance of their students but only if test scores at least partially reflect their causal impact on the underlying learning that is important for later life success. In this review, we examine the evidence on the relationship between test scores and later life outcomes using Hitt et al. (2018) as an illustrative example, especially the extent to which we can infer causality in the relationship, and then discuss what this might imply for the use of test scores as a component in educational accountability systems.
Test Scores and Later Life Outcomes
There is a vast literature linking test scores and later life outcomes, such as educational attainment, health, and earnings. Hanushek (2009) provides a review of the extant literature on the relationship between cognitive skills, as proxied by test scores, and individual incomes in developed and developing countries and concludes that there is considerable evidence that test scores are directly related to later life outcomes. 8 Similarly, Heckman, Stixrud, and Urzua (2006) find that test scores are significantly correlated not only with educational attainment and labor market outcomes (employment, work experience, and choice of occupation) but also with risky behavior (teenage pregnancy, smoking, and participation in illegal activities). However, as Hanushek (2009) notes, these observed correlations do not necessarily reflect causal effects of schools on later life outcomes.
For example, the observed differences in later life outcomes between students with higher and lower test scores could be driven by differences in unobservable attributes of students such as their levels of grit. Test achievement is also likely to be significantly influenced by learning opportunities outside of school—the supportiveness of families or the communities in which students live. This is an important reason why some scholars doubt that static measures of test performance alone are reflective of contributions schools or teachers make toward student learning (Tienken, 2017). 9
Establishing causal links between test scores and adult outcomes is challenging; it would be unethical to design an experiment where we randomly provide better education to some students, measure their test scores, and assess whether improvements in test scores lead to better life outcomes. Thus, what we know about the causality of this relationship comes from a limited number of studies that examine the causal effects of different educational inputs (e.g., schools, teachers, classroom peers) on both student test scores and later life outcomes. If a study finds test score impacts and adult outcome impacts that are not in the same direction, this might be regarded as evidence that test scores do not affect the later life outcomes we care about. This is also the approach utilized by Hitt et al. (2018) in the context of school choice program evaluations.
So what does the broader literature (beyond school choice) say about whether there is a causal link? While there are studies that find test-score and long-term outcome effects that are not in the same direction (such as the ones cited in cited in Hitt et al., 2018), there is also abundant evidence suggesting a causal link between test scores and later life outcomes. Perhaps the most influential study connecting schooling, test scores, and later life outcomes was conducted by Chetty, Friedman, and Rockoff (2014a). Examining the long-term effects of teacher quality assessed based on their effect on student test scores, the authors find that students who are assigned to highly effective teachers in elementary school are more likely to attend college and earn higher salaries. 10
Another study by Raj Chetty et al. (2011) examines the long-term effects of peer quality in kindergarten proxied by test scores using the Tennessee Student Teacher Achievement Ratio (STAR) experiment and finds that students who are assigned to classrooms with higher quality peers have higher college attendance rates and adult earnings. Similarly, using the Tennessee STAR experiment, a recent study by Susan Dynarski and colleagues (Dynarski, Hyman, & Schanzenbach, 2013) looks at the effects of smaller classes in primary school and finds that the test score effects at the time of the experiment are an excellent predictor of long-term improvements in postsecondary outcomes. Lafortune, Rothstein, and Schanzenbach (2018) and Jackson, Johnson, and Persico (2016) investigate the effects of school finance reform on test scores, educational attainment, and earnings and find significant benefits of an increase in school spending on both test scores and adult outcomes.
Finally, there are a number of studies in the school choice context that show certain school choice programs having positive effects on both test scores and later life outcomes. For example, Angrist, Cohodes, Dynarski, Pathak, and Walters (2016) examine the effects of Boston’s charter high schools and conclude that charter effects on college-related outcomes are strongly correlated with gains on earlier tests. Dobbie and Fryer (2015) find that attending a high-performing charter school not only increases test scores but also significantly reduces the likelihood of engaging in risky behavior.
Accountability Without Test Scores
Overall, all of these studies suggest that interventions that move the needle on test scores also improve later life outcomes, which lend support to the argument for using test scores as a measure of success in education systems. 11 This does not mean that test score effects of educational interventions will always align with their effects on adult outcomes. 12 It is easy to make the case that interventions can and do improve later life outcomes without affecting the cognitive skills of children. For example, choice schools may have stronger pipelines into college, leading to better college-going results while not affecting test results. In short, test scores will not encompass the full impact of schools and teachers on students, and hence we should not expect them to fully capture all the contributions that schools and teachers make toward influencing long-term student outcomes. 13
But we need to think carefully about what abandoning the use of test scores altogether might mean for education policy and practice. 14 From a practical perspective, we cannot wait many years to get long-term measures of what schools are contributing to students. This does not mean that test scores ought to be the exclusive or even primary short-term measures, but if one believes in educational accountability and that test scores ought to be down-weighted, it is important to consider what alternative measures of success are out there and how reliable they are.
ESSA, for instance, encourages states to rely more on nontest outcomes (e.g., high school graduation rates, kindergarten readiness, college readiness, and chronic absenteeism) to assess school performance. But there are increasing concerns that these measures are “gameable.” For example, as of this writing, the District of Columbia Public Schools (DCPS) has been under investigation by the U.S. Department of Education and the FBI for awarding high school diplomas to hundreds of students who failed to meet the high school graduation requirements. 15 Similarly, while some studies find that high school GPA is a better predictor of college success than standardized test scores (e.g., Geiser & Santelices, 2007), there is recent evidence suggesting grade inflation in high schools, especially in wealthier settings, which casts doubt on the reliability of high school GPA in school accountability (Gershenson, 2018).
Perhaps more importantly, we know less empirically about the causal connections between some of these new ESSA measures or other means of school or teacher accountability and long-term student prospects. Are students assigned to teachers who get good classroom observation ratings likely to have better future prospects? Perhaps, but there is less evidence about this type of measure than there is about test-based measures.
In the end, where one lands on the use of test scores to measure student or school success is a matter of subjective judgment. But the debate about this (1) should be framed by use of all relevant empirical evidence, (2) should also consider the predictive validity of nontest measures of student success, and (3) should keep in mind that the predictive validity of test scores could be stronger in some contexts than others.
