Abstract
One question posed continually over the past century of education research is to what extent school resources affect student outcomes. From the turn of the century to the present, a diverse set of actors, including politicians, physicians, and researchers from a number of disciplines, have studied whether and how money that is provided for schools translates into increased student achievement. The authors discuss the historical origins of the question of whether school resources relate to student achievement, and report the results of a meta-analysis of studies examining that relationship. They find that policymakers, researchers, and other stakeholders have addressed this question using diverse strategies. The way the question is asked, and the methods used to answer it, is shaped by history, as well by the scholarly, social, and political concerns of any given time. The diversity of methods has resulted in a body of literature too diverse and too inconsistent to yield reliable inferences through meta-analysis. The authors suggest that a collaborative approach addressing the question from a variety of disciplinary and practice perspectives may lead to more effective interventions to meet the needs of all students.
An enduring question in education research is how, and sometimes even whether, the resources provided to schools relate to student achievement. This issue can be summarized into a seemingly simple question: Does money matter to student outcomes? A close examination of the historical origins of this question as well as of recent studies that examine the influence of resources on student achievement highlights the tension between the competing priorities of efficiency and equity in U.S. public schooling. It also raises issues about the ways in which certain questions develop and become central to education research.
This chapter is organized into three sections. The first section presents a historical background of school resources and student achievement research. The second section is a report on the results of a systematic review of quantitative studies examining the relationship between per-pupil expenditure (PPE) and student achievement. This review updates work first conducted by Hanushek (1989) and later reanalyzed by Greenwald, Hedges, and Laine (1996) by incorporating relevant literature published since 1966. The concluding section draws from both the historical narrative and the meta-analysis to discuss the limitations of contemporary research into the relationship between resources and student achievement, and it suggests ways that the field might develop better, more valuable questions to pursue.
Historical Background on Research Into School Resources and Student Achievement
Like Hanushek (1989), we consider Equality of Educational Opportunity (Coleman et. al., 1966), known as the “Coleman Report,” a historical milestone. The Coleman Report marks the start of the current era of research into relationships between schooling inputs and outputs, a period characterized by the increasingly sophisticated use of inferential statistics with large-scale data sets. This chapter briefly reviews the 100-plus years leading up to the Coleman Report history to provide context for how statistical studies of educational resources and student achievement developed in relationship to contemporaneous scholarly and social concerns. The history is cleaved into three eras and organized around central questions posed in each of those eras: 1867 to 1891 (What is the state of U.S. schooling?), 1892 to 1965 (Can the field of education measurement assist in directing school resources in more efficient ways?), and 1966 to the present day (Can education measurement indicate if money matters?). Each period witnessed the introduction of new statistical methods and debates about the nature of schooling in a socioeconomically diverse, multilingual, and multiracial society; the purposes of public education; and which are the academic fields best equipped to answer questions about how resources relate to achievement.
What Is the State of U.S. Schooling? Collecting Data on U.S. Schools (1867–1891)
The U.S. Department of Education was formed in 1867 as part of the Reconstruction Acts passed by a Republican-controlled Congress. The Republicans, especially the radical faction lead by Charles Sumner and Thaddeus Stevens, held strong views on education, including the idea that widespread, publicly supported school systems were essential for the country. Indeed, Sumner noted the lack of public schools in the Southern states and insinuated that it was a cause of their recent “rebelliousness” (Tyack, James, & Benavot, 1987). While the “Radical Republicans” failed in their attempt to pass a federal law guaranteeing a public education to all citizens, they were able to establish the Department of Education to, in the words of Republican Congressman from Minnesota Ignatius Donnelly, “enforce education, without regard to race or color, upon the population of all such States that fall below a standard to be established by Congress” (quoted in Tyack et. al., 1987, p. 141).
An immediate problem facing their effort was that little was known about the state of American education nationally. To address this need, Congress provided the Department of Education with the purpose of
collecting such statistics and facts as shall show the condition and progress of education in the several States and Territories, and of diffusing such information respecting the organization and management of schools and school systems, and methods of teaching, as shall aid the people of the United States in the establishment and maintenance of efficient school systems, and otherwise promote the cause of education throughout the country. (quoted in W. V. Grant, 1993, p. 1)
At the time, the modern field of statistics was in its infancy, though advances in mathematics (Feinberg, 1992) and epidemiology (Freedman, 1999) were attracting wide interest for their potential use to describe phenomena and make predictions. The notion that statistical information and facts about education can help support efficient and widespread schooling represents the influence of Horace Mann. Sumner, a self-described friend of and regular correspondent with Mann, unsuccessfully ran for the Boston School Committee in 1855 following Mann’s encouragement (Reese, 2013). A year later, Mann introduced the nation’s first system of standard examinations in an effort to gather objective information about the comparative quality of Boston schools and whether or not students were qualified to graduate (Gallagher, 2003). Mann promoted this use of examinations and statistical information nationally, later connecting it to the abolition movement through advocacy for multiracial common schools. The support for abolition helped garner Mann the Free Soil Party’s 1852 nomination for governor of Massachusetts. (The short-lived political party had been established in Massachusetts by Sumner; by 1856, it was folded into the nascent Republican Party.)
By the time the first national commissioner of education was installed in 1869, Sumner had been marginalized within his party. Sumner was unable to deter legislation from downgrading the Department of Education to the Office of Education within the Department of the Interior and cutting the office’s staff from three clerks to two. Nonetheless, the Office of Education developed and distributed a survey in 1870 to solicit information ranging from student enrollment totals to school expenditures to numbers of teacher to tallies of high school graduates to attendance figures. These efforts were hamstrung by sizable gaps in basic information, such as complete lists of schools and colleges. But the office persisted, hiring its first statistician in 1872 and publishing its first public report in 1875. Although Senator Donnelly’s vision for a vigorous, forceful federal role in education never found sufficient political backing, the Office of Education was able to meet its mandate by progressively expanding its survey scope and increasing the detail of published data. In 1890, the office inquired about the subject areas taken by students, sources of public revenue, and the value of facilities and physical equipment from both public and private schools (W. V. Grant, 1993). A basic yet robust statistical portrait of American education was emerging.
Can the Field of Education Measurement Assist in Directing School Resources in More Efficient Ways? (1892–1965)
The late 19th century marked the developmental period of education measurement, an era characterized by trial and error, experimentation, and wide-ranging uses of this new tool. Much like any new discipline, seemingly contradictory perspectives coexisted within it. The late 19th and early 20th centuries were also known as the Progressive Era, and the field of education measurement grew as part of broader progressive efforts to develop and use scientific and social scientific methods to solve social problems (Feinberg, 1992; Freedman, 1999). Some efforts embraced progressive education aimed at limiting opportunities (e.g., eugenics and IQ testing), and others attempted to expand education efforts to all students (e.g., early efforts to develop special education). In both cases, the main issues revolved around how schools could efficiently educate all children to become productive citizens in an era of compulsory mandates.
In the late 19th and early 20th centuries, district officials, researchers, and concerned citizens, almost exclusively businessmen, used or encouraged the use of descriptive statistics to investigate two central questions:
How can one use statistics to understand what is happening in schools?
How can one use information gleaned from statistical analyses to best direct resources?
Education measurement did not yet distinguish between theory and practice. Education researchers developed scientific methods for the explicit purpose of improving education, and they worked diligently to integrate their ideas into formal education policy with great immediacy. This urgency came about in large part from the perceived inefficiency of schools and the resulting need for reform.
Demands for school reform came from several sectors, with business and industry pushing schools to be more efficient as they engaged in scientific management, their own reform effort. This reform idea derived from concepts about industrial efficiency and scientific management put forth by Frederick Taylor and popularly known as Taylorism. The rise of the social efficiency movement in schools at the turn of the 20th century resulted in large part from Taylorism (Kliebard, 2004).
One of the more noted figures who attempted to bring Taylor’s ideas of scientific management to education was Joseph Mayer Rice. Kliebard (2004) dubs Rice “the father of comparative methodology” (p. 19) as a result of his surveys of American schools that he started in 1891. He published his findings in the education journal The Forum beginning the next year. Trained as a physician, Rice devoted his work to understanding the status of American education—curriculum, teaching, and the performance of students. He became interested in comparing student performance and education conditions through administrative school surveys. Using the results of his surveys, Rice advocated for better education conditions for American students. According to Callahan (1962), Rice’s use and application of statistics reflected limited knowledge and questionable results, although he was taken seriously at the time and considered a pioneer in the field of measurement.
Rice published an expanded version of his work in a 1913 book, Scientific Management in Education, in which he proposed holding administrators and teachers accountable for both defining education goals and measuring the results of their efforts on meeting those goals through scientific measurement (as cited in Kliebard, 2004). He grounded these ideas in industrialism and the social efficiency culture that had begun to seep into American education.
Taylorism made a large impression on early 20th-century education reformers. They saw its adherence to efficiency as a ripe solution to the challenges faced by school systems dealing with an expanding school population with a multitude of needs. Rice’s surveys signaled the beginning of a broader trend. The school survey took hold in districts and found support not only from business interests but from academics and professional education associations as well (Ryan, 2011).
In the 1910s, the American School Board Journal promoted the use of school surveys to examine the return of investments in schools, the efficiency and quality of teachers, and, to some degree, the efficiency of students (Callahan, 1962). Much of the work around efficiency stemmed from Taylorism, but it also stemmed from the work of academics like Arthur C. Boyce of the University of Chicago’s Department of Education, who was a colleague of Franklin Bobbitt (Callahan, 1962). Teachers voiced concern about these rating systems but had to accept them in most districts due to a lack of bargaining power. Callahan (1962) notes that there was little resistance to the movement to make schools more efficient from professional circles. Conducting full school and district surveys required public support, and to garner such support, school boards often enlisted the help of business groups or groups that represented taxpayers and appealed to the public’s desire to use funds wisely to provide education resources. George D. Strayer, a professor of educational administration at Teachers College Columbia and a key figure in the survey movement, played a large role in developing and conducting district-wide surveys well into the 1930s. These surveys left a lasting impression on how district and school administrators approached their positions in schools, putting data about administrative and management concerns at the forefront (Callahan, 1962).
The heightened focus on the efficiency of schools, teachers, and eventually students and their achievement led in part to a movement toward standardizing education testing (Callahan, 1962). Statistics reflected the number of students who repeated grades or dropped out altogether, the chief concerns in larger districts, for example, in New York and Chicago, in the early 20th century (Tyack, 1974). A district’s goal in collecting such statistics was to determine how schools would deal with “backward” children, or the “feeble-minded.” For example, in 1899, the Chicago public schools established a Department of Child Study, which, in 1911, tagged “educational research” on to the department’s name. By 1918, Chicago had an entire department devoted to standards and statistics (Ryan, 2011). Departments such as these coincided with a growing school population and compulsory school laws in order to manage and sort school populations. Simultaneously, calls for how to better differentiate curriculum increased. Many educators sought to meet the needs of their students better and put their hopes in the use of IQ and other testing, as well as stratified curriculum to prepare children for what they might be best “suited for” in life.
Gould (1996) examined the introduction of intelligence testing in the United States and its European origins. His seminal work, The Mismeasure of Man, addresses how key figures who introduced the field of measurement and testing to American education through the promotion of IQ testing and other standard forms of testing rejected the cautions of French psychologist Alfred Binet, who believed the “aim of his scale was to identify in order to help and improve, not to label in order to limit” (p. 182). Henry H. Goddard of the University of Chicago, Lewis M. Terman of Stanford University, and Robert M. Yerkes of Harvard University and then Yale University were early and renowned figures in the field of testing in the United States. These psychologists had a significant impact on the growth and use of measurement. Terman, more so than the others, was responsible for the growth of testing in schools and across districts with his development of the Stanford–Binet Scale. This instrument, although focused on measuring the “intelligence” of individual children, broadened into other tests designed to assess all children by the late 1910s and early 1920s (Gould, 1996). According to Gould (1996), researchers like Terman took more interest in the “science” of hereditarianism (eugenics) than in the burgeoning field of statistics. When confronted with information that contradicted his beliefs—for example, a “correlation of 0.4 between social status and IQ”—Terman advanced a multifaceted argument in support of nature over environment (Gould, 1996, p. 219). Terman ended up backpedaling on some of his earlier arguments, but not until the late 1930s, after eugenics had largely been discredited.
E. L. Thorndike, a professor at Teachers College Columbia and an influential psychologist in the early 20th century, adhered to eugenic beliefs of intelligence and had a heavy influence on ideas about the curriculum (Kliebard, 2004). Bobbitt and others who supported curriculum that would stratify American children and prepare them for their “station in life” based on IQ test results found confirmation in Thorndike’s conclusions (Kliebard, 2004). Both Terman and Thorndike believed that intelligence was inherited and fixed, but other educators questioned that notion. Harold Rugg (1917) of Teachers College Columbia believed teachers could achieve societal change through education and that students could learn and grow through curriculum (Kliebard, 2004). In 1917, Rugg published a textbook on statistics for teachers with the hope that they would learn to use statistics as a tool of social science.
The educators in the Progressive Education Association reflected similar beliefs. The Progressive Education Association was one of the more prominent users of statistics during the World War II era. The Eight-Year Study (1932–1940), directed by Ralph Tyler of the Ohio State University, examined 30 schools (the final tally was 29), with 15 given curricular freedom and the remaining schools following traditional curricula. At the close of the study, almost 1,500 students had attended college from across the study group, with little difference in academic performance based on grade point average and other factors, with the students in experimental schools slightly edging out those in traditional schools (Kliebard, 2004). This comparative study, in which education researchers employed an experimental design, provided a good example of a large-scale investigation beyond a school survey.
Although the use of education statistics was primarily centered on how to better use resources and reduce waste in K–12 school districts, researchers began to see how statistics could be used to address unequal conditions in education more broadly. Organizations such as the National Association for the Advancement of Colored People pulled statistics to address issues of school segregation and to equalize resources in graduate education. In 1935, Charles Hamilton Houston of the National Association for the Advancement of Colored People began efforts to desegregate law schools, arguing that separate but equal law schools would become prohibitively expensive for states. He saw law school desegregation as a strategy to eventually call for equal schooling at other levels. In 1938, the Supreme Court ruled in the case of State of Missouri ex rel. Gaines v. Canada that the state must provide Gaines with an equal legal education. This case led to a series of cases brought by Thurgood Marshall, eventually leading to the landmark decision to desegregate schools with Brown v. Board of Education in 1954. The shift in thinking about how education measurement and statistics could be marshaled to support the cause to equalize and perhaps even garner resources for those denied equal access would shape the next period in the field of measurement.
Can Education Measurement Indicate if Money Matters? (1966–Present)
With the election of President John F. Kennedy, two ideas were paired as central to federal social policy: a strong belief in the value of scholarly research to effectively design social policies combined with a commitment to social welfare in the form of the expansion of civil rights and the alleviation of poverty (Featherman & Vinovskis, 2001, p. 49). In the 1960s, prominent academics from leading universities, particularly those with personal ties to members of the Kennedy and later the Johnson administrations, were sometimes directly consulted and often solicited to prepare reports in support of key policy initiatives (Halberstram, 1993). These tendencies led to the emergence of two parallel approaches to education policy, traditions that are still present today. The first is the “compensatory” approach, codified by Bloom, Davis, and Hess (1965), which primarily seeks to design and implement programs and policies that improve education for students in poverty and minority students. The second is the “efficiency” approach, embodied in the Coleman Report (Coleman et al., 1966), which seeks to evaluate programs and policies in order to promote the most effective and resource efficient among them.
The divergence began in the earliest weeks of the Johnson administration as the president and his aides began pressing Congress to enact comprehensive civil rights legislation. The Kennedy administration had proposed the Civil Rights Act, but Johnson saw the bill through to law in the wake of Kennedy’s assassination. A small provision was written into early drafts requiring the federal government to conduct a thorough national assessment of educational opportunities for children from all backgrounds. After a flurry of negotiations, Section 402 of the Civil Rights Act read,
The Commissioner (of Education) shall conduct a survey and make a report to the President and the Congress within two years of the enactment of this title concerning the lack of availability of equal educational opportunity for individuals by reason of race, color, religion, or national origin in public educational institutions at all levels in the U.S., its territories, and possessions, and the District of Columbia.
The completed survey would come to be known as the Coleman Report (G. Grant, 1973).
After the passage of the Civil Rights Act, the Johnson administration began focus on comprehensive education legislation independent from the Coleman Report. John W. Gardner, president of the Carnegie Corporation and a psychologist by training, was tapped to form a commission to draft a new education bill. The Gardner Commission put forth a proposal to categorically direct federal education spending, with a significant entitlement program addressing the needs of children from poor families. This concept became the basic structure of the Elementary and Secondary Education Act of 1965, and the provision of aid directly to school districts educating children in poverty became Title I. Following the passage of the Elementary and Secondary Education Act, Gardner was appointed secretary of Health, Education, and Welfare (Thomas & Brady, 2005). In turn, he contracted with eminent education psychologist Benjamin Bloom to organize a conference and publish its proceedings to make recommendations as to how Title I monies might be invested.
Bloom and his colleagues at the University of Chicago hosted the 5-day Research Conference on Education and Cultural Deprivation in June 1965, recruiting 30 leading education scholars. The vast majority were psychologists, although several sociologists and two public schools officials were included. In the wake of the Brown v. Board of Education ruling, which drew heavily on the Clarks’ “doll tests” (Clark & Clark, 1947) to demonstrate the injury done by segregated schooling, cognitive psychology took a central position in discussions about desegregation and education policy. These issues were typically framed in terms of “cultural deprivation”; as Bloom et al. (1965) explain in the introduction to the conference proceedings, the cultural deprivation discourse rejected the idea of natural intelligence deficits among certain races in favor of emphasizing “homes which do not transmit the cultural patterns necessary for the types of learning characteristic of the school and the larger society” (p. 4). These problems were to be addressed through “compensatory education,” which sought to “prevent or overcome earlier deficiencies in the development of each individual” (p. 6). Frank Reisman (1963), the conference’s opening speaker, had previously argued that the goal of compensatory education was not “to train the disadvantaged to become ‘good middle class’ children” (p. 345) but rather to change the way schools and teachers engaged culturally deprived students and families in order to better equip these children for success in society using a variety of programs and curricular changes. Policy suggestions published in the conference proceedings (Bloom et. al., 1965) ranged from providing free breakfasts and annual physical examinations to increasing contact between home and school, to identifying appropriate curricula and pedagogies, to effectively educate “disadvantaged” youths. The compensatory approach sought to develop and install targeted programs aimed at improving the education outcomes of students in poverty and minority students.
David Seeley, assistant commissioner of education, was a listed observer at the conference. Seeley came to the Office of Education as a Yale-trained lawyer with a particular interest in modernizing the office’s historic data collection and publication functions. In 1964, Seeley successfully lobbied the commissioner to hire Alexander Mood, a mathematician and former executive at the RAND Corporation, to apply his expertise in inferential statistics and computers as assistant commissioner for education statistics. One of Mood’s first tasks was to contract a principal investigator for the Section 402 survey, and Mood’s immediate recommendation was James Coleman (G. Grant, 1973). Mood had been impressed by Coleman’s 1961 book, The Adolescent Society, in which Coleman and a team of researchers surveyed more than 4,000 students across nine Chicago area high schools. A 175-item questionnaire was paired with informal observation and interviews to present a portrait of the American teenager as overly influenced by peers, being steered away from academic and mature social responsibilities and toward superficial entertainments and immature peer relationships.
Coleman agreed to lead the survey notwithstanding the short time line to deliver a report (less than 2 years) and the office’s numerous contentious relationships with state and district leaders across the country stemming from attempts to enforce desegregation orders. Despite these challenges, Coleman and his researchers were able to administer their survey to more than 650,000 teachers and students across more than 3,000 schools over 3 days in October 1965. Defining “equality of educational opportunity” as the “equality of results, given the same individual input” (Coleman et al., 1966, p. 14), the survey generated data about individual students, school contexts, and academic performance. This massive data set allowed the researchers to cultivate a variety of sample groups using results from the 1960 census and to apply a relatively new analytic method that was uncommon outside of economics: an input-output analysis.
Input-output analysis was developed by Soviet economists during the 1920s as a way to inform socialist economic planning. The codification and popularization of the method are attributed to Wasily Leontief, a Russian Jew who left the USSR for Germany in 1925 at age 19 with a master’s degree in economics. After earning his doctorate in economics in Munich, Leontief fled rising anti-Semitism in Germany to take a position with the U.S. National Bureau of Economic Research in 1931. From 1932 through 1975, Leontief also held a faculty appointment at Harvard, where he taught input-output analysis to successive generations of economists (Kaliadina, 2006; Kaliadina & Pavlova, 2006). Carl Christ (1955) was one of Leontief’s early acolytes, publishing an influential paper on input-output analysis in 1955 and becoming a colleague of Coleman at Johns Hopkins in 1959. The use of input-output concepts as part of the report’s regression analysis was groundbreaking and its use was soon employed by other scholars (e.g., Entwisle & Conviser, 1969) to open new lines of inquiry. Yet the regression methods employed by Coleman were poorly equipped to provide causal inferences (Hoxby, 2016) and were better suited for measuring correlations between phenomena.
The Coleman Report was released on July 1, 1966. At more than 700 pages, its three major conclusions were that racial segregation was widespread in public schools, there were distinct disparities in academic achievement between racial groups, and school effects on student achievement were much smaller than variation in individual background, particularly social class (Gamoran & Long, 2006). The claim that “schools are not acting as a strong stimulus independent of the child’s background, or the level of the student body” (Coleman et al., 1966, p. 311) was the result of regression analysis and became the report’s most noteworthy argument. In the short term, the Coleman Report was ignored by the Johnson administration, whose major foci were school desegregation and poverty alleviation; was questioned by other academics; and was met with confusion by the news media, which found the report technical, dense, and difficult to summarize (G. Grant, 1973). The report had no notable policy influences until 1968, when Daniel Patrick Moynihan wrote a laudatory review in the Harvard Educational Review. Moynihan brought his enthusiasm for the report into the Nixon administration, where he arranged for Coleman to become advisor to the Cabinet Committee on Desegregation as well as a favored expert to testify for Congressional committees. The Coleman Report’s findings became central to the Emergency School Aid Act of 1970, which initiated two key changes in federal education policy: a shift from punishing school districts that did not desegregate to rewarding districts that complied with desegregation mandates, and targeted cuts in education spending under the rationale that school effects are comparatively small (G. Grant, 1973). Subsequent education policies under Nixon, Ford, and Reagan would adopt similar “efficiency” approaches, thereby establishing the Coleman Report as foundational to recent education policy and scholarship.
The Study
The Coleman Report’s conclusion that schools have a comparatively minor influence on student achievement spurred research activity around examining the relationship between school outcomes such as achievement and school inputs using input-output analysis, or education production functions. Perhaps most reflective of the efficiency mind-set in the post-Coleman era is the Reagan administration’s report A Nation at Risk (1983) and ensuing budget cuts to the Department of Education. The report argued for a “back to basics” approach to education focused on streamlined academic inputs in hopes of raising student achievement in core subject areas. Left unaddressed were any compensatory concerns about racial or socioeconomic inequities and how these might be addressed through targeted program or differentiated curricular reforms.
Twenty years after the Coleman Report and during a renewed focus on efficiency in the 1980s, Eric Hanushek, an economist, reviewed the existing literature using educational production functions. Hanushek has a personal history with the Coleman Report; as a graduate student at Harvard, he participated in a yearlong series of weekly seminar meetings among researchers from various backgrounds to parse the report’s data, methods, and findings. Hanushek (2016) has written that this experience set him on a path to researching education policy. He published a series of articles (1981, 1986, 1989, 1991) reviewing the educational production function literature, typically using ordinary least squares regression analysis to predict student achievement using a number of covariates, including measures of school inputs such as PPE. The assumption in these analyses is that student background variables such as race, prior achievement, and socioeconomic status can be adequately controlled so that one can infer a causal relationship among school resource inputs and student outcomes. Across these studies, Hanushek concludes that school resources do not have a consistent relationship with school achievement—essentially, that money does not matter for student outcomes.
Hanushek used a method of research synthesis called vote counting. Vote counting categorizes each study into groups depending on the direction and significance of the studies’ conclusions. The analyses counted the numbers of studies that determined a positive relationship between school resources and achievement, no relationship between resources and achievement, and a negative relationship between school resources and achievement. Hanushek found insufficient evidence that a majority of studies determined a positive relationship between school resources and achievement. Since his research, methodological developments in meta-analysis have provided more robust and statistically defensible alternatives to vote counting. A series of papers by Larry Hedges, Robert Greenwald, and Richard Laine (Greenwald et al., 1996; Hedges, Laine, & Greenwald, 1994; Laine, Greenwald, & Hedges, 1995) used meta-analytic techniques to reexamine Hanushek’s conclusions. These analyses synthesized the actual values for measures of the relationship between school resources and achievement, instead of characterizing the studies based on the direction of their results. Greenwald et al. (1996) found a small but consistent positive relationship between school resources and student achievement.
Although Greenwald et al.’s (1996) methods followed the most current guidelines for research synthesis at the time, the authors encountered a number of difficulties in analyzing the education production function literature. Two of the major issues were the diversity of models used across the studies and the number of models presented in each study. In the education production function literature, researchers do not have an agreed-on set of covariates that should be included. Thus, when predicting academic achievement, researchers control for a wide range of student and school characteristics such as gender, race, socioeconomic status, and prior achievement. Greenwald et al. included only studies that controlled for socioeconomic status or prior achievement in order to decrease the possibility that student background characteristics would confound the findings. Similarly, Hanushek’s (1989) vote-counting method did not account for the influence of other covariates in the studies’ models.
The second issue concerns dependencies among the estimates of the relationship between achievement and PPE within studies. Studies included in the review typically reported more than one education production function model. Greenwald et al. (1996) used the median regression coefficient within each study to ensure that the coefficients used in the analysis were computed from independent samples.
Twenty years have passed since Greenwald et al.’s (1996) work, and new meta-analytic techniques exist for handling some of the difficulties faced in the original work. The present study used a subset of a larger work to provide an update of the synthesis of education production function studies. We focused on the subset of studies measuring the impact of PPE on achievement. Hanushek (1989) includes other resources such as teacher/pupil ratio, teacher education, and teacher salary, as this line of research flows directly from the historical concerns around efficiency.
Method
Background
The present study builds on the systematic review conducted by Greenwald et al. (1996) that expanded Hanushek’s (1989) article examining the relationship between school resources and student achievement. In addition to including the studies used by Hanushek, Greenwald et al. (1996) conducted a search of electronic databases in economics, education, and psychology and examined the references from several narrative reviews of this literature. The final sample of studies in Greenwald et al. included 29 studies from Hanushek’s review and an additional 31 studies.
The present study was designed to update the Greenwald et al. (1996) review. We conducted a search of studies published since 1993, the last year of the search in Greenwald et al., to examine the relationship between school resources and student achievement. We used the same search terms as the original study did. The full dates of the search were from 1993 to 2014. Using the search terms, we identified studies that directly examined the relationships among school resources and student achievement. Our search did not identify any studies where school expenditures are used as control variable. A list of search terms used in the current study is provided in the appendix.
Inclusion Criteria
We generally followed Greenwald et al.’s (1996) inclusion criteria for the additional studies, although we also included unpublished research. We included studies
conducted in the United States,
where the outcome measure was some form of K–12 student academic achievement, and
that included a measure of education expenditures, such as PPE or teacher salary.
We included unpublished research given the changes in systematic review practice since 1996. Current guidelines for systematic reviews such as those in Cooper (2009) include both published and unpublished research. We focused exclusively on studies that included a measure of PPE in the models examining correlates of academic achievement. All studies included used independent samples. In some cases, studies used the same database; we used only the study that included the most complete model for the analysis.
Coding
All studies included in our analysis were coded by three of the authors. Coding categories included type of publication, year of publication, and demographic characteristics of participants such as race, socioeconomic status, gender, and grade level. We coded every model within each study, recording descriptive statistics if provided, descriptions of each predictor variable and associated outcome variable, the estimated regression coefficients and their standard errors if provided, measures of the quality of the model such as R2, and the level of the analysis, such as district or student level.
Analysis
The focus in the analysis was the synthesis of the regression coefficient for PPE, a measure of the relationship among school expenditures and academic achievement. The studies included used some form of regression analysis to predict academic achievement from a set of covariates, including PPE. Studies typically reported more than one regression model, resulting in dependencies among the coefficients within the studies. Greenwald et al. (1996) computed the median value of the PPE regression coefficient for each study reporting more than one regression model. Since 1996, researchers have developed more sophisticated meta-analytic strategies for handling dependent effect sizes within studies.
Becker and Wu (2007) outline three key difficulties in combining multiple regression slope estimates. First, all model outcomes must be measured on a common scale. Second, the slope estimate of interest (focal slope) is measured on a common scale across studies. Finally, each study estimates the partial relationship between the focal slope and the outcome using the model (i.e., includes an identical set of additional predictors). Maintaining these assumptions in any synthesis will almost always be impossible.
An alternative approach that requires few assumptions and no additional information is robust variance estimation. Hedges, Tipton, and Johnson (2010) and Tipton (2013) identify three important features of this estimator. First, and most important, the covariance structure of effect size estimates is not needed. Second, parameter estimates converge on the target parameter as the number of studies, not the number of cases within studies, rises. These authors show that accurate standard errors are produced with as few as 10 to 20 studies, and Tipton (2013) provides a small sample correction for cases with fewer than 10 studies. Third, the robust variance estimator is unbiased for any set of weights. Williams (2012) conducted a simulation study that examined using robust variance estimation in synthesizing sample-dependent focal slope estimates and as a means of synthesizing regression models across multiple samples. His results indicate that the robust variance estimator provides accurate standard errors across a wide range of circumstances. All analyses were conducted in R (R Development Core Team, 2008) using the robumeta package (Fisher & Tipton, 2014).
Several studies have also used a log-transformation of the PPE variable in the model, potentially creating difficulties in synthesizing the PPE coefficient across studies. To correct for this problem, we divided the PPE regression coefficient by the mean PPE reported within each study. All the regression models that were included in the analysis reported on the mean PPE and could be included in the analysis.
Results
Our analysis focused on the models that predict some measure of academic achievement, including a measure of PPE as a predictor and control for race and either socioeconomic status or prior achievement in some manner. The meta-analysis was conducted separately for studies conducted at the level of the district and studies conducted at the level of the student.
Figure 1 is a flowchart of the results of the search process for the studies included. We identified 2,641 potential studies in the search of studies conducted from 1993 to 2014. After screening titles and abstracts, we obtained 56 studies for full-text screening. We coded 35 studies from the full-text eligibility screening.

Results of Search
Of the 95 eligible studies (60 from Greenwald et al., 1996, plus 35 studies from our search), 24 included a measure of PPE as a covariate in a regression model predicting some form of academic achievement. The other 71 studies typically included some measure of teacher salary or administrative expenses rather than PPE. The majority of the 24 studies were published in journals in the field of economics. For a study to be eligible for our analysis, the regression model needed to include as a covariate a measure of students’ race or the racial composition of the sample and a measure of either prior achievement or the socioeconomic status of the participants. We included the racial composition of the sample as a necessary covariate in our analysis in addition to those required by Greenwald et al. (1996). As displayed in Table 1, 12 of the 24 studies were missing the requisite control variables for inclusion in the meta-analysis. Eleven of these 12 studies were missing a control variable for racial background or composition in the sample, and most of them were also missing a measure of prior achievement as a covariate.
Characteristics of Studies Excluded From the Meta-Analysis
Note. The sources are listed in the order of their discussion in the text.
The second inclusion criterion for the meta-analysis concerned the information needed to synthesize the PPE coefficients across studies. We used Greenwald et al.’s (1996) strategy to synthesize the PPE coefficients across studies, which requires the mean value of the achievement outcome in the study. We used the half-standardized partial regression coefficient for PPE as our measure of effect size; we divided the estimate of the regression coefficient for PPE by the standard deviation of the achievement outcome variable. The half-standardized partial regression coefficient measures the number of standard deviations of change in achievement associated with a $1 change in PPE. As shown in Table 1, 3 of the 11 studies with the requisite control variables failed to provide the standard deviation of the achievement outcome variable.
The third inclusion criterion was related to a study’s level of analysis. Most of the eight studies that reported on all requisite control variables and the standard deviation of the achievement outcome collected and analyzed data at the level of the school district or the student. Two studies, however, were at the school or classroom level. We decided not to conduct a separate analysis of these two studies, leaving us with six studies that met the following criteria:
A model that controls for race and either prior achievement or socioeconomic status
The reporting of the standard deviation of the outcome achievement measure
Data collected and analyzed at the student or district level
A list of ineligible studies is provided in Table 1.
Table 2 provides descriptions of the six studies included in the meta-analysis. Three of these studies included data at the level of the district, and three included data at the level of the student. All six studies focused on high school students, with one study also including achievement measures from middle schoolers. Two of the studies published used the Test of Economic Literacy as an outcome, with the remaining studies using either achievement or measures of readiness for college such as the SAT and ACT. Four of the studies used national samples of students, with two studies focused on single states (Virginia and Michigan).
Characteristics of Studies Included in the Meta-Analysis
Note. The sources are listed in the order of their discussion in the text.
In this discussion, we present the results of the robust variance meta-analytic model separately for the district- and student-level data set. For the three studies that included data at the district level, we could estimate 13 effect sizes. The results yielded a very small, nonstatistically significant but positive effect size (b = .00114, SE = .000287, t = 3.97, p = .13, 95% confidence interval [CI; −.00159, .00387]). To put the mean effect size in context, every $1,000 increase in PPE would result in a 1.14 standard deviation increase in achievement. However, the confidence interval includes zero, indicating that, at the district level, PPE is not related to academic achievement. For the three studies that included data at the student level, we estimated eight effect sizes using the half-standardization procedures. The meta-analytic results again indicated a very small, nonstatistically significant but positive effect size (b = .000067, SE = .000035, t = 1.91, p = .29, 95% CI [−.0003, .00043]). Based on this very limited data set, one can conclude that PPE may not be related to academic achievement. In comparison, Greenwald et al. (1996) found a median PPE effect of 0.0003.
Summary of Findings
The models used in the education production function literature are diverse and limited our efforts to conduct a quantitative synthesis. Researchers focusing on the relationship between PPE and student achievement do not agree on a standard set of covariates, nor do they use similar measures of achievement. Of the identified 24 studies that examine PPE, half did not include any control for race in the model, a critical omission given the Coleman Report findings that inspired this area of research. The studies eligible for the analysis were all focused on students at the high school level and mostly focused on a single achievement measure such as economics or math. Generalizations from this set of studies to U.S. schools are thus not warranted.
Our major finding of a nonstatistical relationship between PPE and academic achievement is based on a small set of studies at both the district and the student levels. Although we are confident that our meta-analytic results are representative of the education production function literature, they are, as with all meta-analyses, not necessarily representative of the population of students or districts in the United States. Our finding, while statistically consistent with Hanushek’s (1989) original argument, is not based on a strong evidence base. The studies identified for this review use narrow achievement measures, employ cross-sectional or short time frames, and use broad controls for race, socioeconomic status, and prior achievement. Jackson, Johnson, and Persico (2014) note that the research on educational production functions uses statistical methods (e.g., ordinary least squares) that cannot isolate the causal effects of PPE due to unresolved endogeneity biases.
Many research studies have examined school inputs and outputs, but the literature is too diverse and too inconsistent to employ meta-analysis to estimate a reliable effect. Even if we had been able to obtain a defensible estimate of the magnitude of the relationship between PPE and achievement, the studies included in the meta-analysis do not shed light on the underlying mechanisms of that relationship or how to use PPE to increase achievement. A more important finding of our synthesis is that most of the studies identified do not control for basic student background differences, highlighting a major flaw in this literature.
In these ways, the recent literature fits squarely in the tradition set out by the Coleman Report. It is a legacy that is both enlightening and confounding. The Coleman Report finds distinct disparities in academic achievement among racial groups, and yet the studies in our sample failed to account for race in their models. Since the Coleman Report came out, the broader education research field focused on student outcomes has recognized the importance of race, socioeconomic status, and prior achievement in understanding student performance. Furthermore, policymakers and researchers worked for years under the assumption that schools had little influence on student achievement; numerous scholars sought to test this proposition despite the methodology used in the Coleman Report, which was inadequate to justify the claims put forward (Hoxby, 2016). The question of whether monetary resources directly translate into achievement gains has not been addressed adequately in the literature, and may be impossible to explore given the complexity of schools and school districts and the critical importance of student background in examining student performance. Instead, researchers should reframe the question into one about how school resources could influence student outcomes across a wide range of school contexts and student needs.
One productive line of research centers on the impact of school finance reform. Prior to the 1970s, local property taxes funded most schools, leading to large within-state differences in PPE among districts (Howell & Miller, 1997; Hoxby, 1996). Since 1971, many states have implemented school finance reform through court or legislative action (Jackson et al., 2014). These efforts have been successful, to varying degrees, in equalizing school spending in low- and high-income districts. Jackson et al. (2014) show that low-income children born between 1955 and 1985 in districts that implemented school finance reform completed more years of education, earned higher incomes, and were less likely to experience poverty than poor children in districts that did not implement reform. Due to the broader set of outcomes Jackson et al. examine, their findings suggest room for new questions and research that examines how resources can be deployed to support student outcomes in a socioeconomically diverse, multilingual, and multiracial society.
Conclusion
The question of how resources relate to achievement is a recurring one in American education. It dates as far back as the 1867 law establishing the federal Department of Education to promote the “establishment and maintenance of efficient schools”; this question is also tied to debates about race, equity, and the purposes of schooling in American society. The way the question is asked and the methods used to answer it are shaped by history, as well as a reflection of the scholarly, social, and political concerns at any given time. There is no “best method” to answer the question unequivocally. Educators, researchers, policymakers, and other stakeholders should join forces to carefully consider what may be the best and most effective questions to ask in pursuit of shared goals in the interest of the education welfare of children and public education.
In examining the question of how resources have related to achievement over the past century and a half, one may conclude that responses have been driven by disciplines outside of education: Rice as a physician; Thorndike, Terman, and Bloom as psychologists; Coleman as a quantitative sociologist; and Hanushek as an economist. Most researchers had little relationship or intimate knowledge of the inner workings of schools. Rice attempted to understand the work of schools and how they used resources but did not have the perspective of a teacher or administrator. Psychologists focused on children and whether or not they could be taught—in other words, whether or not intelligence was fixed or malleable. Quantitative sociologists and economists created models and functions to isolate the impact of particular resources in relation to achievement. The Coleman Report narrowed the definition of equality of education opportunity in just this way: “equality of results, given the same individual input” (Coleman et al., 1966, p. 14). The result was to exclude or radically simplify the complex roles of social factors such as race, class, and gender that provide the inextricable context for schooling. The wide range of social scientists that focused on trying to ascertain whether or not one could tie student achievement to the resources devoted to a school or school district rarely included researchers from the field of education with substantial experience and familiarity with schools and school systems.
This lack of understanding the problem and its context on the part of those researching the perceived problem—a mismatch of resources and results—may have set up a situation where the research question was flawed. In the late 19th century, the focus was on understanding what kinds of schooling were available and where so that public education could be promoted nationally. The early 20th century brought the rise of efficiency and a new business model in the service of creating systems of public schools to educate all American youth. Resource achievement research has been conducted in the vein of this efficiency approach, which is characterized by evaluating the inputs and outputs of schooling. This approach was countered by a “compensatory” approach (Bloom et al., 1965), which focuses on identifying and implementing interventions for more equitable schooling with a secondary concern for efficiency (Coleman et al., 1966). Modern scholarly and political debates about education are often caught between these two approaches, whether aligning clearly with one side or attempting to argue an effective claim to both (e.g., Reading Recovery has been identified as a “what works” intervention, one reported to be highly efficient and highly effective in supporting literacy development for students in poverty and minority students; Institute of Education Sciences, 2013).
A critically important point to note is that the statistical models used to examine the relationship between school inputs and student outcomes are not consistent across studies and do not support causal inferences. Policy has been made on the basis of these studies without appreciation of their limitations despite prescient warnings (Murnane, 1991). Moreover, some policymakers seek research in support of their preexisting views without acknowledging the implications of selecting research for ideological purposes (Plank, 2011). Clearer measurement of education constructs, well-defined and articulated methods, and comprehensive results reporting are called for. Without such efforts, data will be limited and conclusions drawn will be suspect.
When one considers the questions that can be asked about education resources and student achievement, especially in this era of “big data,” one must not confine consideration to a narrow sphere of experts, funders, and the various public and private entities that generate massive data sets. Researchers must continuously and vigorously engage with the stakeholders that will benefit from the work—policymakers, school districts, communities, and families of all backgrounds—to ensure that the questions asked have shared value in the pursuit of better education outcomes for all children. The history of this research, from the “Radical Republicans” of the late 1860s to the present, illustrates the dangers of failing to do so.
Modern researchers understand the value of asking complex, sophisticated questions and considering a range of factors in their attempts to understand school systems and student achievement. These questions must be generative. How does one reimagine research on school resources and student achievement as part of a concerted, deliberate collaboration among scholars, practitioners, policymakers, and communities? What processes can help researchers develop questions reflecting shared goals for the education welfare of children and in the best interests of school systems? Scholarship must be critical, research projects must be interdisciplinary, and engagements must be with a diverse range of stakeholders in public education. Researchers must endeavor to be rooted in the realities of those who understand schools at the ground level and those who work with students from all backgrounds and learning styles. They must build partnerships that allow them to ask questions about education that best serve all children in a diverse society.
Footnotes
Appendix
Acknowledgements
The meta-analysis reported in this chapter was partially supported by a grant from the National Science Foundation, NSF DRL-0723543
