Abstract

Research thrives on focus. The idea is to bring empirical evidence to bear on precise questions about relationships among a few clearly defined variables, holding constant the vast network of other relationships that constitute what William James called the “blooming buzzing confusion” of the world we negotiate in everyday life. It’s a powerful idea, and recent research on teacher effectiveness is prime example.
In contrast, effective practical work requires expert judgment about how varied courses of action might bear on a host of interrelated conditions. That’s why specific studies don’t tell us what to do, even if they sometimes have large potential for informing expert judgment. Researchers want their work to be used, so we flirt with the idea that value-added research tells us how to improve schooling. I think this volume has some potential to subdue this flirtation. The hard question is how to integrate the new research on teachers with other important strands of research in order to inform rather than distort practical judgment.
We now know that teachers vary dramatically in their impact on student learning as measured by test scores (Chetty et al., 2011; Kane & Staiger, 2008; Nye, Hedges, & Konstantopoulos, 2004), with long-term consequences for educational attainment, employment, earnings, and social behavior (Chetty et al., 2011). The most convincing evidence comes from studies that randomly assigned students to teachers within schools. The impacts on test scores tend to fade over time, but the more important, long-term outcomes appear nonetheless (see Raudenbush, 2014a, 2014b, for a review).
We also have better evidence than ever before on what good teaching looks like from the point of view of trained observers and the students themselves. Ingenious research design again made a big difference. The Measurement of Effective Teaching Project (MET, 2012) randomly assigned rosters of students to teachers within each of many schools and computed value-added statistics for each teacher. The project correlated these experimental measures of teacher impact with classroom observations, student perceptions, and value-added computed in a different, nonexperimental year. The relationships, though not strong, were statistically reliable and large enough to be of practical significance.
Having obtained an important result, a standard ritual in education research is to investigate “the implications for policy and practice.” Perhaps we should base decisions about the hiring, retention, and professional development of teachers on these apparently validated measures of teaching quality.
However, a fundamental question arises: Does the answer to a precisely focused research question, by itself, have implications for practical action? What happens when “other things” are not held constant? For example, if we give school district leaders the authority to base personnel decisions on value added or other measures of teacher effectiveness, will we undermine the authority of the school principal? And if we do, will that be good or bad for students? Will we encourage or discourage teacher collaboration in solving problems of instruction, as Johnson (this issue, pp. 117-126) reasons, and how will that affect student learning? Applying James Coleman’s ideas, Johnson asks whether we will augment or undermine our capacity to mobilize the “social capital” of the school to strengthen the human capital of the teacher.
Recent Research on School Effects
One of the uncertainties that should arise in thinking about how to use findings from teacher effectiveness research is how to integrate the results with other research findings. Over the past few years, we have accumulated exceptionally strong evidence about the impact of the school as an organization on student learning, with a special focus on the learning of ethnic minority children from low-income families.
Recent randomized experiments quite firmly establish that specific schools have highly varied effects. Some of these studies capitalize on the fact that new charter schools are often oversubscribed: More students apply than can be admitted. By law, applicants to these charter schools are offered admission on the basis of a randomized lottery. Researchers are now following the outcomes of winners and losers of such lotteries. For example, Gleason, Clark, Tuttle, Dwoyer, and Silverberg (2010) studied 36 charter schools that used random lotteries and found that being admitted to a charter school made little difference in outcomes—on average. However, the variation in the impact of being so assigned was substantial. This result gives strong causal evidence not that charter schools per se are particularly effective but that some schools are substantially more effective than others. Dobbie and Fryer (2013) developed a model to predict this variation and found that five policies, including “frequent feedback to teachers, the use of data to guide instruction, high-dosage tutoring, increased instructional time, and high expectations,” (p. 3) explain approximately 50% of the variation in experimentally estimated school effectiveness. Leaders in these effective charter schools take pains to ensure that teachers follow school-wide procedures and norms. These randomized studies corroborate the work of Bryk, Sebring, Allensworth, Easton, and Luppescu (2010), who found that effective school leadership, professional work communities, and even school safety during a base year predict subsequent changes in school-level value added to learning. A recent large-scale experimental study of small public schools of choice in New York (Bloom & Unterman, 2012) provides exciting evidence of the capacity of well-designed schools to improve the outcomes of disadvantaged students on a large scale.
Your Foreground Is My Background: Combining Research on Teachers and Research on Schools
The MET study is often described as a study of many hundreds of teachers, as if those teachers were not clustered together within schools. Though counterintuitive, this notion makes some sense because, like other convincing studies of teacher value added, MET randomly assigned rosters of students to teachers within schools. All school differences in average effectiveness were eliminated by means of statistical technique known as regression with “school fixed effects.” This is a smart strategy for isolating teacher differences, the purpose of the study. However, policy makers want to use value-added statistics and observations to compare teachers who work in different schools. This poses several problems. First, comparisons of teachers working in different schools confound teacher effectiveness with school effectiveness, school context (e.g., neighborhood safety), and school composition (e.g., student social background) (see Raudenbush, 2013). Second, and even more fundamentally, the notion of effective teaching in these studies is entirely relative. When we cite this research as evidence that some teachers are highly effective, we are simply saying that teachers within schools are heterogeneous. We have no evidence that any teacher is producing the learning required to achieve a clearly stated social objective, though we are encouraged to think that rewarding the best teachers and releasing the worst will move us in the right direction. However, it is plausible that teachers are heterogeneous in part because the current way we organize schools encourages autonomy without providing much guidance (Cohen & Moffitt, 2009; Johnson, this issue; Lortie & Clement, 1975). If so, it makes sense to think more about how the school can organize instruction in a way that reduces heterogeneity while enabling children to better achieve clearly stated learning goals.
In contrast, the most convincing studies of the impact of school organization exploit lotteries that assign children at random to schools. The idea is to demonstrate the impact of well-conceived school-wide regimes of instruction and social behavior, revealing the power of a coherent social environment to reshape the life chances of disadvantaged children. However, such studies are, by design, incapable of revealing the impact of individual teachers; a skeptic may argue that particularly effective schools are simply collections of talented, motivated teachers.
In sum, recent research on value added tells us that, by using data from student perceptions, classroom observations, and test score growth, we can obtain credible evidence of the relative effectiveness of a set of teachers who teach similar kids under similar conditions. Recent careful research on school effects tells us that committed educators working collectively can create a learning environment in which low-income minority youth thrive academically. Taken together, these two lines of research inspire optimism about educational improvement. But the two strands of research are insufficient guides for action. To act, we need more informative evidence and a theory to synthesize all of the evidence.
Incorporating Additional Evidence
Perhaps the most informative additional evidence comes from systematic attempts to change instruction. Borman et al. (2007, 2008) provide a partial answer to the assertion that an effective school is nothing more a collection of good teachers. In these studies, researchers assigned schools at random to innovative school-wide instructional regimes, revealing substantial positive effects on student learning. The teaching force remained stable, so the findings cannot be attributed simply to the aggregate personal qualities of the teaching force. In contrast, Allen, Pianta, Gregory, Mikami, and Lun (2011) provide convincing evidence that expert observation and feedback at the level of the individual teacher can substantially improve instruction. So teacher improvement without school organizational change is also possible. And Tyler, Taylor, Kane, and Wooten (2010) found that repeated classroom observation and feedback connected to teacher evaluation improved learning. In that study, certain aspects of the school organization are modified to improve the work of individual teachers: The culture of the organization and the practice of teaching change together.
Theory
The articles in this volume raise questions that a powerful theory of school improvement would have to answer. Goldhaber (this issue, pp. 87–95) raises key questions about how to use new advances in assessing teaching in order to improve the supply of teachers, improve teacher effectiveness, keep effective teachers in the classroom, and release those who are not effective. A key question his article does not answer is who will use this information and at what level of the education hierarchy. Like the Race-to-the-Top legislation, the article probes whether states and districts might become more selective about which teachers are effective, which teachers stay in the work force, and which teachers earn more. As Goldhaber notes, answering these questions affirmatively requires that we tolerate fairly substantial errors in classifying teachers. This will sound forbidding to critics of high-stakes use of data, though conventional decision-making practice is probably based on much less credible evidence.
The problem is that the amount of information a district can collect on each teacher is small. If a district administrator uses data like that collected in MET, we can anticipate that an attempt to classify teachers for personnel decisions will be characterized by intolerably high error rates (see Raudenbush & Jean, 2012). And because districts can collect very limited information, a reliance on district-level data collection systems will likely generate the kinds of distorted behavior described by Ballou and Springer (this issue, pp. 77–86), in which teachers attempt to “game” the comparatively simple indicators. In contrast, the amount of information that a well-functioning school can collect on each teacher is large, a key point made by Goldring et al. (this issue, pp. 96–104). An effective school will likely be characterized by effective “distributed” leadership, meaning that expert teachers share responsibility for classroom observation, feedback, and frequent formative assessments of student learning. Intensive professional development combined with classroom follow-up generates evidence about teacher learning and teacher improvement. Such local data collection efforts have some potential to gain credibility among teachers, a virtue that seems too often absent, as described by Jiang, Sporte, and Luppescu (this issue, pp. 105–116).
If the school is potentially rich in information about teacher effectiveness and teacher improvement, it seems to follow that key personnel decisions should be located firmly at the school level. In the effective schools described earlier, the school principal has considerable authority in making these decisions. Yet we must face the reality that school leadership is highly variable in quality. If the school is to become the locus of dense information and sound decisions, we must adopt a strategy for improving school leadership.
The question of theory looms large. A vision of educational improvement rooted solely in improving state and district use of measures of teacher effectiveness seems limited. This vision overstates what state and district officials can learn from data and fails to incorporate the lessons of a broad range of research on school effectiveness and instructional improvement. A more powerful theory would hold the multiple levels of the education bureaucracy accountable for work each level can realistically perform. States can develop standards and assessments. Districts can put in place coherent curricula and staff development to support it, while holding schools accountable for implementation and student outcomes. If schools are accountable for their outcomes, then expert teachers have an incentive to share their expertise with novice teachers, and school leaders (including expert teachers) will have an incentive to hire and retain a strong faculty. If all teachers have an interest in the overall quality of the school, teachers have an interest in holding themselves and their peers accountable for doing their best. This sense of collective efficacy seems to be a key feature of the highly effective schools described earlier.
My aim is not to propose a developed theory but, by sketching how the various levels of the education hierarchy might collaborate, to question a simplistic notion of how policy makers might make effective use of research. Research on value added has no implications for action in isolation from other research about effective schooling because, like any research program, the narrow conditions that make value-added research convincing limit its direct applicability in practice. However, in combination with a wide range of related research and a coherent theory of action, research on teacher effectiveness indicators can increase the potential for educational improvement.
