Abstract
We studied how six high-performing, high-poverty schools in one large Massachusetts city implemented the state’s new teacher evaluation policy. The sample includes traditional, turnaround, restart, and charter schools, each of which had received the state’s highest accountability rating. We sought to learn how these successful schools approached teacher evaluation, including classroom observations, feedback, and summative ratings. We interviewed 142 teachers and administrators and analyzed data using sensemaking theory, which considers how individuals’ knowledge and beliefs, the context in which they work, and the policy stimuli they encounter affect implementation. All schools prioritized the goal of developing their teachers over holding them accountable. The spillover effects of additional policies affected how these schools approached implementation.
F
The problems with traditional teacher evaluation had been well documented before RTTT. Teachers were unevenly observed and intermittently evaluated, and meaningful feedback was in short supply. Most teachers received the highest rating despite evidence that many of their students were failing and few teachers gained tenure or were dismissed on the basis of performance (Donaldson, 2009; McLaughlin & Pfeifer, 1988; Toch & Rothman, 2008; Weisberg et al., 2009). In response, policymakers, the media, and reformers turned to teacher evaluation as a means to improve schools.
Analysts and advocates who acted as “policy entrepreneurs” (Kingdon, 2002) when these teacher evaluation policies were being enacted typically marshaled evidence on behalf of a particular definition of the problem that they thought evaluation policy should address. Some focused on the need for greater professional accountability and criticized school officials for failing to hold teachers responsible for their performance or dismissing those who were ineffective (Thomas, Wingert, Conant, & Register, 2010; Weisberg et al., 2009). Others faulted current evaluation systems for not supporting teachers’ development, instead offering only cursory observations and shallow, irrelevant feedback (Curtis, Weiner, & Aspen, 2012; von Frank, 2013; Weingarten, n.d.). RTTT guidelines called for states to address both development and accountability for teachers by creating “rigorous, transparent, and fair” annual teacher evaluation systems that would “include timely and constructive feedback,” “provide teachers with data on student achievement growth,” “differentiate effectiveness using multiple ratings,” and be used “to inform decisions about staff development, compensation, promotion, tenure, certification, and removal of ineffective teachers” (Institute for Education Science [IES], 2014, p. 5).
Once new evaluation policies were adopted, analysts differed in their descriptions and assessments of them. For example, Steinberg and Donaldson (2016), who reviewed all 46 new policies, concluded that most took “a developmental stance towards evaluation” by requiring a summative rating to be linked to professional development for teachers, including those judged to be underperforming (p. 350). In contrast, the federal IES (2014) provided convincing evidence of movement toward accountability in the new laws, reporting that the policies that most aligned with RTTT priorities “focused on using multiple measures to evaluate teacher performance (30 states); using multiple rating categories to classify teacher performance (31 states) and conducting annual evaluations (25 states)” (p. 1).
However, new laws and regulations alone will not determine whether evaluation reform ultimately leads to better development and/or more accountability for teachers. Decades of research about policy implementation show that changes in policy do not necessarily produce intended changes in practice (McDonnell & Weatherford, 2013; McLaughlin, 1987; Pressman & Wildavsky, 1973). In the case of teacher evaluation, the financial inducements and mandates of RTTT, which exerted such strong influence on state-level decision makers, might not hold sway in local districts and schools, especially in states with less centralized education systems. Local officials might disregard the state’s requirements, feign endorsement, or fail to fund new programs. Even if district officials were to support the state’s agenda, school-based administrators might not be able or willing to implement it. As Lipsky (2010) explains, it is “street-level bureaucrats,” such as principals and teachers, who decide the ultimate fate of policies enacted by higher levels of government. Moreover, because policy entrepreneurs and policymakers at the federal and state levels promoted different problem definitions during their debates and subsequent drafting of laws and regulations, principals implementing new evaluations might favor one or the other problem definition (lack of accountability or development), and adopt different practices in response. Ultimately, to understand whether and how new teacher evaluation policies affect teachers and their work, we must investigate day-to-day responses by those within the schools.
Therefore, this research addresses the current lack of knowledge about the process by which principals and teachers together implement teacher evaluation. We focus on three key aspects of evaluation—classroom observations, follow-up feedback, and summative ratings of teachers’ instruction. We explore how the educators in these schools interpreted and acted on the new state policy’s opportunities and requirements and, overall, whether they used evaluation to promote greater accountability, more opportunities for development, or both.
Our study makes three important substantive and theoretical contributions to the literature. First, it analyzes implementation practices within the context of particular school work environments, which have been found to significantly influence teachers’ improvement over time (Kraft & Papay, 2014). Most earlier studies of teacher evaluation, which we review below, focus on principals’ approaches or teachers’ responses to evaluation. Although a few report on the responses of both groups, they do not examine their interaction or account for school-specific contexts. The only exception is our own earlier study of evaluation practices in six schools of various achievement levels within one urban school district (Reinhorn & Johnson, 2014). In the current analysis, we examine how principals and teachers within schools approached evaluation and how, in the context of their particular professional environment, they interacted to shape the character and impact of evaluation.
Second, the sample of schools that we selected enabled us to make valuable contributions to the literature on teacher evaluation and policy implementation. To identify promising practices in the most challenging environments, we chose to study only high-poverty schools that had demonstrated success, based on both the state’s accountability system and public reputation. We selected six public schools located in one city, where students from high-poverty communities attended a wide range of traditional and charter schools as well as several schools under state supervision. Although all schools were required to implement the state policy on teacher evaluation, each also was subject to different sets of laws and regulations and, thus, operated in a distinct policy context. This feature of the sample enabled us to detect important spillover effects of other state and local policies during the process of implementing the new teacher evaluation law.
Third, our use of sensemaking theory allowed us to illuminate the complex process by which those in the schools implement evaluation policy. Although earlier research about evaluation rarely used sensemaking theory, findings from prior studies discussed below suggest that the practices documented by scholars were substantially affected by sensemaking, including administrators’ beliefs about the purpose of evaluation and its potential to improve instruction (Kraft & Gilmour, 2016; Reinhorn & Johnson, 2014), the capacity and commitment of evaluators to provide accurate assessments and useful feedback (Halverson, Kelley, & Kimball, 2004; Kraft & Gilmour, 2016; Rigby, 2015; Taylor & Tyler, 2012), teachers’ views about the value of the process (Donaldson & Peske, 2010; O’Pry & Schumacher, 2012; Sartain, Stoelinga, & Brown, 2011), and the institutional conditions that enabled or constrained those implementing the policy to follow through on their intentions (Donaldson & Papay, 2015; Drake et al., 2015; Firestone, Nordin, Shcherbakov, Kirova, & Blitz, 2014).
Relying on Spillane, Reiser, and Reimer’s 2002 formulation of sensemaking, we considered how individual cognition, situated cognition, and policy stimuli influenced participants’ responses to the policy. Informed and guided by this theory, our analysis revealed how evaluators and teachers in these successful schools shared many of the same beliefs about the purpose and best practices for evaluation (individual cognition), yet experienced and responded to evaluation policy from different contexts (situated cognition). Principals were influenced by their overall assessment of teachers’ strengths and needs within their school as well as by the surrounding requirements and resources provided by the district and the state. For their part, teachers responded to a context that was largely determined by the priorities, skills, and beliefs of their principal. By distinguishing among these contexts and the responses they evoked and provoked among the two key groups of participants in the process, we demonstrate the value and versatility of sensemaking theory in explaining complex processes of policy implementation.
Overall, we found that, despite important differences among the six successful schools we studied (e.g., size, curriculum and pedagogy, student discipline codes), administrators responded to the state evaluation policy in remarkably similar ways, giving priority to the goal of development over accountability. Most schools not only complied with the new regulations of the law but also went beyond them to provide teachers with more frequent observations, feedback, and support than the policy required. Teachers widely corroborated their principal’s reports that evaluation in their school was meant to improve their performance and they strongly endorsed that priority. In fact some, including those with considerable teaching experience, suggested changes that might lead to more frequent and more content-specific feedback. However, we also learned that, because the schools operated in distinct policy contexts, they were differentially affected by the spillover effects of additional policies, which created different opportunities and constraints during implementation.
Theoretical Framework
Sensemaking theory, grounded in the scholarship of cognitive and social psychologists such as Weick (1995) and Greeno (1998), provides an informative perspective for understanding the complex ways in which individual and social cognition shape a policy’s ultimate impact within schools and classrooms (Coburn, 2001, 2005; Spillane et al., 2002). As Spillane et al. (2002) observe, sensemaking has been used largely to explain the failed outcomes of policy implementation, which some earlier scholars had attributed to opposition by implementing agents. However, as Spillane and his colleagues observe, it is incorrect to portray those who fail to implement policies with fidelity as “resisters and saboteurs working to circumvent policy proposals that do not advance their self-interest” (p. 391). They suggest that “most conventional theories fail to take account of the complexity of human sensemaking” (p. 391).
Rather than assuming that implementers deliberately dodge policymakers’ intentions, scholars who use sensemaking theory seek to understand how these individuals’ beliefs and mental models influence what they do and how they do it. In education, most studies that rely on sensemaking theory focus on curriculum and instruction (Coburn, 2005; Cohen, 1990; Cohen & Hill, 2001) or related topics, such as data analysis by teachers (Bertrand & Marsh, 2015) and schools’ efforts to improve when they are under sanction (Anagnostopoulos & Rutledge, 2007; Finnigan & Daly, 2012). With very few exceptions (e.g., Halverson et al., 2004; Rigby, 2015), sensemaking theory has not been used systematically to analyze the implementation of policies designed to improve the performance of the teaching force. Yet, it has great promise for doing so.
Sensemaking theory allows us to go beyond findings about compliance and noncompliance and to learn how those who interpret and act on policies are influenced by their beliefs and mental models drawn from prior experience, the social and institutional context that shapes the options and obligations that they have, and the expectations of policymakers and others who highlight the purposes of the policy as it is promulgated.
Spillane et al. (2002) identify three components of sensemaking that together determine what educators do in response to reform initiatives. As they explain, their framework is designed “to make transparent the cognitive component of the implementation process by identifying a set of constructs and the relations among those constructs” (p. 388). The first construct, “individual cognition,” designates the ways in which individual implementers “notice and interpret stimuli and how prior knowledge, beliefs, and experiences influence construction of new understanding.” The second, “situated cognition,” includes elements of the implementers’ social and organizational context that affect their responses. The third, “policy stimuli,” refers to messages and signals that implementers receive about policymakers’ purposes and how practitioners ought to respond to them (pp. 388–389).
Using this framework, we can see how an individual principal responsible for teacher evaluation might be influenced by these three components of sensemaking. For example, in the past, the principal might have developed skills in supervising teachers and, therefore, believes strongly in the potential of evaluation to improve teachers’ practice. However, such skills and beliefs alone are not likely to determine this principal’s actions because her work is embedded in the social and institutional context of a state education agency and a school district, which set standards for how she should implement the policy and provide (or fail to provide) resources to support her efforts. Therefore, this principal’s understanding of the opportunities and constraints that her context affords—her situated cognition—further shapes her response. Third, she will be aware of the messages conveyed by policymakers in promulgating the law and its regulations. Those policy stimuli might give priority to the goal of achieving greater accountability or providing more support for teachers’ development, or both. Given explicit, and sometimes competing, priorities promoted by policy entrepreneurs and policymakers at several levels, this principal might respond to one message, while discounting others. Together, these three sources of influence contribute to the principal’s response.
Teachers, too, bring to implementation prior knowledge and beliefs about the worth of evaluation; they are influenced by their particular school and district context; and they respond to policy stimuli in the media or memoranda from state or local officials. For example, teachers might welcome being observed by a knowledgeable, constructive critic (individual cognition) but doubt that their principal has the time to visit their classrooms and complete the required number of observations (situated cognition). Furthermore, they may hear conflicting messages about the new policy’s goal (policy stimuli). Therefore, they might be unlikely to invest much hope or confidence in the new evaluation process, instead rejecting it, passively complying, or taking a wait-and-see attitude.
If research is to inform practice and subsequent policymaking, as well as future research, we must acknowledge this complexity and turn our attention to those in schools and classrooms. This is complicated terrain for researchers because interactions occur not only among the three components that influence individual actors, but also among the actors themselves. However, by better understanding this interaction as principals and teachers together implement evaluation policy, we can also deepen our use of sensemaking theory in research.
Therefore, for this study, we adopted an inductive, bottom-up approach to data collection and analysis, considering whether the six successful schools of our sample were implementing the new policy; how evaluation was being conducted within each school; whether the school’s educators focused on goals of development, accountability, or both; and how administrators and teachers viewed the strengths and weaknesses of their evaluation process. Throughout our analysis, we noted where participants’ views aligned, conflicted, or diverged. When their views differed, we used sensemaking theory to identify and understand the source of those differences. We recognized that, because each school functioned in a distinct policy context as it implemented the new evaluation policy, unintended spillover effects of additional state and local policies might further affect what the schools could and would do in supervising and evaluating teachers, and we sought to investigate those.
Literature Review
Over a decade ago, value-added analyses documented wide variation in teachers’ effectiveness within schools (Rivkin, Hanushek, & Kain, 2005; Rockoff, 2004), suggesting that administrators did not routinely base employment decisions (reappointment, tenure, and dismissal) on evidence about teachers’ performance. Since that time, researchers, analysts, and the media have attributed this weak role of evaluation to various factors in the assessment process—limitations of the instruments used, irregular and incomplete teacher observations, uninformed summative judgments, and administrators’ reluctance to dismiss teachers in the face of political opposition (Donaldson, 2009; McLaughlin & Pfeifer, 1988; Toch & Rothman, 2008; Weisberg et al., 2009).
Much recent research about evaluation has focused on factors that might support or compromise the use of evaluation to achieve greater accountability, such as how teachers’ instruction is assessed and whether evaluators’ ratings are valid and reliable. In their influential 3-year Measures of Effective Teaching Project (Cantrell & Kane, 2013), researchers found that combining multiple measures—standards-based observations, student surveys, and student achievement gains—yields the best estimates of teacher effectiveness, as measured by standardized tests. However, studies included in this project did not investigate how the perspectives of principals and teachers influenced their approaches and responses to evaluation, specifically, how those implementing the process shaped its use and outcomes.
A few studies conducted in local districts and charter management organizations (CMOs) before the new laws were enacted provide insight into both principals’ approaches and teachers’ responses to evaluation. In Cincinnati, Taylor and Tyler (2012) examined the effects on individual teachers’ instruction of being evaluated by trained and experienced evaluators—including administrators and peers—who combined “multiple, detailed classroom observations and a review of work products” with face-to-face feedback (p. 2). Midcareer math teachers’ effectiveness improved and those gains persisted, and even increased, several years after the evaluation cycle. Although the authors did not conduct interviews that might have illuminated those findings, they suggest that these positive effects were due to the rich feedback and encouragement that teachers received.
Donaldson and Papay (2015) interviewed 95 administrators and teachers in New Haven, Connecticut, during the second year of implementing TEVAL, a locally negotiated system in which evaluators assessed teachers based on classroom observations and students’ academic growth. TEVAL offered support for ineffective teachers’ development and, if that failed, a clear pathway to dismissal. Prior to TEVAL, no tenured New Haven teachers had been dismissed within recent years for poor performance. Two years after its enactment, all teachers who were notified that they would be dismissed chose to resign (1% of tenured teachers, 3% of nontenured teachers), suggesting that changes in the context and explicit policy stimuli that emerged during negotiations influenced both principals’ and teachers’ responses. The policy clearly increased accountability for teachers’ performance in the district, although teachers reported that it did not lead them to change their instruction.
When Donaldson and Peske (2010) interviewed teachers about evaluation in five charter schools of three prominent CMOs, teachers voiced very positive views about weekly or biweekly observations and follow-up coaching sessions with their evaluator. Evaluation in these schools served primarily as a process for professional growth, with far less emphasis on summative assessment and accountability. Dobbie and Fryer (2011), who studied a number of practices in 15 New York City charter schools, found benefits for students in frequent observations. Those whose teachers received formal and informal feedback 10 times or more per semester showed higher learning gains than the students of other teachers. These researchers did not investigate how the participants’ expectations and understandings affected what they did.
In several studies, scholars explored how using a standards-based instrument influenced both principals’ actions and their teachers’ responses. Notably, they found wide variation across schools. For example, Halverson et al. (2004), who interviewed principals and teachers in 14 schools of one large school district, found that, although most principals wanted to make the evaluation system work, they found it “very time consuming” and lacked the skills needed to provide worthwhile feedback, especially for “accomplished teachers” (p. 177). Teachers generally appreciated the feedback they received. However, implementation “varied substantially from school to school,” depending on the principals’ understanding of their role, their context, and the standards-based instrument they were required to use (p. 179). Kimball and Milanowski (2009) interviewed two subgroups of principals, whose ratings of teachers correlated either well or poorly with their students’ achievement. Although the researchers sought to understand whether differences in principals’ motivation, knowledge, skill, and/or school context explained differences in the validity of their ratings, they found no clear explanation. Sartain and colleagues (2011), who interviewed 37 Chicago principals, all trained to conduct observations using a standards-based framework, concluded that the improved evaluation tools and training could support principals in conducting reliable assessments and engaging teachers in reflective, developmental conversations. Subsequent case studies suggested that, from the teacher’s perspective, the benefits of the evaluation process depended less on their principal’s having strong knowledge of the evaluation framework than on having well-developed coaching skills and high engagement in the process. Similarly, O’Pry and Schumacher (2012) interviewed new teachers in Houston and found that their views of evaluation were determined largely by the value they thought their principal placed in the process. Rigby (2015) studied the factors that influenced six first-year principals’ understanding of their role in teacher evaluation and found that they responded to a variety of influences, including messages about instructional leadership conveyed by their preparation programs.
Seeking further insight into these findings about the variety of principals’ approaches to teacher evaluation, Kraft and Gilmour (2016) interviewed 24 principals in one large urban district about how they conducted observations and feedback. Participants said that, in feedback sessions with teachers, they focused largely on the evidence that supported their ratings but that they had little to suggest about what teachers should do to improve. This suggests that principals who are inclined to promote teachers’ development may not be sure how to do so.
In 2014, we analyzed teacher evaluation as part of a larger study of school-based human capital practices. The sample for that study included six low-income schools with varying levels of student achievement, all part of one large urban district (Reinhorn & Johnson, 2014). We found wide variation in principals’ approaches and teachers’ responses to teacher evaluation, even though the principals were using a common standards-based instrument and receiving the same memoranda and training from the central office. One principal, whose primary goal was to increase accountability, concentrated on building dismissal cases for a few weak teachers. Four others treated evaluation largely as an inconsequential, bureaucratic obligation, which, therefore, had little impact on either teachers’ development or accountability. In only one school did the principal integrate formative and summative assessment in a process that promoted both development and accountability. Notably, teachers in that school reported that their experience with evaluation helped them to improve their instruction.
Together, these studies, which focused on how evaluation practices were conducted in single districts or schools, reveal the wide range of principals’ practices and teachers’ responses. Recently, with the enactment of new state laws, researchers have expanded their perspective to explore how local district administrators and school leaders interpret and implement a single policy.
Early Evidence About the Implementation of New Evaluation Policies
Preliminary studies of new state teacher evaluation policies find similar patterns of response across a number of states and large districts. Although that research does not examine school-based practices, it does offer initial insight into how principals are responding to the new laws. It suggests that principals tend to focus primarily on development rather than accountability and that contextual conditions, especially constraints on administrators’ time, affect what they do.
For example, Donaldson and Cobb (2015) studied the pilot implementation of Connecticut’s new evaluation policy in 14 districts and learned that principals had trouble finding time to complete the six required observations (three formal, three informal) for all teachers. Still, teachers appreciated receiving more feedback than they had in the past. In a similar study in New Jersey (Firestone et al., 2014), principals also reported that their state’s requirements for observations exceeded what they could complete. Whereas the Connecticut teachers interviewed by Donaldson and Casey generally were optimistic about what their new system might provide, many New Jersey teachers voiced distrust and concern about their policy’s threat to job security, possibly in response to policy stimuli from the state that stressed the goal of accountability over development. Only about one third of New Jersey teachers said that the process “helped them improve some aspect of their teaching” (p. 22).
Drake et al. (2015), who interviewed central office personnel in five large, urban school systems located in different states, learned from district administrators that principals concentrated first on developing teachers throughout the school year and only subsequently focused on dismissing those who did not improve. Dismissal, they found, was a “byproduct of a long support process” (p. 119). However, they also reported that the time required to complete the dismissal process was a barrier to principals’ pursuing it, indicating that this constraint may have affected how principals chose to spend their time.
Across all the studies discussed above, we have found only two analyses of teacher evaluation conducted before or after RTTT—Halverson et al. (2004) and Rigby (2015)—where scholars used sensemaking theory to interpret data. However, we can see in the findings of other studies evidence of the prominent role that cognition plays in what principals do, what teachers expect, and how they respond. For example, we see how individual cognition affects the process. Principals vary in their ability to observe and assess teachers’ instruction, as well as in their confidence about offering advice for improvement. Whatever their beliefs about the potential promise of a good evaluation system (individual cognition), teachers assess their principal’s goals, how well he seems to understand instruction, and whether he can offer constructive feedback and good support (situated cognition). Principals, too, are influenced by the opportunities and constraints they perceive in their context (situated cognition), including the strengths and weaknesses of their teachers, as they decide where to focus their efforts and commit their limited time. Finally, there is evidence about the effects of policy stimuli on principals’ and teachers’ views about whether evaluation should support development, increase accountability, or both.
Therefore, in our effort to better understand how evaluation functions in schools that successfully serve students from high-poverty communities, we chose to use sensemaking theory because it could illuminate the views and activities of principals and teachers as they implemented the new law together. We began our research aware that administrators’ and teachers’ responses showed great variation across earlier studies, which seemed to result from differences in both individual and situated cognition. We expected to find comparable variation in our sample, given that each school was affected not only by the new state evaluation policy but also by additional state policies that established work rules in traditional schools, regulated accountability, or authorized charter status. This variety meant that principals’ practices were shaped by particular combinations of individual cognition, situated cognition, and policy stimuli. Therefore, important differences in these schools’ day-to-day evaluation practices emerged, largely due to the spillover effects of other state and local policies, which granted some administrators considerably more resources and management discretion than others. However, we were surprised to find more similarity than variation across the sample.
Method
This article is based on a qualitative, comparative analysis (Maxwell, 1996) of data drawn from a larger study examining how six high-performing, high-poverty, urban schools attract, develop, and retain teachers. Here we focus on teachers’ and administrators’ approaches to and experiences with teacher evaluation in their school. We ask the following:
Sample of Schools
Evaluation policies are intended to serve all students, but especially those who have been ill-served by public education, including those living in high-poverty, high-minority communities. Therefore, we decided to study implementation of the new state policy in a single urban center, Walker City, where the effects of local factors, such as cost of living, housing patterns, and community resources would be roughly comparable across schools. (All names used in this article are pseudonyms.) As in many large U.S. cities, students from high-poverty, high-minority communities in Walker City attend a wide selection of public schools, some of which function independent of the Walker City School District (WCSD). Therefore, as we developed our sample, we considered all publicly funded schools within the city limits. Because we initiated this study in the hope of identifying promising or exemplary practices that other school leaders might learn from, we decided to study only schools that were successfully educating their students, rather than a more typical assortment of successful and unsuccessful schools. Therefore, we considered only schools that had achieved the highest rating in the state’s accountability system and were widely viewed as high performing. Although we recognized that the state’s substantial reliance on standardized test scores in rating schools raised legitimate questions about the validity of its decisions about the schools’ standings, these ratings were the best proxy available for identifying schools that were having a positive impact on students’ academic progress.
In seeking to identify elementary and middle schools within Walker City that met our criteria (high-poverty, high-minority student populations and the highest level of academic achievement in the state’s accountability system), we found a relatively short list of candidates. However, among these district and charter schools, even further variety existed because each was also affected by a particular set of policies, including local school board and administrative requirements, the WCSD teachers contract, state accountability regulations, and state charter laws, which further influenced principals’ options and decisions. Therefore, as we selected our sample of schools that successfully served low-income students, we also sought to incorporate variation in school-based policy contexts. We drew up a proposed sample, including one traditional district school, two former turnaround schools, one restart school, and two state charter schools. We asked the heads of these six schools to participate in the study and all agreed. (For descriptive statistics of the sample schools, see Table 1.)
Selected Characteristics of Six Sample Schools
Note. Percentages are approximated for confidentiality purposes.
Before we began data collection, we were generally familiar with these schools’ history, organization, and curriculum, but we knew nothing about how the administrators viewed or conducted evaluation. The purposive nature of our sample allows us to examine the practices of this atypical, but informative, set of schools and to consider the implications of their practices and our findings for others, but it does not permit causal inferences or generalizations beyond the sample.
Data Collection
To understand how principals and teachers interpreted and implemented the evaluation policy, we conducted interviews, analyzed documents, and informally observed in the schools. Between March and June 2014, we conducted 142 semistructured interviews with teachers, administrators, and other staff, including curriculum coaches and program coordinators. We solicited teachers’ participation by email and flyers and followed up on recommendations from those we interviewed about others we should contact. We interviewed between 33% and 56% of the teachers at each school, depending on its size and complexity. (For descriptive statistics about the interviewees, see Appendix A, available in the online version of the journal.) We used semistructured protocols to guide our interviews so that our data would be comparable across sites and across interviewers, while permitting participants to pursue the topics that they thought important. We asked participants about an array of practices that might affect teachers’ work; a major topic of inquiry was teacher evaluation. We promised participants confidentiality and anonymity. All interviews were recorded and transcribed. In the course of visiting the schools to conduct interviews, we informally observed practices in classrooms, corridors, and offices, which we recorded in field notes. We also gathered and analyzed relevant documents, including teacher evaluation frameworks and rubrics; teacher handbooks; school, district, and state policies; and examples of observation feedback to teachers.
Data Analysis and Validity
Before closely analyzing our interview data, we wrote structured thematic summaries following all interviews and used them to identify common themes and differences within and across the sites. We then transcribed all interviews. In developing topical codes for analyzing our data (Miles & Huberman, 1994), we supplemented the etic codes drawn from the literature (e.g., “adminteach” for quotes referring to the relationship between administrators and teachers) with emic codes that emerged from the data (e.g., “demands” for quotes about teachers’ professional responsibilities within the school). The broader study included a wide range of human capital practices at these schools, which led us to create 45 topical codes. Twenty-nine of those are relevant to this analysis of teacher evaluation and are noted in Appendix B, available in the online version of the journal.
To ensure that we would use the topical codes consistently, we independently coded a subset of transcripts and then met to compare and discuss our coding decisions. Through several rounds of this iterative process, we arrived at a detailed understanding of each code and how we would use it. Because we were analyzing data from semistructured interviews rather than structured surveys, several codes often applied to the same data segment. For example, a teacher’s description of a meeting with her evaluator to debrief a classroom observation might be tagged with the codes AdminTeach, CurriculumPedagogy, Eval, and StudentTeach. We used the software, Dedoose, which allowed us to attach multiple codes to a single segment of interview data and then to systematically review participants’ responses, not only by topic but also by descriptors such as role, school, gender, and years of teaching experience. See Appendix B, available in the online version of the journal, for a complete list of descriptors.
Based on coded interview data, we then created analytic matrices (Miles & Huberman, 1994) so that we could closely examine our emerging findings about practices within and across schools. For example, we sought to understand whether administrators and teachers within schools thought that evaluation was being used for the purpose of accountability, development, or both. For each school, we organized the relevant interview segments by role (administrators or teachers) and by purpose (accountability, development, or both). Across all six schools, virtually all participants focused on development as the primary enacted purpose of evaluation; many fewer identified accountability as a secondary purpose. Based on this analysis, we concluded that participants across all schools widely viewed the purpose of teacher evaluation to be improving teachers’ instruction.
In addition to identifying similarities and differences in participants’ accounts of evaluation practices across role and school, we wanted to understand why they responded to the new policies as they did. What evidence was there that they interpreted, implemented, and responded to the evaluation policy based on their own knowledge and beliefs (individual cognition), the school and district context in which they worked (situated cognition), and/or their interpretations of policymakers’ intentions in enacting the policy (policy stimuli)? Therefore, we used these theoretical constructs of sensemaking to further hand-code the data. Such codes were not mutually exclusive. For example, a teacher’s response to a question about whether she found her evaluator’s feedback helpful might include her views about what might be better (individual cognition), a judgment about whether her evaluator had experience teaching her subject (situated cognition), and what she thought the policy was supposed to accomplish (policy stimuli). To illustrate our analytic process, we include in Appendix C, available in the online version of the journal, a small matrix from one school, comparing several responses by administrators and teachers about the purposes of evaluation, which are further organized to illustrate the relevant constructs of sensemaking.
In reporting our findings below, we have been as precise as possible about the extent to which they apply across our sample. When the data permit, we state clearly whether a finding applies to all six schools or only some, noting the specific exceptions. At other times, we specify the number of individuals who mentioned or discussed a particular idea, to demonstrate the extent of the findings, which were often surprisingly consistent or notably rare. However, responses cannot always be quantified. Because we conducted semistructured interviews, in which we posed an initial set of questions to all participants and then allowed them to offer various descriptions and explanations in response, all participants did not comment on every topic we discuss here and some offered additional observations. For example, although we can confidently say that 91 of the 97 teachers provided a positive assessment of the evaluation practices in their school, 11 of those 91 went on to suggest that this process could be improved with more observations and/or a better match between teacher and evaluator by subject area. Notably, the comments from these 11 interviewees were spontaneous suggestions; other participants might have held the same views, but did not express them. As we present and discuss our findings, we explain in as much detail as our data allow the extent to which they apply in this particular sample.
The Massachusetts Evaluation Policy
The Massachusetts teacher evaluation policy was initially developed in 2011 by a 41-member Task Force, which included public education administrators, teachers, and representatives from universities, foundations, business, unions, and nonprofit agencies. In its report (Massachusetts Task Force on the Evaluation of Teachers & Administrators, 2011), the Task Force explains that it sought to “transform educator evaluation from an inconsistently applied compliance mechanism into a statewide catalyst for educator development and continuous professional growth” (p. 5). The policy’s five-step cycle for continuous improvement called for teachers to set “specific, actionable and measurable” goals for improving their practice and students’ learning. Subsequent state regulations gave evaluators the final say in what the teachers’ goals would be and required them to conduct midyear formative and end-of-year summative assessments, selecting from four ratings for each of the four standards of professional practice, an assessment of progress toward meeting their goals, and an overall rating.
In recommending the report to the State Board, the Commissioner of Education urged that in establishing regulations the Board should balance the proposed policy’s focus on teachers’ development with the schools’ obligation to ensure accountability—to “dismiss educators who, despite the opportunity [to improve], continue weak performance” (Chester, 2011, p. 6). Thus, the state’s policy stimuli signaled the importance of both development and accountability.
Following the policy’s enactment, state officials conducted an ambitious program of information and support for districts and schools as they implemented the policy. They created a comprehensive Model System incorporating the statewide standards for a teacher’s effectiveness in four areas (curriculum, planning, and assessment; teaching all students; family and community engagement; and contribution to professional culture). In response to the policy’s requirement that schools use “a rigorous and comprehensive” rubric, the Model System provided rubrics for each standard, describing practice in detail at each level of effectiveness (proficient, exemplary, needs improvement, and unsatisfactory). Districts were also required to incorporate multiple measures of effectiveness in their assessments, including standardized test scores, although that feature of the evaluation system was not yet required during the year that we conducted our study and, therefore, we did not include it. In partnership with a statewide teachers union, state officials also developed model contract language, which local districts could use to facilitate efficient and effective collective bargaining about evaluation. State-level education staff also created templates, planning and implementation guides, and other resources that districts and schools could use as they enacted the five components of the evaluation cycle: self-assessment, goal setting and educator plan development, evidence collection, formative evaluation, and summative evaluation. The state agency then created an extensive website that made these resources readily available online (http://www.doe.mass.edu/edeval/model/), and professional staff conducted a series of webinars for teachers and administrators. In implementing the new Massachusetts teacher evaluation policy, districts and schools could “adopt or adapt” the state’s comprehensive Model System or “revise” their existing evaluation system to meet the new regulations.
Importantly, the new evaluation policy was not the only policy that influenced how principals implemented evaluation. After the state introduced Massachusetts Comprehensive Assessment System (MCAS) in 1993, it took an increasingly active role in monitoring school performance. In 2012, following RTTT, it began to rate schools and districts from Level 1 (highest performing) to Level 5 (lowest performing). The commissioner could designate chronically low-performing schools at Level 3 or 4 for turnaround, restart, or closure. If a turnaround school failed to improve, it could be placed at Level 5, triggering state receivership.
Most schools worked hard to avoid public censure for receiving a low rating or being designated a turnaround or restart school. However, if the state intervened, the process of reopening the school with a new principal and newly constituted faculty provided flexibility and resources that other schools did not have. Principals not only had the right to hire, fire, or transfer their teachers but also had discretion in allocating teachers’ time. Furthermore, a school in which the state had intervened could apply for additional funding through federal grants, which these schools won and then used to expand instruction, employ additional staff (including administrators), and/or fund more professional development time for teachers. State-sponsored charter schools had the authority to hire and fire staff and allocate their time, much as principals of turnaround and restart schools could do. They also could extend the school day and year. Furthermore, state charter schools could rely on their board of trustees for fund-raising that would supplement their allocation from the state.
One City, Different Policies
Despite their proximity within Walker City and the fact that they were required to adhere to the same evaluation policy, each of the six schools functioned in a distinct policy context.
One Traditional District School
Dickinson School PK–5, a century-old neighborhood school, served a largely immigrant student population. Well regarded within WCSD, Dickinson experienced very low teacher turnover; in 2014, more than half of Dickinson’s teachers had taught there more than 20 years. Dickinson’s Principal Davila, the school’s sole administrator, complied with the WCSD teachers contract, as well as other district and state policies. She had no special autonomy over staffing.
Two Turnaround Schools
In 2010, state officials intervened in both Hurston School (PK–8) and Fitzgerald School (PK–5) due to persistently poor performance. At the time, Hurston was functioning as a school with special status in WCSD, which gave the principal broad autonomy over staffing, curriculum, budget, and schedule. Nevertheless, the school was failing. Under RTTT guidelines, the newly appointed principals could replace all teachers, but retain no more than half. Hurston’s Principal Hinds replaced about 80% and Fitzgerald’s Principal Forte replaced about 65%. Each school continued to enroll students from the same local community. By 2013, both had shown substantial growth on the MCAS, allowing them to exit turnaround status at Level 1 of the state’s accountability rankings.
After turnaround, both Hurston and Fitzgerald remained WCSD district schools, although each retained significant school-based control of its organization and management, making it possible to continue many of its initiatives. Hurston reverted to being a school with special status in WCSD, whereas Fitzgerald successfully applied to become a state innovation school within the district, which brought with it many of the management autonomies previously available during turnaround.
Two State Charter Schools
Naylor Charter School (K–8) and Rodriguez Charter School (PK–8) had opened in Walker City 10 and 20 years earlier. Each was responsible not only to the state but also to its own charter board. In 2014, Naylor was one of three schools in the expanding Naylor Charter Network. Although located within WCSD boundaries, these schools were exempt from local district policies. The state required both to meet accountability standards and the new requirements for teacher evaluation.
One Restart School
Kincaid Charter School (6–8) had been a failing WCSD middle school in 2011 when the state selected the Kincaid Charter Network, a local CMO, to restart the school consistent with RTTT guidelines. All the school’s teachers could reapply for positions in the new school, but few did and none was rehired. When it reopened, all administrators, teachers, and staff were new, although approximately 80% of the students returned, a higher proportion than typically had re-enrolled in the past. As a restart school, Kincaid functioned as an in-district charter school; the local union represented its teachers, whose pay aligned with WCSD’s negotiated scale. However, it was exempt from other contract provisions. Within 2 years, Kincaid made significant gains in student test scores and achieved a Level 1 rating from the state.
Findings
WCSD, and therefore its three schools (Dickinson, Fitzgerald, and Hurston), along with Rodriguez Charter adopted the Model System. Kincaid Charter and Naylor Charter revised their existing process to meet the state’s new regulations. All schools used detailed, standards-based frameworks for observations and assessments. Teachers participated actively in setting individual goals for student performance and professional practice, and in completing self-assessments prior to formal evaluation meetings. Every teacher eventually received a summative rating at one of four levels of proficiency.
As the findings presented below explain, in implementing the policy, principals were strongly influenced by their own beliefs and knowledge about teacher evaluation and school improvement (individual cognition). However, their priorities and practices were also shaped by the broader context of laws and regulations described above as well as resources and training provided by their district or CMO (situated cognition). Finally, they were attentive to the national debate about evaluation, federal RTTT guidelines, and the Massachusetts law, guidelines, and support for implementation (policy stimuli). Although teachers had their own views about what evaluation could provide (individual cognition), they responded primarily to their principal’s skills and approach to evaluation, which established the context in which they experienced the new policy (situated cognition). Messages from education officials about the purpose of evaluation (policy stimuli) seemed to have far less prominence for them than what their principal conveyed through words and actions. Thus, sensemaking substantially influenced evaluation practices within each of these schools.
The Schools Focused Primarily on Development
In all six schools, administrators said the primary purpose of evaluation was to develop their teachers, many of whom they had hired. In fact, several suggested that developing teachers was their main responsibility as principal. For example, the head of the Naylor Charter Network explained that the Naylor Schools’ commitment to improvement called for frequent observations:
We do believe that our whole mission is to be a human capital organization. We are here to develop our kids. We are here to develop our teachers. We are here to develop our administrators. This is what we do and what we’re all about.
As a result, she said, Naylor’s administrators focused their time on conducting observations and providing feedback: “[W]e think that the most transformational thing is just being in people’s classrooms, talking with them afterwards.”
Across all six schools, when we asked about teacher evaluation, administrators began by describing their approach to formative, rather than summative, evaluation. An administrator at Kincaid Charter explained, “We believe that teachers, or just people in general, grow with immediate feedback and real-time instruction on how they are performing and giv[ing] them an opportunity to fix that in the moment.” Kincaid’s Principal Kain realized that some teachers would encounter more challenges than others, but he expressed confidence that those in his school had the expertise needed to support them: “If we have teachers who are struggling, it’s often times . . . rooted in a lack of skill. Our job as coaches is to help them with that.” Administrators in this sample provided detailed explanations about how, as one Fitzgerald evaluator said, they “coach [teachers] or find them the help they need.”
Virtually all the 97 teachers we interviewed (94%) confirmed their administrators’ accounts, explaining that evaluation focused primarily on promoting their growth. For example, a Naylor Charter teacher said she appreciated administrators at her school “continuing to develop [her] as a professional.” Her colleague said that teachers wanted to improve: “[I]n order to be an employee here, regardless of if you’re an academic teacher, co-curricular teacher, even a staff member, you need to want feedback . . . to get better . . . [to] help [our] kids.”
Teachers across all six schools said that their evaluation process was embedded in a professional culture that promoted continuous improvement. Many described their school much as this Rodriguez instructional coach did: “[T]here is a culture here that is about continually getting better . . . that means that every teacher, whether they’re getting feedback from an administrator or not, is trying to get better in their own practice.”
Observation and Feedback Practices Supported Development
Under the state’s Model System (Massachusetts Department of Elementary & Secondary Education, 2012), the number of required observations depended on the teacher’s summative rating in prior years. Principals had to observe a new teacher or a teacher who had been rated “unsatisfactory” once during an announced visit and four times during unannounced visits. A returning teacher with a history of “proficient” or “exemplary” ratings had to be observed only once, unannounced. A teacher who had received a “needs improvement” rating had to be observed at least twice, unannounced.
In this sample, most teachers, regardless of their summative rating from the prior year, described a year-long intense cycle of observations, followed soon after by written or oral critique and recommendations. Approximately, 40% of the 97 teachers we interviewed said they were observed and received feedback at least twice per month. Approximately, 20% estimated that they were observed and given feedback between five and 10 times per year. The final 40% estimated that they had been observed one to four times per year, consistent with state and district policies. Although all schools met or exceeded the state’s recommendations, the frequency of observations varied within schools, with novice teachers and new hires being observed more often than others.
Kincaid Charter and Naylor Charter administrators expected that every teacher would be observed and provided face-to-face feedback at least twice per month and all teachers interviewed said that evaluators met, and often exceeded, that standard. At Hurston PK–8 and Rodriguez Charter, administrators aspired to observe every teacher and provide feedback at least once per month, although participants said their school did not have the resources to maintain that level of supervision for all teachers. All administrators routinely conducted “walk-throughs” for quick observations, often providing feedback after they did. Dickinson and Fitzgerald teachers said their principals spent a great deal of time in classrooms throughout the school, but most described receiving formal feedback no more than a few times per year, consistent with the Model System.
It is notable that principals in the six schools expressed similar beliefs about the benefits of frequent observation and feedback, which was not the case in studies summarized earlier, where principals’ views varied widely. Moreover, all principals in our study were, themselves, recognized for being strong, experienced teachers, and therefore, they brought to the process not only beliefs about the benefits of developing teachers but also knowledge and skills about how to do so.
Three schools (Naylor Charter, Kincaid Charter, and Hurston PK–8) had sufficient administrative resources so that principals could spend their time observing teachers and providing feedback, while other administrators in their school handled responsibilities such as student discipline, family interaction, and operations (e.g., building maintenance, bus schedules, data analysis, and budgeting). The director of operations at Hurston PK–8 explained, “My role has been to block and tackle so that [the evaluators] can spend their time in the classroom coaching teachers and at [teacher] team meetings.” Notably, however, Hurston’s administrative team was the same size as Kincaid’s, even though Hurston served twice as many students and teachers. Therefore, regardless of Principal Hinds’s intentions, Hurston’s evaluators could not provide the same level of intense supervision for all teachers as their counterparts at Kincaid and Naylor charter schools could. A school with no more than a principal and assistant principal—and in the case of Dickinson only a principal—could not reassign management responsibilities to others so that the principal could spend most of her time observing classes. Therefore, in addition to the principals’ knowledge about instruction and their beliefs about the benefits of evaluation (individual cognition), the realities of their context (situated cognition) affected what they could do. All principals in this study thought that observing and providing feedback to teachers fostered improvement. However, whatever their beliefs and intentions, principals with less administrative support coped with more demands and greater constraints on their time, affecting how they implemented the policy and how teachers responded to it.
Teachers’ Responses to Observation and Feedback Were Overwhelmingly Positive
Although some principals expressed concern about not being able to observe teachers more often, virtually all teachers (91 of 97) endorsed the observations and feedback they received as a positive part of their professional experience, and some went on to suggest ways to improve the process. Many praised their school’s current approach to evaluation with phrases such as “hugely helpful” or “super supported.” One Naylor teacher in her seventh year of teaching and her fifth at the school viewed “the constant feedback” she received as a highlight of her job: “I constantly feel like I’m getting better.” Similarly, a Fitzgerald teacher said that evaluation kept her “on [her] toes” and “helped [her] to do better as a teacher.” A Dickinson teacher echoed, “It’s helpful always. A second person can notice things that you, yourself, in the job miss.” Others suggested that administrators demonstrated their commitment to teachers by observing their classes often. A third-year teacher at Rodriguez Charter said, “Just the fact that my administrators are in my classroom on a weekly or bi-weekly basis, I think shows a lot. It means that they care, and they’re here to help us.” Across various levels of experience, teachers said that evaluators had greater credibility and gained a better understanding of individuals’ professional experience and struggles if they observed them teaching often. A teacher at Rodriguez Charter with 12 years of experience, explained, “He knows my flaws. He knows what I need to work on. He knows me better than I know myself as a teacher.”
Many teachers suggested that the professional culture within their school encouraged them to view evaluation as a developmental process. One from Naylor Charter explained,
[I]n my old school . . . you’d find out they were coming in [to observe]. It was like you were ready for a performance. You had to do it perfectly and then they never came in again until three or four months later. [Here], they’re just always in and out of the room, so it’s nice. It’s a good way to just always keep getting better.
A colleague offered a similar perspective: “When I know something isn’t going well, I will ask to be observed so that I can get help on that. That’s totally the mentality here. I don’t like someone seeing me doing something wrong.” However, she said she would “prefer that . . . [to] not getting any guidance on it.” Therefore, teachers’ responsiveness to the evaluation process was nurtured by their beliefs that they could improve if their evaluator understood instruction and could offer thoughtful recommendations (individual cognition). They also were encouraged by evidence from their school context that observations were a priority and that investing in teachers’ development paid off (situated cognition). Had the professional norms of these teaching environments discouraged rather than promoted hard work and risk taking, teachers might have viewed their evaluator with suspicion, rather than openness and optimism.
Teachers Reported Receiving Detailed, Helpful Feedback
Most teachers said that their evaluator provided detailed feedback about a range of topics including classroom management and pedagogical strategies. A teacher at Naylor Charter said that her supervisor had helped her improve the questions she asked during read-alouds, so that she could promote students’ higher order thinking. Naylor administrators provided written feedback on a Google Doc shared with the teacher, a practice that teachers appreciated because it helped them track their progress over time. A Kincaid Charter teacher described the feedback she received about the ratio of teacher talk to student talk during her lesson and a Fitzgerald teacher said that she received helpful feedback about pacing lessons. Hurston PK–8 evaluators emailed their feedback within 24 hours and also recorded observations on a Google Doc shared with the administrative team. Hurston teachers repeatedly described the postobservation feedback as timely, specific, and relevant. Several showed us the comments they received. Principal Hinds wrote to one about the pacing of a lesson and to another about the interaction during class discussion as the teacher responded to every student’s contribution before the next student spoke. In both cases, he suggested changes that the teachers found helpful.
Across schools, teachers often said that the observation and feedback process had led them to change their pedagogy, which they thought was improving. A Kincaid Charter teacher with 6 years of experience said that she had become “a drastically better teacher” in the 3 years that she had worked at the school, “because it’s been this really close cycle of being observed and then feedback on what to work on, and then observed again and then feedback again.” An elementary teacher at Rodriguez Charter with 10 years of experience described how his principal provided him with observational feedback over time, which supported him in dramatically shifting his instructional approach.
She kind of said, “Why don’t you think about doing this, that and the other thing?” I said, “Okay” and that first two, three, four weeks of changing my entire teaching style was a disaster. . . . I started tweaking it and figuring it out and she would come in and observe and critique and give good positive comments and negative ones. . . . Looking back I can’t even imagine how much of a disservice I was doing to kids back then in the way that I was teaching.
Integration With Other Practices
Evaluation did not stand alone, but rather, was coordinated with other professional learning opportunities (e.g., instructional coaching, teacher teams, whole school professional development, and peer observation), all part of an integrated strategy for improving teachers’ practice across the school. The evaluator’s approach to instruction was, therefore, situated in and responsive to other current practices. Although teachers experienced evaluations primarily as individuals, they often looked to colleagues on their instructional teams for additional feedback about their teaching and further suggestions about how to improve. Teachers also reported that some administrators remained informed about their professional practice by reviewing unit and lesson plans and participating in team meetings, which focused on data analysis and curriculum planning. In explaining the support they received, teachers often did not distinguish between practices that were part of the evaluation system and others intended to improve their practice; as they saw it, all were part of an ongoing, integrated improvement process. However, many identified classroom observations and feedback as the most valuable component of their school’s developmental process. Teachers expressed confidence that administrators and coaches would provide support if they, as teachers, began to shift their practice in response to feedback (situated cognition). For their part, principals believed that it was their responsibility to support teachers’ development (individual cognition) and they organized their time to make that happen as best they could (situated cognition).
The goals that teachers were required to set in the evaluation process served to integrate elements of the individual, team, and whole school improvement processes. The state’s Model System asserted, “Connecting individual educator goals to larger school and district priorities is critical to effective implementation. Strong vertical alignment between individual, team, school and district goals will accelerate progress on the goals.” In the four schools using the Model System, teachers explained that they chose goals that were related to team-based and school-wide goals. An administrator at Hurston PK–8 described the advantage of explicitly linking these processes. “So there’s . . . an alignment from the individual to the team to the school that makes sense to people, and it doesn’t feel like they’re pulling [evaluation] goals out of the hat.” Thus, as administrators in these schools implemented the evaluation policy, they were influenced not only by their own beliefs and the context in which they worked but also by policy stimuli, conveyed by the state through both its regulations and Model System.
Evaluation for Accountability Was Grounded in Evaluation for Development
The school leaders’ focus on teachers’ development was not seen to be in tension with the summative evaluation process, which included mid- and end-of-year meetings to discuss teachers’ ratings on the evaluation rubric as well as their progress toward designated goals. Although participants realized that formal evaluation could be used to inform current and future employment decisions, the use of evaluation for accountability did not dominate the process.
Teachers widely said that the formal evaluation process provided an accurate assessment of their professional practice; only two teachers we interviewed said that it did not. Unlike ongoing formative supervision, which usually focused on no more than a few issues at a time, summative evaluation was comprehensive and detailed. Evaluators rated teachers’ performance on all four standards, each including a number of indicators defined by specific elements and descriptors. In five schools, accompanying rubrics depicted typical performance at each of four levels of accomplishment. Several teachers said that they respected the fact that even the summative rating process encouraged improvement. This was especially true when teachers believed that their evaluators had a deep understanding of learning and teaching. A Naylor teacher explained that he was graded on “a rubric from 1 to 4, just like the students are.” He noted that, despite receiving “mostly 1.5s, some 2s and a 3,” he was not discouraged, although “in another context, I would have felt like they were starting a paper trail to fire me.” He explained that his current administrators had different expectations for beginning teachers. “They expect their first-year, maybe even second-year teachers, to be working hard, but not really mastering all the things they want you to master.” This assessment of the school-based context for teacher evaluation profoundly influenced his response (situated cognition). Overall, teachers trusted that the summative process, like the formative process, was intended to support their growth and, therefore, they could accept tough assessments of their practice.
“No Surprises” in Formal Evaluations
Teachers in the sample frequently suggested that formal evaluation, as it was implemented in their school, was simply an outgrowth of day-to-day supervisory practice. One teacher described formal evaluation as “just a tiny piece of what we already do on a daily basis.” Another teacher echoed many in explaining that the summative evaluation process “shouldn’t be a big deal. It really hasn’t [been].” Another expanded, “I know exactly what my goals are and what I’m doing, so it wasn’t surprising how she graded me. I graded myself really hard, but I knew what I was working on, so it made sense to me.” This teacher’s individual beliefs about the legitimacy of the rating system led her to take it seriously, although the situational component of sensemaking also came into play; she could be confident that acknowledging her shortcomings would not lead to reprimand. As Principal Hinds explained, his intentions matched the teachers’:
I think evaluation without ongoing supervision is meaningless. It becomes only the way that you terminate employment. And so my belief is that I and every member of my administrative team needs to be in classrooms all the time, giving feedback, asking questions, pushing people. And then all of that just gets rolled into an evaluation. No surprises.
Therefore, despite the focus on development, teachers and administrators across the sample said that evaluation could be used to hold teachers accountable for meeting professional expectations. Teachers believed that, when warranted, evaluators did give teachers low ratings in summative assessments, which could lead to dismissal. At all but one school, teachers and administrators spoke of teachers who were, or had been, on improvement plans with goals that they had to meet to keep their job. Administrators at all schools also told of teachers who were not offered a position the following year or had been dismissed midyear for being ineffective. This contributed to a sense of accountability and made the evaluation process a serious one, but it did not seem to generate fear or undermine the teachers’ trust in their evaluator or the system. Therefore, the fact that these administrators were intent on developing their teachers did not mean that they avoided dismissing or counseling out those they thought should leave. They were aware of, and responded to, the policy stimuli endorsing greater accountability through evaluation.
For most teachers across schools, sensemaking influenced how they responded to formal evaluation. Because their summative assessment grew out of frequent informal and formal observations with feedback, they granted it legitimacy, which they might have withheld if classroom visits were rare or they found feedback vague or off the mark. Based on their acquired understanding of their school context, teachers did not expect or want a rubber stamp of approval, nor did they think that falling short of the highest rating would be the first step out the door. Teachers widely expressed confidence that they were beneficiaries, rather than casualties, of their school’s evaluation process. By contrast, teachers surveyed or interviewed in various studies discussed earlier differed widely in whether they thought evaluation in their school served either them or their students.
Shortcomings in Supervision and Evaluation Processes
Despite the overwhelmingly positive views of supervision and evaluation, teachers and administrators encountered challenges in implementing the policy. Among the most important were subject-based mismatches between evaluator and teacher and the demands that frequent observations placed on evaluators’ scarce time. Both were contextual factors that affected teachers’ views about the quality and potential usefulness of their evaluators’ feedback and assessment (situated cognition). Administrators and teachers thought that their evaluation process could be substantially improved by successfully addressing those limitations.
Mismatches Between Evaluators and Teachers
Across schools, teachers expressed confidence in their evaluator’s knowledge about classroom management and general pedagogy. However, 11 of the 91 teachers expressed respect for their evaluator’s pedagogical expertise, but disappointment that he or she lacked instructional experience in the subject they taught and, therefore, could not offer detailed, subject-specific recommendations. These were spontaneous comments, and others may have held similar views but did not mention them. Notably, however, 8 of these 11 teachers had eight or more years of teaching experience. For example, a middle school math teacher at Hurston PK–8 said that, although she found her administrator’s comments “affirming,” she found her math colleague’s feedback more helpful. “He just knows more about the content. He can tell if students are understanding or not a little bit more than [administrators] can because not everybody’s an expert in everything.” A history teacher at Kincaid Charter who was supervised by a former English teacher said that the feedback often focused on how to teach writing through history, but neglected the “nitty gritty of history.” At Kincaid Charter, teachers of students with special needs expressed concern that their supervisors lacked experience and knowledge about special education. One called the feedback “very standard . . . cookie-cutter.” Another said, “I think there’s still huge amounts of growth I could make, but it’s hard accessing that growth when the [evaluators] don’t know what you’re doing.” Several administrators acknowledged that they could not realistically provide pedagogical advice in every subject, at every grade level. At Fitzgerald and Rodriguez Charter, instructional coaches supported teachers in planning and teaching mathematics and literacy, but those content experts did not conduct formal evaluations.
Insufficient Time
Second, some participants said that evaluators lacked the time they needed to provide comprehensive observation and feedback for all teachers, confirming findings in several studies discussed earlier (Donaldson & Cobb, 2015; Drake et al., 2015; Firestone et al., 2014; Halverson et al., 2004). Virtually all teachers we interviewed suggested that they were grateful for whatever attention they received, but six of those—five having seven or more years of experience—explicitly said that they wanted more attention than their supervisors could provide. As one Fitzgerald teacher said, “I think it would be a lot more powerful if administrators were able to be in the classrooms a lot more.” Principals at Dickinson, Fitzgerald, Hurston, and Rodriguez Charter talked about the daunting demands of conducting frequent observations and providing detailed feedback for all teachers. Most evaluators had between 15 and 20 teachers to supervise, but some had more. Principal Hinds at Hurston had the most—39. Principal Forte at Fitzgerald said, “We just can’t keep up. We’re lucky to have two of us [principal and assistant principal].” Similarly, an administrator at Rodriguez Charter said, “I have 20 people I evaluate and supervise, and it feels like too many to me. I’m always thinking, ‘Oh, I haven’t been there for so long!’”
At these schools, as in those studied by Halverson et al. (2004), teachers generally said that administrators spent more time supervising new and struggling teachers than proficient, experienced teachers. Veterans suggested that they understood why novices’ needs took precedence, but nevertheless, some wanted more support for themselves, because they knew they could improve. For example, a Hurston teacher with 10 years of experience said, “I would like more feedback [from] someone who knows my classroom, has seen Student A in October and now can tell me how Student A progressed in March.” Her colleague with 9 years of experience wanted to be observed more often so that she could have in-depth discussions about her “delivery of instruction,” such as, “Did it make sense to do that activity . . . in groups?” Although these teachers pointed to limitations in the current evaluation process, they still appreciated their school’s focus on development and valued the feedback they received.
Summary and Discussion
Much recent quantitative research about teacher quality is intended to identify causal connections between a policy and its outcomes; however, such research rarely illuminates the process of implementation. Those who make policy and those who implement it need to know more than whether a policy “works.” They also need to understand how those effects are achieved and what factors promote or compromise successful implementation. As this study shows, sensemaking theory can support and guide such inquiry.
Drawing upon sensemaking theory (Spillane et al., 2002) and employing qualitative methods, we conducted this comparative case analysis to learn how one state’s teacher evaluation policy was implemented day to day. By choosing to study six successful schools enrolling many students from high-poverty communities, we hoped to identify effective practices that others might learn from and possibly use in their own schools. The Massachusetts teacher evaluation policy, like those in many states, specified two purposes, developing teachers’ professional skills and increasing accountability in employment decisions. However, the policy gave priority to the goal of development, which the state further reinforced with its Model System and numerous other supports for implementation.
Informed by sensemaking theory, we found that, as they implemented evaluation policy, these six principals drew upon both their knowledge and skills about good teaching and a commitment to use strategies that would support teachers’ development (individual cognition). They also had a clear understanding of what their school’s particular policy context encouraged and allowed, and they capitalized on the opportunities it provided (situated cognition). Furthermore, they recognized the state’s policy stimuli, which highlighted the importance of teachers’ continuous development, while not ignoring the importance of dismissing weak teachers. This focus, conveyed by the state’s Task Force report, its Model System, and additional training and supports, aligned with these principals’ professional priorities. Although no principal suggested that the policy stimuli dictated his or her focus on development, we did find that they were influenced by some of the state’s supports, especially its Model System. To the extent that the principals raised concerns about how to provide an evaluation system centered on teachers’ development, it was because they thought they lacked the time required to do that well, especially given several schools’ limited administrative capacity.
Across all schools, virtually all teachers affirmed their principal’s commitment to developing teachers and reported that they received frequent, useful feedback about their instruction, which they said helped them to improve. Although teachers often spoke about their own needs for improvement (individual cognition), they were most influenced by their principal’s views, priorities, and approach to evaluation (situated cognition). If their principal had not been willing or able to invest in their development, evaluation would not have provided benefits, whatever teachers hoped for.
Teachers recognized that poor performance and failure to improve might lead to dismissal, but they widely expressed confidence in the validity and fairness of their evaluator’s summative assessments, largely because they were grounded in frequent observations by evaluators who had deep knowledge of instruction and offered detailed feedback, and professional support from various sources. Furthermore, by setting goals and completing the self-assessments, teachers played an active role in the process leading to their annual formal assessment. Overall, teachers said that their school’s developmental approach to evaluation was well intentioned and useful, and they judged the state’s approach, in which they willingly and actively participated, to be legitimate. Together, principals’ and teachers’ beliefs and knowledge (individual cognition), their understanding of the context in which they worked (situated cognition), and their endorsement of using the required evaluation process to develop teachers’ talent (policy stimuli) interacted in ways that supported the evaluation policy.
Importantly, this sample of successful schools is unique, and it would be foolhardy to imply that a similar teacher evaluation policy could or would succeed in any or all contexts. Various factors combined to determine the fate and effects of the Massachusetts policy in these schools. Some emanated from the state, including the fact that the policy was informed by a broad array of interests and a rich set of skills among Task Force members and their advisors. Schools benefited from the state’s Model System and additional supports. Other factors embedded in both state and local policy influenced implementation, such as the fact that at one time or another, five of these principals had the authority to choose their teachers. All invested substantially in an intensive, informative hiring process to ensure that teachers were sufficiently skilled and eager to improve. Other policies, including those that authorized charter schools and those that empowered the state to intervene in failing schools, gave several principals control over key elements of teachers’ work, such as the length of the school day and allocation of teachers’ time. Also, schools in turnaround and restart were eligible for supplementary grants, which they had used to fund additional administrators or more time for teachers’ professional development. Unfortunately, some of these autonomies and benefits—especially supplementary funding—disappeared when turnaround schools exited state supervision, making it difficult to continue funding programs that many thought were worthwhile. This study allows us to see how different allocations of autonomy and funding granted by one set of policies can contribute to successful implementation of another policy.
Although many other principals do not have the same degree of flexibility or access to comparable resources, this study suggests that they can still be valued and effective supervisors for teachers. These teachers’ responses document how important a principal’s beliefs and skills can be for teachers, even when the context provides little additional support for the evaluation process. These principals were widely said to have strong instructional skills and, therefore, teachers appreciated the support they received and respected assessments of their practice. Some wanted more frequent observations and better subject matches with their evaluator, but their overall response to evaluation—and therefore, their readiness to invest in it—was positive.
Implications for Policy, Practice, and Research
Readers will see in our findings many implications for how evaluation policy can be developed and successfully implemented in a range of settings. Some lessons have broad relevance, whereas others depend on specific features of districts and schools.
Policy
First, this study provides strong evidence that an evaluation policy focusing on teachers’ development can be effectively implemented in ways that serve the interests of schools, students, and teachers. These cases also suggest that the goals of development and accountability are compatible when summative evaluations are well grounded in the observations, feedback, and support of a formative evaluation process.
This study also reveals the important role that state education officials can play in setting the direction of implementation both before and after a policy is enacted. Many implementation studies find that a particular policy ultimately has little effect on practice. In this case, however, the state relied on capacity building in addition to mandates to promote effective implementation (McDonnell & Elmore, 1987). These schools clearly benefited from the state’s Model System and other supports, which increased principals’ opportunities for agency and leadership as they used evaluation to improve their school. Thus, situated cognition and policy stimuli clearly had their effects on practice. It is also notable that, in implementing the state’s evaluation policy, these schools benefited from the spillover effects of additional policies, including the WCSD teachers contract, which provided for special status schools in WCSD; charter school laws, which granted them considerable autonomy in staffing; and school accountability regulations, which had expanded principals’ discretion over staffing and increased the resources available to build administrative capacity. It would be worthwhile for state policymakers to better understand the interaction of various laws and regulations, which may originate in separate policies, but converge at the school. Mapping how current policies interact within schools could better inform policymaking. As a result of such analysis, policymakers might grant schools more autonomy in staffing than they currently have and ensure that resources for expanded administrative support are widely available, especially to schools with extensive needs.
Practice
These case studies of successful schools demonstrate the pivotal role of the principal in implementing evaluation policies. Unfortunately, districts often do not assign their best principals to the schools that need them most. Our findings suggest that doing so is probably the most important thing district officials can do to ensure that teacher evaluation will be a constructive, productive process.
A principal who knows instruction well can engage the teachers in a process of inquiry, self-reflection, and improvement, which ultimately benefits the entire school. However, principals must help teachers see opportunity in a comprehensive evaluation system and feel confident about seeking help, taking risks, and acting on good advice. How principals frame the purpose and character of evaluation for their teachers will influence teachers’ readiness to benefit from it. In sensemaking terms, principals’ individual cognition is a major factor in determining teachers’ situated cognition. Furthermore, principals can amplify the benefits of evaluation by integrating it with other components of their professional growth system (e.g., instructional coaching or teacher teams) and, thus, provide ongoing, comprehensive support for instruction. Therefore, CMOs and school districts would be wise not only to attend to principals’ belief systems when considering whether to hire them but also to provide ongoing professional learning experiences that build principals’ skills and knowledge as instructional leaders.
We also concluded that, in addition to knowing instruction well, principals must recognize that, among their many important responsibilities, selecting teachers is probably the most consequential. Each of these schools had an intensive hiring process, which ensured that new teachers would expect to improve in response to feedback (Simon, Johnson, & Reinhorn, 2015). Furthermore, principals should do their best to see that teachers are matched with evaluators who know their content area and can model exemplary practices. Some districts successfully do this by assigning peer evaluators who are responsible for both supporting and assessing colleagues (Papay & Johnson, 2012), an option included in the Massachusetts policy, but not part of any of these schools’ approaches. For their part, teachers can step up to new leadership roles as they become available.
Research
Recent research has found that combining multiple measures of teachers’ performance yields more valid assessments of their effectiveness (Cantrell & Kane, 2013). Similarly, our understanding of the factors that contribute to good policy and effective implementation could be enriched if researchers were to rely on a broader array of methods and theory. If research is to inform policymakers and practitioners, then it must investigate and report much more about how policies are implemented within schools. This can be done by drawing upon teacher surveys, administrative data about employment, interviews, observations, and records of day-to-day activities, which together can provide a rich account of policy implementation and effects. In-depth, comparative case studies, such as this one, can yield a rich understanding of how and why a policy works as it does, particularly when the decisions and activities of teachers and principals are viewed from the perspective of sensemaking.
This study yields findings about what works in promoting evaluation for development in a set of successful schools. It would be worthwhile to conduct similar, fine-grained studies in different types of schools, for example, those where principals have little control over hiring and assignment, or where the evaluation policy requires that student achievement constitute a fixed percentage of every teacher’s rating. By analyzing particular policies in context, we can come to understand and explain how implementation in different schools happens at the “street level,” where students are most directly affected. Moreover, it can show how other policies spill over and affect implementation.
Sensemaking theory proved to provide a very useful set of tools for interpreting our data. The three components identified by Spillane et al. (2002)—individual cognition, situated cognition, and policy stimuli—help us to see how complex and multifaceted the implementation process is at the school level. But, as this study makes clear, policies that are intended to improve schooling depend on both administrators and teachers for their effective implementation. By analyzing the three components of sensemaking for each group, we can see that teachers and administrators experience and affect policy implementation differently. For example, we found that situated cognition for principals was affected substantially by autonomy and resources derived as a result of other policies, whereas situated cognition for teachers was largely shaped by the principal’s beliefs and actions. This interaction of subgroups in the process of implementation opens a new aspect of sensemaking theory that warrants further attention.
Teacher evaluation policies currently leave much unspecified. They set the basics, such as designating the assessment tool, requiring a certain number of announced or unannounced observations, and stating whether and how student achievement must be incorporated into summative ratings. Within those boundaries, many outcomes are possible. This flexibility may accommodate variation in local needs and priorities, with some schools focusing implementation on accountability and others on development. In deciding how policies should be written, implemented, and monitored, state officials can benefit from knowing much more about how schools actually implement policies. As we have seen, implementation is profoundly affected by practitioners’ beliefs and knowledge, their assessment of the context in which they work, and their attention to the purposes signaled by policymakers. District and school administrators need to know much more about how best to achieve desired policy goals in evaluation—to develop the teachers they have, to inform employment decisions, and to skillfully combine both.
Footnotes
Acknowledgements
We are grateful to the administrators and teachers who generously participated in this study. Also, we are indebted to the Spencer Foundation and to the Harvard Graduate School of Education for funding this project. Andrés Alonso, Megin Charner-Laird, David K. Cohen, Judith Warren Little, and Educational Evaluation and Policy Analysis’s (EEPA) peer reviewers provided valuable feedback on earlier drafts of this article. Ultimately, we are responsible for all views presented here.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a grant from the Spencer Foundation. In addition, Stefanie Reinhorn and Nicole Simon received support from the Harvard Graduate School of Education’s stipend for dissertation completion.
Authors
STEFANIE K. REINHORN is an educational consultant working with schools and districts on instructional improvement. She is a faculty co-chair for the Instructional Rounds Institute at Harvard Graduate School of Education and teaches in the Teacher Leadership Graduate Program at Brandeis University. She earned her doctorate at the Harvard Graduate School of Education, where she continues as a research affiliate with the Project on the Next Generation of Teachers.
SUSAN MOORE JOHNSON is the Jerome T. Murphy Research Professor in Education at the Harvard Graduate School of Education. She studies, teaches, and consults about teacher policy, organizational change, and administrative practice. She created and directs the Project on the Next Generation of Teachers (
), where she and colleagues examine how best to recruit, support, develop, and retain a strong teaching force.
NICOLE S. SIMON is director of Strategic Initiatives at CUNY John Jay College of Criminal Justice, where she develops pathways for high school and community college students into John Jay and out into careers. She studies the recruitment and selection of teachers and career pathways within teaching. She earned her doctorate at the Harvard Graduate School of Education, where she continues as a research affiliate with the Project on the Next Generation of Teachers.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
