Measuring Teacher Effectiveness Through Meaningful Evaluation

Abstract

While teacher quality is recognized as a critical component in school reform and the pursuit of new teacher evaluation systems has gained national attention, the question of whether proposed teacher assessment models recognize and account for the unique roles and responsibilities of special education teachers has gone largely unnoticed. The purpose of this article is to (a) provide a review of current efforts to reform practices in teacher assessment, (b) describe recommendations for emerging teacher evaluation systems that accurately distinguish between effective and ineffective teachers, and (c) consider the difficulties of implementing these reform measures in the evaluation of teachers who serve students with disabilities. Important consideration is given to understanding the unique roles and responsibilities of the special educator, as well as the use of observation protocols to evaluate instructional practices in the general and special education setting. In addition, this article elucidates the difficulties of incorporating valid measures of student performance as a component of the teacher evaluation process for special education teachers. A summary of recommendations for policy makers serves as the conclusion.

Keywords

teacher evaluation special education teacher value-added school reform teacher effectiveness student performance measures

Building on America’s history of economic and political dominance and as a response to public opinion, accountability systems have placed enormous pressure on school leaders and educators to meet rising expectations to prepare students who are well equipped to lead the nation in the years to come. Subsequently, tangible outcomes for all students—including the differences among student populations—are closely analyzed and scrutinized. Among those populations, students with disabilities and the challenges they face in achieving academic success are well documented (Council for Exceptional Children [CEC], 2012c; Holdheide, Browder, Warren, Buzick, & Jones, 2012; National Center for Education Statistics [NCES], 2009).

Considerable differences have been found between general education students and those students served through special education programs with regard to results on standardized test scores, graduation rates, enrollment in post high school studies, and employment rates (McLaughlin, Smith, & Wilkinson, 2012; Wagner, Newman, Cameto, Levine, & Garza, 2006). Despite the implementation of more inclusive practices and greater numbers of students with disabilities being provided access to curriculum that is aligned to grade level standards, the achievement gap continues to persist. According to the 2009 National Assessment of Educational Progress (NAEP), the difference between the average reading scores of students in general education and those in special education was 35 points in fourth grade and 36 points in eighth grade. Differences in math scores were even more dramatic, with an achievement gap of 21 points in fourth grade, increasing to 58 points in eighth grade (NCES, 2009).

Central to any educational improvement effort is a focus on teacher quality. Darling-Hammond (2012) described the importance of teacher quality, saying,

Educators know—and research confirms that every aspect of school reform depends for its success on highly skilled teachers and principals, especially when the expectations of schools and the diversity of the student body increase. This may be the most important lesson learned in more than two decades of varied reforms to improve schools. Regardless of the efforts or initiative, teachers tip the scale toward success or failure. (p. 8)

Research does, in fact, confirm that an effective teacher makes a positive impact on student achievement gains, as demonstrated by a number of investigations using value-added models (Brownell, Billingsley, McLeskey, & Sindelar, 2012; Carey, 2004; Hanushek, Rivkin, & Kain, 2005; Kane, Rockoff, & Staiger, 2008; Rivkin, 2007). In addition to positive student outcomes, an effective teacher influences the quality of the learning environment. In their analysis of value-added research and experimental studies that investigated the effect of teacher influences, Pianta and Hamre (2009) noted that “teachers play a major role in determining the value of the classroom environment for student learning and development” (p. 110), further confirming that a teacher’s skill and expertise directly affect the learning environment and student outcomes. Together, these research findings have illuminated the critical importance of constructing an accurate and comprehensive set of characteristics to define an effective teacher and to identify teacher qualities that are linked specifically to positive student outcomes.

This article gives careful consideration to current practice and future challenges relative to the teacher evaluation process. A description of initiatives in teacher evaluation and reform efforts designed to effectively distinguish between effective and ineffective teachers is provided. The challenges inherent in implementing a standardized teacher evaluation protocol to assess general education and special education teachers are also discussed in-depth. As educational leaders, the call to expand our thinking to develop a more accurate and equitable determination of teacher quality and effective instruction for all students, including those with disabilities, and recommendations for policy makers serves as the conclusion.

Measuring Teacher Effectiveness

Traditionally, differences in teacher quality have gone largely unnoticed. These differences as well as our failure to recognize them, however, were described in great clarity in The Widget Effect (Weisberg, Sexton, Mulhern, & Keeling, 2009). Based on their research, which included 15,000 teachers in 12 districts of various sizes, Weisberg et al.’s (2009) findings uncovered a telling educational reality—namely that poor performers are ignored and effective teachers go unrecognized.

The widget effect derives its name from the authors’ conclusion that school administrators consistently rate teacher performance at the same level, thus judging teachers as equally effective in the classroom. Conceptually, teachers serve as “widgets” and, as a result, they can be easily interchanged, similar to how one might adjust or improve the mechanical parts of a particular system. Despite specified goals to improve teacher practice and an espoused desire to maintain high expectations for teacher performance, compliance-driven requirements for teacher evaluation have contributed to a process overseen by indifference. District and school leaders pay little attention to variations in teacher performance and fail to recognize and support teachers’ individual differences. In effect, the majority of teachers receive positive evaluations with little regard for their varying contributions to student success. In addition, within the context of their district or building-level teacher evaluation system, most administrators fail to provide frequent, specific and rigorous feedback. As a result, teachers are rarely provided the coaching and support needed for professional growth.

Weisberg et al. (2009) recommended the design and implementation of a comprehensive performance evaluation system that fairly, accurately, and credibly differentiates teachers based on their effectiveness in promoting student achievement. Accordingly, teachers should be evaluated based on their ability to fulfill their core responsibility as professionals—delivering instruction that results in students’ learning. Evaluation systems must delineate clear performance standards, use multiple rating options, adhere to regular monitoring of administrator judgments, and obligate appraisers to deliver frequent feedback to teachers. Furthermore, teacher evaluation systems must be aligned to performance standards that are linked to differentiated professional development opportunities (Culbertson, 2012; Darling-Hammond, 2012; Mead, Rotherham, & Brown, 2012; National Council on Teacher Quality [NCTQ], 2012; Weisberg et al., 2009).

Measures of Student Progress

Designing an evaluation system that meets these specifications represents no small task. Traditionally, evaluation measures have relied almost exclusively on observation protocols and a handful of other measures, such as teacher portfolios, professional goals, and, at times, evaluations of teacher personality traits, professionalism, or contributions to the school (Holdheide, Goe, Croft, & Reschly, 2010; Mead et al., 2012). In the early 2000s, the existence of extensive statewide databases of standardized test results for thousands of students and the accompanying technology led to an increase in the use of value-added measures. These measures are calculated by using prior student performance to predict academic outcomes. A comparison of actual student outcomes to predicted student outcomes is used to determine the value-added score. Based on the belief that value-added measures can objectively differentiate among teachers’ impact on student outcomes, reformers began to explore the possibilities and implications of value-added measures of student growth (Carey, 2004; Glazerman et al., 2010; Mead et al., 2012; Rivkin, 2007). In turn, they suggested that there might be advantages to incorporating these measures into teacher evaluation systems.

The drive to incorporate value-added scores as a measure of teacher effectiveness accelerated greatly as a result of financial incentives provided through Race to the Top (RTT) initiatives and applications (U.S. Department of Education, 2009) which required states to “design and implement rigorous, transparent, and fair evaluation systems for teachers … that differentiate effectiveness using multiple rating categories and that take into account data on student growth as a significant factor” (p. 34). Likewise, A Blueprint for Reform (U.S. Department of Education, 2010) offered grant funds that incentivized states and school districts to implement reforms that would identify top performing teachers “based in significant part on student growth” (p. 1).

Based on a comprehensive review of states’ progress toward implementing new teacher evaluation systems (NCTQ, 2012), 30 states require documented evidence of student learning as a component of teacher assessment. Of those 30 states, 20 require that student achievement be a significant factor—or the most significant factor—in evaluating teacher performance. The NCTQ recommended that all states adopt teacher evaluation systems in which evidence of student gains is the most significant criterion in determining the teacher’s performance rating and delineated the databases and systems needed to support the use of student performance measures.

Teacher Effectiveness Models

Teacher evaluation systems have moved into the forefront of educational reform with a lively discussion centered on the question of how to develop fair and reliable measures of effective teaching. The Bill and Melinda Gates Foundation sought to study this question by funding the Measures of Effective Teaching (MET) Project, a comprehensive proposal to “establish which teaching practices, skills, and knowledge positively impact student learning” (Bill & Melinda Gates Foundation, 2010, p. 1). The intent of the project was to consider the full range of responsibilities and contexts in which teachers do their work by collecting information from more than 3,000 teachers over a 2-year period.

The research team reviewed multiple data sources on teacher performance: (a) video-based classroom observations, (b) evaluations of teachers’ content knowledge and their ability to recognize student misunderstandings, (c) student achievement data, (d) student survey data, and (e) the teachers’ own perceptions of the school-based support they receive. They found that a relationship between effective instructional practices and observation rubrics exists when the observation protocols are used with fidelity and are accompanied by careful training and norming procedures. In addition, a combination of measures can increase reliability in identifying effective teachers. For example, linking teacher observation ratings obtained from a series of classroom visits to value-added scores and student perceptions resulted in a more reliable measure of effectiveness than using a single measure. Finally, the research team concluded that an evaluation system does not reach its true potential unless it is used as a tool to support teachers in their professional growth and development (Bill & Melinda Gates Foundation, 2010).

Failure to Consider Special Education Teachers in Design of Teacher Effectiveness Models

The Widget Effect (Weisberg et al., 2009) and the MET Project (Bill & Melinda Gates Foundation, 2010) have provided insight into the inadequacies of present teacher evaluation systems, as well as the possibilities of implementing more comprehensive and meaningful processes to assess teacher quality. Yet, reference to the words special education is glaringly absent from their reports. Applications for RTT funds make no distinction between regular and special education teachers. The MET Project’s comprehensive review, which aimed to “provide a new knowledge base for practitioners and policymakers” (Bill & Melinda Gates Foundation, 2010, p. 4), made no mention whatsoever of teachers who serve students with disabilities. Likewise, the NCTQ (2011) published a detailed report on the progress states are making toward the implementation of performance-based teacher evaluation systems, but made only a brief mention of the challenges related to designing measures of student growth for special education teachers and others for whom standardized achievement data are not available.

Prevailing leadership practice also reflects ambivalence toward the distinctions between the professional roles of general and special education teachers. In 2010, the National Comprehensive Center for Teacher Quality partnered with the CEC to survey state and district-level special education directors with a threefold purpose: (a) to ascertain practices used to evaluate teachers of at-risk student populations, (b) to identify difficulties associated with developing accurate measures of student outcomes for these student groups, and (c) to provide examples of promising practices. Based on the findings of 1,100 completed surveys, the majority of respondents (71.9%) indicated that their states or districts did not allow the use of a different or modified evaluation system tailored specifically for special education teachers. Yet, half (49.9%) of the survey respondents stated that they did not believe that special education teachers and general education teachers should be evaluated with the same system (Holdheide et al., 2010).

In 2009 and again in 2011, the CEC (2012c) met to focus on the issue of teacher evaluation to make recommendations for systems that would evaluate special education teachers. Their report described the nations’ current policy and practice metaphorically, as “a patchwork of approaches” in which “all states and local districts are grappling with how to measure student growth, especially for students with disabilities” (CEC, 2012c, p. 2). Emerging teacher evaluation methods are marked by a common expectation that teacher effectiveness should be linked with measures of student progress, yet the CEC expressed deep concern regarding the absence of proven methods for using student progress measures to evaluate teachers of students with disabilities. In addition, they acknowledged that there is little agreement or research to suggest how to best evaluate these teachers and noted that this issue has gone largely unnoticed by most states and districts involved in developing new assessment systems.

Until now, insufficient attention has been devoted to teacher evaluation systems and the challenge of linking teacher behaviors to academic gains for students with disabilities. In terms of special education and teacher quality, most research has focused on preservice preparation, certification, and content knowledge (Boe, Shin, & Cook, 2007; Brownell et al., 2009; Nougaret, Scruggs, & Mastropieri, 2005; Sindelar, Daunic, & Rennells, 2004). To provide an adequate and equitable evaluation system that documents and accounts for the various duties, responsibilities, and functions that special education teachers perform throughout the year, we ask policy makers and state and district-level decision makers to consider the following: (a) How well do current reform models for measuring teacher effectiveness account for the responsibilities and challenges faced by special educators? (b) To what extent should new systems for teacher evaluation address the differences between general education teachers and special education teachers?

The purpose of this article is to consider the challenges related to the design and implementation of teacher evaluation systems for special education teachers. More specifically, to meet the promise of identifying highly effective teachers as well as supporting special education teachers in their effort to improve instructional practice and increase student achievement, we contend that the call for new and innovative teacher evaluation must account for the variation that exists between and among general and special education professionals. We compare and contrast the roles and responsibilities of special education teachers to those of the general education teachers; articulate the advantages and challenges of various approaches used to measure teacher effectiveness, including observation protocols and measures of student progress; and attend to the difficulties of implementing these measures of effectiveness in evaluating special education teachers. Recommendations for teacher evaluation systems for special education teachers that adhere to the same high standards being proposed for evaluating general education teachers are provided, followed by suggestions for policy makers to consider regarding teacher evaluations.

Applying Measures of Teacher Effectiveness to Regular and Special Education

Teacher roles and responsibilities

The metrics used to develop fair and reliable systems to evaluate teacher effectiveness must be grounded in a clear understanding of the professional roles and responsibilities teachers are expected to perform within their designated position. How, then, do the roles and responsibilities of the regular education and special education teacher compare? To what extent are these differences reflected in the teacher evaluation system? To what extent should they be reflected? Responsibilities that special education teachers are typically asked to assume may be similar, yet, at the same time, they are distinctly different from those of general education teachers. For instance, special education teachers must (a) collaborate between general education teachers and other special education service providers; (b) engage in regular and ongoing communication with parents, beyond what is expected of general education teachers; (c) develop and oversee the implementation of a student’s individualized education program (IEP); (d) demonstrate knowledge of special education laws and policies; and (e) guide and supervise paraprofessionals.

Another important difference between special education teachers and regular education teachers is in the variation that exists among teacher preparation programs and the demonstrated knowledge, skills, and dispositions that are “necessary to demonstrate positive impact on all P-12 students’ learning” (Council for the Accreditation of Educator Preparation [CAEP], 2013, p. 5), including the education of students with disabilities (National Academy of Sciences, 2010). Generally, every preservice teacher is expected to graduate with a “deep understanding of the critical concepts and principles of their discipline” (CAEP, 2013, p. 5), with the ability to “use discipline-specific practices flexibly to advance the learning of all students toward the attainment of college and career-readiness standards” (p. 5). Nevertheless, special education teachers are expected to possess expertise in the distinct characteristics of various disability categories as well as the ways in which a particular student’s disability may manifest in different situations. Additional expectations for special education teachers include the acquisition of knowledge and skills relevant to the following: (a) provide individualized instruction for students with disabilities, (b) teach appropriate social skills, (c) manage difficult behaviors, (d) provide personal care, and (e) demonstrate sensitivity to the challenges that students with disabilities may face (CEC, 2012a, 2012b).

Despite similarities that may exist in regard to instructional practices of general education and special education teachers, there are times when a differentiated instructional delivery model must be provided to meet the individual needs of students with disabilities. According to the CEC (2012c), the degree to which the actual instructional practices differ for special education teachers has received some attention; nevertheless, the widely varying needs of students with disabilities has made it difficult to make generalizations about the specific instructional practices best suited to meet their needs.

In addition to the distinct responsibilities and expertise required of special educators, many take on a variety of roles at the school campus. For example, some teachers work with small groups, others serve as case managers, and numerous teachers provide instruction in the general classroom, using a coteach model. In other contexts, they are assigned as content mastery teachers, resource teachers, or self-contained teachers. Many special education teachers perform more than one role in the same day and the expectation for collaboration among regular education and special education teachers continues to increase. Moreover, the implementation of response to intervention models has increased the frequency and complexity of the combined efforts of regular and special education teachers to meet students’ needs (Simonsen et al., 2010). Implementing an evaluation system that accounts for the vast array of expanding responsibilities and adequately reflects the importance of the various roles and functions special education teachers perform can be especially difficult. This task is further complicated by the fact that special education teachers often share responsibilities in providing instruction and coordinating support services for students with disabilities (Burdette, 2011a, 2011b; Holdheide et al., 2010; Quigney, 2010). With this specific challenge in mind, Blanton, Sindelar, and Correa (2006) summarized these complexities by saying, “[T]he relationship between special education teacher quality and student outcomes is unclear and potentially tenuous” (p. 117).

The Use of Observation Protocols in Teacher Evaluation

Standard observation protocols

In general, observation protocols are the most common approach to teacher evaluations. In a survey of state and local special education directors, Holdheide et al. (2010) found that 94% of local districts included teacher observations as part of the evaluation process. They also noted, however, that observation protocols are often unreliable predictors of teacher quality, as there is variability in how each appraiser interprets the instrument, the instrument may not be aligned to best practices, and the evaluator may not implement the evaluation process and instrument with fidelity. While 85% of the respondents indicated that they used the same observation protocol for all teachers, more than half (56%) reported that they modified the observation protocol to reflect the unique role and specialized skill of the special educator. Only 12% of the respondents had access to a different observation protocol for special education teachers and in most cases, these protocols were applied only to teachers of students with low incidence disabilities. These results suggest that many appraisers believe the standard observation protocols being used to evaluate special education teachers do not provide a true representation of the diverse roles and functions these teachers perform. The protocols are not tailored to the instructional setting or learning environment and culture that special education teachers help to create; thus, they are not suited to capture the nuances of instructing students with disabilities. When the evaluators modify the protocols, however, the standards from which the evaluation protocols are derived may be applied in an unsystematic and subjective manner—thus, negatively affecting the accuracy of teacher evaluations.

Evaluator knowledge of special education

The accuracy of teacher evaluations depends greatly on the evaluators’ instructional expertise, which may vary widely with regard to special education. Regardless of whether the observation protocol provides metrics that are tailored to represent the specific knowledge and skills of special education teachers, some evaluators lack a knowledge base regarding special education practices and may not possess the instructional expertise needed to accurately assess the special education teacher’s performance. It is not unusual for the special education teacher to have greater knowledge than the school administrator about learner characteristics that may be linked to a particular disability as well as evidence-based practices that are recommended for specific students who are receiving services. Such a lack of knowledge may lessen the teacher’s perception of credibility related to the principals’ ability to provide a comprehensive and accurate evaluation of teacher performance. Survey results of state and local special education directors revealed that only 12% of respondents had received training on how to implement the evaluation system when assessing special education teachers. Most (77%) believed that assessors should have training specific to evaluating special education teachers; yet, in reality, practices seldom reflect this expectation (Holdheide et al., 2010). Sadly enough, similar concerns have been raised for more than 30 years, including those related to the frequency of principal observations and the absence of meaningful feedback, as well as the principal’s lack of knowledge regarding special education programs and unique student needs (Frudden & Manatt, 1986; Katims & Henderson, 1990; Moya & Gay, 1982; Sweeney & Twedt, 1993).

Importance of classroom observations

Nevertheless, classroom observations provide rich detail and description of the teaching process that is inherently interactive and complex. As an evaluation method, the use of classroom observations makes it possible to capture the essence of classroom learning experiences and to provide insight into the nuances of the exchanges between teachers and their students. The large- and small-scale research projects have concluded that it is possible for classroom observation data to be linked to student outcomes, when given appropriate conditions (Bill & Melinda Gates Foundation, 2010; Jacob & Lefgren, 2008; Pianta & Hamre, 2009; Sindelar, Espin, Smith, & Harriman, 1990).

After analyzing the results of standard observations carried out in approximately 2,500 classrooms and reviewing related literature, Pianta and Hamre (2009) determined that teaching behaviors can be accurately assessed and analyzed to identify sources of error, can be valid predictors of positive student outcomes, and can be improved when teachers are provided support and exposure to best practices. These results held consistent across investigators, teachers, and student samples, which varied by grade, socioeconomic status, and geographic location. Furthermore, Jacob and Lefgren (2008) found convincing evidence that principals were able to recognize good teaching through classroom observation and were able to accurately identify teachers whose students demonstrated the largest and smallest achievement gains. Even though results indicated that principals were less accurate in making distinctions between teachers whose student gains were in the middle of the distribution, the findings were compelling enough to recommend that policy makers include principal observations in personnel decisions.

Educators, too, agree on the benefits of using a well-crafted observation protocol. Teachers and administrators reported they were able to develop a common language and shared understanding of effective teaching when quality protocols were used to capture essential teaching behaviors during a series of classroom observations, followed by in-depth teacher conferences. Moreover, they were able to calibrate their expectations to improve practice (Culbertson, 2012; Darling-Hammond, Amrein-Beardsley, Haertel, & Rothstein, 2012). In summary, research has provided evidence that protocols can be used successfully to identify effective teaching practices, can be linked to student achievement, and can be used to support meaningful dialogue between administrators and teachers. The next section examines the challenges of incorporating measures of student progress in the teacher assessment process.

The Use of Student Performance Measures in Teacher Evaluation

Reform efforts aimed at developing more effective teacher evaluation systems have undergone a shift in perspective, wherein an effective teacher is evaluated not only on inputs (i.e., certification, degrees, and instructional practice) but also upon student outputs (i.e., achievement measures). The most common method recommended for emerging teacher evaluation systems is the value-added model. The proponents of value-added models appreciate the advantage of quantifying student growth rather than student achievement, a distinction that allows for equitable comparisons to be made among teachers regardless of the student populations they serve. In theory, therefore, value-added models serve as an advantage to teachers of at-risk populations as they provide additional data that would be difficult to obtain when relying only on achievement data (Ahearn, 2009; Buzick & Laitusis, 2010).

Criticisms of Value-Added

Despite their purported advantages, value-added measures are not without their critics. Growth models have been slow to gain credibility among practitioners for a number of reasons, including (a) their reliance on standardized test scores, (b) the variability in teacher scores from year to year, (c) the difficulty in understanding the mathematical model, (d) the challenges of applying large-scale measurement to individual teachers and students, and (e) the use of value-added measures for high-stakes decisions (Buzick & Laitusis, 2010; Corcoran, 2010; Darling-Hammond et al., 2012; Mead et al., 2012; Quigney, 2010).

Inadequate data collection and calculations

The use of value-added measures has been hampered by the inadequacies of the data collection systems used for making calculations. The Council of Chief State School Officers (CCSSO, 2012) noted inconsistencies in the way that data are gathered, the quality of the data, methods for analyzing data sets, and the absence of common data definitions and indicators. Rothstein (2010) concluded that value-added calculations fail to adequately distinguish between the short-term and long-term influences of a teacher. In addition, disparities were noted between various value-added models. As such, some value-added measures are calculated using a more robust history of student assessment data. These disparities can influence individual teacher scores so that personnel decisions based on value-added scores may inadvertently punish effective teachers or reward ineffective teachers. In effect, the use of value-added measures is still in its infancy and there are critical challenges in the implementation of value-added models if they are to be used for teacher evaluation. These shortcomings are further exacerbated when attempts are made to measure the academic progress of students with disabilities using value-added models.

Incomplete data sets

Due to a lack of consistent and complete data sets, the ability to calculate value-added measures for students with disabilities has been difficult (Ahearn, 2009; Brownell et al., 2012; Buzick & Laitusis, 2010; CEC, 2012c; Corcoran, 2010; Feng & Sass, 2010; Holdheide et al., 2010; Quigney, 2010). Value-added scores are derived by matching standardized scores with student performance from one year to the next. According to district and state special education directors, only 41% of students identified for special education services participated in standardized testing (Holdheide et al., 2010). Given that students with disabilities may take a different version of the test from year to year or may be exempted from testing during a given year, their test data is often incomplete. Thus, making such a linkage presents a difficult challenge for educators. Value-added systems depend on complete and consistent data. Not all systems account for the inconsistencies in the same way, which may affect the accuracy of the value-added score or may preclude the possibility of calculating a value-added score for some teachers.

Small sample sizes

Another challenge is related to the sample size needed to create reliable value-added models. In many situations, the total number of students with disabilities does not yield a large enough sample to provide the same kinds of statistical predictions that are made for students without disabilities. The relatively small number of students with disabilities, especially those with low incidence disabilities, makes the analysis more difficult and less trustworthy. The low number of student test scores in the database is further decreased as student results are disaggregated by grade level and type of assessment. The unique student assessment systems of each individual state preclude the possibility that states might combine data sets across multiple states. In addition, statisticians have noted that student descriptors often change over time (i.e., disability classifications sometimes change, as does the student’s determination as being eligible for special education services and/or placement in general or special education classes). Furthermore, statewide databases do not always clearly identify teachers as special education or general education teachers (Buzick & Laitusis, 2010; Feng & Sass, 2010; Holdheide et al., 2010).

Inaccurate calculations of value-added scores

Value-added scores are based on a projected growth model of student achievement. The mathematical formulas that form the basis for value-added projections, or predicted growth scores, rely on careful analysis of student population trends. Students with disabilities typically score lower than the general population of students on standardized assessments. Statisticians warn that the value-added scores are more difficult to predict for students who score very high or very low on the distribution of results, leaving unanswered questions about the validity of comparing value-added scores from various points in their distribution. Some researchers have raised the question of whether a 10-point gain near the middle of the distribution is equal to a 10-point gain at the higher or lower end of the bell curve (Buzick & Laitusis, 2010; Feng & Sass, 2010; Holdheide et al., 2010).

Variations in testing conditions

Another factor to consider is the testing conditions for students with disabilities. At times, conditions vary depending on the accommodations that each student is allowed. Accommodations vary by student, subject, type, and number, and can vary from year to year. Variations occur because of changes in the students’ IEPs, changes in state policy, limits on available resources, inconsistency in the implementation of accommodations, and changes in the teachers’ ability to select and implement appropriate accommodations (Ahearn, 2009; Buzick & Laitusis, 2010; Holdheide et al., 2010). It is unclear how the changes in accommodations from year to year may affect student results and value-added scores.

Alternative assessments for students with severe disabilities

Students who exhibit severe cognitive disabilities are usually administered an alternative assessment that is highly individualized. Results derived from students who are evaluated using an alternative assessment are not currently included in value-added models. At present, value-added systems do not have the capability to combine scores from various types of test to measure student growth (Ahearn, 2009; Buzick & Laitusis, 2010; Holdheide et al., 2010; Quigney, 2010). With this in mind, Ahearn (2009) explicitly stated the following:

The psychometric barriers to adding students who take an alternative achievement standards assessment to calculations that are designed for large group assessment results are significant and attempts to make them fit into the schema now available under growth models hold little promise for yielding meaningful information about the academic development of these students. (p. 10)

Thus, alternative assessment results are not compatible with value-added models or other measures typically used to assess student progress on a large scale.

Difficulty assigning teachers to student scores

Many students with disabilities receive instruction in the same subject from more than one teacher. In some cases, this takes place in the same classroom through a coteach model and at other times, a student receives instruction in the same subject from two different teachers during two different class periods, with one teacher being a general education teacher and the other being a special education teacher. Measuring each teacher’s contribution to the student’s academic growth has proven to be a difficult dilemma (Blanton et al., 2006; Brownell et al., 2012; Burdette, 2011a, 2011b; CEC, 2012c; Feng & Sass, 2010; Holdheide et al., 2010; Mead et al., 2012; Quigney, 2010).

Battelle for Kids, a national not-for-profit organization that provides value-added measures for local and state agencies, utilizes a system that links individual teachers with students by asking teachers who share responsibilities for the same student to collaboratively determine a percentage that represents each teacher’s contribution to the student’s learning. The system encourages teachers to estimate the percent of time they each spent with the student. They stress that teacher dialogue is fundamental to making the determination and fosters a deeper understanding of their shared responsibilities (Holdheide et al., 2010). Nevertheless, inherent difficulties remain when making these judgments and linking teachers with students. The amount of time each educator spends with a student may be quantified, but measuring the quality indicators can be difficult. For example, does the model assume that both educators are contributing the same level of instruction? What factors outside the teacher’s control might affect student learning? These concerns have led some leaders within the field of special education to conclude that the ability to determine contributions made by individual teachers may be “nearly impossible” (Brownell et al., 2012, p. 274). Others, however, are continuing to search for solutions. The CCSSO (2012) recommended that statewide databases develop systems to create unique educator identifiers that link teachers to their students, and include systems for identifying the impact of a teaching team.

Challenges in differentiating teacher impact from campus impact

Separating the effects of school-based decisions, policies, and culture from the individual contribution of the teacher is a troubling aspect of the value-added model (Corcoran, 2010; Darling-Hammond et al., 2012; Feng & Sass, 2010; Holdheide et al., 2010; Mathis, 2012). Feng and Sass (2010) drew attention to the impact of nondisabled peers on the academic gains of students with disabilities with this possible scenario: At School A, students with disabilities are routinely assigned to general education classes and receive instruction in a coteach model. At School B, the majority of students with disabilities spend a large portion of their day in self-contained classes with few opportunities to learn with their nondisabled peers. Educational research would predict that students who spend more time in inclusive classes will typically outperform their peers who are assigned to self-contained classrooms, and hypothesize that students at School A will be more likely to demonstrate greater student achievement than those at School B. Therefore, through the utilization of a value-added model, the teachers at the two schools would be held to the same measure of accountability when, in fact, they were affected by a decision-making process outside their control. Darling-Hammond et al. (2012) noted similar concerns with regard to factors that influence student achievement that are outside the teacher’s control such as (a) class size, (b) curriculum, (c) student attendance, (d) peer influences, (e) previous teachers, and (f) the type of student assessments used to obtain the scores.

Certification and Teacher Preparation

Recently, policy makers have begun to question the long-held belief that traditional preparation programs and professional teaching credentials, such as teacher certification and degrees in the field of education are valid markers of teacher quality. After reviewing the student outcomes for more than 10,000 teachers hired in the New York City Department of Education, Kane and Rockoff (2007) found no difference in math achievement among students who were assigned to teachers that were traditionally certified, those who were certified through an alternative program, and those who were not certified.

On the other hand, current research in the field of special education appears to indicate that traditional teacher preparation, including certification, is linked to teacher effectiveness. In a recent investigation to measure the relationship between teacher preparation and academic gains, Feng and Sass (2010) made use of value-added models to study student achievement data over a 5-year period. Their findings revealed that teachers who completed postbaccalaureate studies were more effective in increasing math achievement for students with disabilities. In addition, they reported that student achievement gains were positively related to the following teacher experiences: (a) preservice training, (b) special education course hours, (c) a degree in special education, or (d) certification in special education. At a time when general education programs are questioning the value of traditional markers of teacher qualifications such as certification and degrees earned, these indicators for special educators may be worthy of consideration in the teacher evaluation process.

Expanding Our Thinking

Closing the achievement gap for students with disabilities is critically important. Ensuring that teachers who serve students with disabilities are highly effective serves as a key component of reform and innovation with the intent to improve academic outcomes for these at-risk students. The unique responsibilities and challenges of special education teachers call for a careful review of the practices regarding teacher evaluation. Like policies for general education teachers, evaluations for special education teachers must incorporate multiple measures of teacher effectiveness (Blanton et al., 2006; Brownell et al., 2012; CEC, 2012c; Holdheide et al., 2010). Standards must be valid measures of teacher effectiveness that differentiate teachers’ roles and responsibilities, provide teachers with meaningful feedback, support teachers in continued professional growth, and balance the need for rigor and practicality. In addition, they must be capable of identifying teachers whose students demonstrate academic gains and do so in a way that is fair and credible.

Several research teams have approached the question of how to best evaluate special educators. Blanton et al. (2006) assessed various methods for evaluating beginning special education teachers, taking into account three measures of effectiveness: (a) classroom observation protocols; (b) evaluations of teacher competencies, knowledge, and skills; and (c) teachers’ self-reports of their background and experiences (i.e., certification, years in teaching, etc.). Each of these measures was analyzed in terms of its utility, credibility, comprehensiveness, generality, soundness, and practicality. These researchers recommended that teacher evaluation systems make use of multiple measures of effectiveness, acknowledging that the usefulness of a particular model depends on the specific purpose and context in which it is implemented. They also communicated the need to link measures of student progress to teacher quality and to educate policy makers with regard to the complexities of the special education context. Finally, they warned against the temptation to impose standard solutions on distinct problems.

After analyzing extensive survey results from practitioners across the country, Holdheide et al. (2010) described examples of promising practices and offered a number of suggestions for designing an effective teacher evaluation system for special education teachers: (a) begin with a common framework that defines effective teaching and includes differentiated criteria, where appropriate, for special education teachers; (b) include evidence-based practices; (c) make use of standardized assessment data and other evidence of student outcomes; and (d) align the evaluation framework to professional development opportunities that are likely to result in improved practice.

To be meaningful and effective, these broad-based recommendations must be applied to the task of implementing effective teacher evaluation systems for special education teachers. In addition, they must address the need to tailor or modify evaluation protocols—including classroom observation criteria and the method to observe and provide feedback to teachers—and identify how to implement measures of student progress.

Improving Observation Protocols

Several recommendations have been proposed for improving the quality and consistency of observation protocols for evaluating special education teachers. One recommendation is to replace or modify the observation protocol with a rubric that is explicitly designed with clear expectations and performance criteria for special education teachers (Holdheide et al., 2010). Another recommendation is to provide training for assessors to guide school leaders in developing the expertise they need to accurately assess teacher effectiveness and provide meaningful feedback to teachers of students with disabilities (CEC, 2012c; Holdheide et al., 2010). A third recommendation is that teacher-to-teacher observations be incorporated into the evaluation process. For example, some districts are experimenting with models that incorporate peer evaluations whereby master teachers serve as a second appraiser. They observe the teacher and, afterward, collaborate with the school leader to develop the summative evaluation and design relevant professional development (Culbertson, 2012; Holdheide et al., 2010; SRI International, 2011). Similarly, a special education administrator might partner with the principal in completing teacher evaluations (Frudden & Manatt, 1986).

These suggestions have the potential to improve the accuracy and consistency of teacher evaluations, provided that evaluators are given appropriate training and support in using observation protocols. Implementing these recommendations would not be excessively complicated and would most likely be perceived by teachers as credible. Moreover, these approaches could strengthen the evaluators’ understanding of evidence-based practices related to special education and lead to meaningful professional collaboration.

Incorporating Measures of Student Progress

How can measures of student progress be incorporated effectively into systems for evaluating special educators? Clearly, student outcomes matter, yet the feasibility of applying the value-added model universally to measure academic gains for students with disabilities remains uncertain, at best. Systems based on growth data rather than achievement data are essential, but they must take into account the unique and individualized nature of the teaching and learning needs of students with disabilities. The value-added model does not appear to be well suited for this purpose and it seems unlikely that a single data source could effectively measure student progress, especially when one considers the wide range of performance levels among students with disabilities.

Several other types of data sets have been offered as possible solutions to the special education data dilemma. Holdheide et al. (2010) highlighted several school districts that use student-learning objectives as a basis for measuring student growth through the use of a criterion-referenced assessment or a curriculum-based evaluation. Support for this approach was noted by the survey respondents: Sixty percent agreed that achievement gains would be an acceptable component of teacher evaluation. A number of educators (73%) reported that they would support using data related to a student’s progress toward IEP goals as a measure of student outcomes. Another approach to measuring teacher effectiveness makes use of professional development goals, an alternative already being incorporated into many state (56%) and district (62%) evaluation systems.

These recommendations account for the exceptional context of special education and could be implemented with relative ease. However, they also present several difficulties. For example, teachers’ skill in writing and implementing effective IEPs varies greatly and the use of this measure as a means to evaluate teachers could be subjective and lack sufficient rigor. Likewise, evaluations based on professional goals will be ineffective unless they, too, are sufficiently challenging and are linked to substantial student outcomes.

These shortcomings point to the fact that data used to determine teacher effectiveness must be credible to teachers and hold up to public scrutiny. It is not clear that student IEPs and teacher performance goals would meet the public’s expectation of rigor. Moreover, at present, there is no research base to verify that the successful completion of these performance goals is linked to significant gains in student achievement.

Policy Recommendations

How, then, do we develop and implement systems that support and identify special education teachers who are highly effective? How can we be assured that the systems we create are accurate and equitable? The drive to implement more effective processes for differentiating teacher quality will be incomplete without careful considerations for special education teachers and the students they serve. We suggest that policy makers be guided by these considerations:

Continue to pursue the development of observational protocols that are designed or modified to include clear expectations and performance criteria aligned to the delivery of instruction for students with disabilities.

Consider innovative ways to include peers and other trained personnel in collaborating with building administrators to coach and evaluate teachers.

Engage in dialogue with practitioners to identify data sets that provide evidence of academic gains for students with disabilities, studying the ways in which local and state education agencies are demonstrating their ability to close the achievement gap for students with disabilities. How is data informing their practice and what can we learn from them?

Continue to consider teacher certification, teacher preparation, and advanced degrees in the area of special education as important components for identifying highly qualified educators and support programs that facilitate access to these credentials.

Insist that teachers and school-based administrators are involved in the development of teacher evaluation systems for special education teachers so that the complexities of their responsibilities are reflected in all components of the teacher evaluation.

Recognize that teacher evaluation systems require a considerable investment of resources, not only to develop the assessment instrument and protocols but also to train and support evaluators in their use. Ongoing professional development must be provided to ensure consistency in the application of evaluation criteria and the delivery of effective feedback.

Professional learning opportunities must be relevant, accessible, and aligned to teacher competencies. Policy makers must work with urgency yet recognize that capacity building takes an investment of time and commitment.

We cannot afford to wait. Our students need excellent teachers. The design and implementation of teacher evaluation systems to identify excellent teachers is well underway. Unless we act now, however, we run the risk that others will design systems that are ill suited to adequately address the unique roles and responsibilities assigned to special education teachers and, subsequently, overlook the importance of addressing the distinct needs of the students they serve. We must move quickly toward the forefront to advance our concerns and provide a means to create a teacher evaluation process that leads toward the meaningful and accurate assessment of all teachers.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Author Biographies

Ann Sledge is the Head of Schools for Student Academic Services in KIPP Houston Public Schools. She previously served as a teacher, principal, school improvement officer, and assistant superintendent in the Houston Independent School District. She is also a doctoral student in the Cooperative Superintendency Program at The University of Texas at Austin.

Barbara L. Pazey is an Assistant Professor at The University of Texas at Austin in the Departments of Special Education and Educational Administration. Her research focuses on the training and development of socially just administrators and teachers in general and special education, urban education, the impact of turnaround schools on diverse student populations, 21st century skills, and the empowerment of student voice.

References

Ahearn

(2009). Growth models and students with disabilities: Report of state interviews. Retrieved from http://www.projectforum.org/docs/GrowthModelsandStudentswithDisabilities-ReportofStateInterviews.pdf

Bill & Melinda Gates Foundation. (2010). MET Project: Working with teachers to develop fair and reliable measures of effective teaching. Retrieved from http://www.metproject.org/downloads/met-framing-paper.pdf

Blanton

L. P.

Sindelar

P. T.

Correa

V. I.

(2006). Models and measures of beginning teacher quality. Journal of Special Education, 40, 115-127.

Boe

E. E.

Shin

Cook

L. H.

(2007). Does teacher preparation matter for beginning teachers in either special or general education? Journal of Special Education, 41, 158-170.

Brownell

M. T.

Billingsley

B. S.

McLeskey

Sindelar

P. T.

(2012). Teacher quality and effectiveness in an era of accountability: Challenges and solutions in special education? In Crockett

J. B.

Billingsley

B. S.

Boscardin

M. L.

(Eds.), Handbook of leadership and administration for special education (pp. 260-280). New York, NY: Routledge Press.

Brownell

M. T.

Bishop

A. G.

Gersten

Klinger

J. K.

Penfield

R. D.

Dimino

Sindelar

P. T.

(2009). The role of domain expertise in beginning special education teacher quality. Exceptional Children, 75, 391-411.

Burdette

(2011a, April). Performance-based compensation: Focus on special education teachers. Retrieved from http://projectforum.org/docs/Performance-basedCompensation-FocusOnSpecialEducationTeachers-final.pdf

Burdette

(2011b, November). Special education value-added performance evaluation systems: A state-level focus. Retrieved from http://projectforum.org/docs/StateEducatorValue-AddedPerformanceEvaluationSystems-AState-levelFocus.pdf

Buzick

H. M.

Laitusis

(2010). Using growth for accountability: Measurement challenges for students with disabilities and recommendations for research. Educational Researcher, 39, 537-544.

10.

Carey

(2004). The real value of teachers: Using new information about teacher effectiveness to close the achievement gap. Thinking K-16, 8(1). Retrieved from http://www.edtrust.org/sites/edtrust.org/files/publications/files/Spring04_0.pdf

11.

Corcoran

S. P.

(2010). Can teachers be evaluated by their students’ test scores? Should they be? The use of value-added measures of teacher effectiveness in policy and practice (Education Policy for Action Series). New York, NY: Annenberg Institute for School Reform at Brown University.

12.

Council for Exceptional Children. (2012a). CEC initial level special educator preparation standards. Retrieved from http://www.cec.sped.org/sitecore/shell/Controls/Rich%20Text%20Editor//~/media/Files/Standards/Professional%20Preparation%20Standards/Initial%20Preparation%20Standards%20with%20Elaborations.pdf

13.

Council for Exceptional Children. (2012b). CEC special education specialist advanced preparation standards. Retrieved from http://www.cec.sped.org/~/media/Files/Standards/Professional%20Preparation%20Standards/Advanced%20Preparation%20Standards%20with%20Elaborations.pdf

14.

Council for Exceptional Children. (2012c). The council on exceptional children’s position on special education teacher evaluation. Retrieved from http://sped.org/~/media/Files/Policy/CEC%20Professional%20Policies%20and%20Positions/Position_on_Special_Education_Teacher_Evaluation_Background.pdf

15.

Council for the Accreditation of Educator Preparation. (2013). CAEP commission on standards and performance reporting: Draft recommendations for the CAEP board. Retrieved from http://caepnet.files.wordpress.com/2013/02/draft_standards3.pdf

16.

Council of Chief State School Officers. (2012). Our responsibility, our promise: Transforming educator preparation and entry into the profession. Retrieved from http://www.ccsso.org/Documents/2012/Our%20Responsibility%20Our%20Promise_2012.pdf

17.

Culbertson

(2012). Putting the value in teacher evaluation. Kappan, 94(3), 14-18.

18.

Darling-Hammond

(2012). The right start: Creating a strong foundation for the teaching career. Kappan, 94(3), 8-13.

19.

Darling-Hammond

Amrein-Beardsley

Haertel

Rothstein

(2012). Evaluating teacher evaluation. Kappan, 93(6), 8-15.

20.

Feng

Sass

T. R.

(2010). What makes special education teachers special? Teacher training and achievement of students with disabilities (Working Paper No. 49). Retrieved from http://www.caldercenter.org/publications.cfm

21.

Frudden

S. J.

Manatt

R. P.

(1986). Performance evaluation of special education teachers: Is it different? Planning and Changing, 17, 216-223.

22.

Glazerman

Loeb

Goldhaber

Staiger

Raudenbush

Whitehurst

(2010). Evaluating teachers: The important role of value-added. Brookings Institution. Retrieved from http://www.brookings.edu/research/reports/2010/11/17-evaluating-teachers

23.

Hanushek

E. A.

Rivkin

S. G.

Kain

J. F.

(2005). Teachers, schools, and academic achievement. Econometrica, 73, 417-458.

24.

Holdheide

Browder

Warren

Buzick

Jones

(2012, January). Summary of using student growth to evaluate educators of students with disabilities: Issues, challenges, and next steps. Washington, DC: National Comprehensive Center for Teacher Quality. Retrieved from http://isbe.net/peac/pdf/using_student_growth_summary0112.pdf

25.

Holdheide

Goe

Croft

Reschly

D. J.

(2010). Challenges in evaluating special education teachers and English language learner specialists. Washington, DC: National Comprehensive Center for Teacher Quality.

26.

Jacob

B. A.

Lefgren

(2008). Can principals identify effective teachers? Evidence on subjective performance evaluation in education. Journal of Labor Economics, 26, 101-136. doi:10.1086/522974

27.

Kane

Rockoff

(2007). Photo finish: Certification doesn’t guarantee a winner. Education Next, 7, 60-67.

28.

Kane

Rockoff

Staiger

D. O.

(2008). What does certification tell us about teacher effectiveness? Evidence from New York City. Economics of Education Review, 27, 615-631.

29.

Katims

D. S.

Henderson

R. L.

(1990). Teacher evaluation in special education. NASSP Bulletin, 74(527), 47-52.

30.

Mathis

(2012). Research-based options for education policymaking. Retrieved from http://nepc.colorado.edu/publication/options

31.

McLaughlin

M. J.

Smith

A. F.

Wilkinson

T. G.

(2012). Challenges for leaders in the not-so-new era of standards. In Crockett

J. B.

Billingsley

B. S.

Boscardin

M. L.

(Eds.), Handbook of leadership and administration for special education (pp. 361-376). New York, NY: Routledge Press.

32.

Mead

Rotherham

Brown

(2012). The hangover: Thinking about the unintended consequences of the nation’s teacher evaluation binge (Special Report 2, Teacher Quality 2.0). Washington, DC: American Enterprise Institute.

33.

Moya

S. A.

Gay

(1982). Evaluation of special education teachers. Teacher Education and Special Education, 5(1), 37-41.

34.

National Academy of Sciences. (2010). Preparing teachers: Building evidence for sound policy. Retrieved from http://www.nap.edu/catalog.php?record_id=12882

35.

National Center for Education Statistics. (NCES). (2009). NAEP-2009 reading: Grade 4 national results. Retrieved from http://nationsreportcard.gov/reading_2009/nat_g4.asp?subtab_id=Tab_6&;tab_id=tab1#tabsContainer

36.

National Council on Teacher Quality. (2011). State of the states: Trends and early lessons on teacher evaluation and effectiveness policies. Retrieved from http://www.nctq.org/p/publications/docs/nctq_stateOfTheStates.pdf

37.

National Council on Teacher Quality. (2012). What teacher preparation programs teach about K-12 assessment: A review of coursework on K-12 assessment from a sample of teacher preparation programs. Retrieved from http://www.nctq.org/edschoolreports/assessment/report.jsp

38.

Nougaret

A. A.

Scruggs

T. E.

Mastropieri

M. A.

(2005). Does teacher education produce better special education teachers? Exceptional Children, 71(3), 217-229.

39.

Pianta

R. C.

Hamre

B. K.

(2009). Conceptualization, measurement, and improvement of classroom processes: Standardized observation can leverage capacity. Educational Researcher, 38, 109-119.

40.

Quigney

T. A.

(2010). ESEA’s proposed pay-for-performance option: Potential issues regarding the evaluation of special educators. Academic Leadership, 8(4), 1-8. Retrieved from http://contentcat.fhsu.edu/cdm/compoundobject/collection/p15732coll4/id/537

41.

Rivkin

S. G.

(2007). Value-added analysis and education policy (Brief No. 1). Retrieved from http://www.urban.org/publications/411577.html

42.

Rothstein

(2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics, 125, 175-214.

43.

Simonsen

Shaw

S. F.

Fagella-Luby

Sugai

Coyne

M. D.

Rhein

Alfano

(2010). A school-wide model for service delivery: Redefining special educators as interventionists. Remedial and Special Education, 31, 17-23.

44.

Sindelar

P. T.

Daunic

Rennells

M. S.

(2004). Comparisons of traditionally and alternatively trained teachers. Exceptionality, 12, 209-223.

45.

Sindelar

P. T.

Espin

C. A.

Smith

M. A.

Harriman

N. E.

(1990). A comparison of more and less effective special education teachers in elementary-level programs. Teacher Education and Special Education, 13, 9-16.

46.

SRI International (2011). The search for teacher effectiveness: A study of exemplary peer review programs. Retrieved from http://policyweb.sri.com/cep/projects/displayProject.jsp?Nick=PARPeer

47.

Sweeney

Twedt

(1993). A comparison of regular and special education teachers’ perceptions of the teacher evaluation process. Journal of Personnel Evaluation in Education, 7, 43-53.

48.

U.S. Department of Education. (2009). Race to the top application for initial funding. Washington, DC: Author. Retrieved from http://www2.ed.gov/programs/racetothetop/application.doc

49.

U.S. Department of Education. (2010). A blueprint for reform: The reauthorization of the elementary and secondary education act. Washington, DC: Author. Retrieved from http://www2.ed.gov/policy/elsec/leg/blueprint/blueprint.pdf

50.

Wagner

Newman

Cameto

Levine

Garza

(2006). An overview of findings from wave 2 of the National Longitudinal Transition Study-2 (NCSER 2006-3004). Menlo Park, CA: SRI International. Retrieved from http://policyweb.sri.com/cehs/publications/nlts2_report_2006_08.pdf

51.

Weisberg

Sexton

Mulhern

Keeling

(2009). The widget effect: Our national failure to acknowledge and act on differences in teacher effectiveness. New York, NY: The New Teacher Project. Retrieved from http://widgeteffect.org/downloads/TheWidgetEffect.pdf