Examining the Complexity of Assessment and Accountability in Teacher Education

Abstract

The theme of accountability currently permeates conversations about education at every level, including teacher education and professional development. In our Call for Manuscripts for this theme issue, we invited empirical or conceptual manuscripts addressing assessment and accountability in teacher education that would move the community forward in considering the topics both more precisely and with greater complexity. The range of suggested subtopics and questions within assessment and accountability in teacher education was broad to elicit a wide range of responses. In general, we asked, “Who is to be held accountable? For what? And by whom?” As we reviewed the many excellent submissions, one predominant response to these questions centered on value-added modeling (VAM) approaches to accountability. Although this is only one of several answers to the questions we asked, it is one that has important intended and unintended consequences for various stakeholders in teacher education, including beginning teachers, mentor teachers, administrators, teacher educators, higher education institutions, and policy makers.

Emergence of Value-Added Models for Teacher Education

The rise of interest in VAM for teacher education is related to the search for the definition of teacher quality that has emerged as a primary factor in determination of K-12 student performance, probably as a confluence of a number of events (see Knight, 2011). The Tennessee study (Sanders & Horn, 1998), a landmark study using random assignment of teachers and students to classrooms, firmly established the advantage for students of having a high-quality teacher over a number of years. For many years, educators had struggled with the seemingly intractable socioeconomic factors related to poor student performance. The notion of teacher quality appeared to be manipulable and constituted a viable approach for closing student achievement gaps. Not only do teachers make a difference but their effectiveness can potentially be assessed, rewarded, and improved through recruitment, incentives, and/or professional development. No Child Left Behind and Race to the Top funding and legislation served to popularize the VAM approach for determination of teacher quality.

Given the emphasis on teacher professional development as a contributor to teacher quality, the application of VAM approaches for teacher education accountability may have appeared to be an obvious connection to policy makers (see Floden, 2012 [this issue]). Despite the rather limited evidence of the impacts of teacher preparation programs on student achievement (Duckworth, Quinn, & Seligman, 2009), many policy makers concluded that the quality of teacher education preparation, as with individual teachers, can potentially be assessed, rewarded, and improved—or removed if necessary. However, just as VAM applied to determination of individual teacher quality resulted in a number of methodological, ethical, and other concerns, VAM used to determine the quality of teacher education programs carries similar concerns as well as others specific to teacher education.

Possibilities and Limitations of VAM for Teacher Education

In February 2012, Journal of Teacher Education (JTE) sponsored a Major Forum at the American Association of Colleges for Teacher Education (AACTE) Annual Meeting in Chicago that featured research using VAM for assessment of the quality of teacher education programs in several states and commentaries that focused participants on both the possibilities as well as the limitations of this approach. Participants agreed that the framing of VAM as a policy issue is important and that much is at stake as we consider the education of teachers and the education of young people in this country. Federal and state governments are investing a great deal of money in value-added assessment systems, and the consequences of policy implementation are complex and significant. As might be expected, issues related to feasibility, validity, and reliability of the various VAM approaches received considerable attention at the forum and are well represented in the articles in this volume.

However, as we consider the possibilities and limitations of VAM, it seems important to also consider what is missing. What alternatives haven’t we considered? Is there another way to approach the problem of assessing teachers and teacher education programs? As forum participants noted, we should not be making these decisions out of convenience. Instead, we need to ask what evidence we have that value-added assessment systems such as those described in this issue are the best approach. Our decisions about these matters should be based on a full and thoughtful consideration of all the evidence available to us.

If we are certain that value-added assessments are an appropriate approach, then it is important to ask whether student standardized test scores are the best way to determine the value a teacher adds to a child’s education. Reliability and validity are important constructs in test design and use and should not be ignored. A valid test measures what it purports to measure. Tests of reading should actually look like reading (face validity) and measure reading skill and ability (construct validity), and tests of mathematics should look like math and test mathematical understandings. If we follow this logic, should tests of teaching ability and effectiveness look like teaching and measure teaching performance, skills, and understandings? Can valid tests of reading and math administered to K-12 students actually measure how well their teachers are prepared to teach these skills and abilities? The selection of the dependent variable in studies of teacher education program quality raises a range of questions about the uses and potential misuses of tests, and the responsibility of educators, researchers, and policy makers to use data well and fully according to intent.

Even if we determine that student standardized test scores provide an adequate measure of teacher education program quality, we are still left with ethical and moral concerns about the unintended consequences of the use of VAM in teacher education. What is the toll of the public posting of value-added evaluations of teacher education programs on teacher educators and their students and on the relationships between teacher educators, mentor teachers, and preservice teachers within and outside the program? (See Henry, Kershaw, Zulli, & Smith, 2012 [this issue], for a discussion of reporting decisions.) Will teacher education institutions—either overtly or subtlely—guide their students away from employment at schools that experience problems with test scores, making these schools even more difficult to staff? Will the focus on standardized test scores of students as indicators of program quality limit teacher education curriculum to only those aspects that are measured by the tests? If inservice teachers are concerned that mentoring preservice teachers will affect their students’ test scores and ultimately their own evaluations, could this limit the opportunities for mentored field experiences so critical to the regeneration of the profession—resulting in a downward spiraling of teacher education program quality overall? These questions, and many others, provide opportunities for future research related to the use of VAM in teacher education.

Overview of Features and Articles

This issue is divided into two parts. The first part represents the theme discussed previously and features research using VAM from three different contexts. The statistics associated with these models are quite complex and we have decided to place the technical information for each article in an appendix that is accessible online through a link in the abstract. The first article, by Gansle, Noell, and Burns, represents a state that is probably the most advanced in application of VAM for determination of teacher education program effectiveness. Louisiana has been engaged for over a decade in integration of the needed databases, discussion of analytic techniques, and consideration of differences in VAM with single teachers versus programs with many teachers. The authors discuss VAM as a tool for program improvement and acknowledge the complexities of determining the independent variable for these studies and particularly the complexity of ferreting out the features of programs that contribute to differences.

The second VAM article, by Plecki, Elfers, and Nakamura, represents a context very different from Louisiana because there is limited data in this state to be used for purposes of accountability and program improvement. The authors explore what can be learned from the application of VAM as one form of evidence about program quality in states with limited capacity for data collection and integration. As in the Gansle et al. article, Plecki et al. seek to extend the focus beyond accountability to what can be learned for internal improvement of teacher education programs.

Henry et al. (2012) provide a broader look at VAM through presentation and discussion of a framework for decisions needed to produce estimates of teacher preparation program effectiveness. They review the Race to the Top proposals submitted by states in relation to decisions about selection, estimation, and reporting of VAM and discuss insights and consequences of particular decisions using a specific set of criteria for evaluation. The online appendix for this article features more detailed examples of non-value-added and value-added models, including Florida, Tennessee, and North Carolina.

To conclude this section, Robert Floden provides a commentary on the limitations of VAM for determining the effectiveness of teacher preparation programs and the inferences they do or do not support. He examines different definitions of quality, highlighting that VAM is not the only measure of interest or importance in determination of teacher preparation program effectiveness. In addition, he discusses how the labor market affects the inferences that can be drawn from VAM scores.

The second part of this issue also focuses on improving teacher quality but from a very different perspective. Hiebert and Morris (2012) published an article in the March/April issue of JTE suggesting that the conventional strategy of improving particular characteristics of teachers is not the most productive and that we should instead be promoting the development and use of instructional products such as annotated lesson plans and common assessments that are continually improved by teachers through implementation and refinement. During this process, teachers focus on teaching and acquire the skills needed for teaching through use of the products. This section publishes three responses to Hiebert and Morris’s original proposal as well as a brief rejoinder to the three responses by Hiebert and Morris. The discussion engendered through the multiple responses deepens our understanding of strategies for improvement of teaching, provides concrete examples of these strategies, and raises important issues to be considered for implementation and evaluation.

While Lampert acknowledges the value of the approach, she also finds the choice between improving teaching and improving teachers problematic for several reasons she outlines in the article. She further suggests that viewing their approach through a practice-based lens would be advantageous and that attention needs to be given to interpersonal relationships, maintaining high expectations of teachers, and developing the collective will for improvement. Lewis, Perry, Friedlin, and Roth suggest that the focus on common assessments and lesson plans is not sufficient, and similar to Lampert, they also suggest that attention to both teachers and teaching is necessary. They describe how approaches such as lesson study provide this dual focus. Zeichner characterizes Hiebert and Morris’s proposal as an extension of practice-based teacher education and places it and similar movements within a historical context to extract what we have learned in previous efforts to refocus teacher education and professional development on core teaching practices. Several issues emerge that need to be considered when attempting to implement the proposal by Hiebert and Morris, including whether the deliberate teaching of instructional practices as it occurs in many models of practice-based teacher education should take place and where it would occur in the program.

In their response to the set of commentaries, Hiebert and Morris focus on two points that they determined as salient in the commentaries—willingness and motivation to improve teaching through the use of artifacts and the importance of prioritization within teacher education for improving teaching rather than training teachers. They take issue with Zeichner’s sequence for the deliberate teaching of individual practices and suggest that the reverse would be more appropriate.

We hope that the articles included in this issue expand your thinking about the topics of assessment and accountability in teacher education and teacher and teacher education program effectiveness and provide ideas for future research. We encourage extended discussion, whether formally in journal submissions or conference presentations or informally with colleagues, of the implications of these studies for your own research and practice and for education policy. AACTE has recently provided an opportunity for discussion of JTE topics through its Online Community website. Readers are encouraged to participate in the conversations presented in the journal or to offer ideas on anything related to JTE (http://community.aacte.org). We look forward to receiving manuscripts from you in the future as well as your ideas directed toward improvement of JTE.

References

Duckworth

Quinn

Seligman

(2009). Positive predictors of teacher effectiveness. Journal of Positive Psychology, 4, 540-547.

Floden

(2012). Teacher value added as a measure of program quality: Interpret with caution. Journal of Teacher Education, 63(5), 356-360.

Henry

Kershaw

Zulli

Smith

(2012). Incorporating teacher effectiveness into teacher preparation program evaluation. Journal of Teacher Education, 63(5), 335-355.

Hiebert

Morris

(2012). Teaching, rather than teachers, as a path toward improving classroom instruction. Journal of teacher Education, 63(2), 92-102.

Knight

(2011). Evaluation of teacher quality. In Secolsky

(Ed.), Handbook on measurement, assessment, and evaluation in higher education (pp. 584-592). New York, NY: Routledge.

Sanders

Horn

(1998). Research findings from the Tennessee value-added assessment system (TVAAS) database: Implications for educational evaluation and research. Journal of Personnel Evaluation in Education, 12(3), 247-256.