Abstract

Assessment outcomes are used in many high-stakes decisions concerning students, teachers, and schools, including for accountability purposes (Darling-Hammond, 2004). For students, decisions regarding promotion, graduation, and curriculum are often made based on the results of assessments. Teachers also use information from assessments to enhance student learning. Assessments of teachers are used for improvement and personnel decisions regarding pay and advancement. For schools, sanction decisions are made based on student performance at mandatory state and district assessments. Clearly, assessments play an integral role in instruction, placement, promotion, and efforts to ensure that students and teachers receive the support they need for success.
At the same time, serious consequences can result from all of these assessments if they are not constructed and used properly. In particular, the use of assessment outcomes for high-stakes decision making is especially pronounced for students with special needs, such as English language learners (ELLs/ELs) 1 and students with disabilities (SWDs). For example, the process of initial identification, classification, and reclassification of ELL students is based extensively on the results of a single English language proficiency (ELP) test administered at a single time. If, for any reason, these test results are not dependable due to unreliable or invalid tests, it can jeopardize a student’s academic path. An ELL student who is improperly classified as fluent English proficient may miss the opportunity to receive English language development services, and therefore may not receive the academic support necessary for success. Similarly, an ELL student at the lower level of English proficiency may mistakenly be classified as a student with learning or reading disabilities, which can also prevent him or her from receiving the appropriate support for his or her specific language needs. Therefore, caution must be exercised when making high-stakes decisions based on assessments that may have questionable content and psychometric properties.
The purpose of this volume is to bring awareness to specific considerations necessary in the use of high-stakes assessments, particularly for students with diverse learning needs and for teachers. In addition, this volume attempts to shed light on the decisions made based on the results of assessments and explores the implications of using high-stakes assessments for students with special needs. The volume includes exemplary chapters addressing major issues in the content and psychometric characteristics of assessments for both students with diverse learning needs and for teachers.
Assessment of Students With Diverse Learning Needs
Although there are many groups of students who could potentially be labeled as students with diverse learning needs, the primary focus of this volume is on ELLs and SWDs (Shimoni, Barrington, Wilde, & Henwood, 2013). For these students to succeed, it is imperative that their specific academic needs be recognized and addressed. 2 ELLs and SWDs need additional attention in the areas that can impede their academic progress, such as limited English proficiency or their disabilities. To be fair to these students, it is essential to make instruction and assessments accessible for them so they may fully demonstrate what they know, and what they are able to do academically. For example, unnecessary linguistic complexity of content-based assessments (e.g., mathematics and science) can unfairly affect the academic performance of ELLs, particularly those at the lower level of English proficiency, as well as students with reading or learning disabilities. Many ELL students may possess the content knowledge being assessed but may not be at the level of English proficiency needed to understand the complex linguistic structure of the test items, and therefore, the items block students from showing knowledge of the focal construct. Accordingly, the performance gap between ELLs and their native English-speaking peers due to unnecessary linguistic complexity of test items is a serious equity issue in assessment that must be addressed properly (Abedi, in press; Solano-Flores, 2008).
Similarly, many SWDs can perform at the same level as their peers if assessments are made accessible to them by controlling for sources of construct-irrelevant factors in their assessments. Issues such as fatigue and frustration due to the presentation of a large number of test items, crowded pages, and complex tables and charts can affect the performance of SWDs. Therefore, assessments must be constructed to focus on student knowledge and aptitude, while controlling for these inhibiting distractions (Ketterlin-Geller, 2008).
A New Generation of Assessments
Attention to the assessment of students with diverse learning needs is of paramount importance as the nation moves toward the development and implementation of a new generation of assessments. Currently, many states across the nation are implementing major changes to their assessment and accountability systems. The Race-To-The-Top College and Career Readiness standards-based assessment system, currently under development by two consortia of states, the Smarter Balanced Assessment Consortium (Smarter Balanced) and the Partnership for Assessment of Readiness for College and Career (PARCC), present a historical juncture to address issues related to the content and psychometric characteristics of the assessment of students with diverse learning needs. The issues identified by research in assessment and accountability systems must be addressed in the new generation of assessment systems as they will have a substantial impact on the validity of the interpretation of test scores for students with diverse learning needs.
Teacher Assessments
Assessment outcomes for teachers in both cognitive (e.g., teacher’s content knowledge of subject areas) and noncognitive domains (e.g., teacher’s level of motivation and engagement and other psychological factors) provide valuable information in understanding teachers’ instructional strategies and students’ performance in school. As Gitomer and Zisk (Chapter 1 in this volume) have indicated, teachers need to have deep knowledge and understanding of the content they are teaching, but the literature lacks specifics on how teachers should and actually do apply their knowledge during teaching. Moreover, much of the literature on teacher knowledge omits attention to the kinds of pedagogical language knowledge teachers need to have when teaching content to ELs (Faltis & Valdés, in press). Although there are different instruments that are commonly used for measuring teacher’s content knowledge and attention to academic language (e.g., Performance Assessment for California Teachers) and for measuring teacher effectiveness, there is a need for research to judge the content and psychometric quality of these instruments (Tretter, Brown, Bush, Saderholm, & Holmes, 2013). Furthermore, research on the impact of teachers’ cognitive and noncognitive assessment outcomes on student performance is scarce.
Volume Overview
Chapter 1 of this volume by Drew Gitomer and Robert Zisk focuses on the development of assessments of teacher knowledge. The chapter not only discusses assessments that serve licensing functions but also focuses on broader issues, including advances in assessment design and new ideas for promoting teacher knowledge and effectiveness. Gitomer and Zisk also discuss the domain-general rules of pedagogy, including knowledge of child development, classroom management, teaching methods, and classroom assessment. The authors distinguish between the domain-general pedagogical knowledge from the pedagogical content knowledge that provides the foundation for the assessment of pedagogical content knowledge. The authors acknowledge that limited information is available about the outcomes of and research into these content validation processes. Furthermore, they indicate that by understanding the relationships among these measures of teacher content knowledge, a more developed understanding of the nature and use of teacher knowledge can be articulated in carrying out the work of teaching.
In Chapter 2, Ayesha Madni, Eva Baker, Kirby Chow, Girlie Delacruz, and Noelle Griffin present the literature on the assessment of teachers’ social psychological factors, as well as common strategies for assessing these variables. The authors suggest that the assessment of social psychological constructs for teachers might be useful in assigning and representing competent teachers in classrooms where they could model and incite appropriate student behaviors. The chapter examines various social psychological teacher factors such as motivation, intra- and interpersonal skills, and how these variables affect teacher effectiveness. The authors discuss teacher motivation as broken down into beliefs, efficacy, expectations, attribution, and goal orientation. They find that teacher efficacy, intrapersonal skills, and goal orientation can positively affect teacher beliefs about teaching and their instructional practices. The authors indicate that some teachers tend to underestimate the academic ability of students from lower income families or minority groups. The chapter also discusses technical quality issues of measurement, validity models, and the role of technology in measuring social psychological variables.
Chapter 3 by Kip Téllez and Eduardo Mosqueda discusses teachers’ knowledge and skills as they relate to language learners and language assessments. The chapter begins with a discussion of the challenges with assessing ELLs in nondominant languages and the need to create accurate assessment tools. The authors also describe the misdiagnosis of ELLs who are placed into special needs classes and question whether the assessment is related to the students’ language knowledge at the time of the assessment or other learning issues. They offer suggestions for more accurate assessments of ELLs. For example, the authors indicate that the Bilingual Verbal Ability Test helps identify ELLs who are academically skilled when all other linguistic capacity tests fail to recognize these talents. Next, the chapter discusses the issues related to teacher preparedness with dealing with ELLs. Based on the literature presented, the authors indicate that bilingual teachers who knew the home language of their ELL students were more accurate than English-only teachers for assessing their ELLs’ academic abilities. The chapter also highlights a major concern that low income and minority students will continue to be underserved in schools and concludes with a note about the realities of testing ELLs and the potential that these populations may continue to be underserved due to the fact that a majority of ELLs attend underfunded schools that enroll few English speakers from other ethnic groups (Gifford & Valdés, 2006; Powers, 2014)
Chapter 4 by Timothy Boals, Dorry Kenyon, Alissa Blair, Elizabeth Cranley, Carsten Wilmes, and Laura Wright presents a comprehensive literature review on the concept and construct of ELP assessments. This chapter examines the ELP test design and the evolution of the ELP assessment. The authors examine the literature on the merits and shortcomings of ELP test design and testing as they have evolved over time. In the first section of the chapter the authors explain the role of language testing in its broader historical and policy context. In the second section the authors examine the evolving construct and operationalization of academic English, and use the ACCESS for ELLs as an example of how the conceptualization of academic language is operationalized in the design, development, administration, and use of a large-scale ELP assessment. This section also examines the issues of accessibility and accommodations in large-scale ELP testing. In the final section of this chapter the authors explore expanded conceptualizations of ELP assessment in the era of College and Career Readiness standards and conclude with recommendations for future research.
Chapter 5 by Suzanne Lane and Brian Leventhal presents a comprehensive review of the literature on psychometric issues in the assessment of ELLs and SWDs. The chapter is structured in four sections. The first section presents general concepts in the assessment of ELLs and SWDs, such as the inclusion of ELLs and SWDs in large-scale assessments, validity issues in the assessment of ELLs and SWDs, the concept and application of test accommodations in assessment of these students, and a discussion of psychometric challenges, which includes issues related to the validity and fairness of assessing ELLs and SWDs. The second section presents a comprehensive literature review on the efficacy of test accommodations and modifications for SWDs and ELLs. The third section examines the extent to which the psychometric properties (e.g., reliability and score precision, internal structure evidence, external structure evidence, and evidence of equating) of a test are invariant across groups of students. The authors indicate that the establishment of measurement invariance for ELLs and SWDs is required to make valid and fair score interpretations for these students and for group comparisons. The chapter concludes with the presentation of issues related to growth measures for ELLs and SWDs. The authors explain the challenges and goals of the assessment consortia to address these issues by designing better tests and by providing accommodations that are more effective in making assessments more accessible to students without altering the focal construct.
Chapter 6 by Stephen Sireci and Molly Faulkner-Bond presents a thorough discussion of the validity of assessments for ELLs. Acknowledging the complex nature of assessing the knowledge, skills, and abilities of ELLs, the authors discuss how a validation framework for evaluating the inferences derived from ELLs’ test performance can be developed. Since accommodations used for ELLs play a major role in the validity of assessments for these students, the authors present a review of the literature on the effectiveness and validity of accommodations used for ELLs and how construct-irrelevant sources due to inappropriate use of accommodations can affect the validity of assessments. The authors include a discussion of the most common accommodations provided to ELLs and the level of improvement in ELL student performance due to the use of these accommodations. The authors then discuss best practices that test developers can use to promote more valid assessments of ELLs, as well as future directions for research and practice in assessment and accommodations for ELLs.
Chapter 7 by Alison Bailey and Patricia Carroll presents a comprehensive view of current language assessment policies and practices for ELL students and discusses relevant research studies to evaluate their technical quality. The authors discuss a system with four main components to determine students who should and those who should not receive services under Title III of the No Child Left Behind Act. The first component of the system involves administration of a home language survey to identify potential ELL students. In the second phase of the system, screening tools or placement tests are used to determine ELL status and instructional placement. The third phase involves monitoring English language progress and proficiency with classroom assessment approaches for instructional purposes and for annual accountability of the student’s English language progress and level of proficiency as measured by an ELP assessment. In the last phase, ELL students are evaluated based on their level of English proficiency for reclassification as English proficient and exit from ELL programming based on the state and local district formulas. The authors also discuss validity concerns regarding the criteria used in the different phases of the system. The authors examine the intersection of language assessment and academic content assessment in terms of their purposeful interpretation and use (including accommodation use) by educators in decision making at the federal, state, and local levels. Bailey and Carroll conclude by providing recommendations for further research that focuses on improving the ELL assessment system to validly identify students who should receive services under Title III of No Child Left Behind with its focus on ELL language support.
Chapter 8 by Ryan Kettler provides a broad review of research on the role of testing adaptations as well as examples of item modifications and testing accommodations. The chapter begins by introducing the role of testing adaptations in the accessibility of tests for a diverse population of students. The chapter documents the substantial research that has been completed on testing adaptations, along with critical findings. Based on the summary of research presented, the chapter recommends reexamination of the methods that are used to answer questions about the appropriateness of testing adaptations. The final section of the chapter introduces a new practice and research paradigm that recognizes the many related variables that must be considered to draw sophisticated inferences from achievement test scores.
Chapter 9 by Martha Thurlow and Rebecca Kopriva describes the concept and application of accessibility and accommodations in the assessment of special needs student populations. The primary focus of this chapter is on large-scale content assessments used at the state and national levels and the direct implications that these assessments have on students’ academic career, and on district and state policies and practices. The chapter provides a history of the push for assessment accessibility and accommodations in national and state assessments particularly in the new generation of assessments. The authors explain the dramatic shifts that have occurred in the participation of ELLs and SWDs in national and state assessments, and the legal basis for many of the changes in inclusion, accessibility, and accommodations. The authors then discuss the foundational assumptions and research findings regarding accessibility and accommodations for SWDs and ELLs, and the implications for district and classroom assessments.
Chapter 10 by Randy Bennett focuses on the transition of educational assessments from a paper-based to an electronic format. The author indicates that this transition is not a simple change of assessment presentation but is a more substantive process and involves major restructuring in the content, construct, and presentation of the assessment items. The chapter begins by describing the different developmental stages associated with the evolution of assessment programs, including the new generation of assessments being created by PARCC and Smarter Balanced. The author then discusses innovations made possible by electronic learning environments. In this section, the author presents some of the advanced features that the Common Core State Assessment consortia are actively incorporating in the new assessments.
Footnotes
Acknowledgements
The editors wish to express their appreciation to individuals who made this work possible. First, we would like to thank the authors for their great contribution to this volume. We are also grateful to the consulting editors and all the outside reviewers for their time and the valuable suggestions and advice they provided to the authors. Special thanks to Kelsey Krausen, our editorial assistant, who performed an extraordinary job of managing the various tasks related to the volume. She communicated so gracefully with authors, consulting editors, and reviewers and kept everyone on track. She also provided excellent suggestions for revisions of this introduction and different chapters of the volume. We also wish to thank Felice Levine, AERA Executive Director; John Neikirk, AERA Director of Publication; and other members of the AERA Publication Committee for their guidance and support. Last, we want to express our gratitude to Sara Sarver, the project editor at Sage, for her efforts toward publication of this work.
