Abstract

Argument-Based Validation in Testing and Assessment is the 184th volume of the Quantitative Applications in the Social Sciences series, published by SAGE in 2020. Unlike most titles in the series focusing on statistical techniques, this book elaborates on an approach to test validation. Since its introduction in the 1980s, the argument-based approach to validation (e.g., Kane, 2013) has become one of the most influential validation frameworks in educational measurement (Im, Shin, & Cheng, 2019). However, actual applications of argument-based validation are rare (Chapelle & Voss, 2014). This theory-practice gap can be closed by this book. Written by Carol Chapelle, a prominent figure in the development of argument-based validity, the book provides test developers and researchers with a guideline for designing validation research using an argument-based approach.
The book encompasses seven chapters. In Chapter 1, the author introduces the key concepts needed to construct a validity argument by sketching the academic roots of argument-based validity in the evolving definitions of validity and validation, as well as some basic concepts of testing required for understanding argument-based validity, equipping readers with the background knowledge about validity arguments.
Following the introductory chapter, three tests developed for different purposes (Test of English as a Foreign Language Internet-Based Test, Mayer–Salovery–Caruso Emotional Intelligence Test, Iowa Assessments mathematics achievement test) are presented as examples in Chapter 2 to show how to design parts of validity arguments for test interpretations and uses by structuring claims and inferences together. The chapter also offers a detailed introduction of basic elements (warrants, assumptions, backings, rebuttals) supporting and undermining inferences and claims.
Chapters 3 to 6, respectively, go further into the inferences mentioned in the previous chapter and additional inferences, resulting in a total of seven logically connected inferences in a validity argument, namely, consequence implication and utilization (Chapter 3), explanation and extrapolation (Chapter 4), generalization and evaluation (Chapter 5), and domain definition (Chapter 6). Specifically, Chapter 3 addresses test score uses and consequences, the starting point of argument-based approach to validation in view of their central roles in argument-based validity, with two types of inferences, namely, consequence implication and utilization inferences. This is done by illustrating specific warrants required to support the inferences in the three tests used as examples.
Chapter 4 moves on to explanation and extrapolation inferences related to test construct: the substantive sense of test scores. It starts with a brief description of construct from the perspective of argument-based validation, which demonstrates the importance of construct specifications for various tests. Then, the author explores how the two types of inferences can be applied to investigate specified constructs of different types—trait-type construct with explanation inference, performance-type construct with extrapolation inference, and interactionalist construct with a combination of both explanation and extrapolation inferences—using the sample tests.
After the exemplification of inferences regarding constructs, readers are shown two types of inferences associated with test score consistency in Chapter 5: generalization and evaluation inferences. The chapter first describes the concept of consistency as reliability in classical test theory and its importance in score interpretation. Then, it examines how consistency is related to validity arguments based on a summary of claims, warrants, and assumptions for the generalization and evaluation inferences for each of the three test examples.
Next, Chapter 6 covers domain definition inference, the fundamental inference in a validity argument. First, the chapter presents the reason for including test content or test domain and its development process as the basis for a validity argument. Then, it states how content development is taken into account in argument-based validity, as illustrated by different types of warrants that support the domain definition inference for the test examples.
Chapter 7, the concluding chapter, provides practitioners with practical guidance on how to build a validity argument. The summaries of the seven inferences discussed in the previous chapters are first presented to show the logic of argument-based validation. The role of validity argument in social context is then addressed to exhibit its context-specific nature, followed by the presentation of a three-stage argument-based validation process based on Kane (2006) for two different contexts, namely, validating existing tests and developing new tests.
Given the difficulty of conducting validation research in real life, as indicated by Kane (2012): “validation is simple in principle, but difficult in practice” (p. 15), this 160-page methodological book is the best way for practitioners to get started quickly. The structure and its accompanying content are both excellent. Structurally, apart from the introductory chapter, the remaining six chapters are organized sequentially following the three steps of argument-based approach to validation, with interpretation/use argument creation addressed in Chapter 2, evidence identification in Chapters 3 to 6, and validity argument development in Chapter 7, consolidating readers' understanding of the three-stage validity argument development process outlined in the final chapter. With regard to the content of the book, Chapelle makes use of three tests as examples throughout the book to illustrate clearly the designing of validity arguments for tests in particular contexts, making it easy for her readers to master the use of argument-based approach to validation in tests of interest. In addition, in order to deal with the complex relations among elements in validity arguments, a considerable number of schematic diagrams are drawn to visualize how they are connected. Not only does this help readers understand the content, but it also serves as a keynote for readers to grasp important information. Overall, the book gives accessible guidance on how to apply argument-based validation in testing practices.
However, the book could be better if it solved minor problems. First, although Chapter 2 introduces how claims and inferences are connected to articulate parts of interpretation/use arguments, the all-inclusive structure of an interpretation/use argument is not shown to demonstrate why we need seven inferences and how they interrelate with each other in argument-based validity before going deep into each inference. We suggest the inclusion of a diagram and its accompanying paragraphs that briefly describe the connection of the seven inferences in Chapter 2, which might make the inferences to be delineated in the following chapters clearer to readers. As admitted by the author in 2012, “with the overall framework laid out, points of connection with issues in language assessment are apparent” (Chapelle, 2012, p. 21).
Second, by including three test examples, the book illustrates an argument-based approach to validation for tests with different purposes. The test examples, however, are all large-scale standardized tests. It would be preferable if classroom-based assessments are also included to respond to different assessment contexts and to increase the attention on application of argument-based validation in classroom-related assessment (e.g., Bachman & Damböck, 2018).
Overall, this book accomplishes its intended purpose of giving a practical and actionable guide on starting an argument-based validation in tests. The book should appeal to students in educational and psychological testing courses as well as to researchers and practitioners embarking on argument-based validity.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
