Abstract

The chapters in this volume were originally presented as papers at a conference held to honor Lyle F. Bachman in 2017. Professor Bachman’s far-reaching book, Fundamental Considerations in Language Testing (1990), had a profound impact on the field of applied linguistics. Co-edited by Gary Ockey and Brent Green, this volume compiled insightful papers from Bachman’s former students that continue his contribution to the field of language assessment, providing a comprehensive discussion on some major issues in second language assessment. Not only does the Festschrift reflect Bachman’s endeavor to explore fundamental topics in language assessment, such as construct, validity, and validation, but it also inspires the next generation to continue with his work in developing, analyzing, and evaluating effective language assessments.
The volume encompasses 14 chapters and two forewords contributed by Adrian S. Palmer and Lyle F. Bachman, respectively. Bachman’s foreword is particularly noteworthy as it provides useful guidance on mentoring Ph.D. students. Besides the editors’ introduction (Chapter 1) and conclusion (Chapter 14), the remaining chapters are organized into three parts: assessment of evolving language ability constructs, validity and validation of language assessments, and understanding internal structures of language assessments. The introduction chapter provides a brief overview of each chapter, and the conclusion chapter summarizes the issues discussed and recommends directions for future research. Despite focusing on different issues, the remaining chapters are interconnected with each other and arranged in a similar structure, engaging a coherent discussion on essential considerations in language assessment. Five conceptual chapters (Chapters 2, 3 4, 6, and 7) are organized in the order of Introduction; Constructs; Historical Perspectives; Critical Issues; and Conclusions, Implications, and Future Directions sections, establishing an explicit relationship with Bachman’s original work. Seven empirical chapters (Chapters 5, 8, 9, 10, 11, 12, and 13) adhere to the formal IMRAD research paper format.
Part I (Chapters 2–5) explores some of the evolving constructs that have been developing and expanding in the 21st century. It starts with a discussion of pressing issues in defining and assessing English as a lingua franca (ELF) (Chapter 2). With the development of globalization, ELF context is becoming ubiquitous. However, the construct of ELF has not been extensively tapped in English as second or foreign language assessments. This chapter, therefore, provides a framework to determine how well an assessment measures ELF. Likewise, as the expanded instructional approaches that integrate content and language together have become so common in many English-medium contexts, Chapter 3 calls for a rethinking of the role of content in defining language assessment constructs and concludes with an agenda for researchers to explore innovative ways to assess language and content integratively. In line with the dominant role of digital videotext in language teaching and learning, Chapter 4 explores the construct of multimodal listening and its possible implication for listening assessments. Chapter 5 extends Bachman’s statements about the priority of explicit construct definitions in language assessment to providing construct-centered feedback on test takers’ strengths and weaknesses. It is argued that such systematic feedback can make a positive impact on learning.
Part II (Chapters 6–9) addresses issues about the validity and validation of language assessments. The four chapters in this part provide and/or employ rationales, frameworks, and models to draw language assessments toward fair ethical practice. To remedy the situation where two dominant approaches (Standards-based approach and Argument-based approach) are generally deficient in providing an articulated philosophical foundation for the evaluation of language tests, Chapter 6 proposes an Ethics-based approach to conducting more justifiable research on language assessment evaluation by explicitly explaining what and why principles should be based on in devising and validating language assessment. Focusing on some critical alignment issues in the standards-based K–12 English language proficiency (ELP) assessments, Chapter 7 argues that evaluating the assessment content and the target domain alignment should consider both the assessment content and ELP comparisons as well as the correspondence of ELP and content in and across standards, assessments, curriculum, and instruction. Chapter 8 demonstrates the effectiveness of the Assessment Use Argument (AUA) in framing the validation of a practical scoring procedure for short answer reading comprehension questions. Chapter 9 wraps up this part by reminding readers of some specific issues related to model selection in language testing research, calling for acknowledging and justifying the processes of statistical model selection.
Part III (Chapters 10–13) includes four empirical studies examining the internal structures of language assessments with advanced statistical procedures. Chapter 10 probes the functioning of two methods of summary content scoring: content point scores and a holistic summary content rating scale called Integration. The results of multivariate generalizability theory analysis showed that the Integration rating scale was satisfactorily reliable for the intended uses. Chapter 11 investigates the extent to which scoring keys produced by different authors are comparable. By using multivariate generalizability studies, it revealed that the single-author scoring keys may be insufficiently dependable for high-stakes decisions, implying that the score key author may produce construct-irrelevant variance. To address the difficulty in differentiating language abilities from contexts, Chapter 12 and Chapter 13 apply confirmatory factor analysis and multivariate generalizability theory respectively to illuminate the relationships between language ability and context in language speaking tests. Chapter 12 discovered that language ability and contextual factors could be measured in a separate manner, providing new insights into the “fundamental dilemma” (Bachman, 1990, p. 288) of language testing. Chapter 13 found that non-heritage learners tended to perform better when they were paired with heritage learners in paired speaking tasks than in individual tasks, improving the understanding of the co-constructed nature of test constructs in oral assessment.
Overall, this volume makes a significant contribution to the field of language assessment. It provides a transparent discussion of some up-to-date issues in second language assessment. One strength lies in its clarity in defining constructs which language assessment researchers aim to investigate. As Bachman and Palmer (2010) illustrated, the construct definition forms the basis for the interpretations of assessment performance in a particular assessment setting. Following this line of thought, each conceptual chapter contains a part that discusses the construct. Moreover, all empirical chapters make pertinent remarks about the relative constructs assessed in the literature review. Not only do they explore some of the evolving language ability constructs from interdisciplinary perspectives, such as ELF (Chapter 2) and systemic functional linguistics (Chapter 4), but they also endeavor to examine some of the assessment conundrums, for instance, the role of content in language assessment and construct definition (Chapter 3) and the relationship between language ability construct and context (Chapter 12). Another strength is the balance between theoretical discussions and empirical applications. Besides the discussions of internal structures drawn from AUA, this volume covers several empirical studies on how to use AUA in developing and evaluating a particular language assessment (e.g., the comparison of the validity of content point scores to that of the integration rating scale in Chapter 10).
Despite the breadth of scope and the complexity of concepts, this volume is cohesive, well-structured, and reader-friendly. Although the chapters are independent of each other, they cover most relevant topics and work within a similar structure, which should appeal to a broad readership. The introduction part may serve as a guide for readers with different interests in language assessments. Besides, researchers and graduate students in the field of language assessment would gain valuable insights from the recommendations for future directions presented in the conclusion chapter. Furthermore, language test developers and users can also benefit because it provides several examples showing how to apply theory to specific assessment practice, such as the evaluation of an oral communication placement test underlining the framework of ELF (Chapter 2) and the validation of holistic rubrics scoring for reading comprehension assessments (Chapter 8). Finally, language educators and teachers will find it useful, especially in the first part, to see how to employ various approaches to steer tests toward facilitating language learning (Chapters 2, 3, and 4) and generating fine-grained feedback (Chapter 5).
On a more critical note, the ELF test framework in Chapter 2 could be improved by incorporating central elements of an ELF construct such as adaptability and negotiation of meaning (Harding & McNamara, 2018).
Notwithstanding the above minor flaw, this book is a particularly apt tribute to the contributions that Bachman has made to language assessment. It will no doubt inspire the new generation of language assessment researchers and practitioners to move the field forward.
