Abstract

The Handbook of Item Response Theory is an extensive three-volume collection with contributions from leading researchers in the field. This review focuses on Volume 1 (Models). Aside from the Introduction, each of the 33 chapters provides a self-contained presentation of an item response theory (IRT) modeling framework. The chapters share a common notation as well as a uniform organization (Introduction, Model Presentation, Parameter Estimation, Goodness of Fit, an Empirical Example, and a Discussion). Many chapters are lead- or single-authored by original developers of the research, and in all cases, the lead authors are highly regarded as experts in the field.
The Volume is organized into eight sections, each containing between two and seven chapters focused on types of data—dichotomous responses, polytomous responses, response times—or on types of models—multidimensional, nonparametric, nonmonotone, hierarchical and multilevel as well as generalized modeling approaches that include but are not limited to IRT applications. The coverage of models for polytomous data is especially strong, with seven chapters devoted to this topic. In other areas, the coverage is already appearing somewhat thin in light of recent research trends. For example, a large amount of work has been devoted to the analysis of response times since the publication of the Volume. The three chapters in the Volume provide the foundations of this more recent research, focusing the early work of Rasch, approaches based on cognitive models of decision making, and models for lognormal response times; the latter is extended to the joint modeling of responses and response times in a separate chapter. Generalized modeling approaches is another area that, in retrospect, could have received more thorough coverage of topics such as Bayesian IRT, psychometric applications of networks and graphs, or approaches based on machine learning. There is only one chapter addressing models with categorical latent variables. Despite the inevitable nit-picking about specific omissions, the Handbook certainly provides a thorough characterization of the breadth of active research on statistical models used in the IRT literature.
It is important to emphasize that Volume 1 focuses exclusively on statistical models, with only brief motivating remarks and empirical illustrations in each chapter. Readers looking for detailed discussions of applied topics in testing (e.g., item calibration, test design, scoring) should instead consult Volume 3 of the Handbook. Similarly, readers interested in more general discussions of topics like identification, missing data, estimation, and model fit would be better served by Volume 2—such details are often (but not always) skimmed over in Volume 1.
Readers may wonder how the current Handbook compares to the Handbook of Modern Item Response Theory edited by van der Linden and Hambleton (1997). Most obviously, the content has been vastly expanded and updated. But, for the most part, Volume 1 of the new Handbook shares the overall spirit of the 1997 contribution, with a similar emphasis on up-to-date, authoritative, and comprehensive coverage. Comparison might also be made to Rao and Sinhary’s (2006) edited volume, Psychometrics. That contribution addressed a number of psychometric models in addition to those associated with IRT as well as topics such as causal inference and value-added modeling. The current Handbook is more directly focused on IRT, although Volume 1 contains occasional glimmers of factor analysis (e.g., Chapters 10, 11, and 31) and latent class models (e.g., Chapters 23).
In the preface, the editor identifies three principal audiences for the Handbook: researchers, practitioners, and students. I will consider the strengths and weaknesses of Volume 1 from the perspective of each of these groups.
Researchers whose work focuses on statistical methodology will find Volume 1 useful for its succinct summaries of models commonly encountered in the IRT literature. Each chapter cites the original research and other precedents thoroughly, making it easy to seek out additional information. Chapters vary in how much technical detail is given on parameter estimation and goodness of fit. For example, Chapter 2 provides four paragraphs discussing the literature on the estimation of the two-parameter logistic model, whereas Chapter 3 provides four pages of derivations for conditional maximum likelihood estimation of the Rasch model. More generally, despite the uniform organization of each chapter, the authors found plenty of leeway to emphasize topics according to their own style and interests. This was one of the pleasures of reading the Volume—to see similarities and differences in how leading scholars approach their craft. Other researchers in IRT may also enjoy this aspect of the Volume, although it is much more likely to serve as a reference.
There is extensive cross-referencing within the Handbook, but the chapters in Volume 1 generally do not discuss how the focal models compare to other approaches. Some notable exceptions include Dr. Master’s discussion of the partial credit model in Chapter 7, and Dr. Tutz’s discussion of the sequential model in Chapter 9, both of which devote sections to contextualizing the presented model in terms of other models for polytomous data. However, such context is often omitted in favor of other details, which is certainly understandable, given the space constraints for each chapter (approximately 20 pages). In general, it is fair to say that the Volume does not provide a lot of insight about which model to use in a given application, but, once the reader has a model in mind, they will find an authoritative discussion about its use and interpretation.
I have my doubts about the suitability of this Volume for practitioners who do not possess a relatively thorough statistical training. The Handbook aims to be statistically rigorous, and there is a relatively high bar required for most of the material. Some chapters are especially challenging in terms of the level of abstraction used to discuss model properties or the sheer density of mathematical statements and will likely have even the most seasoned researcher reaching for pencil and paper. This, combined with the relatively minimal advice on when to use a given model, may leave practitioners feeling more comfortable with Volume 3 or a textbook designed to provide an overall introduction to IRT.
Even practitioners with a strong mathematical background may find certain aspects of the Volume challenging due to the succinctness of each chapter. There is a lot that must be taken for granted in a 20-page summary. Important topics are occasionally mentioned only tangentially, and specific terminology is often left to a cross-reference or is simply taken for granted. The topic of identification in latent variable models is a good example. Chapter 2 provides a detailed discussion, but more often, parameter restrictions are introduced to address model identification without any discussion at all. As another example, the IRT literature retains a surprising number of procedures referred to as “maximum likelihood” with which readers outside our field may not be familiar. These issues could be addressed by the inclusion of a glossary, both for technical notation shared across chapters and for terminology that is idiomatic. However, the persistent reader will be able to overcome these stumbling blocks.
Third, let us consider the use of this Volume for the training of graduate or advanced undergraduate students. The Handbook is not intended to be a textbook. Volume 1 would be suitable for providing companion readings in the latter half of an introduction to IRT or a second course that focused on advanced topics. Students who do not have sufficient mathematical training in calculus and linear algebra will not find the Volume very accessible. However, it would serve very well as required reading for the comprehensive examinations of any methodologically oriented student seeking to specialize in IRT.
Let me offer another way of grouping the readership of the Handbook, which struck me as important while reading Volume 1: the initiated and the uninitiated. Readers familiar with IRT research will have a sense of the differences of perspective among this collection of authors and, in particular, of two long-standing schools of thought on the Rasch model. As mentioned above, I think the initiated will enjoy and learn from the different perspectives taken by the authors of this Volume. However, the uninitiated would benefit from an introductory chapter outlining how some of these differing perspectives align with the presentation of IRT models and their associated terminology. For example, do IRT models focus on the probability of an item response conditional on the respondent or the probability of an item response vector marginalized over respondents? Are person-level quantities parameters or random effects? Other topics that might benefit from more background discussion include perspectives on the speed–accuracy trade-off and the relation of IRT to other statistical approaches such as factor analysis, mixture modeling, nonlinear mixed-effects models, and so on. This could complement the current introductory chapter, which focuses more on historical precedents set by individual scholars but is less explicit about how such precedents continue to influence current practice.
In summary, Volume 1 of the Handbook succeeds in providing an up-to-date, authoritative, and comprehensive treatment of statistical modeling approaches used in the IRT literature. There are some additions that would make the Volume more accessible to the uninitiated, such as a glossary for technical notation and idiomatic terminology, as well as a background chapter about different theoretical perspectives in the field and how this plays out in the conceptualization and presentation of the models. Researchers looking to update previous handbooks will be rewarded by the breadth of topics covered and benefit from the insights offered by leading scholars. The Handbook is not a suitable replacement for a textbook, but Volume 1 could augment course readings by providing a deeper look at specific modeling approaches.
