Abstract

In 1997, Wim van der Linden and Ronald Hambleton edited the Handbook of Modern Item Response Theory that gives an overview of the state-of-the-art item response theory models at that time. In 2016, 19 years later, a new handbook is published: Handbook of Item Response Theory by Wim van der Linden. Where the 1997 handbook contains 28 chapters, the new handbook contains 85 chapters divided over three volumes. Volume I: Models is in essence the updated version of the 1997 handbook: It contains an overview of traditional and state-of-the-art item response theory models including, for instance, the Rasch model and the generalized partial credit model but also more recently developed models like models for response times, multi-level models, two-tier models, and explanatory item response theory models. In Volume II: Statistical Tools, various statistical aspects underlying item response theory models are discussed including, for instance, the normal distribution, identification of item response theory models, model estimation, and model fit assessment. Finally, in Volume III: Applications, practical issues are covered including person fit, test design, cognitive diagnosis, and various computer packages that are available for application of item response theory. The present review focusses on Volume II: Statistical Tools, which is divided into four sections each containing multiple chapters. Below, first, the topics covered in each chapter are briefly discussed, and next, the book is evaluated as a whole.
The Book
The first section of the book, Basic Tools, kicks off with Chapter 1 by James Albert about the logit, probit, and other response functions. In Chapter 2 by Jodi Casabianca and Brian Junker, various discrete distributions are discussed including their conjugate prior distribution and resulting posterior distribution. In Chapter 3 by the same authors, the multivariate normal distribution is discussed including its conjugate family of prior distributions with resulting posterior distributions. In addition, some skewed generalizations of the (multivariate) normal distribution are discussed. In Chapter 4 by Shelby Haberman, the exponential family of distributions is discussed for polytomous variables, continuous variables, functions of variables, conditional variables, and mixtures of variables. Chapter 5 by Tim Moses covers loglinear models for observed sum scores including their assumptions and estimation. In addition, the models are illustrated in a real data application. In Chapter 6 by Wim van der Linden, distributions for sums of nonidentical random variables are derived for discrete random variables (e.g., item scores) and continuous random variables (e.g., response times). In Chapter 7 by Hua-Hua Chang, Chun Wang, and Zhiliang Ying, Fisher information and Kullback-Leibler information are discussed for both one-dimensional and multidimensional binary item response theory models. In addition, Kullback-Leibler information and the Shannon entropy are discussed for cognitive diagnostic models.
In the second section, Modeling Issues, Chapter 8 by Ernesto San Martín is about the identification of the fixed and random effects specifications of the parametric and semiparametric formulation of the one-parameter, the two-parameter, and the one-parameter guessing logistic models. In Chapter 9 by Shelby Haberman, it is discussed how joint, conditional, and marginal maximum likelihood deal with issues related to incidental parameters using the (generalized) partial credit model and a special case of the additive exponential response model for nominal responses as examples. In Chapter 10 by Robert Mislevy, an overview is given of missing data approaches under various practical testing situations including random assignment of items, targeted testing, adaptive testing, not-reached items, intentional omits, and examinee choice of items.
The third section, Parameter Estimation, starts with Chapter 11 by Cees Glas on joint and marginal maximum likelihood estimation of the item parameters, (weighted) maximum likelihood, maximum a posterior, and expected a posterior estimation of the person parameters. In addition, these techniques are illustrated on a simulated data set. In Chapter 12 by Murray Aitkin, the expectation maximization (EM) algorithm is explained using an application to normal mixture models and normal multilevel models. In addition, applications to item response theory are discussed. In Chapter 13 by Matthew Johnson and Sandip Sinharay, Bayesian point and interval estimation is discussed together with Bayesian hypothesis testing. In Chapter 14 by Frank Rijmen, Minjeong Jeon, and Sophia Rabe-Hesketh, variational approximation methods are discussed for parameter estimation and illustrated using the Rasch model with fixed and random item effects. In Chapter 15 by Brian Junker, Richard Patz, and Nathan VanHoudnos, an overview of Markov chain Monte Carlo (MCMC) modeling tools and issues is given including Gibbs sampling, the Metropolis Hastings algorithm, and various tools to monitor and tune the sampling scheme. In addition, the chapter contains applications to item response theory including the derivation of an MCMC algorithm (with R code) for the two-parameter logistic model, which is subsequently applied to simulated and real data. In Chapter 16 by Heinz Holling and Rainer Schwabe, optimal design theory is discussed for (generalized) linear models, nonlinear models, and item response theory models including different objectives and criteria for optimality and Bayesian optimal design.
Finally, in the fourth section, Model Fit and Comparison, Chapter 17 by Cees Glas discusses Lagrange multiplier tests, Wald tests, Likelihood ratio tests, uniformly most powerful test, and limited-information tests to assess model fit in a frequentist framework. In Chapter 18 by Allan Cohen and Sun-Joo Cho, Kullback-Leibler–based fit indices AIC, TIC, CAIC and AICc are discussed together with Bayesian fit indices BIC and DIC. In Chapter 19 by Sandip Sinharay, various Bayesian approaches to model fit are discussed including residual analysis, prior and posterior predictive checks, the DIC fit index, and the (partial) Bayes factor. Finally, in Chapter 20 by Craig Wells and Ronald Hambleton, graphical and statistical approaches are discussed to assess model fit using the model residuals including parametric and nonparametric approaches.
Evaluation
The Handbook of Item Response Theory, Volume II: Statistical Tools contains 20 interesting and well-written chapters by an impressive list of experts in the field. It takes some time for the book to get going. In Chapters 1 through 6, the tools described are mainly general. Although the topics of the chapters all have obvious connections to item response theory, the tools are so general that these chapters felt more like chapters from a general statistics book. Chapter 7 on information theory is a turning point in the handbook with a spot-on discussion on how information theory is, and can be, used in item response theory to facilitate testing. In addition to this chapter on information theory, three other chapters stand out in particular: Chapter 8 with a thorough discussion on identification of item response theory models, Chapter 10 with a very comprehensive overview of various missing data approaches, and Chapter 15 on MCMC estimation with a very comprehensive and accessible overview of the MCMC modeling tools and issues. It is very illustrative that in this chapter, as discussed above, an MCMC algorithm is derived, and R code is provided for the two-parameter logistic model. Also Chapter 11 on frequentist model fit approaches, Chapter 13 on Bayesian estimation, and Chapter 19 on Bayesian model fit stood out because of the insightful data illustrations
It is hard to criticize individual chapters of the book as all chapters are of high quality. In addition, the book is very well edited, reflected by the comprehensively aligned notation across the different chapters and the thorough cross-referencing to chapters within this volume and chapters from the other two volumes. The book also seems relatively complete as there are no important omissions from the topics covered. One or two chapters on item factor models approaches might have been interesting as there are a number of useful statistical tools originating from this class of models (e.g., fitting item response theory models by weighted least squares estimation and model fit assessment by fit measures like the RMSEA), but generally, the most important topics are covered.
Key question from my perspective is, however, how large the need is for a book about statistical tools underlying item response theory. The need for Volume I: Models seems obvious because in the last 19 years many new item response theory models have been developed. Volume III: Applications also contains many new developments with respect to model applications and item response theory software that was very limited in 1997. The need for Volume II: Statistical Tools seems less obvious. In the preface of the book, van der Linden motivates the second volume of the handbook by pointing out that although item response theory depends heavily on various statistical tools, no systematic overview has been given in the literature before. Personally, I am not sure how successful Volume II will be as a reference book providing a systematic overview of the statistical tools underlying item response theory. As holds for more edited books, the focus and technical detail varied greatly across the chapters. As a result, it is challenging to pinpoint a concrete audience for which this book is suitable. Some of the chapters are very suitable for researchers or students without a background in item response theory. This includes the chapters on response functions, discrete distributions, multivariate normal distributions, loglinear models, maximum likelihood estimation, Bayesian estimation, MCMC, frequentist model fit tests, Bayesian model fit, and model fit using model residuals. Other chapters are suitable as a reference for researchers with experience in item response theory modeling. This includes the chapters on sums of nonidentical variables, information theory, missing data, the EM algorithm, variational approximation methods, optimal design theory, and information criteria. Finally, the chapters on the exponential family, identification, and incidental parameters are suitable for the very technically interested researchers.
Another argument for the need for Volume II of the handbook, as discussed in the preface by Wim van der Linden, is that with respect to parameter estimation and model fit assessment, many new developments have occurred in the last two decades. I can only agree that since 1997, many important developments in model estimation have occurred. Where in 1997 applying a basic item response theory model was arguably a challenge for the applied researcher, nowadays, various multilevel models, multidimensional model, mixture models, and so on, can be fit relatively easily due to the advances in computer technology and the accompanying developments of various efficient fitting algorithms. The three chapters on Bayesian statistics and the chapter on variational approximation methods can thus be seen as reflecting more major and recent developments in the statistical tools underlying item response theory with the other chapters of Volume II reflecting more on statistical tools that have already been established for some time.
Thus, Handbook of Item Response Theory, Volume II: Statistical Tools is an interesting and well-edited book that contains high-quality chapters about both established and new statistical tools underlying item response theory. However, while Volume I: Models and Volume III: Applications will probably quickly become the new standards to replace the 1997 Handbook of Modern Item Response Theory, time will tell how large the demands for the Volume II: Statistical Tools actually are.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research by Dylan Molenaar was made possible by a grant from the Netherlands Organization for Scientific Research (NWO VENI-451-15-008).
