Abstract
In “compensatory” multidimensional item response theory (IRT) models, latent ability scores are typically assumed to be independent and combine additively to influence the probability of responding to an item correctly. However, testing situations arise where modeling an additive relationship between latent abilities is not appropriate or desired. In these situations, “noncompensatory” models may be better suited to handle this phenomenon. Unfortunately, relatively few estimation studies have been conducted using these types of models and effective estimation of the parameters by maximum-likelihood has not been well established. In this article, the authors demonstrate how noncompensatory models may be estimated with a Metropolis–Hastings Robbins–Monro hybrid (MH-RM) algorithm and perform a computer simulation study to determine how effective this algorithm is at recovering population parameters. Results suggest that although the parameters are not recovered accurately in general, the empirical fit was consistently better than a competing product-constructed IRT model and latent ability scores were also more accurately recovered.
Item response theory (IRT) is a family of statistical models that relates unobservable psychological traits (known as ability in educational testing literature) to observable test behaviors such as responding to a question on an aptitude test. IRT attempts to model how an individual’s observed response pattern can be explained by how the participant’s underlying latent trait level (
IRT models are best known for their application to dichotomous item response data (e.g., correct versus incorrect) drawn from ability and aptitude tests (Embretson & Reise, 2000). Historically, these models were estimated using a normal ogive response curve function, but software developers have often preferred to use a nearly identical logistic response curve with a scaling modification of 1.702 instead because of their simplicity to manipulate and program (Thissen & Steinberg, 2009). The classic three-parameter logistic model (3PL) introduced by Birnbaum (1968) for person
where
where the slopes
Compensatory and Noncompensatory Models
In general, there are two classes of MIRT models: compensatory (e.g., Bock & Aitkin, 1981), which are typically additive models that contain no latent variable interactions, and noncompensatory (e.g., Whitely, 1980) for incorporating products of individual item response probability curves. The former has received an overwhelming amount of attention in IRT literature due to its intimate connection to nonlinear factor analysis (see Lord, 1980). Compensatory models may be most appropriate when test items follow a disjunctive component processes, meaning that the abilities or factors can be combined additively to influence an item response probability, whereas noncompensatory models may be more appropriate for items that have a conjunctive component processes (Maris, 1999). A compensatory model for Equation 2 with
Here, we see that
The compensatory nature of Equation 3 raises theoretical issues for testing situations that require simultaneous contribution from multiple abilities, but where the probability of correct endorsement should be limited by the lowest ability. For compensatory models, if a subject has one very low factor score, she can still have a very high probability of correct response if the remaining factor scores are sufficiently high. Sympson (1977) argued that this hypothesized compensation is not realistic for some types of test items and gives an example of a mathematics test item that requires both arithmetic computation and reading skills. Sympson believed that a subject with poor reading skills would not be able to determine the problem that needed to be solved, and, conversely, that a subject with poor mathematical skills would not be able to solve the problem despite adequately understanding the question. To correct this inherent limitation, Sympson proposed that the probability of correct endorsement for an item be modeled as the product of the individual response curves from separate unidimensional models. The product of these models results in a bounded nonlinear response surface where the probability of correct response cannot exceed the lowest of the unidimensional models, and is in this sense noncompensatory with respect to increasing values of
More formally, the noncompensatory version of Equation 2 can be expressed as
where
To determine the observed data likelihood function for compensatory and noncompensatory models, let
Here,
where there are
The IRT parameter estimation for exploratory compensatory models has progressed over that past 60 years, moving from heuristic estimation techniques to more computationally intensive Bayesian Markov chain Monte Carlo (MCMC) methods (Baker & Kim, 2004). The early focus was on estimating the item-specific parameters for unidimensional models, and until Bock and Aitkin (1981) introduced an Expectation-Maximization estimation solution, IRT applications were largely limited to small testing situations (Baker & Kim, 2004). The Expectation-Maximization (EM) algorithm appeared to be a reasonable solution for lower dimensional models without forfeiting numerical accuracy. Unfortunately, this technique quickly becomes inefficient as the number of dimensions increases because the number of quadrature points required for estimating the “E-step” increases exponentially and must be accommodated by decreasing the number of quadrature nodes. A solution for a moderate number of dimensions was described by Schilling and Bock (2005), who demonstrated that adaptive quadrature could be used for better accuracy when a smaller number of quadratures per dimension is used. However, the problem of high-dimensional solutions still remained, and without supplementary EM techniques parameter standard error computations from the parameter information matrix were not possible.
Although interesting in their own right, noncompensatory models have received relatively little attention in theoretical and applied research mainly because they present a greater estimation challenge, especially in exploratory applications. Knol and Berger (1991) state that “the disadvantage of noncompensatory models is that no efficient algorithms for estimation of the item parameters are available,” and to date this statement largely remains true. Since noncompensatory models often include factor-specific difficulty parameters, their estimation requires sufficient variability in the relative difficulties of factors across items to identify the dimensions (Bolt & Lall, 2003). When Sympson (1977) first introduced noncompensatory models, he attempted to estimate a noncompensatory two-parameter logistic (N2PL) model by way of a heuristic regression method. At the same conference proceeding, Lord (1977) suggested that a maximum-likelihood or Bayesian estimation framework would be more appropriate for obtaining model parameters. Shortly thereafter, Whitely (1980) estimated the item coefficients for a N1PL model by maximum likelihood (ML), but this particular implementation required that the factor scores were known a priori and hence error-free.
The first attempt at estimating the item parameters without prior knowledge of the factor scores was introduced by Maris (1995). Maris derived a feasible EM solution for a N1PL model termed a “conjunctive Rasch model.” Unfortunately, the standard errors for the estimated parameters were not available for this method, the factors were required to be orthogonal, and the method itself posed difficulties in its implementation. The main problem was that the EM algorithm was not used to marginalize missing factor scores, as is typical in IRT estimation (e.g., Bock and Aitkin, 1981) and was instead used to deal with the missing latent response processes (Maris, 1995). Consequently, this approach has made the integration of the compensatory and noncompensatory frameworks difficult.
Bolt and Lall (2003) were the first to demonstrate that a feasible and flexible estimation approach for noncompensatory models was to use Bayesian MCMC methods. In their study, the C2PL model was compared with the N1PL model using simulation data and an empirical data set from an English usage test. Whereas the C2PL model was estimated accurately in many scenarios, the N1PL suffered when latent correlations were greater than .6, often requiring larger sample sizes to estimate the parameters to within a reasonable tolerance (n > 3,000). Also, the standard errors for the N1PL model were much larger than the compensatory model for nearly all conditions, and the estimation times took approximately 12 hr to complete on a 1.7 GHz processor for 31 items. Babcock (2011) is the most recent to use MCMC for estimating noncompensatory models and also was the first to attempt to estimate an N2PL model. His simulation results again revealed that higher correlations between the factors led to overall less effective parameter recovery, that at least six unidimensional items per factor were needed to help stabilize the model, and that large sample sizes (e.g., n = 4,000) were necessary for adequate parameter estimation. As Babcock demonstrated, a Bayesian implementation for an N2PL model required substantial modifications to typical prior parameter distributions, and even ad hoc stochastic acceptance schemes were required to control the factor correlation parameter (determined by many prior trail-and-error runs).
Noncompensatory IRT Models With the MH-RM Algorithm
Only recently have confirmatory IRT methods arose in the literature, mainly because programming effective confirmatory methods is difficult due to the inherent “curse of dimensionality,” but also because appropriate standard errors for estimated parameters have not been readily available. Confirmatory IRT models imply that certain restrictions are placed on the model a priori and subsequently tested for their fit to data, such as restricting certain item slopes to equal zero. Estimation methods have mainly been drawn from Bayesian MCMC techniques as they are very flexible and can readily handle user-declared restrictions (see Edwards, 2010).
Recently, however, Cai (2010a) introduced a flexible framework for parameter estimation by combining the Metropolis–Hastings (MH; Hastings, 1970; Metropolis, Rosenbluth, Teller, & Teller, 1953) and Robbins–Monro (RM; Robbins & Monro, 1951) algorithms to form a joint estimation framework that circumvents many of the less attractive features of strictly Bayesian MCMC or ML approaches. The Metropolis–Hastings Robbins–Monro (MH-RM) algorithm estimates the item- and group-level parameters by using a stochastically imputed complete-data solution with an assumed population distributional form (typically, multivariate normal) to capitalize on the more manageable complete-data likelihood:
For exploratory item factor analysis, the population mean vector
Cai (2010a) demonstrated that when using a stochastically imputed complete-data model, and properly accounting for the error in the stochastic imputations, ML estimation and observed parameter standard errors can be calculated. The iterative algorithm works well when partitioned into three stages: perform
Given
Using the imputed
where
Update the parameter estimates using the noise-corrected Hessian and gradient vector
If
The MH-RM algorithm handles the inherent noise-corrupted complete-data imputations using the RM root-finding algorithm to stabilize both the updates and the information matrix. In this way, the inaccuracies borne from the MH sampler are properly accounted for when attempting to maximize the parameter estimates, and subsequent standard errors can be computed appropriately using Louis’s (1982) complete-data method (see Cai, 2010a). Also, during the MH-RM stage, multiple sets of
For the noncompensatory model with a lower asymptote parameter, let
be the probability of a positive item endorsement and
for
Unfortunately, the numerical identification of models that contain noncompensatory items may require slightly more care compared with their compensatory counterparts. While noncompensatory models may be mathematically identified by constraining item and group parameters appropriately (say, constraining the factors to be orthogonal, two slopes to be set to 1, and one additional slope to 0) due to the stochastic nature of how the
The MH-RM algorithm is also suitable when the latent factors are nonlinear or functionally related by products. McDonald (1962) was the first to derive a method to estimate nonlinear factor effects for continuous variables, but modeling categorical variables with nonlinear terms was soon recognized to be problematic due to the “curse of dimensionality” that occurs when many integrals must be evaluated numerically. However, when drawing the
then this is accomplished by simply drawing two random

The probability surface plots on the top represent two-dimensional compensatory (left) and noncompensatory (right) models with the parameters
Noncompensatory models offer important and desirable characteristics not inherent in common multidimensional IRT, and should be utilized in situations where the additive property in compensatory models is theoretically inappropriate. However, the estimation of these models clearly has been difficult, with mixed results regarding parameter recovery and ease of implementation. The purpose of this article is to approach the estimation of noncompensatory models by using a flexible MH-RM algorithm, which has several desirable properties when estimating mixed item types as well as multidimensional models. To evaluate the performance of the MH-RM algorithm with noncompensatory models, a simulation study was designed and is described in detail below.
Simulation Study
The simulation design was organized for investigating two- and three-dimensional noncompensatory IRT models that were identified by including several packets of unidimensional models. Noncompensatory items that included non-zero
Following Babcock (2011), item response data were simulated for noncompensatory and compensatory items to determine how well parameters could be recovered under known conditions. For simplicity and potentially better stability, the compensatory item types were constrained to load only on one designated factor (i.e., unidimensional), and did not include the lower asymptote parameter,
Design and Simulated Parameters
The simulation design had a
The factor scores were drawn from a multivariate normal distribution with mean
Estimation Details
During preliminary analyses, it was noticed that the
where the constants
Results
Two statistics were computed to determine how effectively the item parameters were recovered: bias and root mean-squared deviation (RMSD). Bias and RMSD were defined as
and
where
Compensatory and Noncompensatory Parameter Recovery Statistics.
Note. N2PL = noncompensatory two-parameter logistic model; RMSD = root mean-squared deviation; N3PL = noncompensatory two-parameter logistic model.
The recovery of the compensatory item parameters on average was adequate, where the RMSD ranged from .067 to .126, and had little bias. Compensatory parameters were recovered with an average RMSD of .077 in the two-factor N2PL design, which is consistent with previous findings (e.g., Cai, 2010a, 2010b; Chalmers, 2012). Noncompensatory item parameters, however, were not recovered well for all cells of the design. The two-factor N2PL had the most acceptable parameter recovery for compensatory and noncompensatory designs, although the RMSD for the noncompensatory models were still greater than 0.3. However, the inter-factor correlation parameters were recovered well for the designs with a non-zero
The tables in Appendix B indicate that increasing the sample size reduced the amount of bias and RMSD. For the compensatory parameters, increasing the number of indicators tended to reduce the RMSD, while increasing the inter-factor correlation did not appear to have an influence in reducing RMSD or bias. For the noncompensatory parameters, increasing the inter-factor correlation and number of indicators were highly influential. Using more indicators tended to effectively decrease the RMSD, and higher inter-factor correlations increased the RMSD. In addition, the recovery of the noncompensatory intercepts and slopes appeared to be consistently biased toward zero, possibly due to the normal prior imposed on the
The N3PL designs demonstrated more difficulty converging compared with the N2PL designs, often reaching the maximum number of allocated MH-RM cycles. The two- and three-factor N2PL designs on average converged in 1,153.2 (
Noncompensatory and Product Models
The product model from Equation 13 was also estimated to evaluate how effectively this model could fit the simulated N2PL data. Estimation of the product-constructed model appeared to be more stable than the N2PL, converging in an average of 282.9 MH-RM iterations (
Given that the N2PL and product model have the same degrees of freedom, only the log-likelihood and
In addition to comparing model statistics, recovery of the
Discussion
This study examined the accuracy of recovering compensatory and noncompensatory IRT parameters using the MH-RM algorithm. While previous studies have demonstrated the complexities of estimating noncompensatory models using only ML and Bayesian methods, the MH-RM was believed to provide a suitable compromise between the two frameworks since it borrows strength from both disciplines. The algorithm provided a flexible framework for estimating mixed IRT item types for two- and three-dimensional 2PL and 3PL noncompensatory models, and demonstrated some promising results for noncompensatory models that did not freely estimate the lower asymptote parameter.
The results from the simulation study indicated that the N2PL model could provide a better empirical fit to data simulated from the N2PL model compared with a competing product model. In addition, the N2PL model was able to more accurately recover the population ability (
Unfortunately, there were several concerns for fitting noncompensatory models by ML with the MH-RM algorithm as well. To begin, researchers interested in recovering the population coefficients for noncompensatory models will experience difficulties when estimating all noncompensatory parameters, although the most promising is the two-factor N2PL model. The overall recovery statistics clearly indicate that the population parameters cannot be recovered accurately without substantial effort, and although increasing the sample size and number of unidimensional indicators does help with increasing precision, the sample sizes might have to be in the tens of thousands before adequate confidence in the estimates can be obtained. Also for the noncompensatory model, the parameter standard errors were very large under all conditions, which is consistent with past estimation results (e.g., Babcock, 2011; Bolt & Lall, 2003). Increasing the number of latent factors in the model also decreased the estimation recovery accuracy, suggesting that higher dimensional solutions require even larger data sets, and including lower bound parameters (
The size of the factor correlation also played a negative role in the parameter estimation accuracy, although this problem was not as severe as Babcock (2011) noticed when using an MH within Gibbs sampling estimation approach. In fact, the factor correlation estimates were relatively close to the simulated population values, and, unlike the MH within Gibbs sampling approach, did not require ad hoc prior distributions to be imposed to help facilitate stability. Factor correlations greater than .8, however, may create further estimation problems and should be investigated in the future. Another potential issue in noncompensatory IRT models is that different sets of item parameters could produce very similar item characteristic surfaces to compared with the population values, therefore research into checking the closeness of estimated item response surface (IRS) recovery should be investigated (Zhang, 2012).
An interesting area to investigate in future work is the possibility of combining compensatory and noncompensatory models and applying them to uniquely suited items that would not be adequately fitted by either class in isolation. One example of this type of model could be
where One of the most important theorems for understanding Euclidean space and geometry is the Pythagorean theorem,
This question is meant to measure a student’s ability to prove an important theorem using whatever method they are most comfortable with; however, it requires some additional information before the question can be attempted. A potential limiting factor may be the student’s ability to comprehend what the question is asking, for without this important ability the probability of correct endorsement drops rapidly to zero, regardless of how talented the student may be at successfully completing proofs in other contexts.
Although examples such as these can be accommodated by noncompensatory and compensatory combinations, this article showed that accurately recovering population parameters for items that contain a noncompensatory component is very difficult. However, if researchers are more interested in specifying models that optimally fit their data both empirically and theoretically, rather than recovering population parameters accurately, then these item response models may be warranted, and the flexibility of the MH-RM algorithm for compensatory and noncompensatory response models would be an effective estimation approach for estimating these models by full-information ML.
Footnotes
Appendix A
The observed-data log-likelihood equation used to estimate model parameters is of the form
where
Differentiating the log likelihood with respect to
which can be collected into the gradient vector
Further differentiating the log likelihood, and using
These values are then collected into the
For a noncompensatory model with a lower asymptote parameter, let
Further differentiating the log likelihood gives
which is collected into the
Appendix B
N2PL for
| Compensatory |
Noncompensatory |
|||||
|---|---|---|---|---|---|---|
| Parameters | Correlation | No. of indicators | Bias | RMSD | Bias | RMSD |
| 0.0 | 5 | 0.004 | 0.066 | 0.009 | 0.206 | |
| 10 | 0.003 | 0.061 | 0.028 | 0.204 | ||
| 15 | 0.005 | 0.061 | 0.016 | 0.162 | ||
| 0.2 | 5 | 0.007 | 0.068 | 0.030 | 0.226 | |
| 10 | −0.004 | 0.063 | 0.026 | 0.195 | ||
| 15 | 0.002 | 0.062 | 0.016 | 0.178 | ||
| 0.4 | 5 | 0.000 | 0.069 | 0.039 | 0.258 | |
| 10 | 0.002 | 0.061 | 0.024 | 0.208 | ||
| 15 | 0.001 | 0.060 | 0.010 | 0.187 | ||
| 0.6 | 5 | 0.002 | 0.063 | 0.025 | 0.327 | |
| 10 | 0.004 | 0.060 | 0.025 | 0.246 | ||
| 15 | 0.000 | 0.059 | 0.015 | 0.206 | ||
| 0.8 | 5 | 0.002 | 0.061 | 0.032 | 0.389 | |
| 10 | 0.008 | 0.058 | 0.042 | 0.341 | ||
| 15 | 0.004 | 0.058 | 0.040 | 0.294 | ||
| 0.0 | 5 | −0.007 | 0.051 | 0.003 | 0.298 | |
| 10 | 0.002 | 0.047 | 0.024 | 0.306 | ||
| 15 | 0.001 | 0.048 | 0.013 | 0.241 | ||
| 0.2 | 5 | −0.003 | 0.047 | 0.022 | 0.320 | |
| 10 | 0.002 | 0.048 | 0.026 | 0.288 | ||
| 15 | 0.002 | 0.048 | 0.021 | 0.269 | ||
| 0.4 | 5 | 0.001 | 0.046 | 0.026 | 0.367 | |
| 10 | 0.002 | 0.046 | 0.019 | 0.299 | ||
| 15 | −0.002 | 0.049 | 0.007 | 0.282 | ||
| 0.6 | 5 | −0.007 | 0.046 | 0.009 | 0.460 | |
| 10 | −0.004 | 0.043 | 0.012 | 0.346 | ||
| 15 | 0.001 | 0.047 | 0.012 | 0.283 | ||
| 0.8 | 5 | 0.004 | 0.046 | 0.019 | 0.509 | |
| 10 | −0.001 | 0.046 | 0.021 | 0.437 | ||
| 15 | −0.005 | 0.049 | 0.020 | 0.407 | ||
Note. N2PL = noncompensatory two-parameter logistic model; RMSD = root mean-squared deviation.
Acknowledgements
The authors would like to thank four anonymous reviewers for providing insightful feedback, which greatly improved the quality of this manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
