Abstract
This article describes a multidimensional generalization of the nominal categories model that serves to estimate factorial models from nominal and ordinal observed responses, and includes a structural model for latent variables that distinguishes between endogenous and exogenous factors. The model includes a scale parameter for each response category in each factor. Item parameters relate the logit between categories to the vector of latent variables. The inferential framework is marginal maximum likelihood, implemented via static and adaptive Gauss–Hermite quadrature and Monte Carlo EM. The properties of estimators are investigated in a simulation study. An example with real data illustrates the utility of the model in analyzing local dependencies in testlets composed of multiple-choice items that are clustered in several groups around a common theme.
Keywords
Multidimensional item response models have provided a means to analyze the dimensionality of a set of items since the pioneering work by Bock, Gibbons, and Muraki (1988). The item factor model typically applies to dichotomous and ordinal data (Reckase, 2009), whereas the factorial analysis of nominal responses is more recent (Bock & Gibbons, 2010). For example, Thissen, Cai, and Bock (2010) proposed a flexible multidimensional model that can accommodate a variety of models between ordinal and nominal scales. This article continues this line of research and presents a generalization of the nominal categories model (Bock, 1972) to the multidimensional case. Moreover, a structural model can be imposed on the latent variables to distinguish between exogenous and endogenous factors (Jöreskog & Sörbom, 1979).
The nominal categories model may be regarded as the most popular model for nominal responses because of its unique combination of flexibility and mathematical simplicity. For these reasons, it is taken as the basis to develop a multidimensional nominal categories model (MNCM). The MNCM, in its full generality, imposes no constraints on item parameters. However, the article describes a method of linear constrains that provides particular cases suitable for the analysis of ordinal data or for testing a hypothesis about parameter structure (Revuelta, 2009, 2010).
The MNCM has a factor loading for each category in each factor, instead of having one parameter for each variable in each factor, as in the factor analysis of quantitative and ordinal data. There are a number of new applications in which the factor loadings may differ between categories. For example, consider one multiple-choice item measuring mathematics in which one of the categories has an involved wording. This distractor may have a higher loading on a verbal factor than other categories written in more simple terms. More generally, two-dimensional item response models have been proposed in the psychometric literature to distinguish between a factor of substantive interest and a factor that explains individual differences in solution strategy or response style. For example, Bolt and Johnson (2009) and Johnson and Bolt (2010) analyzed ordinal responses from a self-report of smoking dependence. Their second factor explains response tendencies that are unrelated to the construct measured by the items. A similar idea was translated to maximum performance testing by Bolt and Wollack (2012), who analyzed data from multiple-choice items in which there was a primary factor related to the correct option and a second factor that influences selection among distractors.
The present article includes an example with real data that illustrates the application of the MNCM to investigating local dependencies originated by testlets. In this example, several items are clustered around a common theme, which leads to violation of the local independence assumption in a one-dimensional model. The article shows how a polytomous multidimensional model can be used to analyze associations between categories from one item to another.
The article also includes a number of technical developments, such as a marginal likelihood/EM estimation algorithm (Schilling & Bock, 2005), the computation of the information matrix, a structural latent model, and a Taylor series expansion to approximate the correlation between response categories. Finally, the recovery of theoretical parameters is investigated in a simulation study. Most technical material has been moved to online appendices, and the main body of the article focuses on the description of the model, results of its application, and the conclusions.
MNCM
Suppose that each individual is described by a vector of
where
where
The parameters of Equation 3 are unidentifiable because the logit depends on differences between parameters and does not vary if a constant value is added to all the parameters. The identifiability problem is more easily solved when Equation 3 is written in matrix form (Revuelta, 2008). Let
where
where
Two particular cases of Equation 5 are considered in this article. First, the same constraints may apply to the scale parameters of all dimensions. In that case, Equation 5 reduces to
where
In the context of linearly constrained item response models, identifiability is traditionally assessed by verifying that the augmented matrix
Examples of Linear Constraints
Nominal Models
Models for nominal responses, in their most general form, impose no constraints apart from those that ensure identifiability. This section shows one example of the many forms that matrix
Equation 7 for an item with three categories and two factors reads as follows:
In this example,
has full column rank. The linear restriction reduces the number of effective parameters from the nine elements contained in
The matrices
has full column rank.
Ordinal Models
Ordinal model use different constraints for intercept and scale parameters, as in Equation 6. The category scale parameters are proportional to an item scale parameter,
The matrix
The multidimensional model model by Thissen, Cai and Bock (TCB; 2010) is based on a nonlinear constraint on scale parameters and uses a Fourier basis to specify the matrix of coefficients. The TCB is more flexible than the model in Equation 13 and provides useful and interesting particular cases for the analysis of ordinal data by relaxing the assumption that the response categories are equally spaced. However, the TCB does not reach the generality of the full rank model in Equation 9 because of the effect of the nonlinear constraint.
Inferential Algorithms
The inferential algorithm used in this article assumes that
where
The sample consists of a vector of frequencies (n1,…, nt,…, nT), where the element nt is the number of individuals whose response pattern is
One important difficulty in estimation is the evaluation of the D-dimensional integral in Equation 15 and in its derivatives. Three numerical integration methods have been compared in this investigation: static Gauss–Hermite (GH) quadrature, adaptive GH, and Monte Carlo integration via importance sampling (Monte Carlo EM (MCEM)). These methods are described in Schilling and Bock (2005). The details of the estimation algorithms appear in Online Appendix B and are based on methods for matrix differential calculus described in Magnus and Neudecker (1999) and in Harville (2008).
Simulation Study
A simulation study was conducted to investigate the performance of the estimation algorithm in recovering true parameter values. The performance of the three integration methods for samples of different size was compared in the simulations.
The description of the simulated conditions and the results has been moved to Online Appendix C to preserve journal space, and this section contains only the main conclusions. Taken together, simulations show that adaptive GH and MCEM perform better than static GH when the number of items per factor is large and the structural model is simple. Static GH selects quadrature points to optimize precision in the evaluation of the integral of a function over the population distribution of the factor:
Empirical Example
The use of the MNCM to account for measurement problems is demonstrated with a real data set of an examination about inferential statistics. A sample of 353 second-course undergraduate students of psychology responded to 20 multiple-choice items as part of the final examination of the subject of data analysis, which ranged from one-sample t test to ANOVA, multiple comparisons, and linear regressions. The participants ranged from 19 to 30 years old and had 60 min to complete the test. Because data were collected for a real evaluation, the identity of the participants and any other information apart from the item responses were removed before the data matrix was submitted to investigation. The test was structured in six testlets. Each testlet consisted of two or more items related to a common theme, and each item had three response categories. Identification constraints were as in Equation 9. Table 1 illustrates two of the testlets.
Two Testlets From the Exam of Data Analysis.
Note. Correct categories are B, C, B and A for Items 9, 10, 16, and 17, respectively.
Six models were applied to these data. All the models are variants of a bifactor model, depicted in Figure 1, which has a general factor and six group factors that explain the associations within each testlet. The six models are as follows:
M1: Bifactor oblique. The correlations between the seven factors are free parameters. That is, the latent propensity is
M2: Bifactor partly orthogonal. The correlations between the general factor and each of the group factors are free parameters. The correlations between group factors are set to zero. That is,
M3: Bifactor orthogonal. The seven factors are uncorrelated;
M4: Testlet model. This is a polytomous generalization of the dichotomous testlet model by Bradlow, Wainer, and Wang (1999). In that model, the scale parameters of the general and the group factor have the same value to obtain a correlation between latent propensities with a convenient structure; that is, the latent propensity is
M5: Testlet model, equal variances. That is, zir is as in M4, and
M6: The one-dimensional nominal categories model;

General structure of the factorial models fitted in the empirical example.
These models are obtained as particular cases of the MNCM by using the method for linear constraints described in Online Appendix B. The model was estimated with two quadrature points for each dimension, as recommended by Schilling and Bock (2005) for a model with seven dimensions. However, note that this recommendation is based on the amount of computational resources that would be demanded by a larger number of points and not because two points is an optimal figure in a statistical sense. A number of statistics were used to test the model fit: the likelihood ratio for the hypothesis of exact fit, the likelihood ratio against the most general model, M1, and the statistics Akaike information criterion (AIC), Bayesian information criterion (BIC), and corrected Akaike information criterion (AICc), which consider the model fit and parsimony (Burnham & Anderson, 2004). Table 2 shows the results. M3 was selected for interpretation because of the values G2, AIC, and AICc. Notwithstanding, BIC differs from the other statistics and points to M5 as the preferred model. Table 2 contains also the mean estimation time in minutes from five different runs of the estimation algorithm using different random starting values. Estimation time depends largely on the number of estimated parameters in the structural model in Equation 14.
Goodness of Fit Statistics.
Note. M1 to M6 are described in the text. Log Lik. is the log-likelihood. G2 is the likelihood ratio test against a general multinomial alternative. Params is the number of estimated parameters. The columns ΔG2, Δgl, and p contain the result of the likelihood ratio test of each model against M1. AIC, AICc, and BIC are the information-theoretic statistics. T is the average estimation time in minutes. AIC = Akaike information criterion; BIC = Bayesian information criterion; AICc = corrected Akaike information criterion.
Table 3 presents the parameter estimates for M3. The item responses have been coded such that the correct response is the third one, and Categories 1 and 2 are the distractors. The scale parameters in the six group factors have been placed in the same column, labeled Group factor in the table, although they refer to a different factor for each testlet. All but two of the scale parameters in the general factor were negative, indicating negative correlations between the distractors and the general factor. Although estimates were unbounded in the “Simulation Study,” estimates were bounded between −10 and 10 in the real data analysis because the sample contains insufficient information to estimate some parameters. Table 3 shows that 11 intercept parameters reached the boundary during estimation. This is due to insufficient sample size. For example, the two distractors of Item 6, whose parameters τ are in the boundary, receive six and eight responses. If the estimation is repeated without the bound of the parameters, these estimates converge at very extreme values, −18.4 and −19.8 for Item 6 with first-order derivatives of 0.01 and 0.20 at convergence. Table 3 contains the results with the boundary because their actual estimated value is not important, the real problem is the lack of statistical information, and the boundary helps to identify easily the weakly identifiable parameters.
Parameter Estimates for the MNCM.
Note. The table contains the parameters of the two incorrect categories, the parameters of the correct option are set to zero. There are six group factors, one for each testlet. MNCM = multidimensional nominal categories model.
Testlet 3 presents a number of interesting features. First, Distractor B of Item 10 (in the following, Distractor 10-B) has a positive loading on the general factor (λ10,2,1 = 0.37). That is, the propensity to commit the misconception reflected in Category B increases with θ1, which indicates that this topic was not well clarified during the teaching of the subject. Second, the Categories 9-A and 10-B have remarkable loadings on the group factor, with reversed signs. The group factor θ4 seems to be bipolar in the sense that individuals high in θ4 feel attracted by the error reflected in Category 10-B, and individuals low in θ4 tend toward the irreconcilable error contained in Category 9-A. The association between Categories 9-A and 10-B is quite predictable in relation to item content and is explored more deeply in the following paragraphs.
In the M3, the principle of independence between alternatives conditional on θ1 breaks down. To illustrate, the top row of Figure 2 shows the marginal response functions for Items 10 and 17, with the group factor marginalized out. The response functions vary depending on the response given to the other item in the same testlet (Items 9 and 16, respectively), as shown in the second, third, and fourth rows of Figure 2. The first item of the two testlets entails making a statistical decision about a null hypothesis, and the second item is the conclusion that follows from that decision. The interplay between the decisions and conclusions explains the associations. For example, the conclusion that two population means do not differ (Category 10-B) has a smaller probability for those individuals that reject H0 (Category 9-A) than for those who do not reject H0 (Category 9-B).

Marginal response functions for Items 10 and 17 and response functions conditional on the response given to the preceding item in the test.
Clustered items present an extra covariance beyond that explained by a one-dimensional latent trait. For the one-dimensional model, the covariance between items conditional on θ is zero. Regarding M3, if the group factor is integrated out, the result is a marginal distribution of responses that depends only on θ1 and has a non-null correlation between items. Ip (2010, Corollary 1) showed that the correlation between two dichotomous items, i and j, depends on the product of the scale parameters of the group factor. Online Appendix D extends that result to the polytomous case and shows how to compute item correlations in the marginal distribution that depends only on θ1. The correlation depends largely on the product of the scale parameters but is not proportional to it. The key element that explains correlations in the polytomous case is the difference between a parameter and the mean of the scale parameters of the item. In particular, the correlation between categories r and s of items i and j is proportional to
where
Figure 3 presents the correlation between pairs of categories conditional on θ 1 for the two testlets in Table 1. As expected, Categories 9-B (the correct one) and 10-B (an incorrect one) have a positive relation, which singles out a frequent misconception about the conclusion of a significance test when H0 is maintained. Moreover, the correlation between the two correct categories, 9-B and 10-C, is negative because the distractor 10-B is more attractive for those who pass Item 9 than for those who fail it, which is also predictable from item content. Regarding the second testlet, Categories 16-A and 16-C have similar content and a comparable association pattern with 17-C, which is their logical consequence. Categories 17-A and 17-B have a subtle difference and obtain similar patterns of association with Item 16.

Correlation between categories as a function of θ1 with the group factor integrated out.
Final Remarks
The purpose of the MNCM is to measure factors by using nominal response data. The key aspect of the model is that that there is one slope parameter for each response category on each factor, instead of one slope for each item on each factor, as in factorial models for quantitative and ordinal data. In its more general form, it has minimal parameter constraints to ensure identifiability, although the article shows how to impose more stringent restrictions by using linear constraints.
From a statistical point of view, the combination of multidimensionality and a heavy parameterization presents important challenges for estimation. The empirical example has shown that insufficient sample size may lead to weakly identifiable parameters. Exploratory analyses, in which the number of factors is increased until the model fits and item parameters are unconstrained, demand samples with thousands of examinees and time-consuming computational resources. Problems of weakly identified parameters may arise in exploratory analyses with smaller samples. Weakly identifiable parameters are identifiable in a mathematical sense, but the data provide scarce information about them, originating problems of convergence and large standard errors. Confirmatory analysis seems more tenable because fixed parameters and a smaller number of dimensions per item reduce the dimensionality of the optimization problem involved in estimation and increase statistical information about the remaining parameters.
The model has been developed in a traditional item response theory framework, with strong distributional assumptions for latent factors and a marginal likelihood estimation algorithm. Three integration methods have been compared: adaptive GH, static GH, and MCEM. The simulation results revealed that adaptive GH provides the greatest precision, especially with a large number of items per factor. However, the estimation of a structured variance–covariance matrix for the factors could be more precise with static GH because of the increased stability of this method. The numerical integration problem limits the dimensionality of the latent space to about eight factors, and clearly, an area of further research is the development of more effective estimation methods for higher dimensionality with a reasonable demand for computational resources.
Ideally, an improved algorithm should allow faster estimation and increased dimensionality. The development of better algorithms, within the maximum likelihood framework, may be based on the Metropolis-Hastings Robins-Monro integration algorithm by Cai (2010). Estimation of the bifactor model can also be accelerated by applying the dimension reduction method (Cai, Yang, & Hansen, 2011), which reduces the D-dimensional integration problem to two-dimensional integration. Fully Bayesian simulation techniques are another promising inferential framework (Edwards, 2010). The inclusion of diffuse priors may speed up the estimation algorithm by avoiding cases in which weakly identified parameters use a long sequence of iterations to converge. Nevertheless, Bayesian simulation methods would also benefit from the use of the output of the EM maximization to construct a multivariate normal approximation of the posterior distribution of the parameters, which can be used to obtain samples of parameters. The assumption of multivariate normality of the latent variables has been made to avoid further complications although its relaxation is another topic for future research.
Footnotes
Acknowledgements
The author thanks the editor and three anonymous reviewers for their comments that contributed to the improvement of the manuscript.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article has been supported by Grant PSI2012-31958 from the Spanish Ministry of Economy and Competitiveness (DGICT).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
