Abstract
The expectation-maximization (EM) algorithm is a commonly used technique for the parameter estimation of the diagnostic classification models (DCMs) with a prespecified Q-matrix; however, it requires O(2 K ) calculations in its expectation-step, which significantly slows down the computation when the number of attributes, K, is large. This study proposes an efficient Metropolis-Hastings Robbins-Monro (eMHRM) algorithm, needing only O(K + 1) calculations in the Monte Carlo expectation step. Furthermore, the item parameters and structural parameters are approximated via the Robbins-Monro algorithm, which does not require time-consuming nonlinear optimization procedures. A series of simulation studies were conducted to compare the eMHRM with the EM and a Metropolis-Hastings (MH) algorithm regarding the parameter recovery and execution time. The outcomes presented in this article reveal that the eMHRM is much more computationally efficient than the EM and MH, and it tends to produce better estimates than the EM when K is large, suggesting that the eMHRM is a promising parameter estimation method for high-dimensional DCMs.
In recent decades, many various diagnostic classification models (DCMs)—alias cognitive diagnostic models (CDMs)—for item responses have been developed, including, for example, the log-linear cognitive diagnosis model (LCDM; Henson et al., 2009) and its submodels. The successful applications of the DCMs for educational measurement involve extracting information about students’ latent attributes (or called latent knowledge status) to evaluate their skill profiles (e.g., mastery/non-mastery). Beyond the educational environment, the DCMs have been applied in many large-scale tests, such as the Reading and Listening data of the TOEFL® Internet-based test (IBT), pathological gambling, and the Third International Math and Science Study (TIMSS) (Sessoms & Henson, 2018).
The marginal maximum likelihood with the expectation-maximization (EM) algorithm (Dempster et al., 1977) is a standard parameter estimation method for DCMs, which is accessible from open-source packages such as GDINA (Ma & de la Torre, 2020). The EM algorithm guarantees monotonic climbing to a local minimum of the likelihood function. However, a potential problem with the EM is that it converges very slowly when there is a large amount of “missing data” (i.e., latent attributes). Another approach is to estimate the parameters using the Markov chain Monte Carlo (MCMC) algorithm, such as the Metropolis-Hastings (MH) sampling or the Gibbs sampling. However, the MCMC algorithms could be computationally intensive, especially for many of the attributes (Wang et al., 2021).
Large-scale tests with a large number of latent attributes have been noticed. Tatsuoka et al. (2004) analyzed the 1999 TIMSS mathematics test items from eighth-grade students in 20 countries through the rule-space method (Tatsuoka, 1983). A team of domain experts was invited to identify the probable attributes of 163 items. They developed a Q-matrix of test items that include six content attributes (e.g., geometry and algebra), 10 process attributes (e.g., rule and logic), 11 skill attributes (e.g., unit conversion and proportional reasoning), for a total of 27 attributes. Statistically, the Q-matrix contains element q ik for item i (i = 1, …, I) and attribute k (k = 1, …K), which can be either 0 or 1, and it is used to indicate whether attribute k is required to solve item i. If attribute k is required, q ik is 1; otherwise, it is 0. Because the attributes are binary (either mastery or non-mastery), the total number of attribute profiles is 227. In another example, Lee et al. (2011) analyzed the 2007 TIMSS fourth-grade mathematics test of 25 items and developed a Q-matrix with eight “number” attributes, four “geometric shapes and measures” attributes, and three “data display” attributes, resulting in 215 attribute profiles. With such a large number of attributes, it is expected that the EM would be slow.
The present study proposes an efficient Metropolis-Hastings Robbins-Monro (eMHRM) algorithm for saturated DCMs (SDCMs), with a prespecified Q-matrix, to address the problem of many latent attributes. Cai (2010) has shown that the MHRM is more computationally efficient than the EM for multidimensional item response theory models. The numerical integration (for continuous latent traits) or summation (for discrete latent attributes) in the expectation-step of the EM method is not required. Also, it does not need high-intensity sampling for all model parameters used in MCMC methods. The MHRM algorithm uses the MCMC method, partly to sample latent traits, and the maximum likelihood method, to estimate item parameters efficiently. Meanwhile, the MHRM algorithm uses the Robbins-Monro (RM) algorithm (Robbins & Monro, 1951) to gradually average out the Monte Carlo errors caused by Monte Carlo sampling. In this article, the eMHRM advances the conventional MHRM to handle the high-dimensional SDCMs with many attributes.
This article is outlined as follows. The SDCMs are briefly introduced, followed by a review of the EM algorithm. The eMHRM is then developed in detail. Simulation studies are conducted to examine the performance between algorithms in terms of parameter recovery and execution time. Finally, the concluding remarks are given for the eMHRM and its future extensions.
Saturated Diagnostic Classification Models
In recent decades, SDCMs have been actively developed, such as the LCDM. In the present study, the following SDCM is adopted (Yamaguchi & Templin, 2021) as the basis for developing the eMHRM:
This SDCM is formulated where y
ni
is the nth respondent’s observed item response to item i (y
ni
= 1 for correct response and y
ni
= 0 for incorrect response), η
ih
denotes the hth probability parameter for the correct response for item i, α
n
is the nth respondent’s latent attribute vector indicating his/her mastering status (0: not mastered vs. 1: mastered),
The SDCM needs monotonicity constraints on
The SDCM framework also includes other simpler models as a special case. For instance, the Deterministic Input Noisy “And” gate model (DINA; Macready & Dayton, 1977) can be obtained by re-designing the G-matrix. As shown in Table A2 of Appendix A, there are only two probability parameters for each item: η1 and η2. The η1 corresponds to the guessing parameter, and the 1–η2 corresponds to the slipping parameter in the DINA model. For illustration purposes, the SDCM and the DINA model will be used as an example in the subsequent simulation studies.
Expectation-Maximization Algorithm
The marginal log-likelihood function of the item and structural parameters is proportional to the following:
Here, N is the number of examinees, I is the number of items,
Here, ζ ϵ (η, π),
The E-step and M-step are carried out alternatively until the sequence
Efficient Metropolis-Hastings Robbins-Monro Algorithm
Yamaguchi and Templin (2021) developed a Gibbs sampler for simulating the attribute pattern for each person. However, that sampler needs to evaluate the multinomial distribution over all attribute patterns (2
K
) in the parameter space, which immediately becomes computationally intensive when K is large. In this study, a modified Metropolis-Hastings sampler (Madigan et al., 1995)—for generating the attribute patterns—is employed for reducing the computational burden, which has been used for Bayesian variable selection and DINA models (Liu et al., 2020). The item and structural parameters are then simulated, conditioned on the sampled
The following steps are iterated until specified convergence criteria are satisfied:
The posterior distribution of the current The modified MH sampler suggests the following proposal distribution for a binary attribute: In the above equation, the two proposal distributions are canceled out due to their symmetry, and α(new) is accepted (i.e., α(t+1) = α(new)) when a ≥ U ∼ Uniform(0, 1), otherwise it is rejected (i.e., α(t+1) = α(t)). Notice that the MH only requires O(K+1) operations in coding for k = 1, …, K, because the kth numerator of the acceptance probability (after updated by the acceptance rule) can be re-used for the (k+1)th denominator, except that for the very first iteration the denominator and numerator need to be computed, respectively.
The posterior distribution for the η
ih
is given by The estimate for the η
ih
is error-corrupted due to the noisy input information of the Here, g
t
is the gain sequence defined as In this case, g
t
has the form
The posterior distribution for the Three steps of the eMHRM algorithm are iterated until user-specified termination criteria are satisfied (Yang & Cai, 2014). The first stage sets b = 1 and w = 0 (i.e., g
t
= 1), which is considered a “burn-in” period because they are noisy because of the rough starting values. Using w = 0 can make a large step, which usually brings the eMHRM in the vicinity of a solution with a few iterations. The second stage continues with w = 0, where the average of the iterates are used as starting values for the next stage. For the final stage, the iterative averaging procedure with Regarding standard error and latent attribute estimation, as the eMHRM is a stochastic variant of the EM algorithm, the observed information matrix of the marginal likelihood function can be obtained by Louis (1982) identity (Cai, 2010). For the
Metropolis-Hastings Algorithm
In addition to the EM and the eMHRM, a new MH algorithm is proposed here as a second baseline for comparison purposes. Yamaguchi and Templin (2021) proposed a Gibbs sampler for the parameter estimation of the SDCMs. However, their Gibbs sampler needs to evaluate the posterior distribution of
Simulation Study
This simulation study focused on comparing the parameter recovery and execution time for the EM, MH, and eMHRM on the SDCM and DINA model. Once the item and structural parameter estimates are obtained, the latent attributes can be estimated by the EAP estimator. The number of attributes was set to 3, 7, and 15, respectively. The sample sizes were 500, 2,000, and 10,000. The true values of
Regarding the settings of the eMHRM, the maximum numbers of iterations (200, 200, 900) were for the burn-in stage, averaging stage, and RM stage, respectively. Due to the use of stochastic imputation for latent variables, it was found that the trajectory usually jumps very quickly to the vicinity of the solution (Cai, 2010); therefore, 200 for the burn-in period would be sufficient. The successive 200 iterates were used as the starting values for the final stage. In the final stage, the stopping criterion is set equal to the maximum absolute differences of the averaged item estimates across successive three iterations less than .0001. The M was set equal to 2 for K = 3 and 7, which could stabilize the parameter estimation (Liu et al., 2020), whereas M was set equal to one for K = 15, because we found that the
For the MH algorithm, the burn-in period was 2,000, and the sample collection period was 3,000, where the length of the burn-in period has been verified sufficient for the SDCM and the DINA model (Wang et al., 2021). The settings of the EM were set by default in the GDINA package. The monotonicity constraints were imposed on the three algorithms.
After algorithm convergence, the person’s attributes were estimated based on the model’s parameter estimates. The EAP estimates were used, where the number of sampling attributes for the eMHRM was 200, which is sufficient, as shown in the subsequent Results section.
The simulation replication number was R = 100 for all conditions, except that for the EM with K = 15 the R was 30 because the EM is too slow and consumed a significant amount of computer memory. Regarding the starting values, the
The parameter estimators were assessed by the bias and root-mean-square error (RMSE). Specifically, the bias and the root mean square error (RMSE) of an estimator
Results
For the DINA model, the parameter recoveries of item parameters,
Attribute Recovery for DINA Model and SDCM for Zero Correlation.
Note. AAR is for attribute-wise agreement rate and PAR is for pattern-wise agreement rate.
Comparison of Average CPU Time (in seconds).
Note. “Cor” is for correlation between attributes, “N” is for sample size, and “K” is for the number of attributes.
Comparison of average iteration counts and average time of an iteration (in seconds).
Note. “Cor” is for correlation between attributes, “N” is for sample size, and “K” is for the number of attributes. For the eMHRM, the maximum numbers of iterations (200, 200, 900) are for the burn-in stage, averaging stage, and Robbins-Monro stage, respectively.
Concluding Remarks
The current study proposes an eMHRM algorithm for high-dimensional DCMs with many attributes. The eMHRM replaces the time-consuming E-step of the EM by a stochastic approximation via the modified MH sampler, which simply needs O(K + 1) calculations rather than O(2
K
) in the EM. For the structural parameters,
In addition to the eMHRM, a new MH algorithm for high-dimensional DCMs is proposed for comparison purposes, a variant of the Gibbs algorithm (Yamaguchi & Templin, 2021). The significant advantage of the MH algorithm is that it only requires O(K + 1) computation in sampling attributes. In contrast, the Gibbs algorithm needs to evaluate 2 K multinomial distribution, which can be slow in practice. The MH algorithm performed well in all simulation conditions regarding the parameter recovery; although not as fast as the eMHRM, its speed might be acceptable in practice for K = 15 (i.e., around 8 minutes).
Regarding computational efficiency, the results presented in Table 2 show that the eMHRM was faster than the MH for a large K (e.g., K = 15), not to mention the EM. Considering the results of attribute recovery (Table 1) and time elapse (Tables 2 and 3), the EM can be adopted for the DINA model with K = 3 as it is faster and yields identical attribute recovery. For K = 7 and K = 15, the EM yields worse attribute recovery than the eMHRM and the MH, though faster in some cases for K = 7; thus it is suggested to adopt the eMHRM for good attribute recovery and fast execution for K = 7 and K = 15.
Two algorithms not included in the present study, the sequential Gibbs sampler (Wang et al., 2021) and the variational EM (Yamaguchi & Okada, 2020), were recently proposed to tackle the parameter estimation of high-dimensional SDCMs with a prespecified Q-matrix. For the SDCM with K = 15, N = 2,000, and I = 40, the eMHRM took 8.79 seconds (under maximum CPU speed of 4.0 GHz), much more efficient than the sequential Gibbs sampler, which requires 106.66 seconds (under maximum CPU speed of 4.7 GHz) (see Table 9 in Wang et al., 2021). Comparing the eMHRM with the variational EM, the variational EM was reported as taking 1000 seconds in N = 5000 and K = 10 conditions, whereas the eMHRM took 31.96 seconds in N = 10,000 and K = 15 conditions. The eMHRM could be faster than the variational EM because the latter must evaluate 2 K classes of the posteriori of the attributes (Yamaguchi & Templin, 2021), which is time-consuming for K = 15.
The parameter recovery performance of the EM was not stable for K = 7, and in most cases, it became worse for K = 15. A possible explanation for the poor performance might be that the EM was trapped in the local maximum of the log-likelihood function, which has been reported by Jiang and Ma (2018). One solution might be using global optimization methods, such as the differential evolution algorithm, to alleviate the local-maximum problem (Jiang & Ma, 2018). However, the execution time would take much longer than the report in Table 2, which is not desirable in practice. This issue is beyond the scope of the current study, so it is left for further research.
The current eMHRM could be extended to more complex issues. For example, estimating the item and structural parameters together with the Q-matrix has been an important issue when some elements of the Q-matrix are not correctly specified by domain experts. As the eMHRM is very efficient in computation, extending the eMHRM for the Q-matrix estimation of the high-dimensional SDCMs is under study for future research. In addition, using the number of Monte Carlo size, M = 2 for K = 3 and 7, and M = 1 for K = 15 seem sufficient for our simulation studies considered; however, more investigation for different combinations of K and M is required to examine the sensitivity of parameter estimates.
Supplemental Material
Supplemental Material - Efficient Metropolis-Hastings Robbins-Monro Algorithm for High-Dimensional Diagnostic Classification Models
Supplemental Material for Efficient Metropolis-Hastings Robbins-Monro Algorithm for High-Dimensional Diagnostic Classification Models by Chen-Wei Liu in Applied Psychological Measurement
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The first author acknowledges the grant support from the Taiwan Ministry of Science and Technology, Grant Number MOST 110-2410-H-003-054-MY2.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
