A General Mixture Model for Cognitive Diagnosis

Abstract

Although the generalized deterministic inputs, noisy “and” gate model (G-DINA; de la Torre, 2011) is a general cognitive diagnosis model (CDM), it does not account for the heterogeneity that is rooted from the existing latent groups in the population of examinees. To address this, this study proposes the mixture G-DINA model, a CDM that incorporates the G-DINA model within the finite mixture modeling framework. An expectation–maximization algorithm is developed to estimate the mixture G-DINA model. To determine the viability of the proposed model, an extensive simulation study is conducted to examine the parameter recovery performance, model fit, and correct classification rates. Responses to a reading comprehension assessment were analyzed to further demonstrate the capability of the proposed model.

Keywords

cognitive diagnosis G-DINA mixture model

1. Introduction

Cognitive diagnosis models (CDMs) are statistical models that elicit a set of finer-grained examinee’s attributes or skills for diagnostic purposes (e.g., de la Torre & Minchen, 2014). While CDMs originated from education with the purpose of analyzing formative assessments, CDMs have also been found to be useful in analyzing clinical data (e.g., de la Torre et al., 2015) and responses from competency-based situational judgment tests (e.g., Garcia et al., 2014; Sorrel et al., 2016).

In the CDM literature, the attributes of an examinee i are represented by a vector $α_{i} = (α_{i 1}, α_{i 2},..., α_{i K})$ , where K is the number of attributes measured by an exam. For the case wherein the attributes are dichotomous, $α_{i k} = 1$ if examinee i has mastered attribute k and $α_{i k} = 0$ otherwise. The required attributes in answering the items in the exam are represented by the $J \times K$ matrix called Q-matrix (Tatsuoka, 1983), where J is the test length, $q_{j k} = 1$ if attribute k is required to answer item j, and $q_{j k} = 0$ otherwise. The response data of the examinees are represented by the $N \times J$ matrix X, where N is the number of examinees. On the assumption that the responses are dichotomous, $X_{i j} = 1$ if an examinee i managed to answer item j correctly, and $X_{i j} = 0$ otherwise.

Over the past years, several CDMs have already been developed. Some noteworthy models include the deterministic inputs, noisy, “and” gate model (DINA; Haertel, 1989; Junker & Sitjsma, 2001); deterministic inputs, noisy, “or” gate model (DINO; Templin & Henson, 2006); additive CDM (A-CDM; de la Torre, 2011); linear logistic model (LLM; Maris, 1999); and reduced, reparametrized unified model (RRUM; DiBello et al., 2007; Hartz, 2002). These models have been shown to be special cases of the generalized deterministic inputs, noisy “and” gate model (G-DINA; de la Torre, 2011).

Although the G-DINA model can be thought of as a general CDM, it does not account for the heterogeneity that is rooted from possible existing latent groups in the population. The heterogeneity in the data can be caused by differential item functioning (DIF) or items wherein the probability of a correct response is unequal for examinees with the same attribute profile but is from different latent groups (Huo et al., 2014). The heterogeneity can also be brought upon by different strategies in answering multiple-strategy items caused by different Q-matrices (Mislevy & Verhest, 1990; Wang & Xu, 2015; Zhang et al., 2021). Another possible source of heterogeneity is different latent groups of examinees following different distributions in terms of item parameters or prior probabilities (von Davier, 2007). Based on von Davier (2010), the heterogeneity in mixture population can also be reflected in the different skill distributions across different latent groups. This study introduces an extension of the G-DINA model that incorporates it within finite mixture modeling framework. Finite mixture models are a flexible way of modeling data sets that come from different latent groups of the population (McLachlan & Basford, 1988). The proposed model called mixture G-DINA model aims to provide a better model fit to the response data from examinees coming from multiple latent groups inherent in the population.

In the CDM literature, von Davier (2007) introduced a mixture CDM named mixture generalized diagnostic models, which incorporates the mixture modeling methodology with that of the generalized diagnostic models (GDMs; von Davier, 2005). von Davier (2010) also proposed the hierarchical GDM, which extends the mixture GDM to manage multilevel data. It is important to note that GDM and the G-DINA model have some notable differences. The most evident are the different parameterization of the two models and the interpretation of the parameter estimates. This study introduces another mixture model that is built upon the modeling process of the G-DINA model. Zhang et al. (2021) also introduced a mixture multiple strategy DINA to investigate individual differences in the selection of response categories in multiple-strategy items. Although it is possible to extend the mixture G-DINA model in application on multiple-strategy items due to different Q-matrices, it is not be within the scope of this present study.

The main appeal of extending CDMs to include mixture modeling is that it can be used in determining whether different distributions must be assumed for different latent groups. This means that it can improve model fit and correct classification rate of examinees compared to using the traditional G-DINA model in fitting data sets with inherent heterogeneity. Several studies have also stated its viability in solving some popular problems in CDMs. According to von Davier (2007), it can be a stepping stone in determining DIF items. In item response theory, mixture modeling has already been used in uncovering different strategies in answering a problem-solving question (Mislevy & Verhest, 1990; Wang & Xu, 2015). Additionally, Aitkin and Tunnicliffe Wilson (1980) stated that mixture modeling can be used in identifying outliers in a data set, opening its possibility for application in the identification of aberrant response patterns.

Another advantage of extending the G-DINA model specifically to mixture modeling application is the fact that a mixture model that builds upon the G-DINA model will be helpful for researchers who have employed the model in their studies and have extensive knowledge on the G-DINA model framework. The G-DINA model is one of the most common CDMs as it can be implemented freely in R (Ma, 2020). Aside from that, there has already been extensive research built upon this model framework.

This study proposes to employ expectation–maximization (EM) algorithm to obtain the maximum likelihood estimates of the parameters of the mixture G-DINA model and evaluate the capability of the proposed model across different factors, such as test length, number of examinees, and mixture composition of the population. To be specific, this study aimed to examine the performance of mixture G-DINA model in estimating the different item parameters from different latent groups and to determine the performance of the model in correctly classifying the latent groups of the examinees. Furthermore, the study would also investigate whether it can improve the classification of the attribute profile of examinees from different latent groups compared to the usual G-DINA model. This present study also aimed to identify which relative model fit criteria can be used in assessing the number of latent group components in the population, if the number of latent groups is unknown. The application of the proposed model was also applied to a real-world data to demonstrate its viability.

2. The G-DINA Model

The G-DINA model has been proposed to loosen the assumptions of the DINA model and to allow the differing subsets of attributes to have different success probabilities for an item (de la Torre, 2011). In the DINA model, the population is partitioned into two latent groups per item: those who have all the required attributes and those who lack at least one required attribute in answering the item; whereas in the G-DINA model, the population is divided into $2^{K_{j}^{*}}$ latent groups, where $K_{j}^{*} = \sum_{k = 1}^{K} q_{j k}$ is the number of required attributes for item j (e.g., de la Torre & Lee, 2013).

The reduced attribute vector $α_{l j}^{*} = (α_{l j 1}, α_{l j 2},..., α_{l j K_{j}^{*}})$ is used in place of the full attribute vector $α = (α_{1}, α_{2},..., α_{K})$ , where the reduction is based on row j of the Q-matrix. As an example, if $K = 3$ and item j only requires the first and third attributes $α_{1}$ and $α_{3}$ [i.e., q _j , the row j of the Q-matrix is equal to $(1, 0, 1)$ ], the reduced attribute vector $α_{l j}^{*}$ is equal to ${(α}_{l j 1}^{*} = α_{1}, α_{l j 2}^{*} = α_{3})$ . Several link functions can be utilized for the G-DINA model, namely, the identity, logit, and log links. The item response function (IRF) of the G-DINA model under the identity link is given in the following:

P (α_{l j}^{*}) = δ_{j 0} + \sum_{k = 1}^{K_{j}^{*}} δ_{j k} α_{l j k} + \sum_{k' = k + 1}^{K_{j}^{*}} \sum_{k = 1}^{K_{j}^{*} - 1} δ_{j k k'} α_{l j k} α_{l j k'} + ... + δ_{j 12... K_{j}^{*}} \prod_{k = 1}^{K_{j}^{*}} α_{l j k},

where $P (α_{l j}^{*})$ is the probability of a correct response for item j for examinees with $α_{l j}^{*}$ reduced attribute vector for item j, $δ_{j 0}$ is the probability of correct response if the examinee does not master any attribute required for item j, $δ_{j k}$ is the change in probability of a correct response due to the mastery of $α_{l j k}$ , $δ_{j k k^{'}}$ is the change in the probability of a correct response due to the mastery of $α_{l j k}$ and $α_{l j k^{'}}$ that is over and above the additive impact of the mastery of the same two attributes, and $δ_{j 12... K_{j}^{*}}$ is the interaction effect due to $α_{l j 1},..., α_{l j K_{j}^{*}}$ or the change in the probability of the correct response due to the mastery all the attributes that is over and above the additive impact of the main and lower order interaction effects. As derived in de la Torre (2011), the G-DINA model subsumes the DINA, DINO, A-CDM, RRUM, and LLM models with proper restrictions in the parameterization.

2.1. Estimation of the G-DINA Model

The maximum likelihood estimator (MLE) of $P (α_{l j}^{*})$ in the G-DINA model was shown by de la Torre (2011) to be equal to

\hat{P} (α_{l j}^{*}) = \frac{R_{α_{l j}^{*}}}{N_{α_{l j}^{*}}},

where $N_{α_{l j}^{*}} = \sum_{i = 1}^{N} p (α_{l j}^{*} | X_{i})$ is the estimated expected number of examinees in the latent group $α_{l j}^{*}$ , $R_{α_{l j}^{*}} = \sum_{i = 1}^{N} p (α_{l j}^{*} | X_{i}) \times X_{i j}$ is the estimated expected number of examinees in the latent group $α_{l j}^{*}$ that will answer item j correctly, and $p (α_{l j}^{*} | X_{i})$ represents the posterior probability that examinee i is in the latent group $α_{l j}^{*}$ .

For ability parameter estimation, three methods can be used, namely, MLE, maximum a posterior (MAP), and expected a posteriori (EAP; Huebner & Wang, 2011). The estimated attribute vector using MLE is the attribute vector given by ${\hat{α}}_{M L E}$ , which maximizes the likelihood function $L (X_{i} | α_{l})$ .

On the other hand, the MAP estimator denoted by ${\hat{α}}_{M A P}$ is the attribute vector that maximizes the posterior probability

P (α_{l} | X_{i}) = \frac{L (X_{i} | α_{l}) P (α_{l})}{\sum_{m = 1}^{L} L (X_{i} | α_{m}) P (α_{m})} .

Lastly, the EAP estimator uses the following formula:

{\hat{α}}_{k} = \sum_{l = 1}^{L} P (α_{l} | X_{i}) I (α_{l k} = 1),

where $I (α_{l k} = 1)$ is the indicator function that is equal to 1 if $α_{l k} = 1$ , 0 otherwise. To obtain the dichotomous attributes, ${\hat{α}}_{k}$ is rounded at 0.5. This method is referred to as the EAP because it averages over all the posterior probabilities instead of just picking the attributes that maximize the posterior probabilities or the likelihoods.

3. On Mixture Models

Finite mixture modeling is a flexible way of modeling heterogeneous data that come from different latent groups inherent in the population (McLachlan & Peel, 2000). Mixture modeling has been applied in several areas, such as biometrics, genetics, medicine, and marketing (Früwirth-Schnatter, 2006). McLachlan and Basford (1988) utilized a maximum likelihood approach in estimating the parameters of a finite mixture model.

Suppose H is the number of latent groups in a population. Let $π_{h}$ , oftentimes called the mixing proportion, be the proportion of population that belong to latent group h, where $\sum_{h = 1}^{H} π_{h} = 1$ , and $π_{h} \geq 0$ for all $h = 1, 2, ..., H$ . The density function of a vector of observations X _i is given by

f (X_{i}; φ) = \sum_{h = 1}^{H} π_{h} f_{h} (X_{i} | θ_{h}),

where $f_{h} (X_{i} | θ_{h})$ is the density function of the random vector X _i on latent group h, $θ_{h}$ is a vector of unknown parameters associated with latent group h, and $φ$ is the vector of unknown parameters consisting of the model parameters associated with latent group h and the mixing parameters, that is, $φ = (π', θ')^{'}$ , where $π = (π_{1}, π_{2},..., π_{H})^{'}$ and $θ = (θ'_{1}, θ'_{2},..., θ'_{H})^{'}$ .

To determine the latent group classification of the observations, Bayes’s rule is commonly utilized (Bayes, 1763). The posterior probability that element i of the population is in latent group h given the random vector X _i is given by

τ_{i h} = \frac{π_{h} f_{h} (X_{i} | θ_{h})}{\sum_{h = 1}^{H} π_{h} f_{h} (X_{i} | θ_{h})} .

3.1. Estimation of Mixture Models

The advent of EM algorithm has increased the interest in modeling heterogeneous data using finite mixture models. It simplifies the estimation of the parameters, because the latent groupings inside the population can be treated as missing data, in which EM algorithm excels at (McLachlan & Peel, 2000). The EM algorithm for the estimation of parameters from mixture models is discussed in McLachlan and Basford (1988).

Let $L (X) = \prod_{i = 1}^{N} f (X_{i}; φ)$ , where $L (X)$ is the likelihood function of an entire data set X, which has N random vectors. To get the estimates $\hat{φ}$ for the model that maximizes this likelihood function, one can get the derivative of $L (X)$ with respect to $φ$ and equate the derivative with respect to 0. Define the latent grouping variables for random vector i to be $Z_{i} = (Z_{i 1}, Z_{i 2},..., Z_{i H})$ to be equal to

Z_{i h} = {\begin{matrix} 1, & if X_{i} is in latent group h \\ 0, & if X_{i} is not in latent group h, \end{matrix}

where $Z_{1}, Z_{2},..., Z_{N}$ are independent and identically distributed random vectors following a multinomial distribution with parameters $π = (π_{1}, π_{2},..., π_{H - 1})$ . $X_{1}, X_{2},..., X_{N}$ and $Z_{1}, Z_{2},..., Z_{N}$ are assumed to be conditionally independent with the following log density:

l (X_{i}) = \sum_{h = 1}^{H} Z_{i h} log f_{h} (X_{i} | θ_{h}) .

The log likelihood for the complete data, $X_{1}, X_{1},..., X_{N}$ and $Z_{1}, Z_{1},..., Z_{N}$ , is given by

L_{c} (φ) = \sum_{h = 1}^{H} \sum_{i = 1}^{N} Z_{i h} [log π_{h} + log f_{h} (X_{i} | θ_{h})] .

For the E-step, we need to calculate

Q (φ, φ^{(0)}) = E (L_{c} (φ) | X; φ^{(0)}) = \sum_{h = 1}^{H} \sum_{i = 1}^{N} τ_{i h} [log π_{h} + log f_{h} (X_{i} | θ_{h})] .

For the M-step, we need to choose the value of $φ$ that maximizes $Q (φ, φ^{(0)})$ . In mixture modeling, to maximize the likelihood, it solves the following equations (McLachlan & Basford, 1988):

{\hat{π}}_{h} = \sum_{i = 1}^{N} \frac{{\hat{τ}}_{i h}}{N} and \sum_{h = 1}^{H} \sum_{i = 1}^{N} {\hat{τ}}_{i h} \frac{\partial log f_{h} (X_{i} | {\hat{θ}}_{h})}{\partial {\hat{θ}}_{h}} = 0.

The E and M steps are alternated repeatedly until the convergence criterion is met.

4. On Model Fit Criteria

Several model fit criteria have already been used for relative fit evaluation and determining the number of latent groups in the population when modeling mixture models. For this study, the performance of the following relative model fit criteria was examined: Akaike information criterion ( $A I C$ ), small-sample AIC ( $A I C_{C}$ ), Bayesian information criterion ( $B I C$ ), Kullback information criterion ( $K I C$ ), bias-corrected KIC ( $K I C_{C}$ ), approximation of KIC ( $A K I C_{C}$ ), large sample approximation of integrated classification likelihood ( $I C L - B I C$ ), classification likelihood criterion ( $C L C$ ), and approximate weight of evidence ( $A W E$ ). This section gives a quick description of these criteria.

The AIC (Akaike, 1974) selects the model that minimizes

A I C = - 2 log L (\hat{φ} | X) + 2 d,

where d is the number of unknown model parameters and $L (\hat{φ} | X)$ is the likelihood function of an entire data set. If the number of parameters d is large relative to the sample size n, a small-sample version ( $A I C_{c}$ ; Hurvich and Tsai, 1989) is given in the following:

A I C_{c} = - 2 log L (\hat{φ} | X) + \frac{2 d n}{n - d - 1} .

The BIC (Schwarz, 1978) picks the model with the lowest

B I C = - 2 log L (\hat{φ} | X) + d log N,

where N is the number of observations in the dataset. Cavanaugh (1999) developed an asymptotic unbiased estimator of the KIC given by

K I C = - 2 log L (\hat{φ} | X) + 3 (d + 1) .

The bias correction of $K I C$ ( $K I C_{c}$ ; Seghouane & Maiza, 2004) can be expressed as

K I C_{c} = - 2 log L (\hat{φ} | X) + \frac{2 (d + 1) n}{n - d - 2} - n ψ (\frac{n - d}{2}) + n log (\frac{n}{2}),

where $ψ (.)$ is the digamma or psi function. The $A K I C_{c}$ (Seghouane et al., 2005) is given by

A K I C_{c} = - 2 log L (\hat{φ} | X) + \frac{(d + 1) (3 n - d - 2)}{n - d - 2} + \frac{d}{n - d} .

The large sample approximation of the ICL-BIC (Biernacki et al., 1998) selects the model with the minimum

I C L - B I C = - 2 log L (\hat{φ} | X) + 2 E N (\hat{τ}) + 3 (d + 1),

where $\hat{τ}$ is an $N \times H$ matrix, in which $\hat{τ} = (τ_{i h})$ and the entropy term $E N (\hat{τ})$ is defined as

E N (\hat{τ}) = - \sum_{i = 1}^{N} \sum_{h = 1}^{H} τ_{i h} log (τ_{i h}) .

The entropy term $E N (\hat{τ})$ assesses how well the observations were correctly categorized into latent groups: the lower the entropy, the better the classification of observations (Celeux & Soromenho, 1996).

The CLC (Biernacki & Govaert, 1997) is given by

C L C = - 2 log L (\hat{φ} | X) + 2 E N (\hat{τ}) .

The model with the lowest $C L C$ is considered to be the best model.

The AWE (Banfield & Raftery, 1993) is expressed as

A W E = - 2 log L_{c} + 2 d (\frac{3}{2} + log (n)),

where $log L_{c} = log L (\hat{φ} | X) + E N (\hat{τ})$ . $log L_{c}$ is based on Hathaway’s (1986) mixture logarithmic likelihood. The model with the lowest $A W E$ score can be considered the best model.

5. The Proposed Model

Similar to the G-DINA model, the reduced attribute vector $α_{l j}^{*} = (α_{l j 1}, α_{l j 2},..., α_{l j K_{j}^{*}})$ is used in place of the full attribute vector $α = (α_{1}, α_{2},..., α_{K})$ for the mixture G-DINA model. K and $K_{j}^{*}$ are defined similarly. Let $P_{h α_{l j}^{*}}$ be the probability that an examinee with reduced attribute vector $α_{l j}^{*}$ will answer the question j correctly if an examinee is in latent group h, and $π_{h}$ be the proportion of examinees in latent group h, $h = 1, 2, ..., H$ . The interpretations of the model parameters are based on de la Torre (2011). Using the identity link, the IRF of the mixture G-DINA model is given by

P_{h α_{l j}^{*}} = δ_{j 0 h} + \sum_{k = 1}^{K_{j}^{*}} δ_{j k h} α_{l j k} + \sum_{k^{'} = k + 1}^{K_{j}^{*}} \sum_{k = 1}^{K_{j}^{*} - 1} δ_{j k k^{'} h} α_{l j k} α_{l j k^{'}} + ... + δ_{j 12... K_{j}^{*} h} \prod_{k = 1}^{K_{j}^{*}} α_{l j k},

where $δ_{j 0 h}$ is the probability of correct response, if an examinee in latent group h does not master any attribute required for item j, $δ_{j k h}$ is the change in probability of a correct response due to the mastery of $α_{k}$ , if an examinee is in latent group h, $δ_{j k k^{'} h}$ is the change in the probability of a correct response due to the mastery of $α_{k}$ and $α_{k^{'}}$ that is over and above the additive impact of the mastery of the same two attributes, if an examinee is in latent group h, $δ_{j 12... K_{j}^{*} h}$ is the interaction effect due to $α_{1}, α_{2},..., α_{K_{j}^{*}}$ , or the change in the probability of the correct response due to the mastery of all the attributes that is over and above the additive impact of the main and lower order interaction effects, if an examinee is in latent group h. Note that when $H = 1$ , then the mixture G-DINA model reduces to the traditional G-DINA model.

Note that the IRF of the proposed model is almost equivalent with the multiple group G-DINA model introduced by Ma et al. (2021). The key difference is the fact that the latent groups in multiple group G-DINA model are determined a priori, whereas the latent groups in the mixture G-DINA model are not known beforehand (i.e., the latent grouping variable of the examinees is considered another latent variable). The mixture G-DINA model also makes use of the mixture proportion, which multiple group G-DINA model does not have. The mixing proportion is an additional model parameter that needs to be estimated, providing the proportion of examinees belonging to a specific latent group. The mixing proportion is incorporated in the likelihood function used for the estimation of the model parameters.

5.1. Estimation Procedure

Let $L (X)$ be the likelihood of an entire response dataset X, a matrix of dimension $N \times J$ , where N is the number of examinees, and J is the number of items in the exam. Define $L (X)$ to be

L (X) = \prod_{i = 1}^{N} L (X_{i}),

under the assumption that the response vectors are independent, where $L (X_{i})$ is the likelihood function for the response vector of examinee i with dimension J. In finite mixture modeling, $L (X_{i})$ is equal to

L (X_{i}) = \sum_{h = 1}^{H} π_{h} L (X_{i} | h),

where $L (X_{i} | h)$ is the likelihood function for examinee i assuming that it is in latent group h, and $π_{h}$ , oftentimes called the mixing proportion, is the probability that an examinee is in latent group h, or the proportion of examinees in latent group h, $h = 1, 2, \dots, H$ . $L (X_{i} | h)$ can be further expressed to be equal to

L (X_{i} | h) = \sum_{l = 1}^{L} L (X_{i} | α_{l}, h) P (α_{l} | h),

where $P (α_{l} | h)$ is the probability that an examinee has skill vector $α_{l}$ is in latent group h, $L (X_{i} | α_{l}, h)$ is the likelihood function for examinee i in latent group h that has skill vector $α_{l}$ , $l = 1, 2, ..., L$ , and L is the total number of latent classes. Assuming that the responses are dichotomous, $L (X_{i} | α_{l}, h)$ can be expressed as

L (X_{i} | α_{l}, h) = \prod_{j = 1}^{J} P_{h α_{l j}^{*}}^{X_{i j}} {(1 - P_{h α_{l j}^{*}})}^{1 - X_{i j}},

where $P_{h α_{l j}^{*}}$ is the probability that examinee i belonging in latent group h answers the item j correctly, $l j {= 1, 2, ..., 2}^{K_{j}^{*}}$ , and $X_{i j}$ is the response of examinee i on item j. To obtain the MLE of our parameters of interest $P_{h α_{l j}^{*}}$ which are the success probabilities for item j and latent groups h across the different reduced attribute profiles, we need to maximize

l (X) = log L (X) = log \prod_{i = 1}^{N} L (X_{i}) = \sum_{i = 1}^{N} log L (X_{i}) .

The MLE of $P_{h α_{l j}^{*}}$ is derived to be equal to

{\hat{P}}_{h α_{l j}^{*}} = \frac{R_{h l j}^{*}}{N_{h l j}^{*}},

where $N_{h l j}^{*} = \sum_{i = 1}^{N} τ_{i h} P (α_{l j}^{*} | X_{i}, h)$ is the estimated expected number of examinees with reduced attribute pattern $α_{l j}^{*}$ in latent group h, and $R_{h l j}^{*} = \sum_{i = 1}^{N} X_{i j} τ_{i h} P (α_{l j}^{*} | X_{i}, h)$ is the estimated expected number of examinees with reduced attribute pattern $α_{l j}^{*}$ in latent group h that answered item j correctly.

The following are the steps of the EM algorithm in estimating the parameters of the mixture G-DINA model:

1. At iteration 0, assign initial values $P_{1 α_{1}^{*}}^{(0)}, P_{2 α_{1}^{*}}^{(0)},..., P_{H α_{1}^{*}}^{(0)},..., P_{1 α_{L_{j}}^{*}}^{(0)}, P_{2 α_{L_{j}}^{*}}^{(0)},..., P_{H α_{L_{j}}^{*}}^{(0)},$ $τ_{11}^{(0)},$ $τ_{12}^{(0)},..., τ_{1 H}^{(0)},..., τ_{N 1}^{(0)}, τ_{N 2}^{(0)},..., τ_{N H}^{(0)}$ , where $0 < P_{1 α_{1}^{*}}^{(0)}, P_{2 α_{1}^{*}}^{(0)}$ ,…, $P_{H α_{1}^{*}}^{(0)},$ …, $P_{1 α_{L_{j}}^{*}}^{(0)}, P_{2 α_{L_{j}}^{*}}^{(0)}$ ,…, $P_{H α_{L_{j}}^{*}}^{(0)} < 1,$ $0 \leq τ_{11}^{(0)}, τ_{12}^{(0)},..., τ_{1 H}^{(0)},..., τ_{N 1}^{(0)}, τ_{N 2}^{(0)},..., τ_{N H}^{(0)} \leq 1$ , and $\sum_{h = 1}^{H} τ_{i h}^{(0)} = 1$ . Note that if any of the initial values for $P_{h α_{l j}^{*}}$ in this step is equal to 1 or 0, the iterated likelihood function values will be undefined as some of the values will be divided by zero for every iteration.

2. Compute for $π_{1}^{(0)}, π_{2}^{(0)},..., π_{H}^{(0)}$ using the $τ_{11}^{(0)}, τ_{12}^{(0)},..., τ_{1 H}^{(0)},..., τ_{N 1}^{(0)}, τ_{N 2}^{(0)},..., τ_{N H}^{(0)}$ , where $π_{h}^{(0)} = \sum_{i = 1}^{N} τ_{i h}^{(0)}, h = 1, 2, ..., H$ .

3. At iteration t, compute for $(R_{11}^{*}^{(t)}, N_{11}^{*}^{(t)}),$ $(R_{12}^{*}^{(t)}, N_{12}^{*}^{(t)}),...,$ $(R_{1 H}^{*}^{(t)}, N_{1 H}^{*}^{(t)}),...,$ $(R_{L_{j} 1}^{*}^{(t)}, N_{L_{j} 1}^{*}^{(t)}),$ $(R_{L_{j} 2}^{*}^{(t)}, N_{L_{j} 2}^{*}^{(t)}),...,$ $(R_{L_{j} H}^{*}^{(t)}, N_{L_{j} H}^{*}^{(t)}),$ using $P_{1 α_{1}^{*}}^{(t - 1)}$ , $P_{2 α_{1}^{*}}^{(t - 1)}$ ,…, $P_{H α_{1}^{*}}^{(t - 1)}$ ,…, $P_{1 α_{L_{j}}^{*}}^{(t - 1)}$ , $P_{2 α_{L_{j}}^{*}}^{(t - 1)}$ ,…, $P_{H α_{L_{j}}^{*}}^{(t - 1)}$ and $τ_{11}^{(t - 1)}$ , $τ_{21}^{(t - 1)}$ ,…, $τ_{H 1}^{(t - 1)}$ ,…, $τ_{N 1}^{(t - 1)}$ , $τ_{N 2}^{(t - 1)}$ ,…, $τ_{N H}^{(t - 1)}$ where

$N_{h l j}^{*}^{(t)} = \sum_{i = 1}^{N} P^{(t - 1)} (α_{l j}^{*} | X_{i}, h) τ_{i h}^{(t - 1)} and R_{h l j}^{*}^{(t)} = \sum_{i = 1}^{N} X_{i j} P^{(t - 1)} (α_{l j}^{*} | X_{i}, h) τ_{i h}^{(t - 1)}$

Update $P_{1 α_{1}^{*}}^{(t)}$ , $P_{2 α_{1}^{*}}^{(t)}$ ,…, $P_{H α_{1}^{*}}^{(t)}$ ,…, $P_{1 α_{L_{j}}^{*}}^{(t)}$ , $P_{2 α_{L_{j}}^{*}}^{(t)}$ ,…, $P_{H α_{L_{j}}^{*}}^{(t)}$ using

$P_{h α_{l j}^{*}}^{(t)} = \frac{R_{h l j}^{*}^{(t)}}{N_{h l j}^{*}^{(t)}} .$

4. At iteration t, update $τ_{11}^{(t)}$ , $τ_{12}^{(t)}$ ,…, $τ_{1 H}^{(t)}$ ,…, $τ_{N 1}^{(t)}$ , $τ_{N 2}^{(t)}$ ,…, $τ_{N H}^{(t)}$ using

$τ_{i h}^{(t)} = \frac{π_{h}^{(t - 1)} L^{(t - 1)} (X_{i} | h)}{\sum_{h = 1}^{H} π_{h}^{(t - 1)} L^{(t - 1)} (X_{i} | h)} .$

At the same time, compute for $π_{1}^{(t)}, π_{2}^{(t)},..., π_{H}^{(t)}$ using the $τ_{11}^{(t)},$ $τ_{12}^{(t)},$ …, $τ_{1 H}^{(t)},$ …, $τ_{N 1}^{(t)},$ $τ_{N 2}^{(t)},$ …, $τ_{N H}^{(t)}$ , where $π_{h}^{(t)} = \sum_{i = 1}^{N} τ_{i h}^{(t)}, h = 1, 2, ..., H$ (McLachlan & Basford, 1988).

5. Repeat Steps 3 and 4 until convergence or when the max difference in all the model parameters $P_{1 α_{1}^{*}}, P_{2 α_{1}^{*}},..., P_{H α_{1}^{*}},..., P_{1 α_{L_{j}}^{*}}, P_{2 α_{L_{j}}^{*}},..., P_{H α_{L_{j}}^{*}},$ and $π_{1}, π_{2},..., π_{H}$ do not exceed 0.0001 for two consecutive iterations.

The priors are estimated using the information from the data by employing the empirical Bayes method. For each iteration, the prior probabilities for the attribute vectors are updated using the following expression:

P^{(t)} (α_{l} | h) = \sum_{i = 1}^{N} \frac{P^{(t - 1)} (α_{l} | X_{i}, h)}{N} .

Note that this algorithm is for the assumption that the number of latent groups is already known. If the number of components in the population is unknown, one way to estimate it is through the use of relative model fit criteria. This is done by fitting mixture models at $H = 1$ , then at $H = 2$ , and so on up to the case wherein we finally get the lowest value of the relative model fit criteria. Simulation study 2 examined the performances of the different relative model fit criteria used for estimating the number of components in mixture models.

In the estimation of the latent group membership of the examinees, the Bayes’s rule was used. The latent group membership of examinee i was determined by the maximum $τ_{i h}$ across all h. If there is a tie, that is at least two of the latent grouping probabilities of an examinee became its maximum, the examinee is randomly assigned to the any of the latent groups wherein there were a tie. The attributes of an examinee were estimated using the usual MLE, EAP, and MAP.

6. Simulation Studies

Two simulation studies were conducted in this research. Simulation Study 1 investigated the viability of the mixture G-DINA model when fitted to heterogeneous response data. Simulation Study 2 focused on the estimation of the number of components in the population using relative model fit criteria. The monotonicity property of the success probabilities was not assumed in the simulation study and the real data application. This means that it is possible for examinees who possess more attributes to have lower success probability than examinees who possess less attributes needed to answer the question.

6.1. Simulation Study 1: Evaluation of the Performance of the Proposed Model

6.1.1. Design

The factors manipulated in Simulation Study 1 were the test length, the number of examinees, the mixing proportions, the item quality (IQ), the different generating models (genmod), and the number of latent groups.

For the number of items, $J = 15$ and $J = 30$ were considered. For the number of examinees, $N = 1,500$ and $N = 3,000$ were considered. These levels were chosen as both are divisible by two and three. This was to facilitate the simulations on equal weightings, wherein the entire population would be equally divided by two or three. For the number of latent groups, $H = 2$ and $H = 3$ were considered. This study has not dealt with the condition of an extremely unbalanced group with a small sample (450 examinees + 50 examinees mixtures). The authors started studying these cases through simulation experiments, and based on preliminary results, the average absolute bias of the item parameters is already extremely high (around 0.25) even when the conditions simulated have high item quality and longer test length. Aside from this, several papers have also supported the observation that 500 is already a small sample size for CDMs and several of them have already recommended having at least 1,000 examinees to adequately estimate the model parameters (Bradshaw & Madison, 2016; Madison & Bradshaw, 2018; Sen & Cohen, 2021). Since the mixture G-DINA model has more parameters than most of the other usual CDMs, we think that with a sample size lower than 1,000, estimating the proposed model would be challenging.

The performance of the mixture G-DINA model was compared across the different partitioning of the mixing proportions: even and uneven. For two latent groups with even partitioning, 50% was generated from latent Groups 1 and 2. For two latent groups with uneven partitioning, two mixing proportions specifications were investigated: 80%–20% and 95%–5%. The 95%–5% mixing proportion specification is for examining the performance of the model in simulating cheating scenarios, where 5% of the examinees cheated. For three latent groups with even partitioning, 1/3 was generated from latent Groups 1 through 3. For uneven partitioning of the three latent groups, 50% was generated from Latent Group 1, 30% was generated from Latent Group 2, and 20% was generated from Latent Group 3.

For item quality, the combinations of good, moderate, and poor item quality were considered for a single test. Items with guessing and slip parameters generated from uniform distribution with mean equal to 0.1, 0.2, and 0.3 were considered to be items with good, moderate, and bad quality, respectively. Lastly, the generating model was manipulated for the simulation design to form mixtures. Combinations of different generating models: G-DINA, DINA, DINO, and A-CDM were considered in the study. These scenarios were based on Santos et al. (2015) when they examined the impact of aberrant examinees in CDMs using a forward search algorithm. In generating aberrant examinees, they assumed that the response patterns came from another CDM with different parameter specifications. For instance, the typical response patterns were generated from DINA model with guessing and slip parameters equal to 0.10 and aberrant response patterns were generated from DINA model but with guessing and slip parameters equal to 0.25. Similar scenarios were investigated in this study to investigate the performance of the mixture G-DINA model in handling such situations when there is the presence of outlying response patterns in the assessment data.

The number of attributes in consideration was fixed at $K = 5$ . The Q-matrix used in the simulation study was based on de la Torre (2011). The Q-matrix for $J = 30$ is given in Table 1. For $J = 15$ , the q-vectors involved are the items with asterisks. The Q-matrix for $J = 15$ was specifically used, so that the skills across all items were scattered evenly, and there were items that possessed 1, 2, and 3 required attributes per item, as with the original Q-matrix from de la Torre (2011). The identifiability of the mixture G-DINA model is assured by the Q-matrix used. Existing identifiability research states that the Q-matrix for J = 30 and J = 15 ensures generic identifiability of the G-DINA model (Gu & Xu, 2020; Xu, 2017). Additionally, to avoid the problem of model nonidentifiability of mixture models due to invariance to relabeling the components and to potential overfitting, the following formal identifiability constraints from Früwirth-Schnatter (2006) were implemented: (1) $π_{h} > 0$ for $h = 1, 2, \dots, H$ and (2) the collection of parameters from different latent groups $θ_{h}$ and $θ_{h^{'}}$ differ in at least one element of collection. For each condition, 100 data sets were generated.

Table 1.

Q-Matrix Used for the Simulation Studies

Items	Attributes					Items	Attributes					Items	Attributes
	a1	a2	a3	a4	a5		a1	a2	a3	a4	a5		a1	a2	a3	a4	a5
1*	1	0	0	0	0	11*	1	1	0	0	0	21*	1	1	1	0	0
2*	0	1	0	0	0	12	1	0	1	0	0	22	1	1	0	1	0
3*	0	0	1	0	0	13	1	0	0	1	0	23*	1	1	0	0	1
4*	0	0	0	1	0	14*	1	0	0	0	1	24	1	0	1	1	0
5*	0	0	0	0	1	15*	0	1	1	0	0	25	1	0	1	0	1
6	1	0	0	0	0	16	0	1	0	1	0	26*	1	0	0	1	1
7	0	1	0	0	0	17	0	1	0	0	1	27*	0	1	1	1	0
8	0	0	1	0	0	18*	0	0	1	1	0	28	0	1	1	0	1
9	0	0	0	1	0	19	0	0	1	0	1	29	0	1	0	1	1
10	0	0	0	0	1	20*	0	0	0	1	1	30*	0	0	1	1	1

Note: Items with * are used for J = 15.

To examine whether the proposed estimation procedure can recover the parameters of the mixture G-DINA model, the overall absolute bias of the estimates and the componentwise absolute bias were measured across different conditions. The overall absolute bias denoted by $B I A S_{o}$ and the componentwise bias of the estimates for component h denoted by $B I A S_{h}$ are given by the following formulas:

B I A S_{o} = \sum_{h = 1}^{H} \sum_{j = 1}^{J} \sum_{l j = 1}^{L_{j}} \frac{| P_{h α_{l j}^{*}} - {\hat{P}}_{h α_{l j}^{*}} |}{\sum_{h = 1}^{H} \sum_{j = 1}^{J} \sum_{l j = 1}^{L_{j}} 1},

B I A S_{h} = \sum_{j = 1}^{J} \sum_{l j = 1}^{L_{j}} \frac{| P_{h α_{l j}^{*}} - {\hat{P}}_{h α_{l j}^{*}} |}{\sum_{j = 1}^{J} \sum_{l j = 1}^{L_{j}} 1},

where $P_{h α_{l j}^{*}}$ is the generated success probability for item j for examinees with $α_{l j}^{*}$ attribute vector and ${\hat{P}}_{h α_{lj}^{*}}$ is the estimated value of $P_{h α_{l j}^{*}}$ .

To determine the viability of mixture G-DINA model in classifying the examinees correctly into their inherent latent groups in the population and the latent attribute classes, the correct latent group classification rates ( $C C G$ ) and the correct attribute latent class classification rates, both vectorwise ( $C C L_{v e c}$ ) and elementwise ( $C C L_{e l}$ ) were measured. All of these measures are given in the following:

C C G = \sum_{i = 1}^{N} \frac{I_{i [h = \hat{h}]}}{N},

C C L_{v e c} = \sum_{i = 1}^{N} \frac{I_{i [α = \hat{α}]}}{N},

C C L_{e l} = \sum_{k = 1}^{K} \sum_{i = 1}^{N} \frac{I_{i [α_{k} = {\hat{α}}_{k}]}}{K \times N},

where $I_{i [h = \hat{h}]} = 1$ if the estimated latent grouping of examinee i, denoted by $\hat{h}$ is equal to h, 0 otherwise, $I_{i [α = \hat{α}]} = 1$ if the estimated attribute vector $\hat{α}$ is equal to the actual attribute vector $α$ of examinee i, 0 otherwise, and $I_{i [α_{k} = {\hat{α}}_{k}]} = 1$ if the estimated attribute k ${\hat{α}}_{k}$ is equal to the real attribute k $α_{k}$ , 0 otherwise.

The correct latent class classification rate of the mixture G-DINA model was compared against the case wherein we only used the usual G-DINA model, to know whether it improves the correct attribute classification. For the estimation of the attribute vector, MLE, EAP, and MAP were considered.

6.1.2. Results

Due to space limitations, only the results for some of the treatment combinations are presented in this subsection.

6.1.2.1. Overall and componentwise absolute bias

Tables 2 through 4 show the average overall and componentwise absolute biases of estimated success probabilities across items and across different treatment combinations of mixtures of generating model, combinations of item quality, number of items, and number of examinees across replicates.

Table 2.

Average Absolute Biases of Estimated Success Probabilities Across Items (Equal Weights and $H = 2$ )

		$J = 30$ and $N = 1,500$			$J = 15$ and $N = 1,500$
Model 1	Model 2	Overall	Comp1	Comp2	Overall	Comp1	Comp2
Item Quality 1—Good; Item Quality 2—Moderate
DINA	DINA	.0630	.0393	.0866	.1536	.0476	.2597
DINO	DINO	.0710	.0364	.1256	.1953	.0395	.3511
A-CDM	A-CDM	.1291	.1266	.1315	.1816	.1105	.2528
GDINA	GDINA	.0940	.0510	.0369	.1817	.1067	.2566
Item Quality 1—Moderate; Item Quality 2—Bad
DINA	DINA	.1114	.0278	.1951	.1878	.0476	.3281
DINO	DINO	.1188	.0278	.2098	.1964	.0501	.3427
A-CDM	A-CDM	.1437	.0374	.2500	.2088	.1490	.2687
GDINA	GDINA	.1328	.0347	.2308	.2057	.1202	.2911
Item Quality 1—Good; Item Quality 2—Bad
DINA	DINA	.1566	.0760	.2372	.2514	.2072	.2957
DINO	DINO	.1831	.0432	.3231	.2670	.2116	.3223
A-CDM	A-CDM	.1836	.1381	.2290	.3157	.3061	.3252
GDINA	GDINA	.1799	.1008	.2591	.2735	.2492	.2979

Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Overall is the overall absolute bias for both components, whereas Comp1 and Comp2 refer to the componentwise absolute bias for Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.

Table 3.

Average Absolute Biases of Estimated Success Probabilities Across Items (Equal Weights and $H = 2$ )

		$J = 30$ and $N = 1,500$			$J = 15$ and $N = 1,500$
Model 1	Model 2	Overall	Comp1	Comp2	Overall	Comp1	Comp2
Item Quality 1—Good; Item Quality 2—Good
DINA	DINO	.0236	.0235	.0237	.1153	.0535	.1771
DINA	A-CDM	.0306	.0261	.0351	.1206	.0544	.1599
DINA	GDINA	.0273	.0254	.0360	.0687	.0515	.0858
DINO	A-CDM	.0304	.0256	.0352	.1307	.0547	.1563
DINO	GDINA	.0283	.0259	.0307	.1019	.0593	.1444
A-CDM	GDINA	.0465	.0492	.0437	.1621	.2313	.0929
Item Quality 1—Moderate; Item Quality 2—Moderate
DINA	DINO	.0464	.0463	.0466	.2546	.2244	.2849
DINA	A-CDM	.0805	.0805	.2089	.2648	.2198	.3023
DINA	GDINA	.0561	.0524	.0597	.2315	.2237	.2393
DINO	A-CDM	.1363	.0607	.2118	.2680	.2131	.3229
DINO	GDINA	.1255	.0616	.1894	.2360	.1977	.2743
A-CDM	GDINA	.1714	.2757	.0671	.2585	.2688	.2483
Item Quality 1—Bad; Item Quality 2—Bad
DINA	DINO	.2271	.2289	.2252	.3230	.3118	.3343
DINA	A-CDM	.2329	.2566	.2093	.3596	.3545	.3647
DINA	GDINA	.2255	.2606	.1904	.3325	.3321	.3330
DINO	A-CDM	.2406	.2059	.2753	.3482	.3471	.3493
DINO	GDINA	.2369	.2087	.2651	.3312	.3311	.3313
A-CDM	GDINA	.2327	.2591	.2062	.3451	.3457	.3446

Table 4.

Average Absolute Biases of Estimated Success Probabilities Across Items (Equal Weights and $H = 2$ )

		$J = 30$ and $N = 3,000$			$J = 15$ and $N = 3,000$
Model 1	Model 2	Overall	Comp1	Comp2	Overall	Comp1	Comp2
Item Quality 1—Good; Item Quality 2—Good
DINA	DINO	.0167	.0166	.0168	.1091	.0396	.1785
DINA	A-CDM	.0212	.0180	.0247	.0925	.0439	.1411
DINA	GDINA	.0192	.0171	.0213	.0556	.0386	.0725
DINO	A-CDM	.0212	.0178	.0246	.1100	.0479	.1721
DINO	GDINA	.0202	.0182	.0222	.0830	.0462	.1198
A-CDM	GDINA	.0298	.0312	.0283	.1983	.2859	.1106
Item Quality 1—Moderate; Item Quality 2—Moderate
DINA	DINO	.0322	.0322	.0322	.2562	.2364	.2759
DINA	A-CDM	.0474	.0378	.0570	.2380	.1856	.2904
DINA	GDINA	.0387	.0354	.0421	.2199	.2390	.2007
DINO	A-CDM	.0817	.0436	.1197	.1985	.0687	.3284
DINO	GDINA	.0649	.0419	.0879	.2296	.1499	.3093
A-CDM	GDINA	.1887	.3255	.0519	.2144	.2940	.1348
Item Quality 1—Bad; Item Quality 2—Bad
DINA	DINO	.2003	.2404	.1601	.3270	.3108	.3433
DINA	A-CDM	.2158	.2393	.1922	.3341	.3223	.3460
DINA	GDINA	.2156	.2735	.1576	.3349	.3221	.3477
DINO	A-CDM	.2216	.1408	.3023	.3729	.3544	.3914
DINO	GDINA	.2189	.1335	.3044	.3623	.3413	.3832
A-CDM	GDINA	.2276	.2979	.1573	.3684	.3730	.3639

Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Overall is the overall absolute bias for both components, whereas Comp1 and Comp2 refer to the component-wise absolute bias for Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.

Based on Tables 2 through 4, if there are two latent groups in the population that differs in item quality only, better item quality for both latent groups implies lower overall absolute bias. This characteristic is seen across all the treatment combinations in the study. Additionally, if we have two latent groups that only differ in the generating model, the absolute bias is generally lower for latent groups with higher item quality. Note that the average absolute bias is already too high and unacceptable when the item quality for both components is bad. This implies that it is not advisable to use the proposed methodology for cases when the item quality for both components is bad.

The results also showed that the number of examinees has less impact in comparison with the number of items on the reduction of absolute bias, if both latent groups came from different generating models. In terms of test length, longer examinations incur lower overall and componentwise absolute bias. Generally, for latent groups that differ in the generating model, the overall absolute bias is lower for populations equally weighted than unequally weighted.

The overall absolute bias is generally lower when there is a component that is from A-CDM. It is oftentimes evidenced that the absolute bias is lowest when A-CDM and G-DINA are both involved in the population, for both $H = 2$ and $H = 3$ . This is probably the case since A-CDM and G-DINA with the same item quality are the two models that have the closest item parameterization. Another thing to consider is the fact that these two generating models are the most complex models included in this study. It may be that the proposed methodology in estimating the item parameters finds it hard to estimate that level of frequency of parameters in the model.

One can see from the componentwise absolute bias that the absolute bias from components with higher item quality is lower compared to the absolute bias from components with lower item quality. Lastly, the absolute bias from components involving DINA and DINO model is lower compared to that of G-DINA and A-CDM. The componentwise absolute bias of components with A-CDM is also consistently the highest among the four generating models.

6.1.2.2. Correct latent group classification rates

Given in Tables 5 through 7 are the mean correct latent group classification rates across different treatment combinations of mixtures of different generating models, mixtures of different item qualities, number of items, and number of examinees across replicates.

Table 5.

Average Correct Latent Group Classification Rate Across Examinees (Equal Weights and $H = 2$ )

Item Quality 1		Good	Good	Moderate	Good	Good	Moderate
Item Quality 2		Moderate	Bad	Bad	Moderate	Bad	Bad
Model 1	Model 2	$J = 30$ and $N = 1,500$			$J = 15$ and $N = 1,500$
DINA	DINA	.6866	.8795	.5837	.5808	.7491	.5356
DINO	DINO	.5999	.8775	.5953	.5643	.7427	.5284
A-CDM	A-CDM	.5252	.7573	.5268	.5285	.5583	.5126
GDINA	GDINA	.9315	.8541	.5743	.5734	.6430	.5275
Model 1	Model 2	$J = 30$ and $N = 3,000$			$J = 15$ and $N = 3,000$
DINA	DINA	.7363	.8955	.5664	.5500	.7580	.5344
DINO	DINO	.6735	.8933	.5913	.5460	.7553	.5422
A-CDM	A-CDM	.5278	.7618	.5184	.5189	.5718	.5126
GDINA	GDINA	.8342	.8580	.5610	.5637	.6595	.5243

Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.

Table 6.

Average Correct Latent Group Classification Rate Across Examinees (Equal Weights and $H = 2$ )

Item Quality 1		Good	Moderate	Bad	Good	Moderate	Bad
Item Quality 2		Good	Moderate	Bad	Good	Moderate	Bad
Model 1	Model 2	$J = 30$ and $N = 1,500$			$J = 15$ and $N = 1,500$
DINA	DINO	.9286	.8253	.5629	.7413	.5469	.6295
DINA	A-CDM	.8930	.7050	.5484	.6895	.5492	.6068
DINA	GDINA	.9262	.7950	.5637	.7820	.5708	.6014
DINO	A-CDM	.8940	.6361	.5360	.6868	.5470	.5798
DINO	GDINA	.8857	.6167	.5276	.6755	.5397	.5527
A-CDM	GDINA	.7363	.5201	.5128	.5373	.5211	.5213
Model 1	Model 2	$J = 30$ and $N = 3,000$			$J = 15$ and $N = 3,000$
DINA	DINO	.9307	.8323	.5667	.7533	.5466	.7149
DINA	A-CDM	.8967	.7562	.5269	.7205	.5387	.6714
DINA	GDINA	.9275	.8109	.5457	.7885	.5578	.6840
DINO	A-CDM	.8985	.7117	.5295	.7016	.7544	.6120
DINO	GDINA	.8910	.7134	.5181	.6936	.5238	.5697
A-CDM	GDINA	.7690	.5117	.5076	.5247	.5126	.5213

Table 7.

Average Correct Latent Group Classification Rate Across Examinees ( $H = 2$ and $N = 1,500$ )

	Item Quality 1	Good	Moderate	Bad	Good	Moderate	Bad
	Item Quality 2	Good	Moderate	Bad	Good	Moderate	Bad
Genmod 1	Genmod 2	$J = 30$ and 80%–20% Mixture			$J = 15$ and 80%–20% Mixture
DINA	DINO	.9333	.8595	.6925	.8408	.6835	.7097
DINA	A-CDM	.9264	.7411	.6905	.8312	.6705	.6204
DINA	GDINA	.9429	.8428	.6785	.8551	.6777	.6367
DINO	DINA	.9321	.8653	.6958	.8356	.7210	.6758
DINO	A-CDM	.9280	.8192	.7073	.8336	.7335	.6058
DINO	GDINA	.9193	.8099	.6900	.8222	.7299	.6000
A-CDM	DINA	.9105	.7912	.6876	.7883	.6594	.5872
A-CDM	DINO	.9096	.7476	.7015	.7279	.6615	.5940
A-CDM	GDINA	.7163	.7574	.6898	.7465	.6480	.5704
GDINA	DINA	.9414	.8408	.7187	.8375	.7133	.6269
GDINA	DINO	.9106	.7554	.7105	.7346	.6885	.5905
GDINA	A-CDM	.8194	.7800	.7142	.7444	.6895	.5716
Genmod 1	Genmod 2	$J = 30$ and 95%–5% Mixture			$J = 15$ and 95%–5% Mixture
DINA	DINO	.9650	.7152	.8010	.8374	.6368	.6169
DINA	A-CDM	.9486	.7491	.7929	.7746	.6212	.6274
DINA	GDINA	.9702	.7203	.7781	.8458	.6284	.6311
DINO	DINA	.9622	.9349	.8494	.9445	.8461	.6342
DINO	A-CDM	.9593	.9118	.8239	.9272	.8265	.6354
DINO	GDINA	.9540	.9076	.8332	.9241	.8259	.6520
A-CDM	DINA	.9191	.8892	.8165	.8084	.7156	.6384
A-CDM	DINO	.9058	.8843	.8232	.7616	.7010	.6385
A-CDM	GDINA	.8024	.8843	.8111	.7626	.7121	.6341
GDINA	DINA	.9627	.9096	.8402	.8839	.7728	.6579
GDINA	DINO	.9158	.8845	.8410	.7852	.7706	.6503
GDINA	A-CDM	.8240	.8903	.8408	.7989	.7733	.6596

Tables 5 through 7 reveal that the correct latent group classification rate is generally higher for latent groups with unequal weights than equal weights. It is important to note that for the case of unequal weights, most of the examinees from mixture that are classified correctly came from the majority class.

The results in Tables 5 through 7 also give evidence that higher difference in the item quality between latent groups implies higher latent group classification rate across the population. The latent group classification rate is also higher when the generating model is different compared when the latent groups differ only by the item quality. Both results signify that the more prominent the difference between the characteristics of the mixtures, the better the methodology in correctly classifying the examinees.

Both item quality, test length, and number of mixtures are the significant factors that affect the correct latent group classification rate. The higher the item quality for latent groups that differ only by generating models, the higher the latent group classification rate is. Longer test length also results to higher latent group classification rate. Additionally, there is generally higher correct latent group classification rate for two latent groups than three latent groups. On the other hand, there is not that much improvement in the latent group classification rate in increasing the number of examinees.

For $H = 3$ , the inclusion of A-CDM and G-DINA in the same mixture will result in a lower correct latent group classification rate, meaning, that it is harder for mixture G-DINA model to classify examinees into A-CDM or G-DINA components. This is maybe due to the fact that among the four generating models involved, A-CDM and G-DINA have the closest item parameterization. Another thing to consider is the fact that the two models, when combined, are the most complex among the different mixtures.

For the case of equal weights with the same generating model but different item qualities, the majority of the correctly assigned examinees to latent groups are from components with better item quality. Aside from that, for the case of equal weights with the same item quality and different generating models, the majority of the correctly assigned examinees are from components with DINA or DINO generating models. Furthermore, for simulation treatment combinations with unequal weights, the same generating model, but different item quality across components, it is important to note that most of the correctly assigned examinees to their true latent groups are from the majority latent group with better item quality. For cases with mixtures comprising of unequal weights, the same item quality, but different generating models, most of the correctly assigned examinees are still from the largest component. However, it is evident that the generating model affects the percentage of examinees that are correctly assigned in the component. Mixtures wherein majority of it are from DINA or DINO have higher correct classification rate than mixtures wherein A-CDM and G-DINA are the majority.

Lastly, examining Table 7 with 95%–5% mixture condition, we can see that the performance of the proposed method to correctly identify the cheating group also largely depends on the same factors discussed earlier. Higher performance in correctly identifying the cheating group is for cases when the test is longer, both mixtures have good item quality and do not involve both G-DINA and A-CDM components. It can be noted that because most of the correctly classified examinees belong to the majority class (original members of the 95% class), the proposed method has higher performance in correctly classifying examinees that are members of the noncheating latent group than the cheating latent group.

6.1.2.3. Correct latent class classification rates

The results given in Tables 8 through 11 show the mean vectorwise correct latent class classification rate using maximum likelihood estimation across replicates. Only results from attribute estimation using MLE were presented, because among the three methods of estimating the attribute profile of examinees, it is the MLE that consistently had the highest performance in mixture G-DINA modeling. Because the results from the correct attribute classification rate vectorwise and elementwise follow congruently with each other, it suffices to present only the vectorwise correct latent class classification rates.

Table 8.

Average Correct Latent Class Classification Rate (Vectorwise) via Maximum Likelihood Estimator (Equal Weights, $H = 2$ )

Item Quality 1		Good		Good		Moderate
Item Quality 2		Moderate		Bad		Bad
Model 1	Model 2	$H = 1$	$H = 2$	$H = 1$	$H = 2$	$H = 1$	$H = 2$
$J = 30$ and $N = 1,500$
DINA	DINA	.8019	.7840	.6480	.5544	.5221	.4667
DINO	DINO	.8029	.7646	.6471	.5495	.5210	.4617
A-CDM	A-CDM	.7112	.6968	.5645	.5017	.4009	.3497
GDINA	GDINA	.7341	.7234	.5790	.5113	.4387	.3922
$J = 15$ and $N = 1,500$
DINA	DINA	.5926	.5416	.4608	.3808	.2861	.1247
DINO	DINO	.5894	.5418	.4573	.3701	.2773	.1173
A-CDM	A-CDM	.4779	.3924	.3668	.2251	.1948	.0536
GDINA	GDINA	.4974	.4062	.3880	.2612	.2133	.0935

Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. $H = 1$ and $H = 2$ pertains to the average correct latent class classification rate using the usual G-DINA model and the mixture G-DINA model, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.

Table 9.

Average Correct Latent Class Classification Rate (Vectorwise) via Maximum Likelihood Estimator (Equal Weights, $H = 2$ )

Item Quality 1		Good		Moderate		Bad
Item Quality 2		Good		Moderate		Bad
Model 1	Model 2	$H = 1$	$H = 2$	$H = 1$	$H = 2$	$H = 1$	$H = 2$
$J = 30$ and $N = 1,500$
DINA	DINO	.4538	.8998	.3169	.5883	.1302	.1018
DINA	A-CDM	.7742	.8595	.5160	.5088	.2269	.1073
DINA	GDINA	.7331	.8868	.4810	.5533	.2105	.1195
DINO	A-CDM	.7736	.8603	.5155	.4815	.2190	.1161
DINO	GDINA	.8297	.8833	.5643	.5375	.2458	.1313
A-CDM	GDINA	.8630	.8608	.5745	.5519	.2289	.1228
$J = 15$ and $N = 1,500$
DINA	DINO	.3121	.4164	.1717	.1071	.0942	.0413
DINA	A-CDM	.5331	.5079	.2905	.1468	.1344	.0252
DINA	GDINA	.4479	.5392	.2657	.1478	.1299	.0325
DINO	A-CDM	.5314	.5068	.2638	.1523	.1014	.0313
DINO	GDINA	.6023	.5720	.3102	.1900	.1093	.0376
A-CDM	GDINA	.6338	.5910	.3209	.1526	.1285	.0309

Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. $H = 1$ and $H = 2$ pertain to the average correct latent class classification rate using the usual G-DINA model and the mixture G-DINA model, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.

Table 10.

Average Correct Latent Class Classification Rate (Vectorwise) via Maximum Likelihood Estimator (Equal Weights, $H = 2$ )

Item Quality 1		Good		Moderate		Bad
Item Quality 2		Good		Moderate		Bad
Model 1	Model 2	$H = 1$	$H = 2$	$H = 1$	$H = 2$	$H = 1$	$H = 2$
$J = 30$ and $N = 3,000$
DINA	DINO	.4564	.9042	.3326	.6014	.1395	.1229
DINA	A-CDM	.7732	.8650	.5216	.5438	.2554	.1586
DINA	GDINA	.7378	.8904	.4935	.5816	.2351	.1310
DINO	A-CDM	.7752	.8655	.5244	.5211	.2560	.1672
DINO	GDINA	.8319	.8864	.5731	.5807	.2818	.1978
A-CDM	GDINA	.8663	.8687	.5856	.5733	.2645	.1553
$J = 15$ and $N = 3,000$
DINA	DINO	.3131	.4019	.1765	.1110	.0964	.0419
DINA	A-CDM	.5438	.5339	.3136	.2007	.1504	.0299
DINA	GDINA	.4594	.5532	.2720	.1711	.1380	.0326
DINO	A-CDM	.5418	.5248	.3001	.3235	.1139	.0207
DINO	GDINA	.6076	.5881	.3394	.2527	.1230	.0235
A-CDM	GDINA	.6434	.6222	.3485	.2374	.1424	.0224

Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. $H = 1$ and $H = 2$ pertain to the average correct latent class classification rate using the usual G-DINA model and the mixture G-DINA model, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.

Table 11.

Average Correct Latent Class Classification Rate (Vectorwise) via Maximum Likelihood Estimator (Unequal Weights, $H = 2$ )

	Item Quality 1	Good		Moderate		Bad
	Item Quality 2	Good		Moderate		Bad
Model 1	Model 2	$H = 1$	$H = 2$	$H = 1$	$H = 2$	H $= 1$	$H = 2$
$J = 30$ and $N = 1,500$ , 80% and 20% mixture
DINA	DINO	.7308	.8988	.5360	.5757	.2482	.1435
DINA	A-CDM	.8337	.8792	.6080	.5825	.2841	.1599
DINA	GDINA	.8049	.8932	.5908	.5780	.2756	.1384
DINO	DINA	.7288	.8956	.5351	.6093	.2410	.1303
DINO	A-CDM	.8339	.8793	.6075	.5858	.2818	.1698
DINO	GDINA	.8606	.8913	.6246	.5975	.2877	.1613
A-CDM	DINA	.8130	.8454	.5231	.5068	.2083	.1034
A-CDM	DINO	.8160	.8450	.5259	.4944	.2093	.1062
A-CDM	GDINA	.8553	.8468	.5605	.5385	.2153	.1088
GDINA	DINA	.7826	.8829	.5462	.5597	.2295	.1349
GDINA	DINO	.8510	.8815	.5867	.5657	.2515	.1342
GDINA	A-CDM	.8794	.8719	.6028	.5817	.2452	.1476
J = 30, N = 1,500, unequal weights 95% and 5%, H = 2
DINA	DINO	.8742	.8682	.6502	.6234	.3166	.2136
DINA	A-CDM	.8919	.8906	.6658	.6391	.3241	.2063
DINA	GDINA	.8886	.8943	.6650	.6383	.3222	.1982
DINO	DINA	.8727	.9036	.6550	.6496	.3179	.2376
DINO	A-CDM	.8939	.8946	.6684	.6433	.3198	.2260
DINO	GDINA	.8996	.8986	.6710	.6448	.3215	.2236
A-CDM	DINA	.8454	.8402	.5521	.5288	.2213	.1338
A-CDM	DINO	.8440	.8346	.5475	.5372	.2227	.1383
A-CDM	GDINA	.8570	.8481	.5599	.5372	.2228	.1347
GDINA	DINA	.8650	.8817	.6119	.5948	.2672	.1777
GDINA	DINO	.8831	.8766	.6172	.5959	.2679	.1770
GDINA	A-CDM	.8914	.8838	.6218	.6008	.2702	.1771

Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. $H = 1$ and $H = 2$ pertain to the average correct latent class classification rate using the usual G-DINA model and the mixture G-DINA model, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.

The results presented in Tables 8 through 11 suggest that there is only a slight improvement in the latent class classification rates (vectorwise or elementwise) for most cases wherein the latent groups differ by the generating model from an examination with higher number of items and with good or moderate item quality. This gives light to a conclusion that the usual G-DINA model is specifically robust to heterogeneity caused by item quality in the estimation of the attribute profile of examinees. This implies that it is more advisable to use the G-DINA model than the mixture G-DINA model for cases when the latent groups only differ by the item quality and not the generating model.

As in the results with the absolute bias and the correct latent group classification rates, item quality and test length have significant effect to the correct latent class classification rate. Higher item quality across the examination implies higher latent class classification rates for the examinees when using the mixture G-DINA model. Additionally, longer examination incurs higher correct latent class classification rates, vectorwise or elementwise. However, the number of examinees does not have that much impact in the improvement of the latent class classification rates.

It is still important to note that for the case of equal weights with the same generating model but different item qualities, most of the correctly assigned examinees to attribute vectors are from components with better item quality. On the other hand, for the case of equal weights with the same item quality but different generating models, the majority of the examinees with correctly assigned attribute vectors are from components with DINA or DINO generating models.

For the case of the unequal weights with the same generating model but different item qualities across components, most of the correctly assigned examinees to their true attribute profile are from the majority component with better item quality, which further supports the result that better correct latent class classification rates are expected when the items in the examination have good item quality. Note that there is still a considerable improvement in the performance of ability parameter estimation for the cases of 80%–20% mixing proportion especially when the item qualities are good for both components. There is no improvement already in the estimation of ability parameters for the treatment with 95%–5% mixing proportions. This result indicated that the mixture G-DINA model might not be applicable when the difference between weights of two latent groups is wide.

Lastly, for the case of unequal weights, the same item quality, but different generating models, most of the correctly assigned examinees’ attribute vectors are still from the largest component. Additionally, mixtures that involve majority components that are DINA or DINO have higher correct latent class classification rates than mixtures wherein A-CDM and G-DINA are the majority.

6.2. Simulation Study 2: Estimation of the Number of Components

6.2.1. Design

The factor levels in Simulation Study 2 were the same with those in Simulation Study 1 for the test length, number of examinees, the mixing proportions, the item quality, and the generating model. For the number of generated latent groups, the case when $H = 1$ (or the case when the population is not heterogeneous), $H = 2$ , and $H = 3$ were considered.

To investigate the capabilities of the different model fit criteria in estimating the number of components in the population, the different generated responses were fitted assuming that the number of latent groups were equal to $H = 1$ , $H = 2$ , $H = 3$ , and $H = 4$ . The correct selection rates of the criteria were measured. These model fit criteria are $A I C$ , $B I C$ , $K I C$ , $K I C_{c}$ , $A K I C_{C}$ , $I C L - B I C$ , $C L C$ , and $A W E$ . The formula for the correct selection rate $C S R$ is given in the following:

C S R = \sum_{m = 1}^{M} \frac{I_{m [H = \hat{H}]}}{M},

where $I_{m [H = \hat{H}]} = 1$ if the estimated number of components in the population for replicate m, $\hat{H}$ is equal to H, 0 otherwise, and M is the number of data replicates.

As in Simulation Study 1, the number of attributes was fixed at $K = 5$ , and the number of replicates was set to $M = 100$ . The same Q-matrices in Simulation Study 1 for $J = 15$ and $J = 30$ were employed in Simulation Study 2.

6.2.2. Results

Similar to Simulation Study 1, only the results for selected treatment combinations are presented in this subsection.

6.2.2.1. Correct selection rates

Tables 12 through 16 give the correct selection rates from different treatment combinations of mixtures of generating model, mixtures of item quality, number of items, and number of examinees, across replicates.

Table 12.

Average Correct Selection Rate ( $H = 1$ , $J = 30$ , and $N = 1,500$ , Equal Weights)

Model	IQ	AIC	AICC	BIC	KIC	KICC	AKICC	ICLBIC	CLC	AWE
DINA	G	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
DINO	G	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.96	1.00
A-CDM	G	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.84	1.00
GDINA	G	1.00	0.98	1.00	1.00	0.89	0.88	1.00	0.79	1.00
DINA	M	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.96	1.00
DINO	M	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.83	1.00
A-CDM	M	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.76	1.00
GDINA	M	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.73	1.00
DINA	B	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.99	1.00
DINO	B	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.99	1.00
A-CDM	B	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.98	1.00
GDINA	B	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.99	1.00

Note. Model pertains to the generating model. IQ refers to the item quality for latent groups: G for good, M for moderate, and B for bad. This table shows the average correct selection rates of the following relative model fit criteria: Akaike information criterion (AIC), small-sample AIC (AICC), Bayesian information criterion (BIC), Kullback information criterion (KIC), bias-corrected KIC (KICC), approximation of KIC (AKICC), large sample approximation of integrated classification likelihood (ICLBIC), classification likelihood criterion (CLC), and approximate weight of evidence (AWE). DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.

Table 13.

Average Correct Selection Rate ( $H = 1$ , $J = 15$ , and $N = 1,500$ , Equal Weights)

Model	IQ	AIC	AICC	BIC	KIC	KICC	AKICC	ICLBIC	CLC	AWE
DINA	G	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
DINO	G	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
A-CDM	G	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
GDINA	G	0.98	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
DINA	M	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.84	1.00
DINO	M	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
A-CDM	M	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
GDINA	M	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
DINA	B	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.84	1.00
DINO	B	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
A-CDM	B	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00
GDINA	B	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00	1.00

Table 14.

Average Correct Selection Rate ( $H = 2$ , $J = 30$ , and $N = 1,500$ , Equal Weights)

Model 1	Model 2	IQ 1	IQ 2	AIC	AICC	BIC	KIC	KICC	AKICC	ICLBIC	CLC	AWE
DINA	DINO	G	G	0.99	1.00	1.00	1.00	1.00	1.00	1.00	0.65	1.00
DINA	A-CDM	G	G	1.00	1.00	1.00	1.00	1.00	1.00	0.55	0.35	0.54
DINA	G-DINA	G	G	1.00	1.00	1.00	1.00	1.00	1.00	1.00	0.64	1.00
DINO	A-CDM	G	G	1.00	1.00	1.00	1.00	1.00	1.00	0.62	0.48	0.64
DINO	G-DINA	G	G	1.00	1.00	0.93	1.00	1.00	1.00	0.27	0.62	0.27
A-CDM	G-DINA	G	G	0.96	0.24	0.00	0.15	0.00	0.00	0.00	0.00	0.00
DINA	DINO	M	M	1.00	1.00	1.00	1.00	1.00	1.00	0.11	0.68	0.10
DINA	A-CDM	M	M	0.88	0.06	0.00	0.03	0.00	0.00	0.00	0.00	0.00
DINA	G-DINA	M	M	1.00	1.00	0.00	1.00	0.88	0.94	0.00	0.15	0.00
DINO	A-CDM	M	M	0.43	0.01	0.00	0.01	0.00	0.00	0.00	0.01	0.00
DINO	G-DINA	M	M	0.34	0.08	0.00	0.08	0.01	0.02	0.00	0.02	0.00
A-CDM	G-DINA	M	M	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.26	0.00
DINA	DINO	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
DINA	A-CDM	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.00
DINA	G-DINA	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
DINO	A-CDM	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.00
DINO	G-DINA	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.02	0.00
A-CDM	G-DINA	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.00

Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. IQ 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively: G for good, M for moderate, and B for bad. This table shows the average correct selection rates of the following relative model fit criteria: Akaike information criterion (AIC), small-sample AIC (AICC), Bayesian information criterion (BIC), Kullback information criterion (KIC), bias-corrected KIC (KICC), approximation of KIC (AKICC), large sample approximation of integrated classification likelihood (ICLBIC), classification likelihood criterion (CLC), and approximate weight of evidence (AWE). DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.

Table 15.

Average Correct Selection Rate ( $H = 2$ , $J = 15$ , and $N = 1,500$ , Equal Weights)

Model 1	Model 2	IQ 1	IQ 2	AIC	AICC	BIC	KIC	KICC	AKICC	CLC
DINA	DINO	G	G	0.49	0.56	0.59	0.57	0.70	0.67	0.38
DINA	A-CDM	G	G	0.42	0.18	0.00	0.03	0.01	0.01	0.00
DINA	G-DINA	G	G	0.98	0.98	0.02	0.93	0.84	0.87	0.00
DINO	A-CDM	G	G	0.54	0.21	0.00	0.06	0.00	0.00	0.00
DINO	G-DINA	G	G	0.58	0.28	0.00	0.11	0.02	0.03	0.00
A-CDM	G-DINA	G	G	0.16	0.08	0.00	0.06	0.02	0.03	0.03
DINA	DINO	M	M	0.01	0.00	0.00	0.00	0.00	0.00	0.00
DINA	A-CDM	M	M	0.00	0.00	0.00	0.00	0.00	0.00	0.00
DINA	G-DINA	M	M	0.00	0.00	0.00	0.00	0.00	0.00	0.00
DINO	A-CDM	M	M	0.00	0.00	0.00	0.00	0.00	0.00	0.00
DINO	G-DINA	M	M	0.00	0.00	0.00	0.00	0.00	0.00	0.00
A-CDM	G-DINA	M	M	0.00	0.00	0.00	0.00	0.00	0.00	0.01
DINA	DINO	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00
DINA	A-CDM	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00
DINA	G-DINA	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00
DINO	A-CDM	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00
DINO	G-DINA	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00
A-CDM	G-DINA	B	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00

Table 16.

Average Correct Selection Rate ( $H = 2$ , $J = 30$ , and $N = 1,500$ , Equal Weights)

Model 1	Model 2	IQ 1	IQ 2	AIC	AICC	BIC	KIC	KICC	AKICC	CLC
DINA	DINA	G	M	0.59	0.00	0.00	0.00	0.00	0.00	0.00
DINO	DINO	G	M	0.15	0.00	0.00	0.00	0.00	0.00	0.01
A-CDM	A-CDM	G	M	0.00	0.00	0.00	0.00	0.00	0.00	0.00
GDINA	GDINA	G	M	0.98	0.92	0.00	0.89	0.29	0.39	0.07
DINA	DINA	G	B	1.00	1.00	0.16	1.00	1.00	1.00	0.99
DINO	DINO	G	B	0.99	1.00	0.23	1.00	1.00	1.00	0.97
A-CDM	A-CDM	G	B	1.00	0.34	0.00	0.00	0.00	0.00	0.00
GDINA	GDINA	G	B	1.00	1.00	0.00	1.00	0.96	0.99	0.97
DINA	DINA	M	B	0.00	0.00	0.00	0.00	0.00	0.00	0.01
DINO	DINO	M	B	0.00	0.00	0.00	0.00	0.00	0.00	0.01
A-CDM	A-CDM	M	B	0.00	0.00	0.00	0.00	0.00	0.00	0.00
GDINA	GDINA	M	B	0.00	0.00	0.00	0.00	0.00	0.00	0.06

The correct selection rates for $H = 1$ are acceptable for any of the treatment combinations, implying that the existing relative model fit criteria can correctly distinguish if a data set comes from homogeneous population.

For H = 2, according to Tables 14 through 16, the test length substantially affects the performance of the relative model fit criteria in estimating the number of components as the longer the test is, the better the performance of most of the criteria. Additionally, better item quality across the entire exam incurs better performance of the relative model fit criteria in estimating the number of components in the model. Among the generating model, the involvement of A-CDM in mixture G-DINA modeling decreases the performance of most of the relative model fit criteria. For the case of the same generating model with different item qualities, the higher the difference between the item quality between the latent groups, the better the performance of most of the relative model fit criteria in estimating the number of latent groups.

Among the relative model fit criteria examined, $A I C$ is the most viable in estimating the number of components, as it consistently has the highest performance in most treatment combinations. Note that, when the relative model fit criteria could not provide practical judgment in identifying H = 2, there is also no improvement in the estimation of the attribute vectors. Nevertheless, those are also the same conditions where it is more advisable to employ the GDINA model because it is more robust and better in terms of performance for attribute estimation than the mixture G-DINA model.

It is important to note that the differences between the relative model fit criteria for fitting with right number of components and the relative model fit criteria for fitting with wrong number of components mostly do not even exceed 1% of the size of the relative model fit criteria for fitting with right number of components.

7. Real Data Application

To demonstrate the applicability of the mixture G-DINA model, the model was fitted to the responses of 1,095 German students in the 2,000 Program for International Student Assessment on 26 items from the paper of Chen and dela Torre (2014). The five attributes included in this real data application are (1) locating information, (2) forming broad general understanding, (3) developing a logical interpretation, (4) evaluating a number-rich text with number sense, and (5) evaluating the quality or appropriateness of a text (Chen & dela Torre, 2014). Both the traditional and mixture G-DINA models were fitted to the data and model fit indices were compared.

First, to determine whether there exists possible latent groups in the population, the mixture G-DINA modeling was fitted to the assessment data on the assumption that $H = 1$ and $H = 2$ . Table 17 shows the values of the different relative model fit criteria on the assumption that $H = 1$ (usual G-DINA) and $H = 2$ .

Based on most of the relative model fit criteria in estimating the number of components in the population, $A I C$ , $K I C$ , $K I C_{c}$ , $A K I C_{c}$ , and $C L C$ , it is safe to assume that $H = 2$ . Simulation Study 2 revealed that $A I C$ is the best-performing relative model fit criteria for estimating the number of components in the population. $K I C$ , $K I C_{c}$ , $A K I C_{c}$ , and $C L C$ are pretty much next to $A I C$ in terms of performance. Therefore, we can safely assume that mixture G-DINA wherein $H = 2$ is the better model compared to using the usual G-DINA model. The next step is to compare the relative model fit criteria for $H = 2$ and $H = 3$ . Table 18 shows the relative model fit criteria for $H = 2$ and $H = 3$ .

Comparing the relative model fit criteria, we can see that all of the criteria are higher for $H = 3$ than $H = 2$ . Based on Tables 17 and 18, it can be inferred that there are two existing homogeneous latent groups and, therefore, we used the mixture G-DINA model wherein $H = 2$ .

Table 17.

Relative Model Fit Criteria for $H = 1$ and $H = 2$

	AIC	AICC	BIC	KIC	KICC	AKICC	ICL-BIC	CLC	AWE
$H = 1$	14,301.4	14,337.31	14,956.21	14,432.4	14,480.56	14,472.04	14,956.21	14,039.4	16,266.01
$H = 2$	13,928.04	14,095.14	15,242.64	14,191.04	14,400.7	14,362.93	15,761.85	13,921.24	17,353.05

Note. This table shows the following relative model fit criteria: Akaike information criterion (AIC), small-sample AIC (AICC), Bayesian information criterion (BIC), Kullback information criterion (KIC), bias-corrected KIC (KICC), approximation of KIC (AKICC), large sample approximation of integrated classification likelihood (ICLBIC), classification likelihood criterion (CLC), and approximate weight of evidence (AWE).

Table 18.

Relative Model Fit Criteria for $H = 1$ and $H = 2$

	AIC	AICC	BIC	KIC	KICC	AKICC	ICL-BIC	CLC	AWE
H = 1	14,301.4	14,337.31	14,956.21	14,432.4	14,480.56	14,472.04	14,956.21	14,039.4	16,266.01
H = 2	13,928.04	14,095.14	15,242.64	14,191.04	14,400.7	14,362.93	15,761.85	13,921.24	17,353.05
H = 3	14,012.53	14,160.08	15,686.94	14,207.53	14,656.49	14,561.55	16,462.8	13,998.39	18,860.49

Fitting the model where $H = 2$ , 44% of the examinees were classified to Latent Group 1 and 56% were classified to Latent Group 2. Tables 19 and 20 show the success probability item parameter estimates for $H = 2$ for the first and the second components of the population.

Table 19.

Item Parameters for the First Component

		P(0)	P(1)
		P(00)	P(10)	P(01)	P(11)
Q-vector	Item	P(000)	P(100)	P(010)	P(001)	P(110)	P(101)	P(011)	P(111)
00110	1	0.04	0.01	0.59	0.83
10010	2	0.01	0.04	0.40	0.85
00011	3	0.01	0.25	0.01	0.65
01010	4	0.01	0.01	0.70	0.99
00110	5	0.08	0.01	0.41	0.89
10000	6	0.35	0.98
01001	7	0.01	0.01	0.01	1.00
00101	8	0.26	0.69	0.60	0.70
01001	9	0.01	0.15	0.55	0.42
10100	10	0.14	0.26	0.65	0.70
00110	11	0.12	0.23	0.21	0.90
10010	12	0.08	0.14	0.23	1.00
00110	13	0.04	0.01	0.28	1.00
01010	14	0.01	0.06	0.43	0.87
01011	15	0.04	0.01	0.30	0.01	0.90	0.06	0.54	0.92
01001	16	0.35	0.97	0.77	0.95
10100	17	0.44	0.67	1.00	0.97
10100	18	0.26	0.76	0.76	0.90
10100	19	0.38	0.77	0.87	0.90
01000	20	0.24	0.96
00101	21	0.02	1.00	0.41	0.87
10100	22	0.01	0.16	0.92	0.86
00100	23	0.01	0.74
10100	24	0.15	0.48	0.83	0.94
10100	25	0.06	0.33	0.80	0.75
00100	26	0.04	0.48

Table 20.

Item Parameters for the Second Component

		P(0)	P(1)
		P(00)	P(10)	P(01)	P(11)
Q-vector	Item	P(000)	P(100)	P(010)	P(001)	P(110)	P(101)	P(011)	P(111)
00110	1	0.44	0.62	0.72	0.70
10010	2	0.19	0.60	0.59	0.59
00011	3	0.03	0.33	0.20	0.33
01010	4	0.62	0.74	0.50	0.82
00110	5	0.32	1.00	0.69	0.57
10000	6	0.54	0.86
01001	7	0.01	0.01	0.01	1.00
00101	8	0.46	0.49	0.53	0.63
01001	9	0.02	0.23	1.00	0.31
10100	10	0.30	0.54	0.58	0.38
01010	11	0.41	0.54	0.01	0.75
10010	12	0.37	0.01	0.01	1.00
00110	13	0.27	0.01	0.01	1.00
01010	14	0.39	0.55	0.55	0.77
01011	15	0.17	0.01	0.31	0.01	0.65	0.37	0.01	0.67
01001	16	0.78	0.85	0.40	0.83
10100	17	0.69	0.93	0.78	0.86
10100	18	0.47	0.73	0.62	0.76
10100	19	0.53	0.77	0.75	0.81
01000	20	0.58	0.74
00101	21	0.32	0.43	0.56	0.61
10100	22	0.06	0.55	0.33	0.34
00100	23	0.30	0.30
10100	24	0.55	0.80	0.73	0.67
10100	25	0.22	0.56	0.45	0.53
00100	26	0.12	0.15

Examination of the profile of the two latent groups revealed that the first latent group is mainly composed of items with low guessing and slipping parameters, indicating that the items generally have good discriminating power. The second latent group, on the other hand, is composed of items that generally have high guessing and slipping parameters, implying that the items are of lower quality for the second latent group than the first latent group. Because the item quality is considered better for the first latent group than the second latent group, the attributes estimated for the first latent group are more accurate than the attribute vectors estimated for the second latent group.

Table 21 shows the prevalence rates of the attributes using mixture G-DINA−overall and componentwise (using MLE). It can be seen that the latent group wherein the items are considered to have better quality has more examinees that have Attributes 3 and 4 compared to the second latent group. Meanwhile, the latent group that generally has lower item quality has more examinees that have Attributes 2 and 5. Both latent groups have the same percentage of examinees with Attribute 1.

Table 21.

Percentage Distribution of Attributes: Overall and Componentwise Assuming $H = 2$

Attributes	Overall	Group 1	Group 2
Attribute 1	.59	.59	.59
Attribute 2	.73	.68	.76
Attribute 3	.53	.57	.49
Attribute 4	.70	.67	.72
Attribute 5	.59	.58	.60

Note. Overall is the overall percentage of examinees that possesses the attribute, whereas Group 1 and Group 2 refer to the componentwise percentage of examinees that possesses the attribute for Latent Groups 1 and 2, respectively.

Table 22 lists down the percentage distribution of the attribute vectors overall and componentwise. The attribute vector with the highest prevalence rate is (1,1,1,1,1), that is, most of the examinees have all of the attributes studied. Next to it, 20% of the attributes have none of the attributes being studied. Comparing the two latent groups suggests that the latent group with higher item quality has more students that have attribute vectors (0,0,0,0,0) and (1,1,1,1,1) than the latent group with lower item quality.

Table 22.

Percentage Distribution of Attribute Vectors: Overall and Componentwise Assuming $H = 2$

Attribute Vector	Overall	Group 1	Group 2	Attribute Vector	Overall	Group 1	Group 2
00000	0.20	0.25	0.16	00111	0.00	0.00	0.00
00001	0.01	0.01	0.01	01011	0.01	0.00	0.02
00010	0.05	0.05	0.04	01101	0.00	0.00	0.00
00100	0.00	0.00	0.00	01110	0.01	0.01	0.02
01000	0.00	0.00	0.00	10011	0.00	0.00	0.00
10000	0.00	0.00	0.00	10101	0.00	0.00	0.00
00011	0.00	0.00	0.01	10110	0.00	0.00	0.00
00101	0.00	0.00	0.00	11001	0.03	0.02	0.03
00110	0.00	0.00	0.01	11010	0.03	0.02	0.04
01001	0.05	0.03	0.07	11100	0.00	0.00	0.00
01010	0.00	0.00	0.00	01111	0.06	0.05	0.07
01100	0.00	0.00	0.00	10111	0.00	0.00	0.00
10001	0.00	0.00	0.00	11011	0.08	0.02	0.12
10010	0.00	0.00	0.00	11101	0.01	0.00	0.02
10100	0.00	0.00	0.00	11110	0.11	0.08	0.12
11000	0.00	0.00	0.00	11111	0.33	0.42	0.26

This analysis demonstrated the viability of the proposed model when an actual data were fitted with it. Moreover, more insights on the examinee profiles can be generated because the behavior of the different latent groups in the population when there is heterogeneous data can be seen.

8. Conclusions and Recommendations

Inherent heterogeneity can be present sometimes in assessment data. It could due to the existence of DIF, the presence of different strategies in answering problem solving question, or the existence of difference of distributions of the item parameters and prior probabilities. In analyzing formative assessment data, one of the most commonly used CDMs is the G-DINA model. However, the G-DINA model does not account for the possibility of existence of latent groups in the population. Therefore, one should not stop with insights derived from the G-DINA model if one suspects there is the presence of latent groups in the population because it could lead to invalid inferences. This study proposed and investigated the viability of the mixture G-DINA model, a novel CDM for fitting assessment data that accounts for the presence of heterogeneity.

As demonstrated in the real data application, the mixture G-DINA model can provide more insights to the profile of the examinees if the population is heterogeneous and is composed of different homogeneous components. However, the extensive simulation studies showed that utilizing the mixture G-DINA is only advisable in certain cases.

The simulation studies also showed that it is important to have longer test length and good item quality in the test they lead to more accurate parameter estimation, latent group classification rate, attribute classification rate, and performance of relative model fit criteria in estimating the number of components for mixture G-DINA modeling. On the other hand, the number of examinees has generally lower effect compared to test length on the absolute bias, latent group correct classification rate, attribute classification rate, and performance of relative model fit criteria in estimating the number of components for mixture G-DINA modeling. These findings suggest that it is an absolute necessity that if ever mixture G-DINA modeling is employed, the test developer will make the test long enough (the test should be around 30 questions long) and the items of good quality: that is the items are not easy to be guessed by examinees who lacked the important skill in answering the questions, or that examinees that have the appropriate skills to answer the items will not slip in making a mistake. It is a good thing as the test length and the item qualities are both factors that the test developers are in control of.

Among the methods in attribute profile estimation, estimation via MLE is the most viable in mixture G-DINA modeling. Although there are instances wherein MLE is not superior to either EAP or MAP, particularly for cases wherein there are differences for the components in the item quality but not the generating model, the differences in the correct classification rate of vectorwise or elementwise attribute profile are generally negligible in those cases. However, the improvement of using MLE method in attribute profile estimation for the cases, wherein the components differ by generating model, is fairly large, hence the recommendation for using the MLE method.

The performance of mixture G-DINA modeling is higher for instances wherein there are different generating model with high item quality than instances wherein the components only differ by item quality but they have the same generating model. It is important to reiterate too that the simulation studies conducted exemplified the robustness of the G-DINA model when the components only differ by item quality but not by generating models. It is not advisable to use mixture G-DINA modeling for these cases as the G-DINA model is shown to have superiority over mixture G-DINA model for attribute estimation, and it is our main goal to provide correct attribute profile of examinees more than having good parameter estimates or better latent group classification.

$A I C$ is shown to be the most viable in estimating the number of components via relative model fit criteria. However, $A I C$ ‘s superiority over the other relative model fit criteria stems only from $A I C$ ‘s performance for cases wherein mixture G-DINA model is not powerful enough to induce huge improvement in the model fit in comparison with the usual G-DINA model. When the items are of good quality, and the test is long enough, it is still much more advisable to compare all of the relative model fit criteria as they all have exceptionally good performance in estimating the number of components for these cases. Devising a better relative model fit criteria in estimation of the number of components might be an interesting future topic to explore.

Lastly, if two of the generating models are ACDM and G-DINA, there is an evident decrease in the performance of mixture G-DINA model in item parameter estimation, attribute profile estimation, latent group classification, and performance of relative model fit criteria in estimating the number of components for mixture G-DINA modeling. If two of the components of the population are assumed to behave that of ACDM and G-DINA, it is not advisable to use mixture G-DINA model especially when the items are not of good quality. This is probably due to the complexity of the model and how close these two models are in terms of item parameterization. Because of varying performances of the different generating models for mixture G-DINA model, it is helpful for us to assess the profile of the estimates of item parameters in order to know the generating model the components resemble.

The natural next step for this research is to examine the performance of the mixture G-DINA model in answering some of the common problems in CDMs such as DIF, uncovering different strategies in answering problem-solving question, and detection of aberrant response patterns, as several studies have stated. The mixture G-DINA model can further be extended to modeling polytomous responses or estimating polytomous attributes. Examining the performance of mixture G-DINA model to other factors that were not controlled in the study and the impact of the number of attributes on the performance of the model could be a future research topic to pursue. Additionally, closer examination of model identifiability warrants a separate study.

Introducing constraints in the mixture G-DINA model to introduce new mixture models such as mixture DINA, mixture DINO, and mixture A-CDM might give better performance, especially for cases when the mixture G-DINA is found to be not viable.

Footnotes

Authors’ Note

Supplemental materials (such as codes) related to this study can be requested directly from the corresponding author.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research and/or authorship of this article: This work was supported by the Philippine Statistical Research and Training Institute Thesis and Dissertation Grant of 2020.

ORCID iD

Joemari Olea

References

Aitkin

Tunnicliffe Wilson

(1980). Mixture models, outliers, and the EM algorithm. Technometrics, 22, 325–331.

Akaike

(1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.

Banfield

J. D.

Raftery

A. E.

(1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49, 803–821.

Bayes

(1763). An essay towards solving a problem in the doctrine of chances. Reprinted in Biometrika, 45(1958), 296–315.

Biernacki

Celeux

Govaert

(1998). Assessing a mixture model for clustering with the integrated classification likelihood. Technical Report No. 3521. INRIA.

Biernacki

Govaert

(1997). Using the classification likelihood to choose the number of clusters. Computing Science and Statistics, 29, 2451–2457.

Bradshaw

L. P.

Madison

M. J.

(2016). Invariance properties for general diagnostic classification models. International Journal of Testing, 16, 99–118. https://doi.org/10.1080/15305058.2015.1107076

Cavanaugh

J. E.

(1999). A large-sample model selection criterion based on Kullback’s symmetric divergence. Statistics and Probability Letters, 44, 333–344.

Celeux

Soromenho

(1996). An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification, 13, 195–212.

10.

Chen

de la Torre

(2014). A procedure for diagnostically modeling extant large-scale assessment data: The case of the programme for international student assessment in reading. Psychology, 5(18), 1967.

11.

de la Torre

(2011). The generalized DINA model framework. Psychometrika, 76, 179–199.

12.

de la Torre

Lee

Y.-S.

(2013). Evaluating the Wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50, 355–373.

13.

de la Torre

Minchen

(2014). Cognitively diagnostic assessments and the cognitive diagnosis model framework. Psicologia Educativa, 20(2), 89–97.

14.

de la Torre

van der Ark

Rossi

(2015). Analysis of clinical data from a cognitive diagnosis modeling framework. Measurement and Evaluation in Counseling and Development. https://doi.org/10.1177/0748175615569110

15.

DiBello

Roussos

Stout

(2007). Cognitive diagnosis Part I. In Rao

C. R.

Sinharay

(Eds.), Handbook of statistics psychometrics (Vol. 26), (pp. 979–1030). Elsevier.

16.

Früwirth-Schnatter

(2006). Finite mixture and Markov switching models. Springer.

17.

Garcia

Olea

de la Torre

(2014). Application of cognitive diagnosis models to competency-based situational judgment tests. Psichothema, 26, 372–377.

18.

(2020). Partial identifiability of restricted latent class models. The Annals of Statistics, 48(4), 2082 – 2107.

19.

Haertel

E. H.

(1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 333–352.

20.

Hartz

(2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality [Unpublished doctoral dissertation]. University of Illinois at Urbana–Champaign.

21.

Hathaway

R. J.

(1986). Another interpretation of the EM algorithm for mixture distributions. Statistics & Probability Letters, 4, 53–56.

22.

Huebner

Wang

(2011). A note on comparing examinee classification methods for cognitive diagnosis models. Educational and Psychological Measurement, 71(2), 407–419.

23.

Huo

de la Torre

Ratna

(2014). Differential item functioning assessment in cognitive diagnosis modeling: Application of the Wald test to investigate DIF in the DINA model. Journal of Educational Measurement, 51(1), 98–125.

24.

Hurvich

C. M.

Tsai

C. L.

(1989). Regression and time series model selection in small samples. Biometrika, 76, 297–307.

25.

Junker

B. W.

Sijtsma

(2001). Cognitive assessment models with few assumptions, and connections with non-parametric item response theory. Applied Psychological Measurement, 25, 258–272.

26.

de la Torre

(2020). GDINA: An R package for cognitive diagnosis modeling. Journal of Statistical Software, 93(14), 1–26. https://doi.org/10.18637/jss.v093.i14

27.

Terzi

de la Torre

(2021). Detecting differential item functioning using multiple-group cognitive diagnosis models. Applied Psychological Measurement, 45(1), 37–53.

28.

Madison

M. J.

Bradshaw

L. P.

(2018). Assessing growth in a diagnostic classification model framework. Psychometrika, 83, 963–990. https://doi.org/10.1007/s11336-018-9638-5

29.

Maris

(1999). Estimating multiple classification latent class models. Psychometrika, 64, 187–212.

30.

McLachlan

Peel

(2000). Finite mixture models. John Wiley & Sons, Inc.

31.

McLachlan

Basford

(1988). Mixture models: Inference and applications to clustering. Marcel Dekker, Inc.

32.

Mislevy

R. J.

Verhest

(1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55, 195–215.

33.

Santos

de la Torre

Barrios

(2015, July 13–15). Detecting aberrant examinees in cognitive diagnosis models [Paper presentation]. International Meeting of Psychometric Society. https://www.psychometricsociety.org/imps-2015

34.

Schwarz

(1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.

35.

Seghouane

A.-K.

Bekara

Fleury

(2005). A criterion for model selection in the presence of incomplete data based on Kullback’s symmetric divergence. Signal Process, 85, 1405–1417.

36.

Seghouane

A.-K.

Maiza

(2004). A small sample model selection criterion based on Kullback’s symmetric divergence. IEEE Transactions on Signal Processing, 52, 3314–3323.

37.

Sen

Cohen

A. S.

(2021). Sample size requirements for applying diagnostic classification models. Frontiers in Psychology, 11, 621251. https://doi.org/10.3389/fpsyg.2020.621251

38.

Sorrel

Olea

Abad

de la Torre

Aguado

Lievens

(2016). Validity and reliability of situational judgement test scores: A new approach based on cognitive diagnosis models. Organizational Research Methods. https://doi.org/10.1177/1094428116630065

39.

Tatsuoka

(1983). Rule space: An sample size requirements for applying based on item response theory. Journal of Educational Measurement, 20, 345–354.

40.

Templin

Henson

R. A.

(2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305.

41.

von Davier

(2005). A general diagnostic model applied to language testing data. (Educational Testing Service Reasearch Report No. RR-05-16). ETS.

42.

von Davier

(2007). Mixture distribution diagnostic models. (Educational Testing Service Reasearch Report No. RR-07-32). ETS.

43.

von Davier

(2010). Hierarchical mixtures of diagnostic models. Psychological Test and Assessment Modeling, 52(1), 8.

44.

Wang

(2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68, 456–477.

45.

(2017). Identifiability of restricted latent class models with binary responses. The Annals of Statistics, 45(2), 675–707.

46.

Zhang

Yang

Zhang

Sun

(2021). Exploring multiple strategic problem solving behaviors in educational psychology research by using mixture cognitive diagnosis model. Frontiers in Psychology, 12, 568348. https://doi.org/10.3389/fpsyg.2021.568348

Items	Attributes					Items	Attributes					Items	Attributes
	a1	a2	a3	a4	a5		a1	a2	a3	a4	a5		a1	a2	a3	a4	a5
1*	1	0	0	0	0	11*	1	1	0	0	0	21*	1	1	1	0	0
2*	0	1	0	0	0	12	1	0	1	0	0	22	1	1	0	1	0
3*	0	0	1	0	0	13	1	0	0	1	0	23*	1	1	0	0	1
4*	0	0	0	1	0	14*	1	0	0	0	1	24	1	0	1	1	0
5*	0	0	0	0	1	15*	0	1	1	0	0	25	1	0	1	0	1
6	1	0	0	0	0	16	0	1	0	1	0	26*	1	0	0	1	1
7	0	1	0	0	0	17	0	1	0	0	1	27*	0	1	1	1	0
8	0	0	1	0	0	18*	0	0	1	1	0	28	0	1	1	0	1
9	0	0	0	1	0	19	0	0	1	0	1	29	0	1	0	1	1
10	0	0	0	0	1	20*	0	0	0	1	1	30*	0	0	1	1	1

Items	Attributes					Items	Attributes					Items	Attributes
	a1	a2	a3	a4	a5		a1	a2	a3	a4	a5		a1	a2	a3	a4	a5
1*	1	0	0	0	0	11*	1	1	0	0	0	21*	1	1	1	0	0
2*	0	1	0	0	0	12	1	0	1	0	0	22	1	1	0	1	0
3*	0	0	1	0	0	13	1	0	0	1	0	23*	1	1	0	0	1
4*	0	0	0	1	0	14*	1	0	0	0	1	24	1	0	1	1	0
5*	0	0	0	0	1	15*	0	1	1	0	0	25	1	0	1	0	1
6	1	0	0	0	0	16	0	1	0	1	0	26*	1	0	0	1	1
7	0	1	0	0	0	17	0	1	0	0	1	27*	0	1	1	1	0
8	0	0	1	0	0	18*	0	0	1	1	0	28	0	1	1	0	1
9	0	0	0	1	0	19	0	0	1	0	1	29	0	1	0	1	1
10	0	0	0	0	1	20*	0	0	0	1	1	30*	0	0	1	1	1

Items	Attributes					Items	Attributes					Items	Attributes
	a1	a2	a3	a4	a5		a1	a2	a3	a4	a5		a1	a2	a3	a4	a5
1*	1	0	0	0	0	11*	1	1	0	0	0	21*	1	1	1	0	0
2*	0	1	0	0	0	12	1	0	1	0	0	22	1	1	0	1	0
3*	0	0	1	0	0	13	1	0	0	1	0	23*	1	1	0	0	1
4*	0	0	0	1	0	14*	1	0	0	0	1	24	1	0	1	1	0
5*	0	0	0	0	1	15*	0	1	1	0	0	25	1	0	1	0	1
6	1	0	0	0	0	16	0	1	0	1	0	26*	1	0	0	1	1
7	0	1	0	0	0	17	0	1	0	0	1	27*	0	1	1	1	0
8	0	0	1	0	0	18*	0	0	1	1	0	28	0	1	1	0	1
9	0	0	0	1	0	19	0	0	1	0	1	29	0	1	0	1	1
10	0	0	0	0	1	20*	0	0	0	1	1	30*	0	0	1	1	1