Abstract
Although the generalized deterministic inputs, noisy “and” gate model (G-DINA; de la Torre, 2011) is a general cognitive diagnosis model (CDM), it does not account for the heterogeneity that is rooted from the existing latent groups in the population of examinees. To address this, this study proposes the mixture G-DINA model, a CDM that incorporates the G-DINA model within the finite mixture modeling framework. An expectation–maximization algorithm is developed to estimate the mixture G-DINA model. To determine the viability of the proposed model, an extensive simulation study is conducted to examine the parameter recovery performance, model fit, and correct classification rates. Responses to a reading comprehension assessment were analyzed to further demonstrate the capability of the proposed model.
1. Introduction
Cognitive diagnosis models (CDMs) are statistical models that elicit a set of finer-grained examinee’s attributes or skills for diagnostic purposes (e.g., de la Torre & Minchen, 2014). While CDMs originated from education with the purpose of analyzing formative assessments, CDMs have also been found to be useful in analyzing clinical data (e.g., de la Torre et al., 2015) and responses from competency-based situational judgment tests (e.g., Garcia et al., 2014; Sorrel et al., 2016).
In the CDM literature, the attributes of an examinee i are represented by a vector
Over the past years, several CDMs have already been developed. Some noteworthy models include the deterministic inputs, noisy, “and” gate model (DINA; Haertel, 1989; Junker & Sitjsma, 2001); deterministic inputs, noisy, “or” gate model (DINO; Templin & Henson, 2006); additive CDM (A-CDM; de la Torre, 2011); linear logistic model (LLM; Maris, 1999); and reduced, reparametrized unified model (RRUM; DiBello et al., 2007; Hartz, 2002). These models have been shown to be special cases of the generalized deterministic inputs, noisy “and” gate model (G-DINA; de la Torre, 2011).
Although the G-DINA model can be thought of as a general CDM, it does not account for the heterogeneity that is rooted from possible existing latent groups in the population. The heterogeneity in the data can be caused by differential item functioning (DIF) or items wherein the probability of a correct response is unequal for examinees with the same attribute profile but is from different latent groups (Huo et al., 2014). The heterogeneity can also be brought upon by different strategies in answering multiple-strategy items caused by different Q-matrices (Mislevy & Verhest, 1990; Wang & Xu, 2015; Zhang et al., 2021). Another possible source of heterogeneity is different latent groups of examinees following different distributions in terms of item parameters or prior probabilities (von Davier, 2007). Based on von Davier (2010), the heterogeneity in mixture population can also be reflected in the different skill distributions across different latent groups. This study introduces an extension of the G-DINA model that incorporates it within finite mixture modeling framework. Finite mixture models are a flexible way of modeling data sets that come from different latent groups of the population (McLachlan & Basford, 1988). The proposed model called mixture G-DINA model aims to provide a better model fit to the response data from examinees coming from multiple latent groups inherent in the population.
In the CDM literature, von Davier (2007) introduced a mixture CDM named mixture generalized diagnostic models, which incorporates the mixture modeling methodology with that of the generalized diagnostic models (GDMs; von Davier, 2005). von Davier (2010) also proposed the hierarchical GDM, which extends the mixture GDM to manage multilevel data. It is important to note that GDM and the G-DINA model have some notable differences. The most evident are the different parameterization of the two models and the interpretation of the parameter estimates. This study introduces another mixture model that is built upon the modeling process of the G-DINA model. Zhang et al. (2021) also introduced a mixture multiple strategy DINA to investigate individual differences in the selection of response categories in multiple-strategy items. Although it is possible to extend the mixture G-DINA model in application on multiple-strategy items due to different Q-matrices, it is not be within the scope of this present study.
The main appeal of extending CDMs to include mixture modeling is that it can be used in determining whether different distributions must be assumed for different latent groups. This means that it can improve model fit and correct classification rate of examinees compared to using the traditional G-DINA model in fitting data sets with inherent heterogeneity. Several studies have also stated its viability in solving some popular problems in CDMs. According to von Davier (2007), it can be a stepping stone in determining DIF items. In item response theory, mixture modeling has already been used in uncovering different strategies in answering a problem-solving question (Mislevy & Verhest, 1990; Wang & Xu, 2015). Additionally, Aitkin and Tunnicliffe Wilson (1980) stated that mixture modeling can be used in identifying outliers in a data set, opening its possibility for application in the identification of aberrant response patterns.
Another advantage of extending the G-DINA model specifically to mixture modeling application is the fact that a mixture model that builds upon the G-DINA model will be helpful for researchers who have employed the model in their studies and have extensive knowledge on the G-DINA model framework. The G-DINA model is one of the most common CDMs as it can be implemented freely in R (Ma, 2020). Aside from that, there has already been extensive research built upon this model framework.
This study proposes to employ expectation–maximization (EM) algorithm to obtain the maximum likelihood estimates of the parameters of the mixture G-DINA model and evaluate the capability of the proposed model across different factors, such as test length, number of examinees, and mixture composition of the population. To be specific, this study aimed to examine the performance of mixture G-DINA model in estimating the different item parameters from different latent groups and to determine the performance of the model in correctly classifying the latent groups of the examinees. Furthermore, the study would also investigate whether it can improve the classification of the attribute profile of examinees from different latent groups compared to the usual G-DINA model. This present study also aimed to identify which relative model fit criteria can be used in assessing the number of latent group components in the population, if the number of latent groups is unknown. The application of the proposed model was also applied to a real-world data to demonstrate its viability.
2. The G-DINA Model
The G-DINA model has been proposed to loosen the assumptions of the DINA model and to allow the differing subsets of attributes to have different success probabilities for an item (de la Torre, 2011). In the DINA model, the population is partitioned into two latent groups per item: those who have all the required attributes and those who lack at least one required attribute in answering the item; whereas in the G-DINA model, the population is divided into
The reduced attribute vector
where
2.1. Estimation of the G-DINA Model
The maximum likelihood estimator (MLE) of
where
For ability parameter estimation, three methods can be used, namely, MLE, maximum a posterior (MAP), and expected a posteriori (EAP; Huebner & Wang, 2011). The estimated attribute vector using MLE is the attribute vector given by
On the other hand, the MAP estimator denoted by
Lastly, the EAP estimator uses the following formula:
where
3. On Mixture Models
Finite mixture modeling is a flexible way of modeling heterogeneous data that come from different latent groups inherent in the population (McLachlan & Peel, 2000). Mixture modeling has been applied in several areas, such as biometrics, genetics, medicine, and marketing (Früwirth-Schnatter, 2006). McLachlan and Basford (1988) utilized a maximum likelihood approach in estimating the parameters of a finite mixture model.
Suppose H is the number of latent groups in a population. Let
where
To determine the latent group classification of the observations, Bayes’s rule is commonly utilized (Bayes, 1763). The posterior probability that element i of the population is in latent group h given the random vector
3.1. Estimation of Mixture Models
The advent of EM algorithm has increased the interest in modeling heterogeneous data using finite mixture models. It simplifies the estimation of the parameters, because the latent groupings inside the population can be treated as missing data, in which EM algorithm excels at (McLachlan & Peel, 2000). The EM algorithm for the estimation of parameters from mixture models is discussed in McLachlan and Basford (1988).
Let
where
The log likelihood for the complete data,
For the E-step, we need to calculate
For the M-step, we need to choose the value of
The E and M steps are alternated repeatedly until the convergence criterion is met.
4. On Model Fit Criteria
Several model fit criteria have already been used for relative fit evaluation and determining the number of latent groups in the population when modeling mixture models. For this study, the performance of the following relative model fit criteria was examined: Akaike information criterion (
The AIC (Akaike, 1974) selects the model that minimizes
where d is the number of unknown model parameters and
The BIC (Schwarz, 1978) picks the model with the lowest
where N is the number of observations in the dataset. Cavanaugh (1999) developed an asymptotic unbiased estimator of the KIC given by
The bias correction of
where
The large sample approximation of the ICL-BIC (Biernacki et al., 1998) selects the model with the minimum
where
The entropy term
The CLC (Biernacki & Govaert, 1997) is given by
The model with the lowest
The AWE (Banfield & Raftery, 1993) is expressed as
where
5. The Proposed Model
Similar to the G-DINA model, the reduced attribute vector
where
Note that the IRF of the proposed model is almost equivalent with the multiple group G-DINA model introduced by Ma et al. (2021). The key difference is the fact that the latent groups in multiple group G-DINA model are determined a priori, whereas the latent groups in the mixture G-DINA model are not known beforehand (i.e., the latent grouping variable of the examinees is considered another latent variable). The mixture G-DINA model also makes use of the mixture proportion, which multiple group G-DINA model does not have. The mixing proportion is an additional model parameter that needs to be estimated, providing the proportion of examinees belonging to a specific latent group. The mixing proportion is incorporated in the likelihood function used for the estimation of the model parameters.
5.1. Estimation Procedure
Let
under the assumption that the response vectors are independent, where
where
where
where
The MLE of
where
The following are the steps of the EM algorithm in estimating the parameters of the mixture G-DINA model:
1. At iteration 0, assign initial values
2. Compute for
3. At iteration t, compute for
Update
4. At iteration t, update
At the same time, compute for
5. Repeat Steps 3 and 4 until convergence or when the max difference in all the model parameters
The priors are estimated using the information from the data by employing the empirical Bayes method. For each iteration, the prior probabilities for the attribute vectors are updated using the following expression:
Note that this algorithm is for the assumption that the number of latent groups is already known. If the number of components in the population is unknown, one way to estimate it is through the use of relative model fit criteria. This is done by fitting mixture models at
In the estimation of the latent group membership of the examinees, the Bayes’s rule was used. The latent group membership of examinee i was determined by the maximum
6. Simulation Studies
Two simulation studies were conducted in this research. Simulation Study 1 investigated the viability of the mixture G-DINA model when fitted to heterogeneous response data. Simulation Study 2 focused on the estimation of the number of components in the population using relative model fit criteria. The monotonicity property of the success probabilities was not assumed in the simulation study and the real data application. This means that it is possible for examinees who possess more attributes to have lower success probability than examinees who possess less attributes needed to answer the question.
6.1. Simulation Study 1: Evaluation of the Performance of the Proposed Model
6.1.1. Design
The factors manipulated in Simulation Study 1 were the test length, the number of examinees, the mixing proportions, the item quality (IQ), the different generating models (genmod), and the number of latent groups.
For the number of items,
The performance of the mixture G-DINA model was compared across the different partitioning of the mixing proportions: even and uneven. For two latent groups with even partitioning, 50% was generated from latent Groups 1 and 2. For two latent groups with uneven partitioning, two mixing proportions specifications were investigated: 80%–20% and 95%–5%. The 95%–5% mixing proportion specification is for examining the performance of the model in simulating cheating scenarios, where 5% of the examinees cheated. For three latent groups with even partitioning, 1/3 was generated from latent Groups 1 through 3. For uneven partitioning of the three latent groups, 50% was generated from Latent Group 1, 30% was generated from Latent Group 2, and 20% was generated from Latent Group 3.
For item quality, the combinations of good, moderate, and poor item quality were considered for a single test. Items with guessing and slip parameters generated from uniform distribution with mean equal to 0.1, 0.2, and 0.3 were considered to be items with good, moderate, and bad quality, respectively. Lastly, the generating model was manipulated for the simulation design to form mixtures. Combinations of different generating models: G-DINA, DINA, DINO, and A-CDM were considered in the study. These scenarios were based on Santos et al. (2015) when they examined the impact of aberrant examinees in CDMs using a forward search algorithm. In generating aberrant examinees, they assumed that the response patterns came from another CDM with different parameter specifications. For instance, the typical response patterns were generated from DINA model with guessing and slip parameters equal to 0.10 and aberrant response patterns were generated from DINA model but with guessing and slip parameters equal to 0.25. Similar scenarios were investigated in this study to investigate the performance of the mixture G-DINA model in handling such situations when there is the presence of outlying response patterns in the assessment data.
The number of attributes in consideration was fixed at
Q-Matrix Used for the Simulation Studies
Note: Items with * are used for J = 15.
To examine whether the proposed estimation procedure can recover the parameters of the mixture G-DINA model, the overall absolute bias of the estimates and the componentwise absolute bias were measured across different conditions. The overall absolute bias denoted by
where
To determine the viability of mixture G-DINA model in classifying the examinees correctly into their inherent latent groups in the population and the latent attribute classes, the correct latent group classification rates (
where
The correct latent class classification rate of the mixture G-DINA model was compared against the case wherein we only used the usual G-DINA model, to know whether it improves the correct attribute classification. For the estimation of the attribute vector, MLE, EAP, and MAP were considered.
6.1.2. Results
Due to space limitations, only the results for some of the treatment combinations are presented in this subsection.
6.1.2.1. Overall and componentwise absolute bias
Tables 2 through 4 show the average overall and componentwise absolute biases of estimated success probabilities across items and across different treatment combinations of mixtures of generating model, combinations of item quality, number of items, and number of examinees across replicates.
Average Absolute Biases of Estimated Success Probabilities Across Items (Equal Weights and
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Overall is the overall absolute bias for both components, whereas Comp1 and Comp2 refer to the componentwise absolute bias for Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
Average Absolute Biases of Estimated Success Probabilities Across Items (Equal Weights and
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Overall is the overall absolute bias for both components, whereas Comp1 and Comp2 refer to the componentwise absolute bias for Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
Average Absolute Biases of Estimated Success Probabilities Across Items (Equal Weights and
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Overall is the overall absolute bias for both components, whereas Comp1 and Comp2 refer to the component-wise absolute bias for Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
Based on Tables 2 through 4, if there are two latent groups in the population that differs in item quality only, better item quality for both latent groups implies lower overall absolute bias. This characteristic is seen across all the treatment combinations in the study. Additionally, if we have two latent groups that only differ in the generating model, the absolute bias is generally lower for latent groups with higher item quality. Note that the average absolute bias is already too high and unacceptable when the item quality for both components is bad. This implies that it is not advisable to use the proposed methodology for cases when the item quality for both components is bad.
The results also showed that the number of examinees has less impact in comparison with the number of items on the reduction of absolute bias, if both latent groups came from different generating models. In terms of test length, longer examinations incur lower overall and componentwise absolute bias. Generally, for latent groups that differ in the generating model, the overall absolute bias is lower for populations equally weighted than unequally weighted.
The overall absolute bias is generally lower when there is a component that is from A-CDM. It is oftentimes evidenced that the absolute bias is lowest when A-CDM and G-DINA are both involved in the population, for both
One can see from the componentwise absolute bias that the absolute bias from components with higher item quality is lower compared to the absolute bias from components with lower item quality. Lastly, the absolute bias from components involving DINA and DINO model is lower compared to that of G-DINA and A-CDM. The componentwise absolute bias of components with A-CDM is also consistently the highest among the four generating models.
6.1.2.2. Correct latent group classification rates
Given in Tables 5 through 7 are the mean correct latent group classification rates across different treatment combinations of mixtures of different generating models, mixtures of different item qualities, number of items, and number of examinees across replicates.
Average Correct Latent Group Classification Rate Across Examinees (Equal Weights and
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
Average Correct Latent Group Classification Rate Across Examinees (Equal Weights and
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
Average Correct Latent Group Classification Rate Across Examinees (
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively. DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
Tables 5 through 7 reveal that the correct latent group classification rate is generally higher for latent groups with unequal weights than equal weights. It is important to note that for the case of unequal weights, most of the examinees from mixture that are classified correctly came from the majority class.
The results in Tables 5 through 7 also give evidence that higher difference in the item quality between latent groups implies higher latent group classification rate across the population. The latent group classification rate is also higher when the generating model is different compared when the latent groups differ only by the item quality. Both results signify that the more prominent the difference between the characteristics of the mixtures, the better the methodology in correctly classifying the examinees.
Both item quality, test length, and number of mixtures are the significant factors that affect the correct latent group classification rate. The higher the item quality for latent groups that differ only by generating models, the higher the latent group classification rate is. Longer test length also results to higher latent group classification rate. Additionally, there is generally higher correct latent group classification rate for two latent groups than three latent groups. On the other hand, there is not that much improvement in the latent group classification rate in increasing the number of examinees.
For
For the case of equal weights with the same generating model but different item qualities, the majority of the correctly assigned examinees to latent groups are from components with better item quality. Aside from that, for the case of equal weights with the same item quality and different generating models, the majority of the correctly assigned examinees are from components with DINA or DINO generating models. Furthermore, for simulation treatment combinations with unequal weights, the same generating model, but different item quality across components, it is important to note that most of the correctly assigned examinees to their true latent groups are from the majority latent group with better item quality. For cases with mixtures comprising of unequal weights, the same item quality, but different generating models, most of the correctly assigned examinees are still from the largest component. However, it is evident that the generating model affects the percentage of examinees that are correctly assigned in the component. Mixtures wherein majority of it are from DINA or DINO have higher correct classification rate than mixtures wherein A-CDM and G-DINA are the majority.
Lastly, examining Table 7 with 95%–5% mixture condition, we can see that the performance of the proposed method to correctly identify the cheating group also largely depends on the same factors discussed earlier. Higher performance in correctly identifying the cheating group is for cases when the test is longer, both mixtures have good item quality and do not involve both G-DINA and A-CDM components. It can be noted that because most of the correctly classified examinees belong to the majority class (original members of the 95% class), the proposed method has higher performance in correctly classifying examinees that are members of the noncheating latent group than the cheating latent group.
6.1.2.3. Correct latent class classification rates
The results given in Tables 8 through 11 show the mean vectorwise correct latent class classification rate using maximum likelihood estimation across replicates. Only results from attribute estimation using MLE were presented, because among the three methods of estimating the attribute profile of examinees, it is the MLE that consistently had the highest performance in mixture G-DINA modeling. Because the results from the correct attribute classification rate vectorwise and elementwise follow congruently with each other, it suffices to present only the vectorwise correct latent class classification rates.
Average Correct Latent Class Classification Rate (Vectorwise) via Maximum Likelihood Estimator (Equal Weights,
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively.
Average Correct Latent Class Classification Rate (Vectorwise) via Maximum Likelihood Estimator (Equal Weights,
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively.
Average Correct Latent Class Classification Rate (Vectorwise) via Maximum Likelihood Estimator (Equal Weights,
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively.
Average Correct Latent Class Classification Rate (Vectorwise) via Maximum Likelihood Estimator (Unequal Weights,
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. Item Quality 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively.
The results presented in Tables 8 through 11 suggest that there is only a slight improvement in the latent class classification rates (vectorwise or elementwise) for most cases wherein the latent groups differ by the generating model from an examination with higher number of items and with good or moderate item quality. This gives light to a conclusion that the usual G-DINA model is specifically robust to heterogeneity caused by item quality in the estimation of the attribute profile of examinees. This implies that it is more advisable to use the G-DINA model than the mixture G-DINA model for cases when the latent groups only differ by the item quality and not the generating model.
As in the results with the absolute bias and the correct latent group classification rates, item quality and test length have significant effect to the correct latent class classification rate. Higher item quality across the examination implies higher latent class classification rates for the examinees when using the mixture G-DINA model. Additionally, longer examination incurs higher correct latent class classification rates, vectorwise or elementwise. However, the number of examinees does not have that much impact in the improvement of the latent class classification rates.
It is still important to note that for the case of equal weights with the same generating model but different item qualities, most of the correctly assigned examinees to attribute vectors are from components with better item quality. On the other hand, for the case of equal weights with the same item quality but different generating models, the majority of the examinees with correctly assigned attribute vectors are from components with DINA or DINO generating models.
For the case of the unequal weights with the same generating model but different item qualities across components, most of the correctly assigned examinees to their true attribute profile are from the majority component with better item quality, which further supports the result that better correct latent class classification rates are expected when the items in the examination have good item quality. Note that there is still a considerable improvement in the performance of ability parameter estimation for the cases of 80%–20% mixing proportion especially when the item qualities are good for both components. There is no improvement already in the estimation of ability parameters for the treatment with 95%–5% mixing proportions. This result indicated that the mixture G-DINA model might not be applicable when the difference between weights of two latent groups is wide.
Lastly, for the case of unequal weights, the same item quality, but different generating models, most of the correctly assigned examinees’ attribute vectors are still from the largest component. Additionally, mixtures that involve majority components that are DINA or DINO have higher correct latent class classification rates than mixtures wherein A-CDM and G-DINA are the majority.
6.2. Simulation Study 2: Estimation of the Number of Components
6.2.1. Design
The factor levels in Simulation Study 2 were the same with those in Simulation Study 1 for the test length, number of examinees, the mixing proportions, the item quality, and the generating model. For the number of generated latent groups, the case when
To investigate the capabilities of the different model fit criteria in estimating the number of components in the population, the different generated responses were fitted assuming that the number of latent groups were equal to
where
As in Simulation Study 1, the number of attributes was fixed at
6.2.2. Results
Similar to Simulation Study 1, only the results for selected treatment combinations are presented in this subsection.
6.2.2.1. Correct selection rates
Tables 12 through 16 give the correct selection rates from different treatment combinations of mixtures of generating model, mixtures of item quality, number of items, and number of examinees, across replicates.
Average Correct Selection Rate (
Note. Model pertains to the generating model. IQ refers to the item quality for latent groups: G for good, M for moderate, and B for bad. This table shows the average correct selection rates of the following relative model fit criteria: Akaike information criterion (AIC), small-sample AIC (AICC), Bayesian information criterion (BIC), Kullback information criterion (KIC), bias-corrected KIC (KICC), approximation of KIC (AKICC), large sample approximation of integrated classification likelihood (ICLBIC), classification likelihood criterion (CLC), and approximate weight of evidence (AWE). DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
Average Correct Selection Rate (
Note. Model pertains to the generating model. IQ refers to the item quality for latent groups: G for good, M for moderate, and B for bad. This table shows the average correct selection rates of the following relative model fit criteria: Akaike information criterion (AIC), small-sample AIC (AICC), Bayesian information criterion (BIC), Kullback information criterion (KIC), bias-corrected KIC (KICC), approximation of KIC (AKICC), large sample approximation of integrated classification likelihood (ICLBIC), classification likelihood criterion (CLC), and approximate weight of evidence (AWE). DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
Average Correct Selection Rate (
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. IQ 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively: G for good, M for moderate, and B for bad. This table shows the average correct selection rates of the following relative model fit criteria: Akaike information criterion (AIC), small-sample AIC (AICC), Bayesian information criterion (BIC), Kullback information criterion (KIC), bias-corrected KIC (KICC), approximation of KIC (AKICC), large sample approximation of integrated classification likelihood (ICLBIC), classification likelihood criterion (CLC), and approximate weight of evidence (AWE). DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
Average Correct Selection Rate (
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. IQ 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively: G for good, M for moderate, and B for bad. This table shows the average correct selection rates of the following relative model fit criteria: Akaike information criterion (AIC), small-sample AIC (AICC), Bayesian information criterion (BIC), Kullback information criterion (KIC), bias-corrected KIC (KICC), approximation of KIC (AKICC), large sample approximation of integrated classification likelihood (ICLBIC), classification likelihood criterion (CLC), and approximate weight of evidence (AWE). DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
Average Correct Selection Rate (
Note. Models 1 and 2 refer to the generating model for the Latent Groups 1 and 2, respectively. IQ 1 and 2 refer to the item quality for Latent Groups 1 and 2, respectively: G for good, M for moderate, and B for bad. This table shows the average correct selection rates of the following relative model fit criteria: Akaike information criterion (AIC), small-sample AIC (AICC), Bayesian information criterion (BIC), Kullback information criterion (KIC), bias-corrected KIC (KICC), approximation of KIC (AKICC), large sample approximation of integrated classification likelihood (ICLBIC), classification likelihood criterion (CLC), and approximate weight of evidence (AWE). DINA = deterministic inputs, noisy “and” gate model; DINO = deterministic inputs, noisy, “or” gate model; A-CDM = additive cognitive diagnosis model; GDINA = generalized DINA.
The correct selection rates for
For H = 2, according to Tables 14 through 16, the test length substantially affects the performance of the relative model fit criteria in estimating the number of components as the longer the test is, the better the performance of most of the criteria. Additionally, better item quality across the entire exam incurs better performance of the relative model fit criteria in estimating the number of components in the model. Among the generating model, the involvement of A-CDM in mixture G-DINA modeling decreases the performance of most of the relative model fit criteria. For the case of the same generating model with different item qualities, the higher the difference between the item quality between the latent groups, the better the performance of most of the relative model fit criteria in estimating the number of latent groups.
Among the relative model fit criteria examined,
It is important to note that the differences between the relative model fit criteria for fitting with right number of components and the relative model fit criteria for fitting with wrong number of components mostly do not even exceed 1% of the size of the relative model fit criteria for fitting with right number of components.
7. Real Data Application
To demonstrate the applicability of the mixture G-DINA model, the model was fitted to the responses of 1,095 German students in the 2,000 Program for International Student Assessment on 26 items from the paper of Chen and dela Torre (2014). The five attributes included in this real data application are (1) locating information, (2) forming broad general understanding, (3) developing a logical interpretation, (4) evaluating a number-rich text with number sense, and (5) evaluating the quality or appropriateness of a text (Chen & dela Torre, 2014). Both the traditional and mixture G-DINA models were fitted to the data and model fit indices were compared.
First, to determine whether there exists possible latent groups in the population, the mixture G-DINA modeling was fitted to the assessment data on the assumption that
Based on most of the relative model fit criteria in estimating the number of components in the population,
Comparing the relative model fit criteria, we can see that all of the criteria are higher for
Relative Model Fit Criteria for
Note. This table shows the following relative model fit criteria: Akaike information criterion (AIC), small-sample AIC (AICC), Bayesian information criterion (BIC), Kullback information criterion (KIC), bias-corrected KIC (KICC), approximation of KIC (AKICC), large sample approximation of integrated classification likelihood (ICLBIC), classification likelihood criterion (CLC), and approximate weight of evidence (AWE).
Relative Model Fit Criteria for
Note. This table shows the following relative model fit criteria: Akaike information criterion (AIC), small-sample AIC (AICC), Bayesian information criterion (BIC), Kullback information criterion (KIC), bias-corrected KIC (KICC), approximation of KIC (AKICC), large sample approximation of integrated classification likelihood (ICLBIC), classification likelihood criterion (CLC), and approximate weight of evidence (AWE).
Fitting the model where
Item Parameters for the First Component
Item Parameters for the Second Component
Examination of the profile of the two latent groups revealed that the first latent group is mainly composed of items with low guessing and slipping parameters, indicating that the items generally have good discriminating power. The second latent group, on the other hand, is composed of items that generally have high guessing and slipping parameters, implying that the items are of lower quality for the second latent group than the first latent group. Because the item quality is considered better for the first latent group than the second latent group, the attributes estimated for the first latent group are more accurate than the attribute vectors estimated for the second latent group.
Table 21 shows the prevalence rates of the attributes using mixture G-DINA−overall and componentwise (using MLE). It can be seen that the latent group wherein the items are considered to have better quality has more examinees that have Attributes 3 and 4 compared to the second latent group. Meanwhile, the latent group that generally has lower item quality has more examinees that have Attributes 2 and 5. Both latent groups have the same percentage of examinees with Attribute 1.
Percentage Distribution of Attributes: Overall and Componentwise Assuming
Note. Overall is the overall percentage of examinees that possesses the attribute, whereas Group 1 and Group 2 refer to the componentwise percentage of examinees that possesses the attribute for Latent Groups 1 and 2, respectively.
Table 22 lists down the percentage distribution of the attribute vectors overall and componentwise. The attribute vector with the highest prevalence rate is (1,1,1,1,1), that is, most of the examinees have all of the attributes studied. Next to it, 20% of the attributes have none of the attributes being studied. Comparing the two latent groups suggests that the latent group with higher item quality has more students that have attribute vectors (0,0,0,0,0) and (1,1,1,1,1) than the latent group with lower item quality.
Percentage Distribution of Attribute Vectors: Overall and Componentwise Assuming
Note. Overall is the overall percentage of examinees that possesses the attribute, whereas Group 1 and Group 2 refer to the componentwise percentage of examinees that possesses the attribute for Latent Groups 1 and 2, respectively.
This analysis demonstrated the viability of the proposed model when an actual data were fitted with it. Moreover, more insights on the examinee profiles can be generated because the behavior of the different latent groups in the population when there is heterogeneous data can be seen.
8. Conclusions and Recommendations
Inherent heterogeneity can be present sometimes in assessment data. It could due to the existence of DIF, the presence of different strategies in answering problem solving question, or the existence of difference of distributions of the item parameters and prior probabilities. In analyzing formative assessment data, one of the most commonly used CDMs is the G-DINA model. However, the G-DINA model does not account for the possibility of existence of latent groups in the population. Therefore, one should not stop with insights derived from the G-DINA model if one suspects there is the presence of latent groups in the population because it could lead to invalid inferences. This study proposed and investigated the viability of the mixture G-DINA model, a novel CDM for fitting assessment data that accounts for the presence of heterogeneity.
As demonstrated in the real data application, the mixture G-DINA model can provide more insights to the profile of the examinees if the population is heterogeneous and is composed of different homogeneous components. However, the extensive simulation studies showed that utilizing the mixture G-DINA is only advisable in certain cases.
The simulation studies also showed that it is important to have longer test length and good item quality in the test they lead to more accurate parameter estimation, latent group classification rate, attribute classification rate, and performance of relative model fit criteria in estimating the number of components for mixture G-DINA modeling. On the other hand, the number of examinees has generally lower effect compared to test length on the absolute bias, latent group correct classification rate, attribute classification rate, and performance of relative model fit criteria in estimating the number of components for mixture G-DINA modeling. These findings suggest that it is an absolute necessity that if ever mixture G-DINA modeling is employed, the test developer will make the test long enough (the test should be around 30 questions long) and the items of good quality: that is the items are not easy to be guessed by examinees who lacked the important skill in answering the questions, or that examinees that have the appropriate skills to answer the items will not slip in making a mistake. It is a good thing as the test length and the item qualities are both factors that the test developers are in control of.
Among the methods in attribute profile estimation, estimation via MLE is the most viable in mixture G-DINA modeling. Although there are instances wherein MLE is not superior to either EAP or MAP, particularly for cases wherein there are differences for the components in the item quality but not the generating model, the differences in the correct classification rate of vectorwise or elementwise attribute profile are generally negligible in those cases. However, the improvement of using MLE method in attribute profile estimation for the cases, wherein the components differ by generating model, is fairly large, hence the recommendation for using the MLE method.
The performance of mixture G-DINA modeling is higher for instances wherein there are different generating model with high item quality than instances wherein the components only differ by item quality but they have the same generating model. It is important to reiterate too that the simulation studies conducted exemplified the robustness of the G-DINA model when the components only differ by item quality but not by generating models. It is not advisable to use mixture G-DINA modeling for these cases as the G-DINA model is shown to have superiority over mixture G-DINA model for attribute estimation, and it is our main goal to provide correct attribute profile of examinees more than having good parameter estimates or better latent group classification.
Lastly, if two of the generating models are ACDM and G-DINA, there is an evident decrease in the performance of mixture G-DINA model in item parameter estimation, attribute profile estimation, latent group classification, and performance of relative model fit criteria in estimating the number of components for mixture G-DINA modeling. If two of the components of the population are assumed to behave that of ACDM and G-DINA, it is not advisable to use mixture G-DINA model especially when the items are not of good quality. This is probably due to the complexity of the model and how close these two models are in terms of item parameterization. Because of varying performances of the different generating models for mixture G-DINA model, it is helpful for us to assess the profile of the estimates of item parameters in order to know the generating model the components resemble.
The natural next step for this research is to examine the performance of the mixture G-DINA model in answering some of the common problems in CDMs such as DIF, uncovering different strategies in answering problem-solving question, and detection of aberrant response patterns, as several studies have stated. The mixture G-DINA model can further be extended to modeling polytomous responses or estimating polytomous attributes. Examining the performance of mixture G-DINA model to other factors that were not controlled in the study and the impact of the number of attributes on the performance of the model could be a future research topic to pursue. Additionally, closer examination of model identifiability warrants a separate study.
Introducing constraints in the mixture G-DINA model to introduce new mixture models such as mixture DINA, mixture DINO, and mixture A-CDM might give better performance, especially for cases when the mixture G-DINA is found to be not viable.
Footnotes
Authors’ Note
Supplemental materials (such as codes) related to this study can be requested directly from the corresponding author.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research and/or authorship of this article: This work was supported by the Philippine Statistical Research and Training Institute Thesis and Dissertation Grant of 2020.
