Abstract
Understanding whether or not different types of students master various attributes can aid future learning remediation. In this study, two-level diagnostic classification models (DCMs) were developed to represent the probabilistic relationship between external latent classes and attribute mastery patterns. Furthermore, variational Bayesian (VB) inference and Gibbs sampling Markov chain Monte Carlo methods were developed for parameter estimation of the two-level DCMs. The results of a parameter recovery simulation study show that both techniques appropriately recovered the true parameters; Gibbs sampling in particular was slightly more accurate than VB, whereas VB performed estimation much faster than Gibbs sampling. The two-level DCMs with the proposed Bayesian estimation methods were further applied to fourth-grade data obtained from the Trends in International Mathematics and Science Study 2007 and indicated that mathematical activities in the classroom could be organized into four latent classes, with each latent class connected to different attribute mastery patterns. This information can be employed in educational intervention to focus on specific latent classes and elucidate attribute patterns.
Keywords
1. Introduction
Diagnosis classification models (DCMs; e.g., Rupp et al., 2010; von Davier & Lee, 2019) are educational measurement models that enable learners and teachers to focus on cognitively weak points and improve the individual learning status of the students. Well-known general models such as the log-linear cognitive diagnostic model (Henson et al., 2009), generalized deterministic input noisy and-gate model (de la Torre, 2011), and general diagnostic model (von Davier, 2008) have been applied to real data such as those obtained by the Programme for International Student Assessment (Chen & de la Torre, 2014) or Trends in International Mathematics and Science Study (TIMSS; e.g., Yamaguchi & Okada, 2018). These analyses have revealed what types of fine cognitive elements, called attributes in the DCM context, are crucial for answering test items. However, DCMs typically use only response data for test items, and their item response and structural parameters represent the assumptions of diagnostic tests. Furthermore, general DCMs do not reveal how attributes are connected with external variables.
One type of extension of DCMs that includes external variables is called explanatory DCM (e.g., Park & Lee, 2014; Park et al., 2018). Explanatory DCMs assume that the attribute mastery probabilities or item response probabilities are predicted with external variables. Through them, one can understand how prediction can be performed for a mathematical attribute, such as the addition or subtraction of fractions, by using the learning time of an individual or their attitude or motivation to learn the attribute. Explanatory DCMs can explicitly model attribute mastery and correct item responses affected by different factors from latent attributes. Park et al. (2018) investigated the effects of mathematics confidence (as measured by items such as “I usually do well in mathematics”), affect in mathematics (as measured by items such as “I enjoy learning mathematics”), and calculator ownership status (1 = having, 0 = not having) as predictors. The results summarized in Table 3 by Park et al. (2018) indicated that owning a calculator and feeling more confident decreased the probability of mastering attributes such as “whole numbers” or “dimensions and locations.” However, an increase in the score of affect in mathematics indicated an increase in the probabilities of mastering attributes. In another related model, Zhan et al. (2018) proposed DCMs incorporating response time, which is another source of information about the latent abilities of individuals. Zhan et al. (2018) developed a hierarchical DCM with higher order latent traits that are associated with response time. Other types of external information were employed by Zhan et al. (2022), who proposed a DCM incorporating not only response times but also biometrics data such as visual fixation counts. These models provide flexible frameworks to deal with additional sources of information to diagnose individual attribute mastery status.
However, explanatory DCMs do not consider the connections between the attribute mastery pattern level and explanatory variables; they instead focus on each attribute level model. This implies that the explanatory DCMs proposed by Park et al. (2018) assume conditional independence among attributes given predictors. Therefore, the attribute mastery pattern probabilities are represented as the product of conditional probability of attributes given covariates. The assumption is that the attribute masteries are independent of each other when the predictors are fixed; however, this assumption is too strong because it is not always possible to include covariates to completely explain the relationships among attributes. Instead, modeling attribute mastery pattern level can moderate the assumption.
Furthermore, the explanatory DCMs mainly use continuous variables as covariates rather than categorical latent variables known as latent classes (e.g., Collins & Lanza, 2009; Hagenaars & McCutcheon, 2002; Lazarsfeld & Henry, 1969; White & Murphy, 2014). Both continuous observed and latent variables are generally employed as covariates (e.g., De Boeck, 2004). However, latent classes that connect with attribute mastery patterns can be more useful in group-level educational remedies in a classroom setting.
For example, in math classes, “activeness” forms latent classes, which are active and nonactive commitments determined by the various actual activities in a math class such as involving group discussion or speaking up in class. Here, the nonactive commitment class may be strongly connected to low attribute mastery patterns, while the active commitment class is related to high mastery patterns. In this case, teachers can consider methods to encourage students to be involved in class activities to improve their mathematical attribute mastery. In essence, one latent class layer explains the attribute mastery patterns and the latent class can provide hints to educational remedy. In other words, the attribute mastery patterns and latent classes consist of a hierarchical relationship, and these latent classes can reveal the types of individuals that tend to belong to specific attribute mastery patterns.
DCMs are obtained by constraining general latent class models (Rupp & Templin, 2008), and attribute mastery patterns are latent classes. Therefore, assuming an additional latent class and assessing the relationships between the latent classes and attribute mastery patterns can be considered a special case of a two-level latent class model (Miyazaki et al., 2007). A two-level latent class can directly model the strength of connection between two latent classes as probability parameters. This idea can be applied to the DCM context, and external variables can be considered differently than explanatory DCMs.
In this research, two-level DCMs whose attribute mastery patterns were connected to exogenous latent classes were developed. Latent class–related models sometimes exhibit poor parameter estimation because of the sparseness of cells. To this end, utilizing a Bayesian prior can stabilize the estimation in latent class–related models (Collins & Lanza, 2009, p. 171). Therefore, we developed two Bayesian estimation methods for the parameter estimation of two-level DCMs: the variational Bayesian (VB) inference and Gibbs sampling methods. The VB inference method (e.g., Ormerod & Wand, 2010) is faster than the Markov chain Monte Carlo (MCMC) approach (e.g., Brooks et al., 2011) but provides parameter estimates similar to a fully Bayesian estimation method. The VB method has been applied to test models such as general DCMs (e.g., Yamaguchi & Okada, 2020a). Generally, the MCMC method is computationally heavy; however, the Gibbs sampling method is a more effective MCMC method than the Metropolis–Hastings type MCMC and can approximate posteriors more precisely than the VB method. Thus, to examine the quality of parameter estimates, we compared these Bayesian methods through a simulation study.
The contribution of this study includes not only developing a DCM that enables us to represent a connection between an attribute mastery pattern and external latent classes but also developing stable Bayesian estimation methods. An expectation–maximization (EM) algorithm to achieve classical maximum-a-posteriori (MAP) estimation, which is a common point-estimation-based Bayesian method, was also derived and is presented in Online Appendix A. The remainder of this article is structured as follows: Section 2 introduces the formulation of two-level DCMs and derivations of the aforementioned Bayesian estimation methods. Section 3 presents a simulation study to check the parameter recovery of the proposed estimation methods. Section 4 provides a real data demonstration of two-level DCMs with TIMSS data. Finally, Section 5 summarizes the conclusions, provides a discussion, and mentions future research possibilities.
2. Formulation of Two-Level DCMs and Bayesian Estimation Procedure
2.1. Elements of Two-Level DCMs
The core idea of two-level DCMs is to combine DCMs and latent class models using relationship probabilities that connect both measurement models. Figure 1 provides a conceptual representation of two-level DCMs, which have three elements: diagnosis measurement, latent class measurement, and structural models. The essential element in two-level DCMs is the structural model among the exogenous latent class and attribute mastery patterns. This structural model represents the strength of the connection between the latent classes and attribute mastery patterns. These models are introduced hereafter. In addition, to derive Bayesian estimation methods, priors for the model parameters are specified, and joint distributions of the observed variables, latent variables, and parameters are defined.

Conceptual representation of the two-level diagnostic classification model.
For the diagnostic measurement, we employed the latent class formulation and notations utilized in Yamaguchi and Okada (2020a) and Yamaguchi and Templin (2022a, 2022b). Attribute mastery patterns
Next, we define the probability of the
Using Gj
and
Assuming conditional independence and exchangeability, the conditional probability of
This is the measurement model of the diagnostic classification approach.
The measurement model of the latent class model assumes unobserved latent groups underlying several categorical indicators (e.g., White & Murphy, 2014). The random categorical indicator
The conditional probability of
The final part is the structural model between the individual attribute mastery pattern indicator
where
where
Furthermore, priors should be set for all parameters. Considering conditional conjugacy, the correct item response probability parameter
where
where
These settings provide the following joint distribution of observed and latent variables and parameters:
This joint distribution was employed to derive the VB inference and Gibbs sampling algorithms, as described in Sections 2.2 and 2.3, respectively.
2.2. VB Inference Algorithm
2.2.1. Basic principle of VB inference
The VB method (Bishop, 2006; Blei et al., 2017; Jeon et al., 2017; Nakajima et al., 2019; Yamaguchi & Okada, 2020a, 2020b) is a popular estimation method in machine learning and has also been employed in psychometrics. Maximum likelihood estimation is a gold standard parameter estimation method in psychometrics but is not always appropriate in situations with small sample sizes or complex models. Bayesian estimation with MCMC is also employed to obtain approximate posteriors but is computationally intractable in situations with large sample sizes or large numbers of parameters. The VB method is computationally tractable but provides Bayesian estimates and does so in a short estimation time. We briefly explain the concept of the VB method and provide a well-established formula to calculate the variational posteriors based on Bishop (2006).
In VB estimation, the variational posterior
This decomposition is called mean-field approximation. In addition, let
where
2.2.2. Construction of VB inference algorithm on two-level DCMs
In two-level DCMs, the variational posterior can be expressed as
The second line in Equation 15 can be naturally derived from the independence between latent variables
In addition to the variational posteriors, the evidence lower bound (ELBO; i.e., lower bound of log-likelihood) is required to evaluate model appropriateness and a stopping rule for the estimation algorithm. The ELBO can be calculated to assess convergence and is defined as
Here, the joint distribution
2.2.3. Variational posteriors and VB algorithm
To derive the variational posterior, we borrowed the results of Yamaguchi and Okada (2020a); then, the variational posterior of the correct item response probability parameter becomes a beta distribution again:
where the parameters are
Hereinafter, we take expectation with respect to the VB posteriors. The variational posterior of
where the parameters are
The variational posteriors of
where
Variational distribution
where “
where
Then,
and
Algorithm 1: VB Algorithm
2.3. Gibbs Sampling Algorithm
The results of the VB algorithm can be employed to derive the Gibbs sampling algorithm for two-level DCMs. The full conditional distributions are required to construct the Gibbs sampling algorithm and are easily obtained from the VB algorithm results simply by replacing the expectations with MCMC samples. For example, when the tth iteration of MCMC sample
Note that
The full conditional distribution of
Similarly, the full conditional distributions of
Finally, the joint full conditional distribution of zi
and
Assembling these elements, the Gibbs sampling algorithm for two-level DCMs is shown in Algorithm 2.
Algorithm 2: Gibbs Sampling Algorithm
3. Simulation Study
3.1. Simulation Settings
The simulation study aimed to verify the parameter recovery of the developed VB algorithm and Gibbs sampling method. In other words, the objective of the simulation study was to confirm how the proposed estimation algorithms work. The simulation was conducted on a desktop computer with a Windows 10 Pro operating system, Intel Xeon central processing unit with 3.60 GHz and 32.0-GB RAM. The entire simulation code was written in R (Version 4.2.0; R Core Team, 2022). Four factors were manipulated: (1) sample size (200 or 2,000), (2) Q-matrix type (three or four attributes), (3) number of latent classes (three or four), and (4) item quality of X and Y (high or low, which refers to the discrimination power of items for the attribute masteries and latent classes). We also conducted model comparison simulation between an ordinal general DCM and proposed two-level DCM to check whether using additional latent classes improves attribute mastery pattern recovery. This comparative simulation was not the primary research objective of this study; thus, related simulation settings and results are presented in Online Supplementary Material C.
The sample size setting could be small (200) or large (2,000). Sessoms and Henson (2018), in their review of DCM application, noted that sample size is generally large (median = 1,255) with large variations; therefore, we set 2,000 as the large sample size setting. The number of indicators used to define the exogenous latent class was fixed at 24. We controlled the severeness of the condition to change the number of latent classes. However, if we increase the number of indicators according to the increase in the number of latent classes, the information required to distinguish latent classes would increase. Therefore, we must fix the number of indicators of latent classes. Furthermore, the number of latent classes was set as three or four because latent class analysis application studies in an educational setting have sometimes involved four classes (e.g., Lin & Tai, 2015; Toker & Green, 2021). As other examples, Tuominen et al. (2020) analyzed the achievement goal orientation of Finnish sixth- and seventh-grade students and revealed four types of time-invariant classes. Furthermore, Sideridis et al. (2021) applied a multilevel latent class analysis to the achievements of high school students and detected four latent classes connecting GAT science test results and background information about the students. We assumed the same situation as in the previous studies. The three and four latent class situation is also shown in the real data analysis in the next section. The reason we employed 24 indicators is that it is a common multiple of three and four. Twelve is the least common multiple; however, in this case, only three indicators strongly connected to a latent class in the four latent class condition. This might be restrictive; thus, we assumed 24 indicators rather than 12.
Two types of Q-matrices are described in Table 1. The left and right parts of Table 1 show a three-attribute Q-matrix with 20 items and a four-attribute matrix with 22 items, respectively. These numbers of attributes may be small, but the famous Examination for the Certificate of Proficiency in English (e.g., Templin & Hoffman, 2013) data contain three attributes. In another DCM simulation study, Sen and Cohen (2021) also selected three attributes. A situation with four attributes is more challenging than one with three attributes.
Three (Left) and Four (Right) Attributes of Q-Matrices
The true
The true
The maximum number of iterations of the VB approach was 1,000, and the convergence criterion was
The bias and root-mean-square error (RMSE) were calculated for each parameter. The bias of the mth parameter
3.2. Results
Table 2 summarizes the mean estimation times and standard deviations of the VB and Gibbs sampling algorithms and the ratio of the mean estimated times of both methods. In this simulation, the VB estimation time was affected by the item quality and sample size, where a lower item quality and larger sample size increased the estimation time. For a sample size of 200, the VB estimation was completed within 3 seconds on average. Even with a sample size of 2,000, the estimation took ∼20 seconds at most, even with low item quality. The time taken by the Gibbs sampling algorithm, on the contrary, was mainly affected by the sample size. Specifically, the Gibbs sampling algorithm took approximately 60 seconds and 360–420 seconds when the sample size was 200 and 2,000, respectively. In this simulation study, the VB algorithm was at least ∼20 times faster than the Gibbs sampling method. Under some conditions, such as a sample size of 200 and high item quality, the VB method was more than 100 times faster than the MCMC method. Note that we also checked the mean iteration number of the VB algorithm at the algorithm finishing point for each condition. The result is shown in Online Supplementary Material D. The values were considerably less than the number of maximum iterations (1,000). This indicated that VB estimation was finished before reaching the maximum iteration number. Therefore, we can conclude that VB is considerably faster than the MCMC method.
Estimation Times of Variational Bayesian (VB) and Gibbs Sampling Methods
Table 3 summarizes the biases and RMSEs of correct item response probability parameter Θ. The VB and Gibbs sampling methods exhibited similar small biases, where a sample size of 2,000 produced smaller biases than that of 200. The sample size of 200 and lower item quality yielded small negative biases (–0.011 to –0.018). However, the other cases exhibited biases of less than
Biases and Root-Mean-Square Errors (RMSEs) of Correct Item Response Probability Parameter Θ of Variational Bayesian (VB) and Gibbs Sampling Methods
Table 4 presents the biases and RMSEs of relationship parameter
Biases and Root-Mean-Square Errors (RMSEs) of Relationship Parameter
Table 5 displays the biases and RMSEs of population mixing parameter
Biases and Root-Mean-Square Errors (RMSEs) of Population Mixing Parameter of Variational Bayesian (VB) and Gibbs Sampling Methods
Biases and Root-Mean-Square Errors (RMSEs) of Response Probability Parameter Λ for Variational Bayesian (VB) and Gibbs Sampling Methods
Figure 2 depicts the recovery of attribute mastery when the Q-matrix has four attributes. The attribute and attribute mastery patterns are generally well recovered, and the two estimation methods provide almost identical results. The lower quality items reduce the attribute recovery rate. Figure 3 presents the latent class recovery results obtained when there are four latent classes. Again, the latent class is well recovered with both methods. Note that the results are almost the same when the Q-matrix has three or four attributes, as well as when there are three or four latent classes. Therefore, the attribute recovery results in the case of a Q-matrix with three attributes, and the latent class recovery results when three latent classes are omitted.

Attribute mastery recovery rate in the case of four attributes.

Latent class recovery rate in the case of four classes.
In summary, the VB method is faster than the Gibbs sampling method, and the two estimation methods are unbiased. The Gibbs sampling method yields more precise and stable results than the VB method. However, both methods provide satisfactory recovery and effectively recover the individual latent variables that are attributes and latent classes. Nevertheless, both methods have advantages and disadvantages; thus, the estimation method should be selected considering these aspects.
4. Real Data Analysis
4.1. Data Analysis Settings
Fourth-grade data from TIMSS 2007 were employed to demonstrate the proposed two-level DCM. Specifically, 25 items from Booklets 4 and 5 of the TIMSS 2007 fourth-grade mathematics assessment were selected for the DCM measurement, which is also consistent with the previous studies (Lee et al., 2011; Yamaguchi & Okada, 2018). The Q-matrix included in the DCM package is summarized in Table S4 in Online Supplementary Material D (George et al., 2016). Three domains of test items were used as attributes for simplicity: number, geometric shapes and measures, and data display. The original set defined by Lee et al. (2011) contained 15 attributes, but it was too complex; therefore, we reduced the number of attributes. We selected country data from the United States and Canada, including Massachusetts, Minnesota, Alberta, British Columbia, Ontario, and Quebec.
As the latent classes, we selected 19 items from the student questionnaires on mathematics in school obtained from TIMMS 2007 Supplemental 1 pp. 16–18 (Foy & Olson, 2009). The items were from AS4MAWEL (“I usually do well in mathematics”) to AS4MHCOM (“I use a computer”). The first eight items were assessed with a four-point Likert-type scale (1 = agree a lot, 2 = agree a little, 3 = disagree a little, and 4 = disagree a lot), and the remaining 12 items were analyzed on another four-point Likert-type scale with different labels (1 = every or almost every lesson, 2 = about half the lessons, 3 = some lessons, and 4 = never). We revised the original four-point scale using dichotomous variables to treat 1 and 2 as 1, and 3 and 4 as 0, respectively. Therefore, a recorded value of 1 means agreement regarding attitude toward mathematics or performing mathematical activities for more than half of the lesson. Note that dichotomizing Likert items leads to loss of information. We do not believe the procedure is always appropriate. Without dichotomization, however, the interpretation of latent classes based on the estimated item response probabilities became complex and the graph became very messy. Therefore, for the sake of simplicity, we dichotomized the questionnaire items in this study. However, the latent class with the polytomous response could be incorporated into the two-level DCMs. The individuals with missing values were eliminated, and the total sample size was 1,061.
Preliminary latent class analysis was conducted to determine the number of latent classes using the poLCA package (Linzer & Lewis, 2011). In addition, we fit two to six classes in the two-level DCMs to determine the number of latent classes. The maximum number of iterations in the VB and MCMC approaches was 500 and 8,000, respectively. The other estimation settings, such as the hyperparameter settings, were the same as those in the simulation study. Both estimation methods were compared in terms of whether they provided similar results. Furthermore, we determined the latent class results and relationships between the latent classes and attribute mastery patterns. Data analysis codes and data are available from the Open Science Framework (OSF: https://osf.io/6nxtc/) page.
4.2. Results
Table 7 presents the fit indices of the preliminary latent class analysis results. Bayesian information criterion (BIC) indicates that four classes were the best, and the other indices, such as Akaike information criterion (AIC) or G 2, continued to decrease with an increasing number of latent classes. However, the changes in these values became small after four classes.
Preliminary Analysis to Determine the Number of Latent Classes
We also calculated AIC and BIC of the two-level DCMs based on the VB estimation results, as shown in Table 8. AIC indicated that having four classes was the best, while BIC indicated three. ELBO indicated a similar tendency in the log-likelihood shown in Table 7; the change in ELBO became small after four classes. Therefore, combining the results from Tables 7 and 8, three or four classes are possible. We also considered the interpretability of the latent class and thus selected four classes in this study. The possibility of different numbers of classes will be discussed later.
Information Criteria of the Two-Level Diagnosis Classification Models Based on the Variational Bayesian Results
Next, we specified the four latent classes in two-level DCMs and estimated model parameters using the VB and Gibbs sampling methods. The trace plot obtained after a burn-in period on OSF indicated that no systematical trend existed, and we judged that the MCMC chains had converged. Figure 4 provides a collection of scatter plots of the parameter estimates obtained with VB and Gibbs sampling methods for four parameter sets. All correlations between the two methods are greater than .99; thus, the parameter estimates resulting from both methods are almost the same. Therefore, the parameter estimation results of the VB method are reported hereafter.

Scatter plots of variational Bayesian and Gibbs sampling methods (Markov chain Monte Carlo) for Θ (upper left panel), Λ (upper right panel), Τ (lower left panel), and π (lower right panel).
Figure 5 presents the estimates of latent class response probability parameter

Estimated parameter Λ of four latent classes with variational Bayesian estimation. Note. Each number represents the corresponding class.
The second class exhibits moderate response probabilities for almost all items. The second class produces relatively high probabilities for the 3rd (AS4MACLM: “Mathematics is harder for me than for many of my classmates”) and 5th (AS4MANOT: “I am just not good at mathematics”) items and lower for the 1st, 6th (AS4MAQKY: “I learn things quickly in mathematics”), 9th (AS4MHASM: “I practice adding, subtracting, multiplying, and dividing without using a calculator”), 14th, and 17th items. This class hated mathematics and did not actively work in the classroom.
The third and fourth classes tend to have high probabilities for the first, second, fourth, sixth, and eighth items, implying that both groups like mathematics. However, these classes show discrepancies in the items after the ninth item. The third class tends to have lower probabilities for the latter half of the items than the fourth class. Large discrepancies are evident for items such as the 10th (AS4MHWFD: “I work on fractions and decimals”), 11th (AS4MHMCL: “I measure thing in the classroom and around the school”), and 15th (AS4MHWSG: “I work with other students in small groups”) items. These items indicate engagement with the mathematical activities in the class. In summary, the fourth class likes mathematics and engages in mathematical activities, and the third class also likes mathematics but does not engage in mathematical activities.
Figure 6 depicts the estimates of relationship matrix

Estimates of relationship matrix Τ for four latent classes with variational Bayesian estimation. Note. Each number represents the corresponding class.
The third class liked mathematics but did not engage in mathematical activities and was strongly connected to mastering all three attributes (111) and only mastering data and display (001). These classes mastered at least one attribute, and many of them master all attributes. The fourth class liked mathematics and engaged in mathematical activities and tended to not master any of the attributes (000), only master data and display (001), or master all three attributes (111). This finding was interesting because mathematical activities in the classroom might not have a strong effect on mastering attributes. Another interpretation was that the mathematical activities in the classroom might be appropriate for students with at most moderate mathematical abilities, and the students who had to engage in the activities did not have high mathematical skills.
In summary, students who felt efficacy or liked mathematics tended to master more attributes, and engagement in mathematics activities might not affect attribute mastering. Interestingly, liking math and being actively engaged in classroom activities were associated with opposite attribute mastery statuses. Subjective engagement might not affect attribute mastery. Furthermore, students who disliked mathematics tended to have less attribute mastery, and it was effective to intervene in this class.
5. Discussion and Conclusion
Two-level DCMs and VB and Gibbs sampling methods were developed. A simulation study showed that both estimation methods provide sufficiently accurate parameter recovery, but the Gibbs sampling method is slightly better than the VB method. However, the VB method is much faster than the Gibbs sampling method. A real data analysis example involving the application of these methods to TIMMS 2007 fourth-grade mathematics data demonstrated how external latent classes are related to attribute mastery patterns. The connection between the external latent class and attribute mastery pattern could reveal the attitudes of students toward mathematics, and engagement in mathematical activities was connected to different attribute mastery patterns. This information can be employed to make teaching plans that consider not only attribute mastery patterns but also factors such as mathematical attitudes.
The simulation study demonstrated that both the proposed estimation methods correctly recovered the model parameters. However, the Gibbs sampling method provided smaller biases and RMSEs than the VB method under some conditions, although the estimation speed of the VB method was much faster than that of Gibbs sampling. Based on these results, the VB method is appropriate for model exploration, and the Gibbs sampling method should be employed in the final estimation step. The estimation methods can be flexibly changed based on the purpose of data analysis and the available computational environment.
The proposed estimation methods were both Bayesian estimation methods. Maximum likelihood estimation is also an alternative estimation method. The EM algorithm can be derived for the ML method because MAP estimation can also be easily derived from the complete data likelihood shown in Equation 12, as discussed in Online Appendix A. Furthermore, the ML method is available via general latent variable modeling software such as Mplus (Muthén & Muthén, 1998–2017) or Latent Gold (Vermunt & Magidson, 2005). Using general latent variable modeling software also extends two-level DCMs to include external covariates. These covariates can be employed to explain the relationship probabilities between latent classes and attribute mastery patterns. One limitation of the proposed Bayesian estimation is that it does not include additional external covariates. Hence, more flexible estimation methods are required in future research.
A two-level DCM is related to various models and can easily be extended. If the measured part of the DCM is the usual latent class model, then two-level DCMs become two-level latent class models (Miyazaki et al., 2007). Furthermore, if the exogenous latent classes are DCMs in the cross-sectional data collection case, then the model represents the strength of the connection between two separate attribute mastery patterns measured by two diagnostic assessments. This model can be used to explore two sets of attribute mastery patterns. For example, the relationships between mathematical attributes and English reading skill sets can be explored simultaneously. Note that there is an arbitrariness as to which attribute set is exogenous when there are no strong theoretical assumptions.
As an extension of these two cross-sectional DCMs, the latent transition or hidden Markov types of longitudinal DMCs (e.g., Chen et al., 2017; Madison & Bradshaw, 2018; Wang, 2021; Yamaguchi & Martinez, in press) can be introduced. Longitudinal DCMs include both endogenous and exogenous latent classes defined by DCM measures in repeated measurement situations. If the diagnostic measurements are performed more than twice, these models can address long-term changes in individual attribute mastery status.
Note that in this example, the exogenous latent class was defined by binary items; however, this measurement model can be changed. For example, polytomous category items or continuous indicators can be employed. Furthermore, general mixture models (e.g., McLachlan & Peel, 2000) can also be employed to define latent classes, which are also called mixture components in the mixture model context. In addition, external continuous observed/latent variables can be included to represent the strength of the connection between the latent class and attribute mastery patterns.
We need several notes on the parameterization of the two-level DCMs. The size of the
In addition, we need to mention the model specification to understand the structural parameter
As an additional note, if the purpose is only to diagnose students’ knowledge status, the external latent classes may not be necessary. It is important to carefully analyze the meaning of “necessary” here. If we want to understand the relationship between the attribute mastery patterns and outside elements of a diagnostic assessment, the latent classes may be informative. The external information sometimes helps improve the estimation precision of attribute mastery patterns, which was confirmed using additional simulation, but this point is not the primary research topic in this study. We introduced the latent classes to surmise and deeply understand the attribute mastery patterns via the relationship probability parameter τs. The relationship probabilities help understand which latent class is strongly connected to which attribute mastery pattern. Such information is not provided by ordinal DCMs framework; thus, we believe that the latent classes are necessary.
In addition, if the attribute mastery patterns determine the latent classes, the model specification in this study is wrong. In such a case, the relationship matrix is biased because the current model assumes
The interpretation of the
The identification of the two-level DCMs must also be discussed. Swapping of attribute mastery and switching of latent class labels may occur if identification conditions are not satisfied. For identification of diagnostic model parts, we need monotonicity constraints on the correct item response probabilities (e.g., Henson et al., 2009; Yamaguchi & Templin, 2022a). In our model formulation, monotonicity constraints were not employed. Instead, our MCMC method satisfies monotonicity constraints for each iteration. However, the VB method does not strictly satisfy monotonicity constraints, but the prior means were set to satisfy monotonicity relationships. Furthermore, one method to prevent label switching of the latent class part is to order parameter constraints on
In addition to the above discussion, one important aspect of model parameter identifiability is the completeness of the Q-matrix: The identity matrix whose size is
The identifiability of DCM parameters is a very important topic not only from a theoretical perspective but also in real data analysis, and the topic has been actively studied for the past decade. Related work by Xu and Shang (2017) proved identifiability conditions for both model parameters and Q-matrix elements. Identifiability conditions in the context of Q-matrix estimation is one important research topic within the research field of DCMs. Moreover, Gu and Xu (2020) extended the identifiability conditions proposed by Xu (2017) under correct model specification and provided strict and partial identifiability conditions that are relatively easily checked. In addition, Chen et al. (2020) modified the model formulation of DCMs to directly utilize the Q-matrix and discussed generic identifiability conditions that are milder than strict identifiability. Culpepper (2023) further relaxed the identifiability condition that requires at least two identity sub-Q-matrices. These studies primarily focused on cross-sectional DCMs. Liu et al. (2023) established strict and genetic identifiability conditions for hidden Markov type DCMs, including the estimation of Q-matrix (see Theorem 3); thus, this work may provide insights into the identifiability of two-level DCMs. Although careful consideration is necessary, based on the Liu et al. (2023), the two-level DCMs model parameters may be identifiable if the population mixing parameter
In application, two-level DCMs offer different perspectives on educational intervention with ordinal DCMs. Ordinal DCMs have been employed to improve individual learning using diagnostic feedback based on attribute mastery probabilities or patterns, which is a type of educational intervention. Furthermore, ordinal DCMs can be used not only for individual self-learning but also class room teaching situation. For both situations, the attribute mastery probabilities and patterns are useful information to understand an individual learner’s cognitive weaknesses and strengths. However, diagnostic information may not be sufficient in a class room setting, and it does not tell the reason why students do not master the assumed attributes. Furthermore, it may be naive to think that an educational intervention is equally effective for students who belong to the same attribute mastery pattern. This is an extensive consideration of well-known aptitude–treatment interaction (e.g., Cronbach & Webb, 1975; Snow, 1991). In other words, the effect of intervention can be differed in a specific attribute mastery pattern depending on the variability of their background situations.
The two-level DCMs can provide information about attribute mastery status and other learning activity and individual traits. From the real data analysis, two-level DCMs revealed that students who have already been actively engaged in math class activities do not always master the attributes. This means that some students have already joined the math class but face several difficulties in mastering some attributes. For such students, encouraging them to join the class room activities may not be effective for improving their mathematical abilities. Furthermore, the real data analysis also indicated that some students dislike mathematics, do not engage in the activity, and do not master the attributes. For such students, it may be a good choice to encourage them to join activities and let them enjoy mathematics. These latent classes indicated the variety in non-mastering status of attributes. In a classroom setting, it is difficult to customize the lecture for each student, and there is a need to focus on latent classes that may be most effectively improved. In summary, two-level DCMs can be employed to explore the variability in an attribute mastery status using external information.
Another future research topic relates to determining the latent classes in the two-level DCMs. In the empirical data analysis, we fitted both classical latent class models and proposed two-level DCMs according to various numbers of latent classes. The results indicated that the diagnosis item might change the number of classes. However, this may be the case for only this dataset. Therefore, we need to confirm how the number of latent classes is affected by considering the diagnosis items in a future simulation research.
Supplemental Material
Supplemental Material, sj-docx-1-jeb-10.3102_10769986231173594 - Bayesian Analysis Methods for Two-Level Diagnosis Classification Models
Supplemental Material, sj-docx-1-jeb-10.3102_10769986231173594 for Bayesian Analysis Methods for Two-Level Diagnosis Classification Models by Kazuhiro Yamaguchi in Journal of Educational and Behavioral Statistics
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research and/or authorship of this article: This work was supported by Japan Society for the Promotion of Science KAKENHI 19H00616, 20H01720, 21H00936, and 22K13810.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
