A Bayesian model formulation of the deterministic inputs, noisy “and” gate (DINA) model is presented. Gibbs sampling is employed to simulate from the joint posterior distribution of item guessing and slipping parameters, subject attribute parameters, and latent class probabilities. The procedure extends concepts in Béguin and Glas, Culpepper, and Sahu for estimating the guessing and slipping parameters in the three- and four-parameter normal-ogive models. The ability of the model to recover parameters is demonstrated in a simulation study. The technique is applied to a mental rotation test. The algorithm and vignettes are freely available to researchers as the “dina” R package.
This article includes five sections. The first section introduces the DINA model and discusses benefits of the proposed Bayesian formulation in the context of prior research. The second section presents the Bayesian model formulation and full conditional distributions that are used to estimate person (i.e., latent skill/attribute profiles) and item parameters (i.e., slipping and guessing) using Markov Chain Monte Carlo (MCMC) with Gibbs sampling. The third section reports results from a simulation study on the accuracy of recovering the model parameters. The fourth section presents an application of the model using responses to a mental rotation test (Guay, 1976; Yoon, 2011). The final section includes discussion, future research directions, and concluding remarks.
The DINA Model
Overview
Throughout this article, let index individuals, index items, index skills, and index latent classes. CDMs assume that a set of K known skills underlie each item. The skills needed to succeed on each item are collected into a matrix Q, which is a structure matrix with elements that are akin to factor loadings relating skills to items. However, one exception is that the elements of Q are either 0 or 1 (i.e., ) depending upon whether skill k is needed for success on item j. That is, equals 1 if skill k is needed to correctly answer item j and 0 otherwise. Let be the j th row of , which indicates the requisite skills for item j.
The DINA model posits that individual test takers are classified into latent classes based upon mastery and nonmastery of the K skills. Let equal 1 if individual i has mastered skill k and zero otherwise. Furthermore, let the K dimensional latent attribute profile of binary indicators for individual i be . Given the categorical nature of the latent classes, can equal one of attribute profiles. It will be useful subsequently to define as the attribute profile for members of class c such that is 1 if members of class c possess skill k and zero otherwise.
The DINA model is a conjunctive model, which implies that students need all of the skills in to correctly answer item j without guessing. We let the vector of ideal responses for student i be with , where I denotes the indicator function. Notice that is 1 if student i has the requisite skills to answer item j and 0 if the student is missing at least one skill. The probability of individual i’s success on item j for the DINA model is a function of individual latent class membership (i.e., ) and item parameters,
where is a row vector of item parameters. By definition, is the probability of correctly guessing item j for students who do not have the requisite skills needed for item j. In contrast, is the probability of “slipping,” which is defined as the chance that students who have the required skills to answer item j record an incorrect response.
The DINA item response function can be used to construct a likelihood of observing a sample of item responses. Let denote a J dimensional vector of item responses for individual i, where is coded as 1 for a correct response and 0 for an incorrect response. Let be a row vector of slipping and guessing parameters for J items. Furthermore, let be the probability that members of class c correctly answer item j and note that implies that individual i is in attribute class c. Assuming that item responses are conditionally independent given , the probability of observing given membership in class c (i.e., ) is,
Let be the marginal probability of belonging to class c in the population and define as a C dimensional vector of class membership probabilities. The Law of Total Probability implies that the probability of observing given item parameters Ω and latent class probabilities π is given by,
The likelihood of observing a sample of N responses to J items is given by,
where is an N × J matrix of observed responses.
In Bayesian formulations, the primary focus is on the parameter posterior distribution, which for the DINA model is,
where p(Ω) is a prior distribution for item parameters and p(π) is a prior distribution for latent class probabilities. Maris (1999) noted the intractable nature of the CDM parameter posterior distributions and alluded to how Monte Carlo procedures could be used to estimate posterior distributions. Accordingly, subsequent research considered MCMC procedures for estimating posterior distributions of CDM parameters (de la Torre & Douglas, 2004; Hartz, 2002; Henson et al., 2009; Templin, 2004). Roussos, Templin, and Henson (2007) provide an overview of estimation methods (both Bayesian and non-Bayesian) prior researchers have employed to estimate CDM parameters. Furthermore, prior research discussed how applied researchers can estimate CDMs with standard statistical software (de la Torre, 2009b; Henson et al., 2009; Rupp, 2009; Rupp, Templin, & Henson, 2010).
Benefits of the Proposed Bayesian Formulation
The Bayesian model formulation for the DINA CDM developed in this article offers several benefits to the existing procedures. First, the model formulation yields analytically tractable full conditional distributions for the person and item parameters that enable the use of Gibbs sampling for efficient computations. Prior applications of the DINA model and other CDMs employed Metropolis-Hastings (MH) sampling to estimate model parameters, which require tuning of proposal distribution parameters for Ω and π. Consequently, one limitation with using MH sampling is that researchers need to tune proposal parameters each time a new data set is analyzed. More specifically, researchers who use MH sampling to estimate DINA model parameters need to choose proposal parameters to ensure no more than 50% of candidates are accepted. Specifying tuning parameters may require only a few attempts for a given data set, but the process could become tedious if the goal is to disseminate software for widespread use by applied researchers who have less experience with MH sampling. In contrast, the procedures discussed subsequently employ Gibbs sampling from parameter full conditional distributions and can be easily implemented without manual tuning.
As another example, consider the additional requirements needed to employ MH sampling in a simulation study similar to the one reported subsequently. After developing estimation code and deciding upon simulation conditions, it would be necessary to select proposal parameters used to sample candidates for Ω and π. Proposal parameters must be tuned using trial and error, where a collection of chains are run with different proposal parameter values to ensure 25–50% of candidates are accepted. It would be necessary to apply the tuning process for all conditions of the simulation study, which, depending upon the number of conditions, could require considerable effort to verify and catalogue proposal distribution parameter values. In contrast, employing the Gibbs sampler developed in this article would only require decisions about prior parameters. The procedure introduced subsequently employs uninformative uniform priors for Ω and π, so the developed algorithm can be immediately applied.
Second, the proposed formulation is advantageous to prior Gibbs samplers. For instance, prior CDM research employed Gibbs sampling to estimate the reparameterized DINA model using OpenBUGS (DeCarlo, 2012) and the higher order DINA (HO-DINA) model (de la Torre & Douglas, 2004) using WinBUGS (Li, 2008). However, one disadvantage is that OpenBUGS and WinBUGS do not explicitly impose the monotonicity restriction for the DINA model that for all items (Junker & Sijtsma, 2001). One consequence is that prior Gibbs samplers did not strictly enforce the identifiability condition, which could impact convergence in some applications. The model discussed subsequently extends the Bayesian formulation of the three-parameter normal-ogive (3PNO) model of Béguin and Glass (2001) and Sahu (2002) and the four-parameter model of Culpepper (in press) to explicitly enforce the monotonicity requirement. Accordingly, the developed strategy could be extended to other CDMs that include guessing and slipping parameters such as the NIDA, DINO, and RUM.
Third, one documented advantage of using Gibbs sampling is improved mixing of the chain in comparison to procedures based upon acceptance/rejection sampling such as MH sampling. That is, prior research suggests that MH sampling mixes best (i.e., the posterior distribution is efficiently explored) if candidates are accepted between 25% and 50% of the iterations (e.g., see chapter 6 of Robert & Casella, 2009). Accordingly, MH sampling may contribute to slower mixing among posterior samples, given that posterior samples change in less than half the total iterations. In contrast, the Gibbs sampler has the potential to improve mixing by sampling values from full conditional distributions. However, as noted by an anonymous reviewer, Gibbs sampling does not guarantee faster mixing for all models and parameters (e.g., see Cowles, 1996, for discussion regarding threshold parameters in polytomous normal-ogive models). The Monte Carlo evidence given subsequently suggests that the proposed MCMC algorithm converges to the parameter posterior distribution within 350 to 750 iterations. In contrast, prior research that employed Bayesian estimation (DeCarlo, 2012; Henson et al., 2009; Li, 2008) reported requiring a burn-in between 1,000 and 5,000 iterations to achieve convergence.
Fourth, as discussed subsequently, students’ latent attribute profiles are modeled using a general categorical prior. One advantage is that specifying a categorical prior provides a natural approach for computing the probability of skill acquisition and classifying students into latent classes (Huebner & Wang, 2011). That is, the posterior distribution for each student’s skill distribution can be summarized to show the probability of membership in the various latent classes.
Fifth, the developed algorithm is made freely available to researchers as the “dina” R package. The Gibbs sampler was written in C++ and provides an efficient approach for estimating model parameters. In fact, the developed R package includes documentation and vignettes to instruct applied researchers.
Bayesian Estimation of the DINA Model
This section presents a Bayesian formulation for the DINA model. The first section presents the model and describes the priors and the second section presents the full conditional distributions for implementation in an MCMC Gibbs sampler.
Bayesian Model Formulation
The definitions for the DINA CDM imply the following Bayesian formulation,
Equation 6 assumes a Bernoulli item response function for when conditioned upon the latent attribute profile and item parameters . In fact, Equation 6 simply restates the item response function in Equation 1. Equation 7 assumes a categorical prior for attribute vector , with the probability of membership in class c as .
Equation 8 shows that the conjugate prior for π is a Dirichlet with parameter vector . Recall the Dirichlet distribution is a multivariate version of the beta distribution for random variables with a support on the unit interval. The values of the elements within δ0 specify prior information about the prevalence of class membership in the population. An uninformative prior can be implemented by setting the elements of δ0 to 1 if there is no a priori information about π. Alternatively, researchers can modify the values within δ0 to reflect the relative prevalence of various attribute profiles. For instance, the marginal distribution for πc under the prior is a , and the values of δ0 can be chosen to reflect prior knowledge about the probability of observing attribute profile πc. Note that the results from the Monte Carlo simulation discussed in the next section provides evidence that the model is robust to employing an uninformative prior with for all c.
Equation 9 includes the joint prior for Ωj. Junker and Sijtsma (2001) established that the DINA model requires for all j so that each item provides information to discriminate among the latent attribute classes. The prior for Ωj in Equation 9 explicitly imposes the monotonicity restriction on the item parameters. In particular, the prior for is constructed by taking the product of Beta densities for sj and gj with parameters and and and and imposing the identifiability restriction with linear truncation on the space defined by P for sj and gj. Prior information regarding the slipping and guessing parameters can be incorporated into the estimation by choosing values for , , , and . Note that choosing an uninformative prior with is identical to employing a linearly truncated bivariate uniform prior for sj and gj. The simulation results discussed in the next section support the use of uninformative priors for the slipping and guessing parameters.
Full Conditional Distributions
The full conditional distributions for , π, and are,
Equation 10 shows that the full conditional for is a categorical distribution with the probability of membership in category c as . Specifically, Bayes’s theorem implies that the probability that student i belongs to class c given , Ω, and π is,
The full conditional for is a categorical distribution with probabilities .
Equation 11 reports the full conditional distribution for π. Specifically, the full conditional distribution for π can be derived as,
Equation 14 shows that the full conditional for π is the product of the categorical prior for , and the Dirichlet prior for π. Let the number of students within the C latent classes for a given MCMC iteration be denoted by the vector such that is the number of subjects within attribute class c. Accordingly, the full conditional for the latent class probabilities in Equation 14 is .
Equation 12 indicates that the full conditional distribution for the item parameters sj and gj is a truncated bivariate Beta distribution. More precisely, the full conditional distribution for sj and gj is derived as,
The final result in Equation 15 was established by simplifying the product over individuals and collecting terms in the exponents for sj, , gj, and . Recall that , , , and are defined after Equation 12. For a given MCMC iteration, let be the number of students with the attributes necessary to correctly answer item j, is the number of students who unsuccessfully applied their skills, and is the number of students who correctly guess the answer to item j. The full conditional for sj and gj is a truncated Beta distribution with parameters , , , and . The Appendix describes an approach for jointly sampling sj and gj subject to the monotonicity restriction of .
Monte Carlo Simulation Study
This section presents Monte Carlo simulation results to demonstrate the accuracy of the MCMC algorithm and to provide additional information about convergence. The simulation study is similar to the designs of de la Torre and Douglas (2004), de la Torre (2009), and Huebner and Wang (2011). Specifically, the simulation considers , , and with a Q matrix from de la Torre and Douglas (2004) and de la Torre (2009; e.g., Q′ is included in the x-axis of Figures 2 and 3). The simulation study considers three issues. First, following Huebner and Wang (2011), two conditions are studied for item guessing and slipping parameters to understand how the relationship between latent skills and items impacts parameter recovery. A high diagnosticity case (i.e., a stronger relationship between latent skills and items) is considered with , whereas a low diagnosticity case is for all j.
Second, two cases are considered for latent class probabilities. Recall that π is the population skill distribution and πc denotes the proportion of the population with attribute profile αc. In the first case, the probability of class membership is constant with for all c (i.e., the examinee skill distribution is flat). In the second case, the probability of belonging to a class with three or more skills is twice as likely as belonging to a class with zero, one, or two skills (i.e., a high skill distribution). That is, for all αc with two or fewer skills and for αc with three or more skills.
Third, the simulation study assessed the number of iterations needed to achieve convergence in the multivariate posterior distribution. For each replication, five chains were run and the multivariate potential scale reduction factor was computed (Brooks & Gelman, 1998) using the R “coda” package (Plummer, Best, Cowles, &, Vines 2006) to assess convergence of Ω and π. Values of less than 1.1 or 1.2 are considered evidence that the chain has converged (Brooks & Gelman, 1998). Note that was computed for chain lengths from 50 to 2,500 to assess the burn-in needed to achieve convergence.
The MCMC algorithm was implemented with 5,000 iterations, and the first 1,000 were discarded. Estimation accuracy was assessed by computing the expected values and root mean squared error (RMSE) for sj, gj, and π over 50 replications. Uninformative priors were used for the Beta and Dirichlet distributions with and , where is a 32-dimensional vector with elements equal to one.
Figure 1 provides evidence of convergence for the simulation posterior distributions. Specifically, Figure 1 includes eight panels plotting the Brooks–Gelman for 50 replications. Figure 1 provides evidence that the diagnosticity condition impacts convergence, whereas the nature of the population skill distribution (i.e., high vs. flat) does not impact . For instance, in the high diagnosticity case (i.e., ), Figure 1 shows that 300 iterations yielded . In contrast, Figure 1 provides evidence that in the low diagnosticity condition (i.e., ) that after roughly 750 iterations. Consequently, the Monte Carlo evidence suggests that employing a burn-in of 1,000 should be adequate for the simulation study to achieve convergence to the posterior distribution.
Plots of Brooks-Gelman multivariate potential scale reduction factor for Monte Carlo simulation conditions. Note. Each panel includes plots of from 50 replications. is calculated from five independent chains with different starting values and assesses convergence of item parameters, Ω, and latent class probabilities, π. The plots are based upon chain lengths from 50 to 2,500 in increments of 50.
Figures 2 and 3 plot the Monte Carlo estimated expected values and RMSE of the slipping and guessing parameters across populations values for sj, gj, and π. Figure 2 provides evidence that the proposed algorithm yielded minimal bias for item parameters across the eight simulation conditions. Figure 3 plots the slipping and guessing RMSE for the 30 items across the eight simulation conditions. Figure 2 demonstrated the algorithm yielded unbiased estimates of item parameters, which implies that the RMSE values in Figure 3 quantify sampling variability. Similar to prior research (de la Torre, 2009a), Figure 3 provides evidence that sampling variability for slipping parameters is associated with the number of requisite skills (i.e., ). The sampling variability of sj was nearly twice as large for some items that required three skills as opposed to items that required one skill. Furthermore, the sampling variability of sj was larger in cases where π had a flat versus high skill distribution. Consequently, the results in Figure 3 provide evidence of an interaction effect involving the number of ones in and the skill distribution on the sampling variability of slipping parameters. In contrast, Figure 3 suggests that the sampling variability for gj was relatively smaller for items that required more skills.
Monte Carlo expected slipping and guessing parameters for different population values for sj, gj, and π. Note. The skills required for each item are listed in the Q′ matrix along the x-axis where and . Expected values were estimated from 50 replications.
Monte Carlo root mean squared error (RMSE) of slipping and guessing parameters for different population values for sj, gj, and . Note. The skills required for each item are listed in the Q′ matrix along the x-axis where and . RMSE values were estimated from 50 replications.
The RMSE for slipping parameters exceeded 0.04 for items that require three skills, which, given that bias is near 0, translates to a margin of error of approximately ±.08 (i.e., if the sampling distribution for slipping parameters with N = 1,000 is approximately normal). There are at least two interpretations. One possibility is that the proposed algorithm, while unbiased, may not be efficient. Alternatively, the observed sampling variability in Figure 3 could be characteristic of the DINA model. Note that de la Torre (2009) found a similar relationship between the requisite skills and the RMSE for item parameters with the exception that the maximum standard error for slipping and guessing parameters was 0.031 and 0.017, respectively, for N = 2,000 and a flat π distribution. That is, de la Torre (2009) found a negative relationship between and RMSE for gj and a positive relationship between and the RMSE for sj. An additional simulation study replicated de la Torre (2009) with 100 samples to gather evidence as to the plausibility of either explanation. In the follow-up simulation study, the maximum RMSE for slipping and guessing was 0.032 and 0.017, which provides evidence for the latter explanation that slipping (guessing) parameters are more variable for items that require more (fewer) skills.
Figures 4 and 5 present Monte Carlo evidence concerning the bias and RMSE for the latent class skill distribution π. Similar to Figure 2, the proposed algorithm yielded unbiased estimates of π across the simulation conditions. Figure 5 plots RMSE for π and provides information about sampling variability. Unlike Figure 3, Figure 5 shows that RMSE for π was unrelated to the diagnosticity condition and shape of the latent class skill distribution. However, in panel h of Figure 5 (i.e., the Flat π and low diagnosticity case), the sampling variability of was elevated for less skilled latent classes.
Monte Carlo expected latent class skill distribution for different population values for sj, gj, and π. Note. is included in the x-axis label where for the absence of a skill and for the presence of a skill. Expected values were estimated from 50 replications.
Monte Carlo root mean squared error (RMSE) of latent class skill distribution for different population values for sj, gj, and π. Note. is included in the x-axis label where for the absence of a skill and for the presence of a skill. RMSE values were estimated from 50 replications.
In short, the Monte Carlo simulation study supports several conclusions. First, the new Gibbs sampler accurately estimates model parameters. Second, the algorithm is accurate when employing uninformative priors. Third, the results suggest that the model converges in a reasonable number of iterations.
Application: Revised Purdue Spatial Visualization Tests–Visualization of Rotations
This subsection reports an application of the DINA model to 388 responses to the Revised Purdue Spatial Visualization Tests–Visualization of Rotations (PSVT-R; see Maeda, Yoon, Kim-Kang, & Imbrie, 2013; Maeda & Yoon, 2013; Yoon, 2011). The PSVT-R is a revision of Guay (1976) and assesses test takers’ ability to mentally rotate three-dimensional objects along the x- and y-axes. Figure 6 includes a practice PSVT-R item. For the PSVT-R, items include a reference object that is rotated and then subjects are presented a new object and must determine which of the five options corresponds to the rotated version. For the practice item in Figure 6, subjects must be able to perform 90° rotations on both the x- and the y-axes (the correct answer is “D”). All items include x- and y-axes rotations with objects of varying complexity. Accordingly, the DINA model was applied to the PSVT-R by defining four mental rotation skills: (1) x-axis (i.e., ), (2) y-axis (i.e., ), (3) x-axis (i.e., ), and (4) y-axis (i.e., ). The defined skills imply a hierarchical structure. Namely, students who have mastery of rotations should also be skilled at rotations. Consequently, the hierarchical structure of the spatial rotation tasks requires 9 of the 16 possible latent classes. More specifically, the following classes are omitted: “0010,” “0001,” 1001,” “0110,” “0011,” “1011,” and “0111,” because the rotation skills appear without the corresponding rotation skills.
Practice item from the Revised Purdue Spatial Visualization Tests–Visualization of Rotations.
Figure 7 plots the Brooks-Gelman multivariate to provide information about posterior convergence for Ω and π. The computed was calculated from five independent chains that employed random starting values. Specifically, starting values for sj and gj were drawn from uniform distributions, π was sampled from a uniform Dirichlet, and was sampled from a uniform categorical distribution. The plot is based upon chain lengths from 50 to 2,500 in increments of 50. A standard rule of thumb is that convergence to a posterior distribution has been reached once is below 1.1 or 1.2. Figure 7 provides evidence after 350 iterations. The following discussion interprets parameter estimates using a burn-in of 1,000 iterations.
Plot of Brooks-Gelman multivariate potential scale reduction factor, , for the Purdue Spatial Visualization Tests–Visualization of Rotations (PSVT-R) application. Note. is based upon five independent chains with different starting values and assesses convergence of item parameters, Ω, and latent class probabilities, π. The plot is based upon chain lengths from 50 to 2,500 in increments of 50.
Table 1 reports the Q matrix and MCMC estimates of the posterior item parameters. First, note that the 30 PSVT-R items demonstrated variability in estimated guessing and slipping parameters. Specifically, gj ranged from 0.084 to 0.733 and ranged from 0.384 to 0.961. Consequently, heterogeneity in object complexity contributed to heterogeneity in slipping and guessing parameters. For instance, students who have the requisite skills are able to answer 21 of the 30 items with greater than an 80% probability. In contrast, students with the requisite skills have less than a 50% probability of correctly answering Items 29 and 30 (i.e., and ). The ability of subjects to correctly guess with high probability is expected because some items use simpler shapes, and the distractors may be easier to detect (e.g., see the practice problem in Figure 6). For instance, the probability of guessing the correct answer on Items 1 and 2 was and , respectively. In contrast, the probability of correctly guessing Item 30, which had a more complex shape, was nearly 0 with .
PSVT-R Q Matrix and MCMC Estimates of gj and
Q
Item
Mean
SE
Mean
SE
1
1
0
0
0
0.617
0.089
0.944
0.017
2
0
1
0
0
0.733
0.047
0.920
0.018
3
0
1
0
0
0.654
0.053
0.961
0.013
4
0
1
0
0
0.676
0.049
0.886
0.021
5
0
0
1
0
0.627
0.041
0.950
0.018
6
0
0
1
0
0.603
0.041
0.928
0.021
7
0
1
0
0
0.410
0.066
0.914
0.020
8
0
1
1
0
0.560
0.037
0.765
0.034
9
1
1
0
0
0.419
0.045
0.862
0.024
10
0
0
0
1
0.567
0.039
0.902
0.024
11
1
1
0
0
0.545
0.044
0.881
0.023
12
0
0
0
1
0.389
0.042
0.925
0.022
13
1
1
0
0
0.476
0.043
0.871
0.024
14
1
1
0
0
0.377
0.043
0.853
0.025
15
0
1
0
0
0.303
0.050
0.700
0.032
16
1
1
0
0
0.387
0.042
0.843
0.027
17
0
1
1
0
0.502
0.039
0.849
0.028
18
1
1
0
0
0.398
0.044
0.757
0.029
19
0
0
0
1
0.421
0.039
0.870
0.029
20
1
1
0
0
0.378
0.043
0.848
0.026
21
0
0
0
1
0.336
0.039
0.847
0.030
22
0
0
0
1
0.402
0.039
0.813
0.030
23
1
0
0
1
0.423
0.036
0.830
0.033
24
0
0
1
0
0.404
0.041
0.760
0.035
25
0
1
1
0
0.356
0.037
0.722
0.037
26
0
1
1
0
0.293
0.035
0.704
0.039
27
1
0
0
1
0.420
0.036
0.820
0.033
28
0
1
1
0
0.268
0.034
0.660
0.038
29
1
1
0
0
0.209
0.034
0.492
0.034
30
0
1
1
0
0.084
0.022
0.384
0.037
Note. The rotation skills are = x-axis, = y-axis, = x-axis, and = y-axis. PSVT-R = Purdue Spatial Visualization Tests–Visualization of Rotations; MCMC = Markov Chain Monte Carlo.
It is important to recall that items with larger gj and sj provide less diagnostic information. The results in Table 1 suggest that several PSVT-R items have larger gj and sj. One concern with applying CDMs is misspecification of Q. The PSVT-R items require the skill to dynamically rotate objects, and the specified Q matrix is consistent with prevailing spatial cognition theory (Uttal et al., 2013). Consequently, additional research is needed to assess the PSVT-R item quality (e.g., revision of distractors or objects).
Table 2 reports information for and shows that the majority of subjects were classified into two of the nine classes. Latent class “1111” was the most prevalent with , which provides evidence that nearly half of subjects were skilled at rotations along the x- and y-axes. The second most prevalent latent class consisted of subjects who were not skilled at rotations (i.e., latent class “0000”) as evidenced by . The results in Table 2 also suggest that 24.6% of subjects were skilled at only rotations (i.e., latent classes “1000,” “0100,” or “1100”). Finally, the results in Table 2 suggest that 2.3% of subjects were only skilled at x-axis rotations (i.e., “1010”), and 3.8% subjects were only skilled at y-axis rotations (i.e., “0101”).
MCMC Posterior Class Probability Distribution for Spatial Rotation Test
Latent Classes
Mean
SE
0
0
0
0
0.188
0.053
1
0
0
0
0.083
0.051
0
1
0
0
0.073
0.029
1
1
0
0
0.089
0.028
1
0
1
0
0.023
0.018
0
1
0
1
0.038
0.016
1
1
1
0
0.029
0.019
1
1
0
1
0.018
0.015
1
1
1
1
0.456
0.033
Note. The rotation skills are = x-axis, = y-axis, = x-axis, and = y-axis. MCMC = Markov Chain Monte Carlo; SE = standard error.
Discussion
This study presented a Bayesian formulation of the widely applied DINA model. An efficient Gibbs sampler was developed to estimate model parameters. The purpose of this section is to discuss the benefits of the current study in the context of existing research and to describe directions for future research.
First, the proposed formulation could be extended to estimate Q. CDMs assume that researchers are able to accurately articulate the skills needed to correctly respond to items in Q. Recent research considered statistical procedures for estimating Q (Chen, Liu, Xu, & Ying, 2014; Liu, Xu, & Ying, 2012, 2013) and for detecting a misspecified Q matrix (Chiu, 2013; de la Torre, 2008; Rupp & Templin, 2008). For instance, Chiu (2013) provides an overview of methods and procedures for refining Q. Furthermore, DeCarlo (2012) and Templin and Henson (2006a) described Bayesian approaches for modeling selected elements of Q as random rather than fixed. Note that Templin and Henson (2006a) employed MH sampling and Decarlo (2012) used Gibbs sampling with OpenBUGS. The proposed Bayesian formulation can incorporate prior strategies for modeling elements of Q with the advantages of the proposed Bayesian formulation, which includes (1) improved mixing of Ω and π, (2) enforcement of the monotonicity requirement, and (3) ease of implementation without tuning.
Second, the DINA model is a special case of more general CDMs (e.g., see de la Torre, 2011; Henson et al., 2009; von Davier, 2008). Future research could extend the Bayesian formulation developed in this study to more general CDM models. Specifically, the model formulation described in this article demonstrated how to exploit the fact that CDMs include a finite number of latent classes. Future research should consider implementing a similar strategy for estimating other CDMs.
Third, evidence from the Monte Carlo simulation and application suggests that the proposed algorithm converged to the parameter posterior distribution within a reasonable number of MCMC iterations. For instance, the results in Figures 1 and 6 suggest the Gibbs sampler converged generally in under 750 iterations. Furthermore, the simulation study provided evidence that the algorithm is accurate when using uninformative priors for Ω and π.
Computing the full conditional distribution over the latent classes is not without some disadvantages and “ … some simplification might be desired when the dimension of α is larger than or perhaps ” (de la Torre & Douglas, 2004, p. 337). The MCMC program was written in C++ using the Rcpp package (Eddelbuettel et al., 2011; Eddelbuettel, 2013) and completed 5,000 MCMC iterations with , , and in approximately 128 seconds using a laptop with a 2.4 GHz processor and 6GB of RAM. However, the developed Gibbs sampler likely requires additional computational resources in comparison to procedures that approximate the latent class distribution. For example, de la Torre and Douglas (2004) employed concepts from nonlinear factor analysis to model higher order factors underlying latent skills. de la Torre and Douglas (2004) proposed the HO-DINA where skills are independent given a higher order dimensional continuous latent trait , such that . The HO-DINA approximates the latent classes with parameters for the relationships among the higher order factors and parameters for the latent intercepts and loadings if the factor model has simple structure. If , there are 1,048,576 latent classes that the developed algorithm would loop over for each student for every MCMC iteration. In contrast, employing an HO-DINA model with would require the estimation of 230 parameters and could, if the approximation is adequate, be computationally advantageous. Additional research is needed for understanding best practice in applying CDMs in high-dimensional applications.
In conclusion, a new Bayesian model formulation was presented for the DINA model. Research regarding CDMs remains a priority, and the methods developed in this article provide an alternative for easily estimating model parameters.
Footnotes
Appendix
Sampling sj and gj with the Monotonicity Restriction of , the slipping and guessing parameters must be jointly sampled from the full conditional distributions to ensure the monotonicity of likelihood condition, . Let denote the joint density function of sj and gj, let and be the respective marginal density functions, and let and be the conditional distributions. For the model formulation discussed in this article, the joint distribution is defined as,
where and I denotes the indicator function. Accordingly, the marginal distribution of gj is,
where is the cumulative distribution function of sj evaluated at . Note that is a continuous mixture distribution and is analytically intractable (Pham-Gia & Turkkan, 1998). The conditional distribution of sj given gj is,
which is a right-truncated Beta distribution at . A similar argument implies that is a right-truncated Beta distribution at . Let and be the guessing and slipping parameters at iteration t. gj and sj can be sampled from the linearly truncated full conditional distribution in two steps,
1. Sample
2. Draw
Acknowledgments
This article benefited from the comments and suggestions of Jeffrey Douglas, Aaron Hudson, Li Cai, and three anonymous reviewers. Any remaining errors belong to the author.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
References
1.
BéguinA. A.GlasC. A. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541–561.
2.
BrooksS. P.GelmanA. (1998). General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434–455.
3.
ChenY.LiuJ.XuG.YingZ. (2015). Statistical analysis of Q-matrix based diagnostic classification models. Journal of the American Statistical Association, 110, 850–866.
4.
ChiuC. (2013). Statistical refinement of the Q-matrix in cognitive diagnosis. Applied Psychological Measurement, 37, 598–618.
5.
CowlesM. K. (1996). Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models. Statistics and Computing, 6, 101–111.
6.
CulpepperS. A. (in press). Revisiting the 4-parameter item response model: Bayesian estimation and application. Psychometrika.
7.
DeCarloL. T. (2012). Recognizing uncertainty in the Q-matrix via a Bayesian extension of the DINA model. Applied Psychological Measurement, 36, 447–468.
8.
de la TorreJ. (2008). An empirically based method of Q-matrix validation for the DINA model: Development and applications. Journal of Educational Measurement, 45, 343–362.
9.
de la TorreJ. (2009a). DINA model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34, 115–130.
10.
de la TorreJ. (2009b). Estimation code for the G-DINA model. Paper presented at the meeting of the American Educational Research Association, San Diego, CA.
11.
de la TorreJ. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199.
12.
de la TorreJ.DouglasJ. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333–353.
13.
de la TorreJ.DouglasJ. A. (2008). Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data. Psychometrika, 73, 595–624.
HaertelE. H. (1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 301–321.
19.
HartzS. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Champaign.
20.
HensonR. A.TemplinJ. L.WillseJ. T. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191–210.
21.
HuebnerA.WangC. (2011). A note on comparing examinee classification methods for cognitive diagnosis models. Educational and Psychological Measurement, 71, 407–419.
22.
JunkerB. W.SijtsmaK. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272.
23.
LiF. (2008). A modified higher-order DINA model for detecting differential item functioning and differential attribute functioning (Unpublished doctoral dissertation). University of Georgia, Athens.
LiuJ.XuG.YingZ. (2013). Theory of the self-learning Q-matrix. Bernoulli, 19, 1790.
26.
MacreadyG. B.DaytonC. M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational and Behavioral Statistics, 2, 99–120.
27.
MaedaY.YoonS. (2013). A meta-analysis on gender differences in mental rotation ability measured by the Purdue Spatial Visualization Tests: Visualization of rotations (PSVT:R). Educational Psychology Review, 25, 69–94.
28.
MaedaY.YoonS. Y.Kim-KangG.ImbrieP. (2013). Psychometric properties of the revised PSVT:R for measuring first year engineering students’ spatial ability. International Journal of Engineering Education, 29, 763–776.
Pham-GiaT.TurkkanN. (1998). Distribution of the linear combination of two general beta variables and applications. Communications in Statistics-Theory and Methods, 27, 1851–1869.
31.
PlummerM.BestN.CowlesK.VinesK. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6, 7–11. Retrieved fromhttp://CRAN.R-project.org/doc/Rnews/
32.
RobertC.CasellaG. (2009). Introducing Monte Carlo methods with R. New York, NY: Springer.
33.
RoussosL. A.DiBelloL. V.StoutW.HartzS. M.HensonR. A.TemplinJ. L. (2007). The fusion model skills diagnosis system. In LeightonJ. P.GierlM. J. (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 275–318). Cambridge, England: Cambridge University Press.
34.
RoussosL. A.TemplinJ. L.HensonR. A. (2007). Skills diagnosis using IRT-based latent class models. Journal of Educational Measurement, 44, 293–311.
35.
RuppA. A. (2009). Software for calibrating diagnostic classification models: An overview of the current state-of-the-art. Symposium conducted at the meeting of the American Educational Research Association, San Diego, CA.
36.
RuppA. A.TemplinJ. L. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68, 78–96.
37.
RuppA. A.TemplinJ. L.HensonR. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York, NY: Guilford Press.
38.
SahuS. K. (2002). Bayesian estimation and model choice in item response models. Journal of Statistical Computation and Simulation, 72, 217–232.
39.
TemplinJ. L. (2004). Generalized linear mixed proficiency models (Unpublished doctoral dissertation). University of Illinois at Urbana-Champaign, Champaign.
40.
TemplinJ. L.HensonR. A. (2006a). A Bayesian method for incorporating uncertainty into Q-matrix estimation in skills assessment. Symposium conducted at the meeting of the American Educational Research Association, San Diego, CA.
41.
TemplinJ. L.HensonR. A. (2006b). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287.
42.
UttalD. H.MeadowN. G.TiptonE.HandL. L.AldenA. R.WarrenC.NewcombeN. S. (2013). The malleability of spatial skills: A meta-analysis of training studies. Psychological Bulletin, 139, 352.
43.
von DavierM. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307.
44.
von DavierM. (2014). The DINA model as a constrained general diagnostic model: Two variants of a model equivalency. British Journal of Mathematical and Statistical Psychology, 67, 49–71.
45.
YoonS. Y. (2011). Psychometric properties of the revised Purdue Spatial Visualization Tests: Visualization of rotations (the revised PSVT-R)(Unpublished doctoral dissertation). Purdue University, West Lafayette.