Abstract
In the typical application of a cognitive diagnosis model, the Q-matrix, which reflects the theory with respect to the skills indicated by the items, is assumed to be known. However, the Q-matrix is usually determined by expert judgment, and so there can be uncertainty about some of its elements. Here it is shown that this uncertainty can be recognized and explored via a Bayesian extension of the DINA (deterministic input noisy and) model. The approach used is to specify some elements of the Q-matrix as being random rather than as fixed; posterior distributions can then be used to obtain information about elements whose inclusion in the Q-matrix is questionable. Simulations show that this approach helps to recover the true Q-matrix when there is uncertainty about some elements. An application to the fraction-subtraction data of K. K. Tatsuoka suggests a modified Q-matrix that gives improved relative fit.
The goal in cognitive diagnosis, as the name suggests, is to “diagnose” which skills examinees have or do not have. This approach potentially offers more useful feedback to examinees than a simple overall score derived from classical test theory or item response theory, and so cognitive diagnosis models (CDMs) have become popular in recent years (for reviews, see DiBello, Roussos, & Stout, 2007; Fu & Li, 2007; Rupp & Templin, 2008b). A useful CDM that has been widely studied is the DINA (deterministic input noisy and) model (Haertel, 1989; Junker & Sijtsma, 2001; Macready & Dayton, 1977).
The use of a CDM requires the specification of a Q-matrix (K. K. Tatsuoka, 1990); the Q-matrix indicates the set of skills that are required to answer a particular item correctly. The Q-matrix is usually determined by expert judgment, and so there can be uncertainty about some of its elements. Here, it is noted that this uncertainty can be recognized by using a Bayesian approach, with some elements of the Q-matrix specified as being random rather than as fixed. Posterior distributions can then be used to obtain information about the inclusion or exclusion of these elements, as shown in the following. The Bayesian approach allows one to explore possible modifications of the Q-matrix, ideally as suggested by expert judgment or by substantive considerations. It is shown that the approach is simple to implement, particularly with a reparameterized version of the DINA model. Simulations that explore benefits and limitations of the approach are presented, and the widely analyzed fraction-subtraction data of K. K. Tatsuoka (1990) are examined.
Uncertainty in the Q-Matrix
Most researchers who have worked with CDMs have recognized that there is often uncertainty with respect to at least some elements of the Q-matrix. For example, virtually every researcher who has analyzed the fraction-subtraction data has suggested possible modifications of the Q-matrix (e.g., DeCarlo, 2011; de la Torre, 2009; de la Torre & Douglas, 2008; Henson, Templin, & Willse, 2009; C. Tatsuoka, 2002; K. K. Tatsuoka, 1990), and so there is clearly uncertainty as to the correct specification of some elements. There are several approaches to this problem. One is to consider a set of alternative Q-matrices and to fit the models associated with these matrices; indices of relative fit, such as the Bayesian information criterion (BIC) or Akaike information criterion (AIC), can then be used to help determine the appropriate Q-matrix. This approach has been used by a number of researchers (e.g., Barnes, Bitzer, & Vouk, 2005; Cen, Koedinger, & Junker, 2005; DeCarlo, 2011; de la Torre & Douglas, 2008; Rupp & Templin, 2008a). A limitation of this approach, however, is that the number of possible Q-matrices grows quickly as the number of uncertain elements increases. For example, the following simulation examines a situation where there is uncertainty about 12 elements of the Q-matrix (out of 60 elements), which gives 212 = 4,096 possible Q-matrices, and so a large number of models would need to be fitted and compared.
An approach to this problem was offered by de la Torre (2008), in which he suggested using a sequential search algorithm, which helps to avoid the large number of possibilities associated with an exhaustive search. In particular, a sequential search algorithm was used with a fit statistic that minimizes the sum of the average slip and guess parameters. There are, however, some limitations to that approach. For example, de la Torre noted that the approach is based on a particular fit statistic, δ, and that other statistics might be more useful or appropriate. Another problem is that the sequential search algorithm might not lead to the best solution; this is suggested below in a reanalysis of the fraction-subtraction data of K. K. Tatsuoka (1990).
Another way to recognize uncertainty is via a Bayesian approach, as has been noted by several researchers. For example, J. Templin and Henson (2006) used a Bayesian approach to allow for uncertain elements in the Q-matrix, in the same manner as done here. Similarly, Muthén and Asparouhov (2010) discussed using a Bayesian approach to recognize uncertainty in factor analysis models (e.g., to recognize uncertainty about factor loadings) and other (structural equation and regression) models. It is shown here that a Bayesian approach is useful and is simple to implement, particularly with a reparameterized version of the DINA model.
The DINA Model and a Reparameterization
The basic idea underlying the DINA model is that, for each item, certain skills are needed for an examinee to answer the item correctly. The skills are assumed to be either present or absent, and so the skills are latent dichotomous variables that take on values of zero or one, typically denoted as α ik for examinee i and skill k. The DINA model is a conjunctive model (Maris, 1999), in that it is assumed that an examinee must have all the required skills to answer an item correctly.
Let Y
ij
be a binary variable that indicates whether the response of the ith examinee to the jth item is correct or incorrect (1 or 0), and let
with
Substituting the second equation above into the first gives the DINA model,
For the Bayesian generalization, uncertainty is recognized by allowing some of the elements of the Q-matrix (i.e., the q jk ) to be random rather than fixed, as done by J. Templin and Henson (2006) and discussed the following. The preceding form of the model, however, is somewhat complex (e.g., the q jk are exponents of exponents). A simpler (but equivalent) form of the model can be obtained by reparameterizing it as follows:
Equation 1 is a reparameterization of the DINA model and has been referred to as the RDINA (reparameterized deterministic input noisy and) model (DeCarlo, 2011). The parameter f j is the false alarm rate, as described earlier, whereas d j is a discrimination (detection) parameter that indicates how well the item discriminates between the presence versus absence of the required skill set. The parameter d j has an interpretation in SDT as a distance measure (e.g., DeCarlo, 2010; Macmillan & Creelman, 2005); a related discrimination parameter, although not a distance measure, has been discussed by Henson, Roussos, Douglas, and He (2008). Note that the DINA parameters are easily recovered from the RDINA parameters,
where exp is the exponential function, and
An attractive aspect of Equation 1 is that it is a simple logistic regression model with latent (multiplicative) predictors (i.e., the latent dichotomous α), and so it can easily be fit with software for latent class analysis, as previously shown (DeCarlo, 2011). In a similar vein, although the Bayesian generalization can be accomplished with either form of the model, the RDINA version is simple to implement, as shown in the Appendix.
Note that, when fitting the DINA or RDINA model, the fixed elements q jk in Equation 1 are simply set to zero or one, according to the Q-matrix specification. For example, suppose that out of a set of four skills, Skills 1 and 3 are considered as necessary to solve the first item, but not Skills 2 or 4. The first row of the Q-matrix is then 1, 0, 1, 0, which means that the model for the first item is
which shows that the q jk elements simply select which terms (α ik ) appear in the model. Note that the product term following d1 is equal to one only if skills αi1 and αi3 are present; otherwise, it is equal to zero, which is the conjunctive aspect of the model. The Bayesian extension of Equation 1 differs in that some q jk explicitly appear in the model as random parameters, rather than being set to zero or one.
Equation 1 is the “examinee-level” part of the model, which is concerned with how the examinees’ response patterns are related to the skill patterns. The full model includes a (higher order) model for the skill patterns:
where the first term on the right of the equal sign is the “skill-level” model and the second term is the examinee-level model. The skill-level model is concerned with relations among the skills, and the term
where p(α
k
) are the probabilities of the K skills (latent class sizes in latent class analysis). Equation 2 also has a product term, which follows from an assumption of local independence of the indicators given the skills. Note that
In addition to the simple DINA and RDINA models with independence, various higher order models have also been considered in the literature on CDMs (e.g., de la Torre & Douglas, 2004; J. L. Templin, Henson, Templin, & Roussos, 2008); these models allow for structure in the skill-level part of the model, such as allowing for correlations among the skills. For example, the independence assumption can be replaced with a conditional independence assumption by including a latent continuous variable θ (e.g., examinee ability) in the skill-level part of the model as follows:
The latent variable θ accounts for associations among the K skills, as in a factor analysis model. Replacing
A simple logistic model has been used for the higher order model (e.g., DeCarlo, 2011; de la Torre & Douglas, 2004):
where b k is an item difficulty parameter and a k is an item discrimination parameter. Using the higher order model with Equations 1 and 2 gives a higher order version of the reparameterized version of the model (higher order reparameterized deterministic input noisy and [HO-RDINA]).
As previously shown, the RDINA and HO-RDINA models are simple to implement in software that can fit logistic models, or more generally, generalized linear models with latent predictors, such as LEM (Vermunt, 1997), Latent Gold (Vermunt & Magidson, 2005), and other software (e.g., GLLAMM; Rabe-Hesketh, Skrondal, & Pickles, 2004); for sample LEM and Latent Gold programs, see DeCarlo (2011). The software can also be used to consider various extensions of the models, such as allowing for more than two response categories (with nominal or ordinal responses) or allowing for skills with more than two categories (either nominal or ordinal). With respect to the (more complex) Bayesian extensions of the RDINA and HO-RDINA models considered here, the freely available software OpenBUGS (Thomas, O’Hara, Ligges, & Sturtz, 2006) can be used; the Appendix provides a sample program.
The Bayesian RDINA/HO-RDINA Models
Bayesian extensions of the above models allow some of the elements of the Q-matrix to be random rather than fixed. In particular, instead of fixing all the q jk to zero or one, which is the usual approach to specifying the Q-matrix, some of the q jk are treated as random variables, and in particular as Bernoulli variables with parameters p jk . The choice of Bernoulli variables reflects the fact that the q jk only take on values of zero or one. The current approach relaxes the assumption, for some elements of Q, that the q jk are known. Posterior distributions can then be used to guide decisions about the q jk in question.
The Bayesian version of the RDINA model can be specified in the same manner as earlier. To start, the Y ij are conditionally independent and Bernoulli distributed:
where p j is given by the inverse of Equation 1; that is,
where the expit function is as defined above. In the Bayesian approach, the parameters f j and d j are specified as random. In addition, instead of being fixed, some of the elements of the Q-matrix are now treated as Bernoulli variables:
where the tilde (~) on q is used to indicate that the element is random (compare Lee, 2004 ); note that the use of tilde is avoided in the following when it is clear from the context that q is random. As before, elements of the Q-matrix that are assumed to be known are set to zero or one, according to the Q-matrix specification. For uncertain elements, however,
For example, Beta(1,1) is the uniform distribution and is used as a prior below. As is well known in Bayesian analysis, the Beta distribution is a conjugate prior for the Bernoulli distribution, and so the posteriors of p
jk
given
The posterior reflects the information that the data provide about whether
Equation 4 shows that, for a Beta(1,1) prior, the posterior mean for p
jk
will be 1/3 if
Implementation in OpenBUGS
The Appendix provides a program that shows how to implement the Bayesian RDINA model using the freely available software OpenBUGS. The program includes uncertainty about the skills for Items 1, 8, and 13, as in the simulations below. The program shows that to include uncertain elements, the latent dichotomous variable α k (denoted as xk in the program) is raised to the power q jk , where q jk is a Bernoulli variable with parameter p jk . Each uncertain element has its own probability, with a Beta(1,1) prior used for p jk . In addition, “mildly” informative normal priors (mean of 0 and variance of 10; note that OpenBUGS uses the precision, which is the inverse of the variance) are used for f j and d j , and the bounds function, I(0,), is used to restrict d to positive values (more informative priors could also be used, given that there is often information available about values of SDT parameters found in practice).
Note that, although the model could be fit as in Equation 1 by using a logit transform of the response probabilities, problems can arise with this approach when there are extreme probabilities, because the logits then go toward plus or minus infinity. This problem does not appear to arise if the probability version of the model (i.e., with the expit function) is used, as is done in the program in the Appendix.
Simulations: Uncertainty About Q-Matrix Elements
The utility of the Bayesian approach is examined in a number of small simulations. The basic RDINA model with independence is first examined in several conditions to determine if the Bayesian approach is of use in at least the simplest case. In the first simulation, there is uncertainty about 4 elements of the Q-matrix (for three items), with the remaining elements correctly specified. In the second simulation, there is complete uncertainty about the skills for three items, giving a total of 12 uncertain elements (20% of the 60 Q-matrix elements), as shown by the Bayesian Q-matrix in Table 1. The third simulation introduces a new aspect, in that there is again uncertainty about 12 elements, as in the second simulation; however, 6 other elements of the Q-matrix are now incorrectly specified in the fitted model, as shown in Table 2. Thus, the third simulation examines the effect of misspecification of other parts of the Q-matrix (i.e., parts that are not under question) on recovery of elements that are considered uncertain. One could argue that this is a realistic situation, in that it seems likely that in practice, some elements of the Q-matrix might be misspecified without the researcher’s knowledge (i.e., the elements are not recognized as being uncertain).
True Q-Matrix and Bayesian Q-Matrix (Conditions 2 and 7) for Four Skills and Fifteen Items
Note: The Bayesian Q-matrix has complete uncertainty about the four skills for Items 1, 8, and 13.
Q-matrix With Complete Uncertainty for Three Items and Six Misspecified Elements (Conditions 3 and 8)
Note: Misspecified elements are shown with arrows, with original value followed by the misspecified value. For example, Skill 3 (α3) for Item 2 is misspecified as 1, whereas it is 0 in the true Q-matrix shown in Table 1.
The preceding conditions examine the utility of the Bayesian approach for determining uncertain elements in the Q-matrix, with the number of skills correctly specified. The next three conditions provide some information about a different question, which is whether the correct number of skills has been specified. In the fourth condition, the same data used in the first three simulations were used, and the fitted model used the correct Q-matrix for the first three skills; however, the fourth skill (which was necessary) was specified as being completely uncertain for all 15 items. In this situation, one is not sure if the fourth skill should be included, but in fact it is necessary. The fifth condition is the same as the fourth condition, except that five other elements of the Q-matrix were misspecified, as shown in Table 2 (although q74 was now uncertain instead of misspecified). In the sixth condition, the Q-matrix was correctly specified for the first four skills, but a fifth (nonrequired) skill, with complete uncertainty about its elements, was introduced. In this situation, one is again not sure whether an additional skill is needed, but in this case it should not be included. The fourth through sixth conditions provide information about the use of the Bayesian approach in situations where there is uncertainty as to the number of skills.
Finally, the seventh and eighth conditions examined the utility of the approach in situations where the skills have a higher order structure. In particular, data were generated according to the HO-RDINA model given earlier. In the seventh condition, the fitted model had 12 uncertain elements, as in the second condition stated earlier, with the difference that the generated data had a higher order structure. In the eighth condition, the fitted model again had 12 uncertain elements and, in addition, 6 misspecified elements, as in the third condition stated earlier. The simulations provide information about a range of basic situations of interest.
Method
The simulations used a Q-matrix for 15 items that was previously used by Rupp and Templin (2008a) and DeCarlo (2011); the matrix is shown in Table 1. For the generated data, the population values chosen for the detection parameters d j and the false alarm rates f j were similar to estimates obtained for real data in previous research. For example, an analysis of Tatsuoka’s fraction-subtraction data with RDINA (DeCarlo, 2011) gave estimates of d j for 21 items that ranged from 1.7 to 6.9 with a mean of 4.3, and estimates of f j that ranged from −4.8 to −0.1 with a mean of −2.3. For the current simulation, values of d j from 1.5 to 5.5 and f j from −4 to 0 were used; the skill class sizes were 0.38, 0.50, 0.62, and 0.73, and the sample size was 1,000. The purpose here was not to examine parameter recovery (which appeared to be good), but rather to explore whether the Bayesian approach helps to recover the true Q-matrix elements when there is uncertainty about them.
The Bayesian RDINA and HO-RDINA models were fitted using OpenBUGS. For each condition, the complete analysis was repeated for the 20 generated data sets. For each run, the data were loaded into OpenBUGS, and the syntax program (see the appendix) was then run with 5,000 burn-ins followed by 20,000 iterations for posterior sampling. With this number of iterations, the Monte Carlo errors (see Geyer, 1992), which are given in the OpenBUGS output, were generally less than 5% of the sample standard deviations (also given in the output), which has previously been suggested as a rule of thumb to assess convergence (Spiegelhalter, Thomas, Best, & Lunn, 2003). Multiple chains also appeared to converge.
In the first condition, four elements in the fitted RDINA model were specified as being uncertain; specifically, elements q13, q14, q82, and q131 for Items 1, 8, and 13 were treated as Bernoulli variables. In the second condition, there was complete uncertainty about all the skills for 3 items (a kind of worst-case scenario), as shown by the Q-matrix on the right side of Table 1 with 12 uncertain elements; note that this would require an examination of 4,096 Q-matrices in an exhaustive search approach. In the third condition, there were again 12 uncertain elements in the Q-matrix, but 6 other elements were incorrectly specified in the fitted model, and in particular, three skills were incorrectly included and three skills were incorrectly excluded, as shown by the Q-matrix in Table 2.
The next three conditions examined situations where there was complete uncertainty about a skill. In the fourth condition, the fitted model used the correct Q-matrix (shown on the left side of Table 1); however, all elements for the fourth skill were treated as being uncertain, that is, the qj4 were random for all 15 items. This examines a situation where there is uncertainty about whether the skill is needed, when in fact it is needed. The fifth condition was the same except that five other elements of the Q-matrix were misspecified, as shown in Table 2 (with the difference that q74 was specified as uncertain rather than misspecified). For the sixth condition, the fitted model used the Q-matrix for the four skills, as shown in Table 1, but a fifth skill was also included, with all elements treated as uncertain, and so the qj5 were uncertain for all 15 items. This examines a situation where there is uncertainty about whether an additional skill should be included, when in fact it should not be included.
Finally, in the seventh and eighth conditions, data with a simple higher order structure for the skills were generated. These conditions provide information about Q-element recovery in the presence of a higher order skill structure. In particular, data for the HO-RDINA model were generated, with Equation 3 used to specify the higher order structure; the Q-matrix was the same as that shown in Table 1. The population values for the higher order parameters were suggested by results for an application of the HO-RDINA model (and restricted versions) to the fraction-subtraction data (DeCarlo, 2011), which found estimates of a k that ranged from 0.7 to 4.0 (and 3.5 for a version with equal a k ) and estimates of b k that ranged from −1 to 4. For the simulation, a common value of a k = 3 was used with values of b1 to b4 of −1, 0, 1, and 2, respectively (which give skill class sizes similar to those used for the RDINA simulation above). In Condition 7, the fitted HO-RDINA model (with equal a k ) had 12 elements of the Q-matrix specified as being uncertain, as in Condition 2. In Condition 8, the fitted model had 12 uncertain Q-elements, and 6 misspecified elements, as in Condition 3.
Results
Condition 1: Four uncertain Q-elements
The main tools used here are means and plots of the posterior distributions of q
jk
and p
jk
for the uncertain elements. For example, for the first generated data set, Figure 1 shows plots from OpenBUGS of the posterior distributions of q
jk
and p
jk
. The true Q-matrix elements in this case are 0, 0, 1, 1, and it is apparent that a similar pattern appears in the plots. For example, the first row of Figure 1 shows that the posteriors for

The top four plots are the posterior distributions for
Posterior Means of Q-Matrix Elements and Probabilities for 20 Replications (Condition 1)
Note: The true Q-values are 0, 0, 1, and 1. Posteriors that give incorrect qs are shown in bold.
Table 3 shows, for the first condition, the posterior means for the uncertain Q-matrix elements for all 20 data sets. The true q
jk
elements are 0, 0, 1, and 1. The third and fourth columns of Table 3 show that the posterior means of
As noted above, a simple procedure is to round the Q-matrix element posterior mean (of
Percent-Correct Recovery of Q-Matrix Elements for Conditions 1, 2, and 3
Note: RDINA = reparameterized deterministic input noisy and. Except for the true Q-matrix, entries indicate the percent-correct recovery of the element for 20 data sets. The misspecified Q-matrix had 12 uncertain elements and 6 misspecified elements.
Condition 2: Twelve uncertain Q-matrix elements
In the second condition, there was complete uncertainty for all four skills for Items 1, 8, and 13, giving a total of 12 uncertain elements. Using the approach noted above (i.e., rounding the posterior mean of q jk ), the matrix at the bottom left of Table 4 shows that the four skills for Item 8 are correctly detected as being zero or one in 100% of the cases, whereas for Item 13, detection is correct for 90% to 100% of the cases. For Item 1, the need to include the first skill is detected correctly in 75% to 100% of the cases. Thus, although it was expected that the inclusion of a greater number of uncertain skills would lead to less accurate detection, the results in Table 4 suggest generally good detection for a situation with 12 uncertain skills.
Condition 3: Twelve uncertain Q-matrix elements and six misspecifications
In the above conditions, the specification of the Q-matrix was uncertain for some of the elements, but the other Q-elements were correctly specified. In Condition 3, there was again uncertainty about 12 Q-matrix elements, as in Condition 2; however, six other elements of the Q-matrix were incorrectly specified. As shown in Table 2, three elements were incorrectly excluded and three elements were incorrectly included. It is of interest to see how this affects recovery of the uncertain Q-matrix elements in the Bayesian approach.
The posterior means were again used to determine the Q-matrix elements. The lower right of Table 4 shows recovery of the uncertain elements. Recovery is somewhat better than expected, still being around 80% or higher for 10 of the 12 elements (and 100% for 4 elements); however, recovery is poor for two elements. Notably, it is 60% and 65% for elements q14 and q81 (both with true values of zero). This shows that misspecification of other elements of the Q-matrix can affect recovery rates for some of the uncertain elements.
Conditions 4, 5, and 6: Complete uncertainty about a skill
Table 5 shows recovery of the Q-elements for the fourth, fifth, and sixth conditions. The third column shows the percent-correct recovery for the fourth condition, where the fourth skill was necessary but had all 15 elements specified as being uncertain. The table shows that recovery is generally excellent, with 90% to 100% correct recovery for 14 of the 15 elements, whereas one element had only 75% recovery. The second column of Table 5 shows results for the fifth condition, which again had complete uncertainty about the fourth skill, but five other elements of the Q-matrix were also misspecified. In this case, recovery is quite poor for many elements. This shows that, when there is complete uncertainty about a skill, misspecification of other elements of the Q-matrix can have a profound effect on the recovery rates. In the sixth condition, a fifth skill with complete uncertainty was included. Given that the skill is not necessary, one might expect to find posteriors for q jk that are all close to zero, suggesting that the skill should not be included; however, the posteriors instead tended to be around 0.5, and the resulting recovery rates were uniformly poor (i.e., around 50%), as shown in Table 5. An interesting result is that the posterior means (not shown) of the class size for the nonnecessary Skill 5 tended to be large (i.e., >.90), which is consistent with an earlier conjecture that including a nonnecessary skill can lead to a class size close to one (DeCarlo, 2011).
Recovery of Q-Matrix Elements for Conditions 4, 5, and 6
Note: Entries indicate the proportion of times the element was correctly recovered for 20 data sets. For Condition 4, the fourth skill was specified as uncertain for all 15 items. Condition 5 was the same as Condition 4, except that five other elements were also misspecified. For Condition 6, the first four skills were correctly specified; however, a fifth skill was (incorrectly) included, with each element specified as uncertain.
Conditions 7 and 8: HO-RDINA
Data in these conditions (20 data sets) were generated according to the HO-RDINA model as described above. In Condition 7, the Bayesian HO-RDINA model (with a common value of a k in Equation 3) was fit with 12 uncertain elements, as shown by the Bayesian Q-matrix in Table 1. The left side of Table 6 shows that recovery of many elements is quite good, with the same pattern as shown in the lower left part of Table 4 (Condition 2 for the RDINA model). A difference, however, is that some elements are poorly recovered, most notably elements q13 and q14 for Item 1, which are correctly detected as being zero only 55% of the time (and so the false alarm rate is quite high at 45%). The recovery rates for elements q133 and q134 are also lower compared with the results shown in the lower left part of Table 4 (Condition 2). In Condition 8, there were again 12 uncertain elements for the higher order model, but six other elements were misspecified (as in Condition 3). The right side of Table 6 shows that recovery was excellent for many elements, but was again poor for Skills 3 and 4 for Item 1 (where the skills are not necessary), and was also poor for Skills 3 and 4 for Item 13 (where the skills are necessary). Thus, as for the model without a higher order structure (Condition 3), the results for Condition 8 suggest that misspecification of other parts of the Q-matrix affects recovery rates for some of the uncertain elements. In addition, recovery rates for data with a higher order structure appear to be lower for some elements than those for data with an independence structure.
Recovery of Q-Matrix Elements for a HO-RDINA Model, Conditions 7 and 8
Note: HO-RDINA = higher order reparameterized deterministic input noisy and.
Discussion
The simulations show that the posterior distributions for the random Q-matrix elements provide useful information about which elements should or should not be included (i.e., whether
With respect to situations that involve uncertainty about the number of skills, the Bayesian approach appears to be of some, but limited, utility. The results for Condition 4, for example, showed that when a skill was necessary, the Bayesian approach worked very well with respect to indicating which items the skill should load on. However, when (five) other elements of the Q-matrix were misspecified (Condition 5), the approach started to break down and recovery rates were poor for many elements. Furthermore, when a nonnecessary skill was introduced (Condition 6), the posteriors for the Q-elements were not zero, but tended to vary around 0.5, and so the recovery rates were poor. The results suggest that the Bayesian approach might be of limited utility with respect to determining when to stop adding skills: It correctly reveals the skill loadings when the rest of the Q-matrix is correctly specified, but less so when other elements are misspecified; it also does not clearly indicate when a skill is not needed.
Overall, the results are encouraging and suggest that it is worthwhile to examine the Bayesian approach in more extensive simulations, and to apply it to real-world data.
An Application: The Fraction-Subtraction Data
The Bayesian approach to Q-matrix determination is applied to the widely analyzed fraction-subtraction data of K. K. Tatsuoka (1990), which consists of 536 examinees. The Q-matrix, which in this case is for 15 of the items, is the same as that used by de la Torre (2008) in a prior study that was concerned with validity of the Q-matrix; the Q-matrix was adapted from Mislevy (1996) and is shown in Table 7. The labels given to the five skills are (a) performing basic fraction-subtraction operation, (b) simplifying/reducing, (c) separating whole numbers from fractions, (d) borrowing one from whole number to fraction, and (e) converting whole numbers to fractions.
Fraction-Subtraction Data, 15 Items, Five Hypothesized Skills
Note: Skill labels are (a) performing basic fraction-subtraction operation, (b) simplifying/reducing, (c) separating whole numbers from fractions, (d) borrowing one from whole number to fraction, and (e) converting whole numbers to fractions.
Prior analyses of these data (e.g., DeCarlo, 2011; de la Torre, 2008) suggested that there is some uncertainty associated with items that involve whole numbers (without fractions), which consists of 4 items in Table 7 (Items 4, 5, 10, and 14) and so these 4 items are examined more closely here. All 4 items include Skills 1 and 3; however, they differ with respect to which of the other three skills (Skills 2, 4, and 5) are included, if any. Thus, the Q-matrix considered here has uncertainty with respect to whether Skills 2, 4, and 5 should be included in Items 4, 5, 10, and 14, which gives 12 uncertain elements. The Bayesian Q-matrix is shown on the right side of Table 7.
Note that one can make substantive arguments for the possible inclusion or exclusion of different skills in these (and other) items because of ambiguities with respect to solving items that involve whole numbers. For example, the first item with a whole number, Item 4, is 3 − 2 1/5, and the original Q-matrix considers all five skills as being involved in this item. There are, however, several different ways to solve this item. For example, one could simply convert both terms to fractions with common denominators and then subtract: 15/5 − 11/5 = 4/5. Or one could borrow from the first whole number, convert to a fraction, and then subtract parts: 3 − 2 1/5 = 2 5/5 − 2 1/5, and so 2 − 2 = 0 and 5/5 − 1/5 = 4/5. Or one could first subtract the whole numbers, 3 − 2 = 1, and then subtract the remaining fraction, 1/5, from the converted remainder, 1− 1/5 = 5/5 − 1/5 = 4/5, and so on. The point is that the set of skills required for this simple item are questionable, and in general there are several approaches to solving the items that involve different combinations of skills. Of course, this is undesirable—ideally, a test for cognitive diagnosis should use items that clearly require certain skills and not others, without ambiguities, but that is easier said than done. In any case, the Bayesian approach recognizes that there is uncertainty about some elements of the Q-matrix and allows one in essence to simultaneously consider a large number of possible respecifications.
Results
OpenBUGS was again used with 5,000 burn-ins and 20,000 iterations. Table 8 shows the posterior means of q jk for fits of the model with 12 uncertain elements. In this case, all the posterior means are close to zero or one. Note, for example, that the third and fourth rows of Table 8 suggest patterns of 0, 1, 1 and 1, 1, 1, respectively, which matches the original Q-matrix, and so the results suggest that Items 10 and 14 are correctly specified. However, the first and second rows suggest three changes to the Q-matrix. In particular, the table suggests that Skills α2 and α5 should be included in Item 5, and Skill α2 should be dropped from Item 4. 1 Note that results of this sort could possibly be useful to substantive experts. For example, of the (at least) three strategies for solving Item 4 noted earlier, the Bayesian results (Skill 2 is not necessary) suggest that the third strategy might be the one used (3 − 2 = 1, 1 = 5/5, 5/5 − 1/5 = 4/5) because it involves the other skills but not Skill 2 (simplify/reduce).
Posterior Means of Q-Matrix Elements for the Fraction-Subtraction Data
To explore modifications of the Q-matrix suggested by the Bayesian analysis, Table 9 shows results for a fit of the original model and the model with a modified Q-matrix; in both cases, Latent Gold was used (using posterior mode estimation with Bayes’s constants of 1; see DeCarlo, 2011). The table shows that BIC and AIC are smaller for the modified model (by more than 40), which indicates better relative fit compared with the original model. Table 10 shows parameter estimates for the original and modified model. For the modified Items 4 and 5, the discrimination parameters are slightly larger in magnitude than for the original model (whereas the false alarm rates are slightly lower in one case and higher in the other). Table 10 also shows that estimates of the skill probabilities (the latent class sizes) for the respecified Q-matrix are similar to those obtained for the original Q-matrix. The table also shows that the standard errors are large for the first item.
Information Criteria for Various Models: Fraction-Subtraction Data, N = 536
Note: BIC = Bayesian information criterion; AIC = Akaike information criterion; RDINA = reparameterized deterministic input noisy and.
Parameter Estimates (SEs in Parenthesis) for the RDINA Model, Fraction-Subtraction Data
Note: RDINA = reparameterized deterministic input noisy and. p1 to p5 are the skill probabilities (latent class sizes) for the five skills.
Another study of the validity of the Q-matrix that used the fraction-subtraction data and the same Q-matrix as used here was presented by de la Torre (2008). A sequential search algorithm was used along with a fit statistic, δ, that is basically the sum of the average slip and guess rates (the approach was referred to as the “δ-method”). Based on this analysis, de la Torre concluded that the original Q-matrix was supported. In contrast, the Bayesian approach used here suggested a modified Q-matrix that led to better relative fit, which suggests that the sequential search approach missed an improvement in the Q-matrix that was detected by the Bayesian approach.
A limitation of the δ-method was illustrated by de la Torre (2008), who included an irrelevant skill for one item (Skill 5 was included in Item 1). It was shown that this led to a smaller value of the proposed fit statistic (indicating better fit), although the inclusion was not theoretically plausible. However, using the Bayesian approach leads to the correct decision—the Bayesian RDINA model was fitted with uncertainty about Skill 5 in Item 1 and the posterior mean for q15 was close to zero, which (correctly) indicates that the attribute should not be included. The lower part of Table 9 shows, for the model with an irrelevant skill, the δ fit statistics reported by de la Torre as well as information criteria. The table shows that BIC and AIC are larger for the incorrect modification, and so they correctly detect that the modification leads to relatively worse fit. Thus, the example illustrates a limitation of the δ-method, but supports the Bayesian approach, as well as the use of conventional information criteria, both of which pick up the error.
Summary and Conclusion
An approach to recognizing uncertainty in the Q-matrix is presented. Instead of specifying all the elements of the Q-matrix as being known (i.e., as zero or one), some elements are specified as being random (Bernoulli) variables. The probability parameters for the random elements are given a prior, such as the Beta distribution. The posterior of the element q, or its probability p, can then be used to help determine whether a Q-matrix element should be set to zero or one.
The utility of the approach was examined in several simulations. In the first two conditions, there was uncertainty about some elements of the Q-matrix (either 4 or 12 elements out of 60), with the rest of the elements correctly specified. In the third condition, a novel aspect was introduced in that there was again uncertainty about 12 elements of the Q-matrix, but 6 other elements were incorrectly specified. Conditions with complete uncertainty about a skill (i.e., for all items) were also examined, as well as data with a higher order structure. The results show that the Bayesian approach generally appears to be useful for helping to determine which skills should be included or excluded for each item. The best situation is when the skills are correctly specified for most or all of the Q-matrix, and there are questions about only some skills for some items. This situation appears to be somewhat robust to misspecifications of other parts of the Q-matrix, although misspecification leads to lower recovery rates for some elements. Similar results were found for data with a higher order structure, except that recovery rates for some elements were lower, particularly when other elements of the Q-matrix were misspecified. The approach appears to break down to some degree, however, when there is complete uncertainty about a skill, with other elements of the Q-matrix misspecified (Condition 5), or when an unnecessary skill is included (Condition 6). In short, the Bayesian approach appears to be useful for detecting which skills the items should load on when the Q-matrix and number of skills are at least generally correctly specified. Of course, the importance of theory in the development and respecification of the Q-matrix cannot be overemphasized.
There are also some limitations as well as directions for future research. For example, the present studies provide some basic groundwork, but the results are, of course, limited to the particular Q-matrices and parameter values examined. Simulations with other types of Q-matrices, a greater number of attributes, and other higher order structures need to be conducted. The use of more informative priors and a greater number of iterations should also be examined, particularly in the problematical situations found above. It might also be possible to use the approach in a more exploratory manner, for example, by considering most or all of the Q-matrix elements as being uncertain. However, the results found for Conditions 5 and 6, where there was complete uncertainty about a skill, raise some doubts about using the approach in this manner, although this remains to be explored.
Effects of using cut points in different ways for the random elements can also be examined. For example, a reviewer suggested a sequential version of the Bayesian approach, where after a first run, only elements with posterior means above or below certain cut points (e.g., below 0.2 or above 0.8) are set to zero or one. A second run can then be used with these elements fixed to see if this helps to set the remaining uncertain elements.
The results also have important practical implications. Given that CDMs are being more widely used in practice, the Bayesian approach offers a way for researchers to consider theory-guided modifications of their particular Q-matrix, without having to consider the dozens or hundreds of alternatives that arise when more than four elements are in question. The approach is simple to implement, and computational time is fairly short, which should encourage researchers to apply the approach in practical applications of CDMs.
Finally, it should be noted that although sophisticated models are helpful, there is also a need to collect data to experimentally test the Q-matrix under consideration. For example, for the fraction-subtraction task, one could instruct examinees as to which strategy to use (e.g., “solve all problems by converting to a common denominator”). Convincing evidence of the validity of a Q-matrix would be obtained if it could be shown that variations in the instructions (within or across examinees) lead to detectable differences in the Q-matrix. Although the fraction-subtraction data have been analyzed for decades, there has yet to be a basic demonstration of this sort, at least to the author’s knowledge.
Footnotes
Appendix
Acknowledgements
The author thanks Matthew S. Johnson for some helpful discussions.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
