DINA-BAG: A Bagging Algorithm for DINA Model Parameter Estimation in Small Samples

Abstract

Cognitive diagnosis models (CDMs) are the assessment tools that provide valuable formative feedback about skill mastery at both the individual and population level. Recent work has explored the performance of CDMs with small sample sizes but has focused solely on the estimates of individual profiles. The current research focuses on obtaining accurate estimates of skill mastery at the population level. We introduce a novel algorithm (bagging algorithm for deterministic inputs noisy “and” gate) that is inspired by ensemble learning methods in the machine learning literature and produces more stable and accurate estimates of the population skill mastery profile distribution for small sample sizes. Using both simulated data and real data from the Examination for the Certificate of Proficiency in English, we demonstrate that the proposed method outperforms other methods on several metrics in a wide variety of scenarios.

Keywords

cognitive diagnosis small sample ensemble learning DINA

1. Introduction

Formative assessment is any form of assessment that seeks to obtain evidence about student learning and then use this evidence to provide both learners and educators with feedback that can guide future learning and instruction (Black & Wiliam, 2009). This type of assessment, when applied correctly, can greatly enhance student learning in a classroom setting. For example, a review by Black and Wiliam (1998) highlighted multiple studies that showed that formative assessment and feedback improved student learning and outcomes in classroom settings. One type of formative assessment tool that has proven to be useful is the cognitive diagnosis model (CDM; Leighton & Gierl, 2007; Rupp et al., 2010), which can be used to provide teachers and students with specific feedback regarding skill mastery. This is done by estimating a “skill mastery profile,” which is a binary vector where each element corresponds to a specific skill with 1 indicating mastery of that skill and 0 indicating lack of mastery.

Despite their utility, CDMs are often only used to analyze the results of large-scale assessments. In their review of applications of CDMs, Sessoms and Henson (2018) found that most studies used sample sizes exceeding 1,000. The smallest sample sizes used in these studies exceeded 50, which is still larger than the average classroom size in the United States (National Center for Education Statistics, 2018). In addition, some of the studies with small sample sizes used pooling to combine information from several small samples, which might not always be possible. To be useful to more educators, these models need to perform well in situations with less than 50 students in the classroom and should not rely on pooling of samples. Thus, a new challenge in educational statistics is developing methods that can be used to accurately describe skill mastery in everyday classroom settings using test responses from only a handful of students.

Some solutions to the problem of small sample sizes when performing cognitive diagnosis have been explored. One common CDM, the deterministic inputs noisy “and” gate (DINA) model (Junker & Sijtsma, 2001; Macready & Dayton, 1977), has been tested in simulation studies with smaller sample sizes. Shu et al. (2013) found that the DINA model performed well in terms of classifying students’ skill mastery profiles with samples as small as N = 20. The nonparametric diagnostic classification (NPCD) method, introduced by Chiu and Douglas (2013), also showed potential in performing well with small sample sizes. Finally, the possibility of using neural networks for cognitive diagnosis with small sample sizes has also been explored with some success (Cui et al., 2016; Shu et al., 2013). In a simulation study performed to compare all three methods, researchers found that each approach performed favorably with small sample sizes, with the DINA model and NPCD method outperforming the neural network model (Paulsen & Valdivia, 2022).

Each of these methods has been shown to perform reasonably well with small samples when it comes to estimating individual profiles. However, the accurate estimation of individual profiles is not the only important task that can be performed with CDMs. In some situations, it can be of equal if not greater importance for educators to obtain accurate estimates of skill mastery at the population level. This is often done with large sample assessments to study groups of students to better understand broader trends in learning such as with the Trends in International Mathematics and Science Study assessments (Choi et al., 2015; Lee et al., 2011). For many teachers, it would be equally valuable to have an accurate picture of skill mastery for the entire classroom. De La Torre and Minchen (2014) suggested that teachers can use group-level information provided by CDMs to “tailor subsequent instruction, providing the double benefit of serving students and maximizing the efficiency of classroom instructional time.” While individual profile estimates are useful for personalized learning, group-level estimates can help teachers plan learning activities and develop curriculum that will be most useful for the entire class.

While the benefits of accurate estimates of skill mastery for an entire classroom population are clear, obtaining such estimates can be difficult with small sample sizes. Current methods can guarantee accurate estimations of these quantities for large samples (Gu & Xu, 2019; Wang & Douglas, 2015), but their ability to do so might be hindered in every day classroom situations where the number of students is small. Furthermore, our research shows that other methods that have been used for obtaining individual profile estimates might not be the most accurate when it comes to estimating population-level parameters related to skill mastery. The goal of the current research is to explore the performance of previously proposed methods when it comes to estimating skill mastery for a small group of students as well as propose an alternative method that overcomes the weaknesses these methods exhibit in such situations.

In this article, we propose a method that can be used to estimate the true underlying population distribution of skill mastery profiles with higher accuracy than current methods when sample sizes are small. This method is based on the popular machine learning concept of bagging and random forests introduced by Breiman (1996, 2001). The idea behind these methods is to fit the data to a variety of different models, obtain estimates from each model, and then aggregate these estimates into a single estimate. This approach is especially useful to avoid overfitting, which can often occur when working with small samples to estimate a model with many parameters. We show how this concept can be used with the DINA model to produce more accurate and stable estimates of the skill mastery profile distribution in small samples.

The remainder of this article is structured as followed. In Section 2, we discuss common cognitive diagnosis methods used with small sample sizes and formally introduce the DINA model. Next, in Section 3, we describe the proposed algorithm and provide key details for its implementation as well as a discussion on how our methodology relates to and differs from the ideas behind bagging and random forests. In Section 4, we perform a simulation study that compares the proposed algorithm to the DINA model as well as the NPCD method. In Section 5, the proposed method is than applied to the ECPE data set to illustrate its performance of our model to other methods when working with real-life data. Finally, in Section 6, we provide a discussion on the advantages of the proposed method and possible direction for future research.

2. Cognitive Diagnosis With Small Sample Sizes

There are several common approaches for performing cognitive diagnosis with small sample sizes. The DINA model is one of the simplest and commonly used CDMs and has been shown to perform well with small sample sizes in certain simulations (Shu et al., 2013). Nonparametric alternatives have also been proposed. The most popular of these include the NPCD and general NPCD methods (G-NPCD; Chiu & Douglas, 2013; Chiu et al., 2018). Finally, neural networks have also been used (Cui et al., 2016; Shu et al., 2013). Since the G-NPCD and NPCD methods are similar and because recent simulation studies have shown that the G-NPCD method does not outperform the NPCD method in many cases (Chiu et al., 2018), for the current research, we choose to only include the NPCD method when making comparisons to our proposed method. For the sake of comparison and replication, we also choose to implement a simple artificial neural network (SANN) approach, despite some studies showing that it is often inferior to the DINA and NPCD methods (Paulsen & Valdivia, 2022). In what follows, we briefly describe each of these methods in turn.

2.1. DINA Model

Let $Y_{i j}$ represent a binary response from student $i \in {1, 2, \dots, N}$ to item/question $j \in {1, 2, \dots, J}$ on an exam, where $Y_{i j} = 1$ indicates the student gave a correct response to the item. Let $α_{i}$ be a binary vector of length K that represents the skill mastery profile for student i, where $α_{i k} = 1$ if said student has mastered skill k and is zero otherwise. A crucial component of the DINA model is the Q-matrix, which is a $J \times K$ binary matrix with rows q _j where $q_{j k} = 1$ indicates that skill k is required to respond correctly to item j and $q_{j k} = 0$ indicates that skill is not required. Let $η_{i j} = 1 [α_{i} ≽ q_{j}]$ represents whether student i has all the requisite skills for responding correctly to question j. Finally, letting $s_{j} = P (Y_{i j} = 0 | η_{i j} = 1)$ and $g_{j} = P (Y_{i j} = 1 | η_{i j} = 0)$ represent the slipping and guessing probabilities for item j directly, the item response function for the DINA model is

P (Y_{i j} = 1 | α_{i}, s_{j}, g_{j}, Q) = (1 - s_{j})^{η_{i j}} g_{j}^{1 - η_{i j}} .

This represents the probability of a student responding correctly to an item given a certain skill mastery profile. Instead of treating the person-level skill mastery profiles as parameters to be estimated, we can introduce a population-level skill mastery probability vector p, where p_c for $c \in \prod_{k = 1}^{K} {0, 1}$ is the overall probability of an individual possessing skill mastery profile c. Then, the parameter of interest becomes p, and the item response function in Equation 1 can be rewritten as

P (Y_{i j} = 1 | s_{j}, g_{j}, p, Q) = \sum_{c} P (Y_{i j} = 1 | α_{c}, s_{j}, g_{j}, Q) p_{c} .

Maximum likelihood estimates (MLEs) of model parameters in Equation 2 can be obtained using the expectation–maximization algorithm (De La Torre, 2009). For model parameters to be identifiable, certain conditions on the Q-matrix must be met. These conditions have been outlined in detail by Gu and Xu (2019). Assuming that these conditions have been met, it can be shown that the MLEs will converge in probability to the true values of the model parameters. However, these results assume that the number of students, N, is sufficiently large. When the sample size is small, the estimates of p might be inaccurate and can be heavily influenced by noise in the observed data rather than effectively capturing the true underlying distribution of the population skill mastery profiles.

2.2. NPCD Method

The NPCD method is a nonparametric classification method that obtains an estimate of $α_{i}$ by comparing the item response vector for student i, $y_{i} = (y_{i 1}, y_{i 2}, \dots, y_{i J})$ , and comparing it to each of the $2^{K}$ ideal response patterns $η_{c} = (η_{c 1}, η_{c 2}, \dots, η_{c J})$ using some distance metric and finds the profile $α_{c}$ for which this distance metric is minimized. A common distance metric used is the Hamming distance, in which case the estimate is defined as

{\hat{α}}_{i} = \underset{α_{c}}{arg min} \sum_{j = 1}^{J} | y_{i j} - η_{c j} | .

Chiu and Douglas (2013) found that they could achieve slightly better performance by weighting each distance according to the inverse sample variance. Thus, the estimate of the skill mastery profile for student i would then be found by replacing Equation 3 with

{\hat{α}}_{i} = \underset{α_{c}}{arg min} \sum_{j = 1}^{J} \frac{1}{{\bar{p}}_{j} (1 - {\bar{p}}_{j})} | y_{i j} - η_{c j} |,

where ${\bar{p}}_{j} = \frac{1}{N} \sum_{i = 1}^{N} y_{i j}$ is the proportion of students that answered item j correctly. After obtaining the estimates of the individual skill mastery profiles, an estimate of p_c can then be obtained as

{\hat{p}}_{c} = \frac{1}{N} \sum_{i = 1}^{N} 1 [{\hat{α}}_{i} = α_{c}] .

Similarly, since estimates of s and g are not produced directly, these quantities are estimated by first using ${\hat{α}}_{i}$ to compute ${\hat{η}}_{i j}$ for $i = 1, 2, \dots, N$ and $j = 1, 2, \dots, J$ and then estimating these parameters using the following equations:

{\hat{s}}_{j} = \frac{\sum_{i = 1}^{N} (1 - y_{i j}) {\hat{η}}_{i j}}{\sum_{i = 1}^{N} {\hat{η}}_{i j}},

{\hat{g}}_{j} = \frac{\sum_{i = 1}^{N} y_{i j} (1 - {\hat{η}}_{i j})}{\sum_{i = 1}^{N} (1 - {\hat{η}}_{i j})} .

One drawback of this method is that it does not directly estimate the levels of skill mastery at the population level. While an estimate of skill mastery for the entire group can be obtained using individual estimates of skill mastery, such an estimate might not necessarily be as accurate as an estimator designed specifically for that purpose, especially when the sample size is small. Similar to the DINA model, the estimator used here for p is unbiased and will be very accurate for large sample sizes but will be highly variable for small sample sizes.

2.3. SANN Method

The SANN method uses a simple feed-forward neural network with one input layer, one hidden layer, and an output layer. As with Cui et al. (2016) and Shu et al. (2013), the input layer takes an ideal response pattern under the DINA model and connects it to the corresponding attribute profile. Let A represents an $M \times K$ matrix of M randomly sampled attribute profiles, H represents an $M \times J$ matrix of ideal response patterns computed for a given Q-matrix, such that $H_{m j} = 1 [A_{m *} ≽ q_{j}]$ , and let $X = [1 H]$ , where $1$ is a column vector of ones of length M. Furthermore, let W ₁ and W ₂ be the weight matrices of dimensions $(J + 1) \times (n_{1} + 1)$ and $(n_{1} + 1) \times K$ , respectively, where n ₁ represents the number of nodes in the hidden layer of the network. Finally, let $σ (x) = exp (x) / (1 + exp (x))$ represents the sigmoid function. The entire SANN can then be written as

\hat{A} = σ (σ (X W_{1}) W_{2}) .

The goal of this approach is to find the values of W ₁ and W ₂ that minimize some error function for $\hat{A}$ and A. This is usually done via some type of backpropagation algorithm (Rumelhart et al., 1986). The error function chosen by both Cui et al. (2016) and Shu et al. (2013) is the sum of squared error. More specifically, the goal is to minimize the following:

\sum_{m = 1}^{M} \sum_{k = 1}^{K} {(α_{m k} - {\hat{α}}_{m k})}^{2} .

Once this error function is minimized, the SANN is trained and is ready to be used to classify real response sequences into one of the attribute profiles. This would be done by providing the trained network, with estimated weight matrices ${\hat{W}}_{1}$ and ${\hat{W}}_{2}$ , with a real response sequence y _i to obtain the estimate

{\hat{α}}_{i}^{*} = σ (σ (y_{i}^{T} {\hat{W}}_{1}) {\hat{W}}_{2}) .

The final estimate ${\hat{α}}_{i}$ is then computed as

{\hat{α}}_{i} = \underset{α_{c}}{arg min} \sum_{k = 1}^{K} {(α_{c k} - {\hat{α}}_{i k}^{*})}^{2},

and the estimates for model parameters can again be produced using the same approach given for the NPCD method with Equations 5 through 7.

An advantage of this method is that it can be trained using simulated data, and thus, its accuracy is not as affected by the sample size N as other methods might be. As long as the number of training examples, M, is sufficiently large and the network has enough learning capacity, it will be able to find an accurate mapping from ideal response patterns to the true corresponding attribute profiles. A major drawback of this method is that it is trained using ideal response patterns instead of real response patterns. As such, it has difficulty accounting for deviations from the ideal patterns that can occur due to slipping or guessing. Thus, while it has the potential to work well with any sample size, its performance depends to a large extent on how much the observed data deviate from the patterns used to train the network.

3. Bagging Algorithm for DINA Model (DINA-BAG)

The motivation behind bagging is to use multiple estimates from a variety of models to improve upon estimates from any individual model. By aggregating results from different models, parameter estimates become more stable and are less influenced by noise in the dataset, which can be especially useful when the number of observations is small. We begin by introducing the DINA-BAG algorithm and then proceed to describe key details. After demonstrating how the algorithm works, we will provide some thoughts on how the proposed methodology relates to and differs from traditional bagging algorithms as they are used in the machine learning literature.

First, assume the Q-matrix for the assessment meets the requirements outlined by Gu and Xu (2019) for identifiability of model parameters. More specifically, we assume that the Q-matrix, Q, contains an identity submatrix, that each skill is measured by at least three items, and that no two columns in the Q-matrix are identical. This means that any subset of two or more columns of Q will also result in an identifiable Q-matrix. Now, we define four important sets required for the bagging algorithm:

A_{K} = {1, 2, \dots, K} (set of skill indices for K skills),

B_{K} = {{k} : k \in {1, 2, \dots, K}},

S_{K} = P (A_{K}) (power set of A_{K}),

M_{K} = S_{K} - (B_{K} \cup \emptyset) (set of sets with 2 or more skill indices) .

Now, let p, s, and g represent the parameters for the DINA model using original item response matrix Y and the original Q-matrix, Q. Let $M \in M_{K}$ , and let $p^{(M)}$ , $s^{(M)}$ , and $g^{(M)}$ denote the parameters for the DINA model using $Q^{(M)}$ and the corresponding $Y^{(M)}$ . Let ${\hat{p}}^{(M)}$ , ${\hat{s}}^{(M)}$ , and ${\hat{g}}^{(M)}$ represent the estimates of $p^{(M)}$ , $s^{(M)}$ , and $g^{(M)}$ obtained by fitting the DINA model using $Y^{(M)}$ and $Q^{(M)}$ . Finally, let ${\hat{p}}^{{(M)}^{*}}$ , ${\hat{s}}^{{(M)}^{*}}$ , and ${\hat{g}}^{{(M)}^{*}}$ be the estimates of p, s, and g that are obtained using ${\hat{p}}^{{(M)}^{*}}$ , ${\hat{s}}^{{(M)}^{*}}$ , and ${\hat{g}}^{{(M)}^{*}}$ . Then, the DINA-BAG algorithm is as ∈follows:

1. For each $M \in M_{K}$ .

Compute $Y^{(M)}$ and $Q^{(M)}$ .

Obtain ${\hat{p}}^{(M)}$ , ${\hat{s}}^{(M)}$ , and ${\hat{g}}^{(M)}$ .

Use ${\hat{p}}^{(M)}$ , ${\hat{s}}^{(M)}$ , and ${\hat{g}}^{(M)}$ to obtain ${\hat{p}}^{{(M)}^{*}}$ , ${\hat{s}}^{{(M)}^{*}}$ , and ${\hat{g}}^{{(M)}^{*}}$ .

2. Compute the estimates of p, s, and g using the following equations:

\hat{p} = \frac{1}{| M_{K} |} \sum_{M \in M_{K}} {\hat{p}}^{{(M)}^{*}},

\hat{s} = \frac{1}{| M_{K} |} \sum_{M \in M_{K}} {\hat{s}}^{{(M)}^{*}},

\hat{g} = \frac{1}{| M_{K} |} \sum_{M \in M_{K}} {\hat{g}}^{{(M)}^{*}} .

With the general algorithm described, we now proceed to give key details required for performing each step in the algorithm, specifically steps (a) and (c).

3.1. Computing $Y^{(M)}$ and $Q^{(M)}$

For any $M \in M_{K}$ , we can define a matrix $R^{(M)}$ , which can be used to select only those columns from Q, which correspond to the elements found in M. More specifically, let $R^{(M)}$ be a $K \times | M |$ matrix, where row k equals the unit vector e _k if $k \in M$ and all other rows are the zero vectors. Finally, denote the set $H^{(M)} = {j \in {1, 2, \dots, J} : \exists k \in M ∍ q_{j k} = 1 \land q_{j k} = 0 \forall k \in M^{c}}$ and define the matrix $P^{(M)}$ as an $| H^{(M)} | \times J$ , where columns $j \in H^{(M)}$ are the first $| H^{(M)} |$ unit vectors e ₁, e ₂, $\dots$ , $e_{| H^{(M)} |}$ and all other vectors are zero vectors. Then, the matrix $Q^{(M)}$ can be defined as

Q^{(M)} = P^{(M)} Q R^{(M)} .

For each Q-matrix $Q^{(M)}$ , we need to define the corresponding dataset $Y^{(M)}$ . These datasets are obtained by starting with the original dataset Y and removing columns that correspond to those rows that were removed from Q to create $Q^{(M)}$ . More specifically, $Y^{(M)}$ is defined as

Y^{(M)} = Y P^{{(M)}^{T}} .

Example 1: For an example, let Y and Q be as follows:

Y = [\begin{matrix} 1 & 0 & 1 & 1 & 1 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 1 & 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 0 & 0 & 1 \end{matrix}], Q = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \end{matrix}] .

This Q-matrix meets the conditions for identifiability described by Gu and Xu (2019). The number of skills is $K = 3$ and therefore

A_{K} = {1, 2, 3},

B_{K} = {{1}, {2}, {3}},

S_{K} = {\emptyset, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}},

M_{K} = {{1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} .

Now, let $M = {1, 2}$ (the first element of M_K ), which means that $R^{({1, 2})}$ is a $3 \times 2$ matrix, where rows 1 and 2 are the unit vectors e ₁ and e ₂ and row three is a vector of zeroes. The set $H^{({1, 2})}$ is then $H^{({1, 2})} = {1, 2, 4}$ and corresponding matrix $P^{({1, 2})}$ is a $3 \times 6$ matrix, where columns 1, 2, and 4 are the unit vectors e ₁, e ₂, and e ₃ and the remaining columns are zero vectors. Using Equation 15, we have that the matrix $Q^{({1, 2})}$ is defined as

Q^{({1, 2})} = P^{({1, 2})} Q R^{({1, 2})} = [\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \end{matrix}] [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \end{matrix}] [\begin{matrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{matrix}] = [\begin{matrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{matrix}] .

The corresponding item response matrix, $Y^{({1, 2})}$ , as obtained from Equation 16 would be

Y^{({1, 2})} P^{{({1, 2})}^{T}} = [\begin{matrix} 1 & 0 & 1 & 1 & 1 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 1 & 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 0 & 0 & 1 \end{matrix}] [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}] = [\begin{matrix} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 1 & 1 \\ 1 & 0 & 0 \end{matrix}] .

Similarly, we have that matrices $Q^{({1, 3})}$ , $Q^{({2, 3})}$ , and $Q^{({1, 2, 3})}$ are

Q^{({1, 3})} = [\begin{matrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{matrix}], Q^{({2, 3})} = [\begin{matrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{matrix}], Q^{({1, 2, 3})} = Q = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \end{matrix}],

and the corresponding item response matrices $Y^{({1, 3})}$ , $Y^{({2, 3})}$ , and $Y^{({1, 2, 3})}$ are

Y^{({1, 3})} = [\begin{matrix} 1 & 1 & 1 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ 1 & 0 & 1 \end{matrix}], Y^{({2, 3})} = [\begin{matrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 1 \\ 0 & 0 & 1 \end{matrix}], Y^{({1, 2, 3})} = Y = [\begin{matrix} 1 & 0 & 1 & 1 & 1 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ 1 & 1 & 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 0 & 0 & 1 \end{matrix}] .

3.2. Estimating p, s, and g Using $p^{(M)}$ , $s^{(M)}$ , and $g^{(M)}$

A key part of the bagging algorithm will involve fitting the DINA model using the newly created submatrices $Y^{(M)}$ and $Q^{(M)}$ , which will produce MLEs of the corresponding parameters $p^{(M)}$ , $s^{(M)}$ , and $g^{(M)}$ . However, the dimensionality of these parameters will not match the dimensionality of the original model parameters, and therefore, an approach for how these estimates will be used to obtain estimates of p, s, and g must be outlined.

First, we describe a simple approach that can be used to obtain an estimate of p from $p^{(M)}$ . The motivation behind this approach is to treat $p^{(M)}$ as an estimate of p after marginalizing over those skills that were not included in $Q^{(M)}$ . Let $F_{K} = \prod_{k = 1}^{K} {0, 1}$ represents the set of possible skill profiles for all K skills and for any $p_{c}^{(M)}$ let $G_{c}^{(M)} = {a \in F_{K} : a_{M_{1}} = c_{1}, a_{M_{2}} = c_{2}, \dots, a_{M_{| M |}} = c_{| M |}}$ . Then, an estimate for $p_{c}^{(M)}$ from p can be obtained as $p_{c}^{(M)} = \sum_{G_{c}^{(M)}} p_{a}$ .

To obtain an estimate of p_c from $p^{(M)}$ , we simply invert this process. However, to do this, we make a naive assumption that each element in $G_{c}^{(M)}$ is equally probable. In other words, $p_{a} = p_{a^{'}} \forall a, a^{'} \in G_{c}^{(M)}$ . This leads to the conclusion that $p_{c}^{(M)} = | G_{c}^{(M)} | p_{a} \forall a \in G_{c}^{(M)}$ , and therefore, the estimate for each $p_{a} \in G_{c}^{(M)}$ is

p_{a} = \frac{p_{c}^{(M)}}{| G_{c}^{(M)} |} .

Example 2: Using the sample Q-matrix as before, but with $M = {1, 3}$ and $K = 3$ , we have that

F_{K} = {(0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1), (1, 1, 1)} .

Now, if we want to obtain an estimate of $p_{(0, 0)}^{({1, 3})}$ , we first obtain

G_{(0, 0)}^{({1, 3})} = {a \in F_{K} : a_{1} = c_{1}, a_{3} = c_{2}} = {a \in F_{K} : a_{1} = 0, a_{3} = 0} = {(0, 0, 0), (0, 1, 0)} .

Therefore, the estimate of $p_{(0, 0)}^{({1, 3})}$ would be

p_{(0, 0)}^{({1, 3})} = p_{(0, 0, 0)} + p_{(0, 1, 0)} .

Because we assume that $p_{(0, 0, 0)} = p_{(0, 1, 0)}$ , this implies that

p_{(0, 0)}^{({1, 3})} = | G_{(0, 0)}^{({1, 3})} | p_{(0, 0, 0)} = | G_{(0, 0)}^{({1, 3})} | p_{(0, 1, 0)} = 2 p_{(0, 0, 0)} = 2 p_{(0, 1, 0)} .

Finally, this gives the estimates

p_{(0, 0, 0)} = p_{(0, 1, 0)} = \frac{p_{(0, 0)}^{({1, 3})}}{2} .

Obtaining the estimates of s and g from $s^{(M)}$ and $g^{(M)}$ is more straightforward. When fitting the model with $Y^{(M)}$ and $Q^{(M)}$ , we will only be able to obtain the estimates of item parameters for those items found in $H^{(M)}$ . For those items not found in $H^{(M)}$ , we will simply treat estimates for those item parameters as missing. More formally, we have that $s_{j} = s_{j}^{(M)} and g_{j} = g_{j}^{(M)}$ if $j \in H^{(M)}$ and for $j \notin H^{(M)}$ , we have that s_j and g_j are missing. Using our example with $M = 1 and 3$ , we would have that

s_{j} = s_{j}^{({1, 3})}, g_{j} = g_{j}^{({1, 3})} \forall j \in {1, 3, 5},

with estimates of s ₂, s ₄, and s ₆ and g ₂, g ₄, and g ₆ being treated as missing.

3.3. Connections Between DINA-BAG and Ensemble Methods

The algorithm described in the previous sections was inspired by ensemble methods used in the machine learning literature. However, there are some novel adaptations that deviate from the norm. As such, it is useful to understand how the proposed algorithm relates to current methods as well as where it departs from the current literature. Bagging, short for “bootstrap aggregating,” as originally introduced by Breiman (1996), was a method of training “weak learners” on different subsets of a dataset and then combining the individual predictions from each weak learner to form a better prediction. Later, Breiman (2001) introduced the idea of random forests which proposed using a different subset of predictors for each of the weak learners that was trained. This helped to reduce the correlation between predictions by weak learners, thus potentially improving the aggregated predictions.

These methods are inherently nonparametric and as such are typically concerned with prediction rather than parameter estimation. Training algorithms for these methods focus on minimizing some distance metric between the observed data and the model predictions. While some approaches, such as SHAP (Shapley additive explanations) values (Lundberg and Lee, 2017), have been developed to help explain how predictors influence outcomes, it is difficult to know if or how these would apply with specialized latent variable models such as CDMs. Our proposed method seeks to glean wisdom from the ideas behind these techniques while adapting them to be able to provide estimates of model parameters for a hypothesized model, such as the DINA CDM. Thus, while it is similar to these algorithms, it is also different in ways that allow it to better accomplish the task at hand.

The main relationship between our proposed algorithm and traditional bagging/random forest algorithms is that we fit the same model (i.e., the DINA model) using different subsets of “predictors” for each model. With random forests, the subsets of predictors are randomly sampled to train each weak learner (usually decision trees). With the DINA-BAG algorithm, we do not have traditional predictors, and instead, we select subsets of columns from the Q-matrix (see Equation 15). With the DINA model, the predictors are indicators as to whether someone has all the skills required to answer a question correctly, and since these quantities are the functions of latent variables and the Q-matrix, selecting different subsets of the Q-matrix redefines the latent space and can be thought of as forming a new predictor. Finally, again in contrast to random forests, this is not a random process, but rather a deterministic one, and is done by selecting all subsets of columns in the Q-matrix of size two or more.

With random forests, a decision tree is usually trained on each subset of the data using the randomly selected predictors. With the DINA-BAG algorithm, the DINA model is used. This provides a natural way to obtain the estimates of the model parameters we are interested in, instead of just predictions of observed data. Finally, as in bagging where the same model is fit several times to different subsets of data, the DINA-BAG algorithm also fits a model several times to different subsets of data. However, whereas in traditional bagging, these subsets are chosen randomly, with the DINA-BAG algorithm they are chosen in a specific way to allow for the estimation of specific parametric quantities (i.e., p, s, and g) using the DINA model. More specifically, a subset $Y^{(M)}$ is formed by selecting only those columns in Y that correspond to the rows in Q that were selected to form $Q^{(M)}$ (see Equation 16).

4. Simulation Study

A simulation study was conducted to evaluate the effectiveness of the proposed algorithm when it comes to estimating model parameters $p, s$ , and g. In this study, we examine the following factors: sample size $(N = 10, 30, and 100)$ , number of attributes $(K = 3 and 6)$ , number of items $(J = 25, 40, and 55)$ , and true skill mastery profile distribution (p). Five different patterns were chosen for p to represent different possible populations that one might encounter in testing settings. We describe each of these in more detail in the paragraphs that follow and formal definitions of these can be found in Table 1. Each data set was generated according to the DINA model. For each data set that was generated from the DINA model, the slipping and guessing parameters were sampled randomly from the uniform distribution $U (0.1, 0.4)$ , which allowed the items to vary in terms of item quality. Finally, with each simulated data set, it was guaranteed that the original Q-matrix was complete and identifiable.

Table 1.

True Underlying Distribution Equations

Uniform	Increasing	Decreasing	U	Inverted U
$\frac{1}{2^{K}} \forall c$	$\sum_{k = 1}^{K} c_{k}$	$\frac{1}{\sum_{k = 1}^{K} c_{k}}$	${(\frac{K}{2} - \sum_{k = 1}^{K} c_{k})}^{2}$	$max_{c} {(\frac{K}{2} - \sum_{k = 1}^{K} c_{k})}^{2} - {(\frac{K}{2} - \sum_{k = 1}^{K} c_{k})}^{2}$

Note. Equations give values for p_c prior to normalization. Uniform—This represents a population of students that are equally distributed among the different skill mastery profiles. Increasing—This represents a population of students where students are more likely to belong to high skill mastery profiles. One might encounter a population such as this when testing students on material that many students are likely to have medium to high mastery over. Decreasing—This represents a population of students where students are more likely to belong to low skill mastery profiles. One might encounter a population such as this when testing students on material that most students have low to medium mastery over. U—This represents a population that consists of both high and low ability, but little in between. This might occur when the population is a mixture of two subpopulations one of which consists of students that have little mastery over the subject and the other which consists of those who have high mastery over the subject. Inverted U—This represents a population where students are likely to have mastered some, but not all aspects of the subject material. In this situation, one would expect that most students have medium mastery over the material.

For each combination of sample size, test length, number of attributes, and true underlying skill mastery profile distribution, 30 different data sets were generated using the same Q-matrix. This resulted in a total of $3 \times 3 \times 2 \times 5 \times 30 = 2, 700$ data sets. With each data set, four models were fit: the DINA model, the NPCD model, the SANN model, and the DINA-BAG model, and estimates for p, s, and g are obtained for each model. For the DINA model, the CDM (George et al., 2016; Robitzsch et al., 2022) package in R (R Core Team, 2022) is used to directly obtain parameter estimates. For the DINA-BAG algorithm, the CDM package is used to produce the estimates of $p^{(M)}$ , $s^{(M)}$ , and $g^{(M)}$ for submodels, and these estimates are used to obtain estimates of model parameters usings Equations 12 through 14. For the NPCD and SANN models, the R packages NPCD (Zheng & Chiu, 2019) and neuralnet (Fritsch et al., 2019) were used, respectively, to obtain the estimates of α, and then, the estimates of DINA model parameters were computed using Equations 5 through 7 found in Section 2. A different SANN was trained for each of the six types of Q-matrices using $n_{1} = 20$ for the number of nodes in the hidden layer, as suggested by Cui et al. (2016). The estimates of p from each method were compared to the true value of p using three different metrics: (1) $MAD (p, \hat{p}) = max_{c} | p_{c} - {\hat{p}}_{c} |$ , (2) $MSE (p, \hat{p}) = \frac{1}{J} \sum_{c} {(p_{c} - {\hat{p}}_{c})}^{2}$ , and (3) $CE (p, \hat{p}) = - \frac{1}{2^{K}} \sum_{c} p_{c} log {\hat{p}}_{c}$ . Each of these metrics measures a different quality in terms of performance for the estimators. Finally, to compare estimates from each method for s and g, only mean-squared error (MSE) was computed with $MSE (s, \hat{s}) = \frac{1}{J} \sum_{j = 1}^{J} {(s_{j} - {\hat{s}}_{j})}^{2}$ and $MSE (g, \hat{g}) = \frac{1}{J} \sum_{j = 1}^{J} {(g_{j} - {\hat{g}}_{j})}^{2}$ .

4.1. Simulation Study Results

Figure 1 shows how sample size affects the performance of the four methods’ estimates of p for each of the three metrics. Overall, the DINA-BAG method outperforms the other methods in our simulation study. The differences in performance are largest for smaller sample sizes and decrease as N increases. For each sample size, the DINA and NPCD methods perform similarly with the NPCD method slightly outperforming the DINA method. The SANN method performs much worse than the other three methods. Again, this is likely due to the fact that the SANN was trained using ideal response patterns instead of actual response patterns that occur when slipping and guessing are involved.

Figure 1.

Median values of the three metrics used to evaluate $\hat{p}$ for the four different methods along with bars for first and third quartiles. The bagging algorithm for deterministic inputs noisy “and” gate method outperforms all other methods on each metric for each sample size with the difference between methods decreasing as sample size increases.

Figure 2 shows the performance of methods’ estimates of slipping and guessing parameters. Differences in performance are less marked than in Figure 1 except for the SANN method, which is clearly outperformed by the other three methods in most scenarios. Again, this is likely due to the way the SANN is trained. The DINA-BAG method shows similar performance to the DINA and NPCD method for each sample size, although it should be noticed that the MSE is usually lowest for the NPCD method, except for when $N = 10$ . This suggests that in terms of more accurately estimating the true underlying distribution of skill mastery for the population, the DINA-BAG method is preferred, but the resulting estimates of the slipping and guessing might not be as accurate as those produced with the NPCD method.

Figure 2.

Median values of mean-squared error for estimates of slipping and guessing parameters for different sample sizes along with bars for first and third quartiles. The bagging algorithm for deterministic inputs noisy “and” gate method performs similarly to other methods for each sample size with it slightly outperforming other methods for $N = 10$ and being slightly outperformed by the nonparametric diagnostic classification method for $N = 30 and 100$ .

Figure 3 is similar to Figure 1 but instead shows the influence of test length on the performance of each method’s performance. For each test length, the DINA-BAG method produces the most accurate estimates of p. With the exception of the SANN method, the length of test seems to have little influence on the estimators’ performance. For the SANN method, increasing the length of the test negatively affects its accuracy in terms of estimating p. One possible explanation for this might be that increasing the test length also increases the chance that the observed response sequence for an attribute profile will deviate from the ideal response sequence. This in turn increases the chances that the SANN will misclassify a response sequence. Since the SANN method’s estimate of p depends on the accuracy of its classifications, this leads to a decrease in performance for longer test lengths.

Figure 3.

In Figure 4, we see the effect of test length on methods’ estimators of the slipping and guessing parameters. For guessing parameters, the DINA, DINA-BAG, and NPCD methods each perform similarly. The SANN method again performs the worst and its performance again degrades as the test length increases. For slipping parameters, each estimator appears to degrade as the test length increases. However, this decrease in performance is much more pronounced for the SANN method than for the other three methods. The DINA-BAG method’s performance once again tends to be better than that of the DINA method, but not as good as the performance of the NPCD method.

Figure 4.

Median values of mean-squared error for estimates of slipping and guessing parameters for different test lengths along with bars for first and third quartiles. With the exception of the simple artificial neural network method, the performance of each method is comparable for each test length with the nonparametric diagnostic classification method producing the most accurate estimates of slipping and guessing parameters.

In Figures 5 and 6, we see the effects of increasing the number of attributes on estimator accuracy. When estimating p, the DINA-BAG method outperforms other methods with regard to each metric for both $K = 3$ and $K = 6$ . However, for some metrics, it appears that the difference in performance depends on how many attributes are being measured. For example, when looking at median absolute deviation (MAD) and cross entropy, we see the gap between the DINA-BAG method and the NPCD and DINA methods is much larger when $K = 6$ than when $K = 3$ . When it comes to estimates of slipping and guessing parameters, we see different patterns. For guessing parameters, each method except the SANN method performs similarly. For slipping parameters, all methods except the SANN method perform similarly when $K = 3$ , but the NPCD method performs better when $K = 6$ . As before, we see that in most situations, the DINA-BAG method does well for what it was designed to do (i.e., estimate p) but might not do as well when estimating slipping and guessing parameters.

Figure 5.

Median values of the three metrics used to evaluate $\hat{p}$ for the four different methods along with bars for first and third quartiles. The bagging algorithm for deterministic inputs noisy “and” gate method outperforms other methods on each metric for both $K = 3$ and $K = 6$ , with difference between methods being somewhat more pronounced when $K = 6$ .

Figure 6.

Median values of mean-squared error for the estimates of slipping and guessing parameters for different test lengths along with bars for first and third quartiles. Each method performs similarly when estimating slipping and guessing parameters for both $K = 3$ and $K = 6$ with the nonparametric diagnostic classification method slightly outperforming other methods.

Finally, Figures 7 through 9 show how well each method is able to estimate p for the different underlying structures considered in our simulation, for sample sizes $10$ , $30$ , and $100$ , respectively. When $N = 10$ , the DINA-BAG method outperforms every other method for each one of the five structures. However, as the sample size increases, we start to see some differences in performance. For increasing, decreasing, and uniform structures for p, the DINA-BAG method produces the best estimates. However, when $N = 30$ , the NPCD method outperforms the DINA-BAG method for the “U”-shaped structure, and both the DINA and NPCD methods outperform the DINA-BAG methods for this structure when $N = 100$ . In each situation, the SANN method is clearly outperformed. For very small sample sizes, the DINA-BAG method will perform well in each situation, but for larger sample sizes, it will depend on the underlying shape of the distribution of skill mastery profiles.

Figure 7.

Median values of the three metrics used to evaluate $\hat{p}$ for the four different methods when $N = 10$ along with bars for first and third quartiles. The bagging algorithm for deterministic inputs noisy “and” gate method outperforms other methods on every metric and in each scenario with differences being less pronounced when the true underlying structure is a “U” shape.

Figure 8.

Median values of the three metrics used to evaluate $\hat{p}$ for the four different methods when $N = 30$ along with bars for first and third quartiles. With a larger sample size, the bagging algorithm for deterministic inputs noisy “and” gate method still outperforms other methods in each situation, except for mean-squared error and MAD when the underlying structure is a “U” shape. However, differences in performance are much smaller.

Figure 9.

Median values of the three metrics used to evaluate $\hat{p}$ for the four different methods when $N = 100$ along with bars for first and third quartiles. The differences in performance are even less pronounced when $N = 100$ . The bagging algorithm for deterministic inputs noisy “and” gate method still outperforms other methods when the underlying structure is uniform, increasing, or decreasing but performs worse in terms of MAD and mean-squared error.

5. Simulation Study With ECPE Dataset

The DINA-BAG model was also used with the Examination for the Certificate of Proficiency in English (ECPE) data set to demonstrate its utility in real-world settings. This data set has been used by Templin and Bradshaw (2013, 2014) to demonstrate how to fit CDMs to item response data. It contains responses from 2,922 students to 28 questions designed to measure proficiency in the English language. This assessment measures knowledge of three parts of English grammar: (1) morphosyntactic rules, (2) cohesive rules, and (3) lexical rules.

Since in real-life settings it is not possible to know the true underlying distribution of skill mastery profiles, we will treat the DINA model’s estimate of p using the entire dataset as the “true” distribution. The purpose of this simulation study then is to measure each model’s ability to recover this distribution using a subset of the entire data set. In other words, this simulation is to study each model’s ability to use a small subset of student responses to obtain estimates like those that would be obtained if the entire data sets were used.

Since the Q-matrix is fixed for this assessment, we only study the effect of sample size on each model’s performance. For each sample size $(N = 10, 30, and 100)$ , we sample 100 different subsets from Y of that size. This results in 300 smaller data sets, one for each sample size. For each data set, each of the three models are fit to that data and estimates of p are obtained from each model. We then compare these estimates of p obtained with the smaller data sets to that obtained from fitting the DINA model to the entire data set. The same metrics as before are used to compare the performance of each estimate. Finally, the same procedures and R packages as those described in Section 4 were used in producing parameter estimates for each method.

5.1. Real Data Study Results

Figure 10 shows the effect of sample size on each method’s ability to estimate the value for p that would have been obtained by fitting the DINA model using the entire ECPE data set. For each metric, the DINA-BAG method outperforms every other method for sample sizes $N = 10$ and $N = 30$ . When $N = 100$ , the DINA method begins to outperform the DINA-BAG method. Unlike in the simulation studies where the NPCD method performed similarly to the DINA method, it now tends to only slightly outperform the SANN method for larger sample sizes. This could in part be due to the fact that we are not estimating the true p, but rather an estimate of p produced by the DINA model.

Figure 10.

Effect of sample size on the performance of each method’s estimator of p for each of the three metrics. With the exception of when $N = 100$ , the bagging algorithm for deterministic inputs noisy “and” gate method produces estimates of p that are closest to those obtained when using the entire Examination for the Certificate of Proficiency in English data set.

In Figure 11, we can see how the different methods compare to one another in terms of their estimation of slipping and guessing parameters. In contrast to their performance when estimating the DINA model estimate of p, the SANN and NPCD methods produce more accurate estimates of guessing parameters, except when $N = 100$ . However, when estimating slipping parameters, the NPCD and SANN methods again perform poorly when compared to the DINA and DINA-BAG methods. In every situation, the DINA-BAG method outperforms the DINA method, except when estimating guessing parameters when $N = 100$ .

Figure 11.

Effect of sample size on each method’s estimators of slipping and guessing parameters. For guessing parameters, the nonparametric diagnostic classification method produces estimates closest to the those obtained with the full Examination for the Certificate of Proficiency in English data set, except when $N = 100$ , in which case the deterministic inputs noisy “and” gate (DINA) and the bagging algorithm for DINA (DINA-BAG) methods show similar performance. For slipping parameters, the DINA-BAG method outperforms all other methods for each sample size but is closely followed by the DINA method.

Assuming the DINA model is a good fit for these data, these results indicate that when sample sizes are small, the DINA-BAG method is still able to produce the estimates of the true underlying distribution of skill mastery profiles that are both more stable and accurate than the estimates produced by the other three methods. In practice, when sample sizes are 30 or smaller, then we can expect the proposed method to outperform other methods in terms of providing a more accurate picture of skill mastery in the population. When the sample size is larger than 30, either the DINA or the NPCD method would be preferable, at least when the number of attributes is small. As the number of attributes being measured increases, we would expect the proposed method to outperform other methods for sample sizes larger than 30.

6. Discussion

Effective instruction and curriculum development depends on an educator’s ability to obtain accurate knowledge about levels of skill mastery in their classroom with respect to a variety of different skills. CDMs are useful tools that can provide this information but often are unable to accurately estimate the true distribution of skill mastery profiles when the sample size is small. This algorithm, which is inspired by ensemble machine learning methods, takes advantages of many different aspects of the data and combines estimates from different models to achieve impressive results when compared to other popular methods. It produces more stable and accurate estimates of the distribution of skill mastery profiles with very small sample sizes in a variety of different settings. It does this while still producing reasonably accurate estimates of other DINA model parameters when compared to other popular methods.

The method proposed in this article offers a better solution for educators that want to use test responses from a very small sample of students to obtain more accurate information about the larger population of students. This information could then be used to make better decisions regarding curriculum design or instructional practices that would more readily generalize to other groups of students in the same population. Through simulation studies we have shown that, for sample sizes of 100 or smaller, our proposed algorithm produces estimates of skill mastery that are closer to the true levels of skill mastery in the population with respect to a variety of different metrics designed to measure “closeness.” This is true for a variety of different distributions for skill mastery that might arise in practice.

In most situations, when the sample size is small, it would be preferable to use the DINA-BAG algorithm over other popular methods. Simulation studies showed that, with the exception of when the true underlying structure of skill mastery is U-shaped, the DINA-BAG method will outperform other methods on every metric when estimating p. Overall, when not considering the underlying structure for p, the DINA-BAG algorithm performs better than all other methods for all sample sizes, test lengths, and number of attributes we considered. It should be noted that as the sample size increases, the differences in performance decrease, suggesting that when the sample size is sufficiently large, it would be preferable to just use the DINA model. This is to be expected as the DINA-BAG method introduces bias to achieve estimates that are more stable and accurate when the sample size is small but becomes detrimental when compared to the DINA method’s unbiased estimates when the sample size is sufficiently large.

As the DINA-BAG method is designed to perform well for small sample sizes, it is important to recognize what factors should be considered when determining what “small” means. An important factor that influences whether the DINA-BAG method will be a better choice when compared to other methods is the number of attributes being measured. When only three attributes are being measured, a sample size of 100 will likely be more than enough for the DINA model to perform well, especially if the true underlying structure is U-shaped. However, as the number of attributes being measured increases, you will need increasingly larger sample sizes to outperform the DINA-BAG method. Current simulation studies suggest that when the sample size is 30 or smaller and the number of attributes is 6, the DINA-BAG method will outperform all other methods, regardless of the true underlying structure for p. However, when the sample size is 100 or when the number of attributes is 3, whether the DINA-BAG method performs best might depend on the true underlying structure.

Many of the benefits seen in studies with simulated data were also realized in studies with real-life data. The simulation study with the ECPE data set suggested that the DINA-BAG method will still outperform other methods in terms of its ability to obtain more accurate information about population skill mastery from just a fraction of the data. In other words, assuming that the DINA model is an appropriate model for the data, then the DINA-BAG model will be able to use a very small sample size to obtain estimates of population skill mastery that are closer to those that would have been obtained had a larger sample size been available. In the case with the ECPE data set, this proved to be true when measuring only three attributes. As the number of attributes increases, we expect the advantages of our proposed method to become even more apparent.

The results presented here are novel and promising, but there is much work that could still be done to explore the strengths and weaknesses of the proposed algorithm. One area of future research could explore how the algorithm might improve performance when used with a variety of CDMs. Furthermore, while the current research assumed that the Q-matrix was complete and identifiable, it might be possible to adapt the algorithm, so that it can be used in cases where the Q-matrix is not identifiable. Also, as it is often the case that we do not know the true Q-matrix, it would be useful to understand how the algorithm performs when the Q-matrix is misspecified or when combined with methods developed to estimate the Q-matrix (Chen et al., 2018; Chung, 2019; Xu & Shang, 2018; Liu et al., 2020; Liu et al., 2012). Finally, while the methods applied here rely on frequentist estimates of model parameters, Bayesian approaches also exist (Culpepper, 2015). The ideas presented here could potentially be applied within a Bayesian framework to benefit from the strengths inherent in Bayesian methods.

Finally, since the focus of this research was to develop a method that could produce more accurate estimates of the true underlying distribution of skill mastery profiles for the population, we did not study classification accuracy at the level of the individual. However, this is an important aspect of assessment and future research should explore how the proposed algorithm performs in terms of classification accuracy for individual skill mastery profiles. It could be the case that the approach given here produces more accurate estimates of skill mastery for the population, but not for individuals. Future research should explore whether the proposed method should be used for individual classification or if it can be adapted for such situations. Last of all, since simulation studies suggested that the performance gain of the DINA-BAG algorithm over other methods is influenced by an interaction between sample size and the number of attributes being measured, this is something that future research should explore.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Black

Wiliam

. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7–74.

Black

Wiliam

. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1), 5–31.

Breiman

(1996). Bagging predictors. Machine Learning, 24(2), 123–140.

Breiman

(2001). Random forests. Machine Learning, 45, 5–32.

Chen

Culpepper

S. A.

Chen

Douglas

. (2018). Bayesian estimation of the Dina Q matrix. Psychometrika, 83(1), 89–108.

Chiu

C. Y.

Douglas

. (2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response patterns. Journal of Classification, 30(2), 225–250.

Chiu

C. Y.

Sun

Bian

. (2018). Cognitive diagnosis for small educational programs: The general nonparametric classification method. Psychometrika, 83(2), 355–375.

Choi

K. M.

Lee

Y. S.

Park

Y. S

. (2015). What CDM can tell about what students have learned: An analysis of TIMSS eighth grade mathematics. Eurasia Journal of Mathematics, Science and Technology Education, 11(6), 1563–1577.

Chung

(2019). A Gibbs sampling algorithm that estimates the Q-matrix for the Dina model. Journal of Mathematical Psychology, 93, 102275.

10.

Cui

Gierl

Guo

. (2016). Statistical classification for cognitive diagnostic assessment: An artificial neural network approach. Educational Psychology, 36(6), 1065–1082.

11.

Culpepper

S. A.

(2015). Bayesian estimation of the Dina model with Gibbs sampling. Journal of Educational and Behavioral Statistics, 40(5), 454–476.

12.

De La Torre

(2009). Dina model and parameter estimation: A didactic. Journal of Educational and Behavioral Statistics, 34(1), 115–130.

13.

De La Torre

Minchen

(2014). Cognitively diagnostic assessments and the cognitive diagnosis model framework. Psicologa Educativa, 20(2), 89–97.

14.

Fritsch

Guenther

Wright

M. N

. (2019). neuralnet: Training of Neural Networks. R package version 1.44.2.

15.

George

A. C.

Robitzsch

Kiefer

Groß

Ünlü

(2016). The R package CDM for cognitive diagnosis models. Journal of Statistical Software, 74(2), 1–24.

16.

. (2019). The sufficient and necessary condition for the identifiability and estimability of the Dina model. Psychometrika, 84(2), 468–483.

17.

Junker

B. W.

Sijtsma

(2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.

18.

Lee

Y. S.

Park

Y. S.

Taylan

. (2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the U.S. national sample using the TIMSS 2007. International Journal of Testing, 11(2), 144–177.

19.

Leighton

Gierl

(2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge University Press.

20.

Liu

C. W.

Andersson

Skrondal

. (2020). A constrained Metropolis–Hastings Robbins–Monro algorithm for Q matrix estimation in Dina models. Psychometrika, 85(2), 322–357.

21.

Liu

Ying

. (2012). Data-driven learning of Q-matrix. Applied Psychological Measurement, 36(7), 548–564.

22.

Lundberg

S. M.

Lee

S. I

. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30.

23.

Macready

G. B.

Dayton

C. M

. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 2(2), 99–120.

24.

National Center for Education Statistics. (2018). National Center for Education Statistics (NCES), a part of the U.S. Department of Education. Retrieved October 11, 2022, from https://nces.ed.gov/surveys/ntps/tables/ntps1718_fltable06_t1s.asp

25.

Paulsen

Valdivia

D. S

. (2022). Examining cognitive diagnostic modeling in classroom assessment conditions. Journal of Experimental Education, 90(4), 916–933.

26.

R Core Team. (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.

27.

Robitzsch

Kiefer

George

A. C.

Ünlü

. (2022). CDM: Cognitive Diagnosis Modeling. R package version 8.2-6.

28.

Rumelhart

D. E.

Hinton

G. E.

Williams

R. J

. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533–536.

29.

Rupp

A. A.

Templin

Henson

R. A.

(2010). Diagnostic measurement: Theory, methods, and applications. The Guilford Press.

30.

Sessoms

Henson

R. A.

(2018). Applications of diagnostic classification models: A literature review and critical commentary. Measurement: Interdisciplinary Research and Perspectives, 16(1), 1–17.

31.

Shu

Henson

Willse

(2013). Using neural network analysis to define methods of Dina model estimation for small sample sizes. Journal of Classification, 30(2), 173–194.

32.

Templin

Bradshaw

(2013). Obtaining diagnostic classification model estimates using MPLUS. Educational Measurement: Issues and Practice, 32(2), 37–50.

33.

Templin

Bradshaw

(2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317–339.

34.

Wang

Douglas

. (2015). Consistency of nonparametric classification in cognitive diagnosis. Psychometrika, 80(1), 85–100.

35.

Shang

(2018). Identifying latent structures in restricted latent class models. Journal of the American Statistical Association, 113(523), 1284–1295.

36.

Zheng

Chiu

C.-Y

. (2019). NPCD: Nonparametric Methods for Cognitive Diagnosis. R package version 1.0-11.

DINA-BAG: A Bagging Algorithm for DINA Model Parameter Estimation in Small Samples

Abstract

Keywords

1. Introduction

2. Cognitive Diagnosis With Small Sample Sizes

2.1. DINA Model

2.2. NPCD Method

2.3. SANN Method

3. Bagging Algorithm for DINA Model (DINA-BAG)

3.1. Computing Y ( M ) and Q ( M )

3.2. Estimating p, s, and g Using p ( M ) , s ( M ) , and g ( M )

3.3. Connections Between DINA-BAG and Ensemble Methods

4. Simulation Study

4.1. Simulation Study Results

5. Simulation Study With ECPE Dataset

5.1. Real Data Study Results

6. Discussion

Footnotes

Declaration of Conflicting Interests

Funding

References

3.1. Computing $Y^{(M)}$ and $Q^{(M)}$

3.2. Estimating p, s, and g Using $p^{(M)}$ , $s^{(M)}$ , and $g^{(M)}$