A Class of Cognitive Diagnosis Models for Polytomous Data

Abstract

This article proposes a class of cognitive diagnosis models (CDMs) for polytomously scored items with different link functions. Many existing polytomous CDMs can be considered as special cases of the proposed class of polytomous CDMs. Simulation studies were carried out to investigate the feasibility of the proposed CDMs and the performance of several information criteria (Akaike’s information criterion [AIC], consistent Akaike’s information criterion [CAIC], and Bayesian information criterion [BIC]) in model selection. The results showed that the parameters of the proposed CDMs could be recovered adequately under varied conditions. In addition, CAIC and BIC had better performance in selecting the most appropriate model than AIC. Finally, a set of real data was analyzed to illustrate the application of the proposed CDMs.

Keywords

cognitive diagnosis polytomously scored items polytomous CDMs

1. Introduction

Cognitive diagnostic assessments (CDAs) aim to detect whether examinees have mastered a set of attributes or skills of interest. Unlike traditional tests that typically provide only total ability estimates, CDAs can provide detailed diagnostic information about the strengths and weaknesses of students to aid instruction and learning. In recent years, CDAs have received increasing attention within educational and psychological tests.

1.1. Dichotomous Cognitive Diagnosis Models

So far, a large number of cognitive diagnosis models (CDMs) have been proposed to satisfy the demands of the CDAs. Examples of CDMs include the deterministic inputs, noisy “and” gate (DINA) model (Haertel, 1989; Junker & Sijtsma, 2001); the deterministic input, noisy “or” gate (DINO) model (Templin & Henson, 2006); the reduced reparameterized unified model (R-RUM; Hartz, 2002); the linear logistic model (LLM; Maris, 1999); the noisy, input deterministic “and” gate (NIDA) model (Junker & Sijtsma, 2001); the generalized DINA (G-DINA) model (de la Torre, 2011); and the log-linear CDM (LCDM; Henson et al., 2009). However, most of these CDMs are only suitable for dichotomously scored items.

1.2. Polytomous Cognitive Diagnosis Models

Polytomously scored items, or polytomous items for short, are important for various testing purposes (Embretson & Reise, 2000). Examples of polytomous items include the constructed-response items in cognitive tests and rating scales in personality and attitude tests. Despite some limitations, polytomous items have several advantages over dichotomous items (van der Ark, 2001). First, polytomous items can usually provide more information than dichotomous items and thus yield more accurate parameter estimation (Nering & Ostini, 2010). Second, polytomous items may be more suitable than dichotomous items for some purposes. For example, in noncognitive tests, rating scales with only two options may frustrate respondents (Cox, 1980) and limit the reliability. In cognitive tests, dichotomous items like multiple-choice and true-false items are usually, though not always, believed to elicit only lower level cognitive skills, whereas polytomous items such as essays are more likely to measure higher level cognitive processes (Bandalos, 2018, p. 82). Finally, researchers have shown that open-ended items that are usually polytomously scored are more appropriate for diagnostic purposes because students’ responses can be explicitly observed and students need to really solve the problem, which may not be the case for multiple-choice items due to the availability of options (Birenbaum & Tatsuoka, 1987).

To analyze polytomous data in CDAs, a common strategy is to convert polytomous data into dichotomous data (e.g., Lee et al., 2011; Templin & Henson, 2006). After this conversion, the existing CDMs for dichotomous data can be applied. However, such a method can lead to loss of information and reduce classification accuracy (Ma & de la Torre, 2016; Tu et al., 2017). Therefore, it is necessary to develop CDMs for polytomous data.

At present, only a few polytomous CDMs have been developed to deal with polytomous items, such as the general diagnostic model (GDM; von Davier, 2005, 2008), the partial-credit DINA model (PC-DINA; de la Torre, 2010), the polytomous LCDM (P-LCDM; Hansen, 2013), rating scale diagnostic model (RSDM; R. Liu & Jiang, 2020), and the sequential G-DINA model (Ma & de la Torre, 2016). It is important to note that the GDM allows polytomous items, and polytomous attributes, while the other models focus only on polytomous responses. However, these CDMs, except the sequential G-DINA model, do not consider the relationship between attributes and response categories by assuming that all response categories of an item measure the same set of attributes. This may result in a loss of diagnostic information because different response categories could measure different attributes. More importantly, the existing polytomous models are based on different theoretical assumptions (or cognitive processing), which belong to different types of models. For example, the P-LCDM belongs to graded response models based on the global (or cumulative) logit, the GDM, PC-DINA and NRDM are partial-credit models that make use of the local logit, while the sequential G-DINA model is a special case of the sequential process model based on the continuation ratio (CR) logit. Therefore, for these existing CDMs, each model depends on restrictive theoretical assumptions and limits their use.

The current article proposes a class of CDMs for polytomous responses with less restrictive assumptions. In the proposed CDMs, three different link functions, namely, the cumulative logit, the local logit, and the CR logit, are considered. Moreover, constraints on the Q-matrix and the parameters across categories may be imposed. As a result, the proposed model contains at least 12 different types of polytomous CDMs and can deal with ordinal polytomous responses of different nature. In contrast, many existing CDMs for polytomous response, such as the sequential G-DINA and the P-LCDM, can only handle one particular type of polytomous response data and can be viewed as special cases of the proposed model. Therefore, compared with existing CDMs, the proposed model is more flexible and could have a wider range of applications.

The remaining sections of the article are laid out as follows. In Section 2, we introduce a class of CDMs for polytomous data and discuss its relationship with some existing models. In Sections 3 and 4, we introduce the approaches for estimating the parameters of the class of polytomous CDMs and several model fit indices, respectively. Section 5 presents the results of a simulation study, and in Section 6, a real data example is presented, respectively. We conclude the article with a brief summary and a discussion in the last section.

2. General Polytomous CDMs

The Q-matrix plays a key role in cognitive diagnosis. The traditional Q-matrix is a $J \times K$ binary matrix (J is the number of items and K is the number of attributes) that relates the items to the attributes. The entry in row j and column k is 1 if the jth item requires the kth attribute and 0 otherwise (Tatsuoka, 1983). The Q-matrix for polytomous data, however, can be specified at the item level (Hansen, 2013; Ma & de la Torre, 2016; von Davier, 2008) or the category level (Ma & de la Torre, 2016). A Q-matrix defined at the item level is the same as the traditional Q-matrix and specifies which attribute is required by each item, while a Q-matrix defined at the category level specifies which attribute is required by each category of each item. Although the item-level Q-matrix seems simpler, the category-level Q-matrix should be used whenever possible because more information is conveyed, and therefore, parameters could be estimated more accurately (Ma & de la Torre, 2016). In addition, the item-level Q-matrix can be derived from the category-level Q-matrix by identifying the attributes measured by each category but not vice versa.

Take $\sqrt{6.4 / 0.4 - 7} = ?$ as an example. As shown in Table 1, for the category-level Q-matrix, each step or each score category has its corresponding attributes. For example, Step 1 only requires Attribute 2 (division), while Steps 2 and 3 require Attribute 1 (subtraction) and Attribute 3 (square root), respectively. In contrast, the item-level Q-matrix assumes that all the attributes required for an item are required for each category of the item.

Table 1.

An Example of Category and Item-Level Q-Matrix

Item/Step	Response Category	Category Level			Item Level
				$α_{3}$			$α_{3}$
		$α_{1}$ Subtraction	$α_{2}$ Division	Square Root	$α_{1}$ Subtraction	$α_{2}$ Division	Square Root
$\sqrt{6.4 / 0.4 - 7} = ?$					1	1	1
Step 1: $6.4 / 0.4 = 16$	1	0	1	0
Step 2: $16 - 7 = 9$	2	1	0	0
Step 3: $\sqrt{9} = 3$	3	0	0	1

2.1. The Proposed Class of Polytomous CDM

Let $X_{j} \in \{0, 1, . . ., m_{j}\}$ denote the response variable for the jth item, where j = 1,…, J. Also, let $P (X_{j} = t | α_{l})$ denote the probability that examinees with attribute pattern $α_{l}$ obtain a score of t on item j. A class of CDMs for polytomous responses can be expressed as

π [P (X_{j} = t | α_{l})] = β_{j t} + γ_{j t}^{T} h (q_{j t}, α_{l}), t = 0, 1, 2, . . ., m_{j},

where $π [\cdot]$ is the link function, m_j is the maximum score of item j, and $q_{j t}$ is q-vector for category t of item j, with entry 1 indicating category t of item j requires attribute k and 0 otherwise. In addition, $γ_{j t}^{T}$ represents a row vector consisting of $2^{K_{j t}^{*}} - 1$ parameters of category t of item j, where $K_{j t}^{*}$ is the number of attributes required by category t of item j. Also, $h (q_{j t}, α_{l})$ represents a set of linear combinations of $α_{l}$ and $q_{j t}$ . Finally, $β_{j t}$ is the baseline parameter representing the baseline level of category t for those who have not mastered any of the attributes corresponding category t. Similar to the G-DINA model and LCDM, $γ_{j t}^{T} h (q_{j t}, α_{l})$ is defined as

γ_{j t}^{T} h (q_{j t}, α_{l}) = \sum_{u = 1}^{K_{j t}^{*}} γ_{j t, u} (α_{l u} q_{j t, u}) + \sum_{u = 1}^{K_{j t}^{*}} \sum_{v > u}^{} γ_{j t, u v} (α_{l u} α_{l v} q_{j t, u} q_{j t, v}) + . . . + γ_{j t, 12, . . ., K_{j t}^{*}} \prod_{k = 1}^{K_{j t}^{*}} α_{l k},

where $γ_{j t, u}$ is the main effect due to $α_{u}$ , $γ_{j t, u v}$ is the two-way interaction effect due to $α_{u}$ , and $α_{v}$ ; $γ_{j t, 12, . . ., K_{j t}^{*}}$ represents the K-way interaction effect due to $α_{1}$ to $α_{K}$ .

In the proposed CDMs, we consider three different link functions: (1) global logits, (2) local (or adjacent categories) logits, and (3) CR logits. To simplify the model, constraints on the Q-matrix and constraints to the same parameters across categories can be adopted in the proposed CDMs.

Type of link function. Note that under the global, local, and CR logit link functions, Equation 1 can be written as

log \frac{P (X_{j} \geq t | α_{l})}{1 - P (X_{j} \geq t | α_{l})} = β_{j t} + γ_{j t}^{T} h (q_{j t}, α_{l}),

log \frac{P (X_{j} = t | α_{l})}{P (X_{j} = t - 1 | α_{l})} = β_{j t} + γ_{j t}^{T} h (q_{j t}, α_{l}), and

log \frac{P (X_{j} \geq t | α_{l})}{P (X_{j} = t - 1 | α_{l})} = β_{j t} + γ_{j t}^{T} h (q_{j t}, α_{l}),

respectively. It should be noted that $q_{j t}$ has different meanings under different link functions. Specifically, $q_{j t}$ represents the required attributes for getting at least a score of t in the global logit while the required attribute(s) for completing the tth step/category in the local and continuation ratio logits.

Constraints on the main effects and interactions. First, when an attribute, or a set of attributes, has the same impact on different categories, we could assume that the main effects due to this attribute are equal across all categories and that the interaction effects due to this set of attributes are also equal across all categories. More formally, we assume

γ_{j t, W}^{} = γ_{j t^{'}, W}^{} = γ_{j, W}^{}, (t \neq t^{'}, t and t^{'} = 1, 2, . . ., m_{j}),

where W represents the index for main effect or interaction. This constraint, which is referred to as the equality constraint across categories, can substantially decrease the number of parameters of item j.

Constraints on the Q-matrix. Another possible constraint is related to the Q-matrix. The proposed model was defined using a category-level Q-matrix, but as mentioned before, the item-level Q-matrix can be obtained by imposing some constraints on the category-level Q-matrix. An item-level Q-matrix assumes that all the attributes required for the item are required for each category of the item as in

q_{j t}^{} = q_{j t^{'}}^{} = q_{j}^{}, (t \neq t^{'}, t and t^{'} = 1, 2, . . ., m_{j}) .

By combining the above two constraints, we can obtain four different item parameterizations (see Table 2). Each of them can be combined with the global, local, or continuation ratio logit link functions to obtain different types of CDMs for polytomous responses.

Table 2.

Parameterizations of Different Cognitive Diagnosis Models for Polytomous Data

Q-Matrix	Constraint Across Categories	$π_{t} [P (X_{j} = t \| α_{l})]$
Item level	Constrained	$β_{j t} + γ_{j}^{T} h (q_{j}, α_{l})$
Item level	Free	$β_{j t} + γ_{j t}^{T} h (q_{j}, α_{l})$
Category level	Constrained	$β_{j t} + γ_{j}^{T} h (q_{j t}, α_{l})$
Category level	Free	$β_{j t} + γ_{j t}^{T} h (q_{j t}, α_{l})$

2.2. Relations With Existing Polytomous CDMs

It can be shown that some existing polytomous CDMs are special cases of the proposed CDMs. First, using the item-level Q-matrix, the proposed CDM with global logit link function can be written as

P (X_{j} \geq t | α_{l}) = \frac{exp \{β_{j t} + γ_{j t}^{T} h (q_{j}, α_{l})\}}{1 + exp \{β_{j t} + γ_{j t}^{T} h (q_{j}, α_{l})\}} .

The category response function for item j can be expressed as

P (X_{j} = t | α_{l}) = P (X_{j} \geq t | α_{l}) - P (X_{j} \geq t + 1 | α_{l}),

with

P (X_{j} \geq t | α_{l}) = \{\begin{cases} 1, t = 0 \\ 0, t = m_{j} + 1 \end{cases},

subject to the constraints

\sum_{t = 0}^{m_{j}} P (X_{j} = t | α_{l}) = 1.

This model is equivalent to Hansen’s (2013) P-LCDM by imposing some constraints.

Second, using the category-level Q-matrix, the proposed CDM with the CR logit can be expressed as

P (X_{j} = t | α_{l}) = [1 - \frac{exp \{β_{j t + 1} + γ_{j t + 1}^{T} h (q_{j t + 1}, α_{l})\}}{1 + exp \{β_{j t + 1} + γ_{j t + 1}^{T} h (q_{j t + 1}, α_{l})\}}] \prod_{c = 1}^{t} \frac{exp \{β_{j c} + γ_{j c}^{T} h (q_{j c}, α_{l})\}}{[1 + exp \{β_{j c} + γ_{j c}^{T} h (q_{j c}, α_{l})\}]} .

Let

M (X_{j} = t | α_{l}) = \frac{exp \{β_{j t} + γ_{j t}^{T} h (q_{j t}, α_{l})\}}{1 + exp \{β_{j t} + γ_{j t}^{T} h (q_{j t}, α_{l})\}},

where $M (X_{j} = t | α_{l})$ denotes the probability of examinees with attribute pattern $α_{l}$ answering category t of item j correctly after completing category t − 1, and the category response function can be rewritten as

P (X_{j} = t | α_{l}) = [1 - M (X_{j} = t + 1 | α_{l})] \prod_{c = 1}^{t} M (X_{j} = c | α_{l}),

with

M (X_{j} = t | α_{l}) = \{\begin{cases} 1, if t = 0 \\ 0, if t = m_{j} + 1 \end{cases} .

Different from the global or cumulative logit, here q _j represents the required attributes only by the tth step. This model is equivalent to Ma and de la Torre’s (2016) sequential G-DINA model.

Third, using an item-level Q-matrix, the proposed CDM with the local logit link function can be written as

P (X_{j} = t | α_{l}) = \frac{exp \sum_{c = 0}^{t} π [P (X_{j} = t | α_{l})]}{\sum_{r = 0}^{m_{j}} exp \sum_{c = 0}^{r} π [P (X_{j} = t | α_{l})]} = \frac{exp \sum_{c = 0}^{t} [β_{j c} + γ_{j t}^{T} h (q_{j}, α_{l})]}{\sum_{r = 0}^{m_{j}} exp \sum_{c = 0}^{r} [β_{j c} + γ_{j t}^{T} h (q_{j}, α_{l})]},

with

\sum_{c = 0}^{0} [β_{j c} + γ_{j t}^{T} h (q_{j}, α_{l})] = 0 or exp \sum_{c = 0}^{0} [β_{j c} + γ_{j t}^{T} h (q_{j}, α_{l})] = 1.

When only main effects are considered, the above model is similar to von Davier’s (2005, 2008) GDM for partial-credit data. In addition, R. Liu and Jiang’s (2020) RSDM can also be obtained by imposing appropriate constraints.

It should be noted that in the article, the general formulation $π_{t} [P (X_{j} = t | α_{l})]$ is formulated using the LCDM. By setting appropriate constraints, some reduced models can be obtained, for example, DINA, DINO, the additive CDM (de la Torre, 2011), R-RUM, and LLM, though the parameters of these models may need to be interpreted with caution.

3. Parameter Estimation

Item parameters of the proposed class of CDMs can be estimated using the marginal maximum likelihood estimation approach via the expectation–maximization (MMLE/EM) algorithm. The marginal log-likelihood function of the response matrix can be written as

l (x) = log \prod_{i = 1}^{N} \sum_{l = 1}^{2^{K}} L (X_{i} | α_{l}) p (α_{l}),

where $L (X_{i} | α_{l})$ is the likelihood value of the response vector X _i of examinee i with attribute pattern $α_{l}$ and K represents the number of attributes. It can be computed as

L (X_{i} | α_{l}) = \prod_{j = 1}^{J} \prod_{t = 0}^{m_{j}} P {(X_{i j} = t | α_{l})}^{I (X_{i j} = t)},

where $X_{i j}$ denotes the response of examinee i on item j, t and m_j denote the score category and the maximum score of the item j, respectively, and $I (X_{i j} = t)$ is an indicator variable. The EM algorithm includes two steps in each iteration: expectation step (E step) and maximization step (M step). The E step calculates the expected number of examinees with attribute profile $α_{l}$ scoring t on item j, which can be written as

R_{l j t} = \sum_{i = 1}^{N} I (X_{i j} = t) P (α_{l} | X_{i}),

Note that $P (α_{l} | X_{i})$ represents the posterior probability that examinee i is in the latent group $α_{l}$ and can be calculated by

P (α_{l} | X_{i}) = \frac{L (X_{i} | α_{l}) p (α_{l})}{\sum_{l = 1}^{2^{K}} L (X_{i} | α_{l}) p (α_{l})} .

For item j, the M step maximizes the following objective function

f = \sum_{l = 1}^{2^{K}} \sum_{t = 0}^{m_{j}} R_{l j t} log [P (X_{j} = t | α_{l})],

using some general optimization algorithms. In this study, the parameter estimation code was written in R software (R Core Team, 2018) using various functions from the G-DINA R package (Ma & de la Torre, 2019b). The E and M steps are repeated until convergence. Note that the prior distribution of attribute patterns was uniform in the first iteration, and then estimated after each iteration, as in de la Torre (2011). After item parameters are obtained via the EM algorithm, we use the expected a posteriori (EAP) method to estimate the examinee parameters.

4. Model Selection

The proposed class of CDMs for polytomous data, subsumes a number of CDMs with different parameterizations. In practice, to determine the most appropriate model, information criteria, such as Akaike’s information criterion (AIC; Akaike, 1974), asymptotically consistent Akaike’s information criterion (CAIC; Bozdogan, 1987), and Bayesian information criterion (BIC; Schwarz, 1978) may be used:

AIC = - 2 log (L) + 2 d,

CAIC = - 2 log (L) + d \times (log (N) + 1),

BIC = - 2 log (L) + d \times log (N),

where L is the likelihood based on maximum likelihood estimation (MLE), d refers to the number of parameters under the assumed model, and N is sample size.

5. Simulation Studies

In this section, two simulation studies were carried out to evaluate the feasibility of the proposed models. For notational convenience, in this article, the category-level Q-matrix is referred to as the cat-Q, while the item-level Q-matrix the item-Q. In addition, the CDMs using the item- and category-level Q-matrices are called item- and cat-CDMs, respectively.

Study 1 aims to examine (1) whether the EM algorithm can accurately estimate the parameters of the proposed models and (2) whether using item-level Q-matrix to analyze data generated by category-level Q-matrix will reduce the accuracy of parameter estimation. Study 2 aims to investigate whether the model-data fit indices can be used to select the appropriate CDMs under various simulation conditions. For simplicity, equality constraints across response categories were not imposed in the simulation studies.

5.1. Simulation Study 1

5.1.1. Design

In Study 1, we manipulated the number of attributes, sample size, and link function. Specifically, the number of attributes was set at K = 5 or 7, test lengths were 20 when K = 5, and 25 when K = 7. The link functions included local logit, global logit, and continuation ratio logit (or CR logit). The category-level Q-matrices when K = 5 and 7 are given in Tables 3 and 4, respectively. Each category in the category-level Q-matrix was constrained to measure a maximum of two attributes. Each attribute was measured the same number of times in the test. For a given item, the number of attributes measured by different categories was independent.

The item-level Q-matrix was created from the category-level Q-matrix by assuming all attributes measured by the categories were needed for the item. Four different sample sizes were used (N = 500, 1,000, 2,000, and 4,000 examinees). The attribute profiles of individuals were simulated using the higher order model (de la Torre & Douglas, 2004). The conditional probability of mastering attribute k can be calculated by the following equation

P (α_{k} | θ) = \frac{exp (λ_{0 k} + λ_{k} θ)}{1 + exp (λ_{0 k} + λ_{k} θ)} .

The simulation method is the same as C. Wang (2013), that is, the higher order latent trait and intercept parameters were drawn from the standard normal distribution, and the slope parameters were drawn from the lognormal (0, 1) distribution. Item responses were generated based on the cat-CDMs with three different link functions. Under each condition, 100 data sets were simulated. To calibrate the data, we fitted the CDMs with three link functions using both category-level and item-level Q-matrices.

Table 3.

Category-Level Q-Matrix for Simulation Study (K = 5)

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$
1	1	1	0	0	0	0	11	1	1	1	0	0	0
1	2	0	1	0	0	0	11	2	0	0	0	0	1
2	1	0	0	1	0	0	12	1	0	1	0	0	0
2	2	0	0	1	1	0	12	2	0	0	0	1	0
3	1	1	0	0	0	1	12	3	0	0	0	0	1
3	2	1	0	0	0	0	13	1	0	0	0	0	1
4	1	0	0	0	0	1	13	2	0	0	0	1	0
4	2	0	0	0	1	1	13	3	0	0	1	0	0
5	1	0	0	1	0	0	14	1	1	0	0	0	0
5	2	0	1	0	1	0	14	2	0	1	0	0	0
6	1	1	1	0	0	0	14	3	0	0	1	0	0
6	2	0	0	1	0	0	15	1	0	0	0	1	0
7	1	0	1	0	0	0	15	2	0	0	0	0	1
7	2	0	1	0	1	0	15	3	1	0	0	0	0
8	1	0	0	0	1	0	16	1	1	0	0	0	0
8	2	1	0	1	0	0	17	1	0	1	0	0	0
9	1	0	0	0	1	1	18	1	0	0	1	0	0
9	2	0	0	1	0	1	19	1	0	0	0	1	0
10	1	0	1	1	0	0	20	1	0	0	0	0	1
10	2	1	0	0	0	0

Table 4.

Category-Level Q-Matrix for Simulation Study (K = 7)

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	$α_{7}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	$α_{7}$
1	1	1	0	0	0	0	0	0	17	1	0	1	1	0	0	0	0
2	1	0	1	0	0	0	0	0	17	2	1	0	0	0	0	0	0
3	1	0	0	1	0	0	0	0	18	1	1	1	0	0	0	0	0
4	1	0	0	0	1	0	0	0	18	2	0	0	0	0	1	0	0
5	1	0	0	0	0	1	0	0	19	1	1	0	0	0	0	0	0
6	1	0	0	0	0	0	1	0	19	2	0	1	0	0	0	0	0
7	1	0	0	0	0	0	0	1	19	3	0	0	1	0	0	0	1
8	1	1	0	0	0	0	0	0	20	1	0	0	0	0	1	0	0
8	2	0	1	0	0	0	0	0	20	2	0	0	0	0	0	0	1
9	1	0	1	1	0	0	0	0	20	3	0	0	0	1	0	0	1
9	2	0	0	1	1	0	0	0	21	1	0	0	1	0	0	0	0
10	1	1	0	0	0	1	0	0	21	2	0	0	0	1	0	0	0
10	2	1	0	0	1	0	0	1	21	3	0	0	0	0	1	1	0
11	1	0	0	0	0	1	0	0	22	1	0	0	0	1	0	0	0
11	2	1	0	0	0	1	0	0	22	2	0	0	0	0	0	0	1
12	1	0	0	0	0	0	1	0	22	3	0	0	0	0	0	1	1
12	2	0	0	0	0	1	0	1	23	1	0	0	0	0	1	0	0
13	1	0	1	0	0	0	0	0	23	2	0	0	0	0	0	1	0
13	2	0	0	1	0	0	1	0	23	3	0	0	0	0	0	1	1
14	1	0	1	0	0	0	0	0	24	1	1	0	0	0	0	1	1
14	2	0	1	0	1	0	0	0	24	2	0	1	0	0	0	0	0
15	1	0	0	0	1	0	0	0	24	3	0	0	0	0	0	1	0
15	2	1	0	1	0	0	0	0	25	1	0	0	1	0	0	0	0
16	1	0	0	0	1	0	1	0	25	2	0	0	0	0	1	0	0
16	2	0	0	1	0	0	1	0	25	3	0	0	0	0	0	0	1

For each item, three types of parameters (intercept, main effect, and interaction effect) were randomly drawn from three normal distributions with a common standard deviation of 0.5 and means of −2, 4, and 0, respectively. Given the item parameters, we can calculate the probability of examinees with attribute pattern $α_{l}$ having a score of t on item j, which is represented by $P (X_{j} = t | α_{l})$ . Then, the corresponding responses of examinees with attribute pattern $α_{l}$ in item j were drawn from a categorical distribution. Both the cat-CDMs and item-CDMs were used to analyze data generated from cat-CDMs.

5.1.2. Evaluation criteria

The item parameter recovery was evaluated using bias and root mean square error (RMSE) defined as

Bias = \frac{1}{R \times J} \sum_{r}^{R} \sum_{j = 1}^{J} (τ_{j}^{(r)} - {\hat{τ}}_{j}^{(r)}),

RMSE = \sqrt{\frac{1}{R \times J} \sum_{r}^{R} \sum_{j = 1}^{J} {(τ_{j}^{(r)} - {\hat{τ}}_{j}^{(r)})}^{2}},

where J and R are the number of items and replications, respectively, and $τ_{j}^{(r)}$ and ${\hat{τ}}_{j}^{(r)}$ are the true and estimated item parameter for the rth replication, respectively. The examinee parameter recovery was evaluated using the pattern correct classification rate (PCCR), which reflects the agreement between estimated attribute profiles and the true attribute profiles:

P C C R = \frac{\sum_{r = 1}^{R} \sum_{i = 1}^{N} I^{(r)} (α_{i} = {\hat{α}}_{i})}{N \times R},

where N and R are the sample size and the number of replications, respectively; are indicator functions; and represent true and estimated attribute profiles, respectively.

5.1.3. Results

Recall that in this simulation study, the data were generated using cat-CDMs with three link functions (local, global, and CR logits) but fitted using both cat-CDMs and item-CDMs. The convergence of model calibration was monitored, and a calibration is said to converge normally when the difference of the negative 2 times log-likelihood values for two consecutive EM iterations was less than .01 and the number of EM iterations is less than 1,000. We found that both cat-CDMs and item-CDMs converged normally for all replications under all simulation conditions.

To evaluate parameter estimation accuracy, we only focus on the results of cat-CDMs. Figures 1 and 2 present the PCCRs of CDMs under varied conditions. When K = 5 and 7, the PCCRs of the cat-CDMs with the local and CR logit link functions were greater than .9 and .85, respectively, while the PCCRs of the cat-CDMs with the global logit link function were above .85 and .76, respectively, regardless of the sample size. Under the same conditions, the PCCRs of CDMs with the global logit were slightly lower than those of the CDMs with local and CR logits. This may be because the model based on the global logit link is more complex compared with the other two link functions. It can also be observed that, as expected, the PCCRs increased as the sample size increased but decreased as the number of attributes increased. In addition, under all conditions, the cat-CDMs produced higher PCCRs than the item-CDMs. For example, when K = 7 and the sample size was 500, the PCCR of the Item-CDM with the local logit was about .84, whereas the PCCR of the cat-CDM under the same condition was about .88.

Figure 1.

The pattern correct classification rates for K = 5 with various Q-Matrices and test lengths. Note. Item-Q = item level-Q; cat-Q = category level-Q; CR = continuation ratio.

Figure 2.

The pattern correct classification rates for K = 7 with various Q-Matrices and test lengths. Note. Item-Q = item level-Q; cat-Q = category level-Q; CR = continuation ratio.

Tables 5 and 6 provide the biases and RMSEs, respectively, when K = 5, and Tables 7 and 8 summarize these results when K = 7. The results in Tables 5 through 8 showed that the intercept and main effect parameters can be more accurately estimated with lower biases and RMSEs than the interaction effects, which is consistent with the findings of Jiang and Ma (2018). The accuracy of the item parameter estimation was improved with the increase of sample size. In addition, the cat-CDMs provided higher estimation accuracy for item parameters (with lower biases and RMSEs) than the item-CDMs. This is because, in general, item-CDMs involve more item parameters than cat-CDMs. For example, under the local logit link function, when K = 5, the cat-CDM had 131 item parameters but the item-CDM had 273 item parameters.

Table 5.

Biases of Item Parameter Estimates When K = 5

Link Function	N	Cat-Q			Item-Q
Link Function	N	Intercept	Main	Interaction	Intercept	Main	Interaction
Local	500	.196	−.653	−2.062	1.270	−2.356	−2.792
	1,000	.080	−.294	−2.102	0.693	−1.286	−2.923
	2,000	.029	−.140	−1.735	0.385	−0.649	−2.446
	4,000	.015	−.067	−1.149	0.219	−0.355	−1.636
Global	500	.175	−.157	−0.385	0.335	−0.359	−0.502
	1,000	.083	−.064	−0.313	0.146	−0.161	−0.467
	2,000	.030	−.021	−0.221	0.074	−0.093	−0.295
	4,000	.014	−.010	−0.150	0.024	−0.040	−0.233
CR	500	.224	−.595	−2.240	1.290	−2.824	−3.094
	1,000	.097	−.248	−2.150	0.749	−1.738	−2.967
	2,000	.037	−.119	−1.548	0.461	−1.044	−2.217
	4,000	.009	−.038	−0.878	0.243	−0.547	−1.562

Note. Local = local logit; global = global logit; CR = continuation ratio logit; N =sample size; item-Q = item level-Q; cat-Q = category level-Q; intercept = intercept parameters; main = main effect parameters; interaction = interaction effect parameters.

Table 6.

RMSE Values for Simulation Study (K = 5)

Link Function	N	Cat-Q			Item-Q
Link Function	N	Intercept	Main	Interaction	Intercept	Main	Interaction
Local	500	0.965	2.119	4.420	3.368	5.028	6.322
	1,000	0.517	1.281	3.926	2.450	3.504	5.487
	2,000	0.277	0.768	3.361	1.637	2.242	4.478
	4,000	0.181	0.461	2.642	1.100	1.451	3.365
Global	500	0.856	1.849	3.605	1.228	3.080	5.509
	1,000	0.522	1.034	2.445	0.723	1.897	3.802
	2,000	0.278	0.569	1.687	0.389	1.105	2.702
	4,000	0.140	0.303	1.143	0.174	0.546	1.877
CR	500	1.095	2.078	4.785	3.553	5.851	6.222
	1,000	0.619	1.139	4.185	2.670	4.888	5.370
	2,000	0.346	0.728	3.583	1.846	3.121	4.523
	4,000	0.176	0.344	2.425	1.230	2.102	3.613

Note. RMSE = root mean square error; local = local logit; global = global logit; CR = continuation ratio logit; N = sample size; item-Q = item level-Q; cat-Q = category level-Q; intercept = intercept parameters; main = main effect parameters; interaction = interaction effect parameters.

Table 7.

Bias Values for Simulation Study (K = 7)

Link Function	N	Cat-Q			Item-Q
Link Function	N	Intercept	Main	Interaction	Intercept	Main	Interaction
Local	500	.226	−.707	−1.833	1.216	−2.694	−2.402
	1,000	.108	−.333	−1.739	0.750	−1.467	−2.676
	2,000	.049	−.152	−1.427	0.371	−0.692	−2.565
	4,000	.021	−.078	−1.095	0.227	−0.397	−1.919
Global	500	.259	−.269	−0.144	0.452	−0.495	−0.161
	1,000	.070	−.014	−0.213	0.160	−0.156	−0.255
	2,000	.009	.031	−0.132	0.084	−0.114	−0.116
	4,000	.019	−.013	−0.096	0.026	−0.036	−0.171
CR	500	.198	−.616	−2.580	1.112	−3.011	−3.046
	1,000	.093	−.278	−2.243	0.737	−1.962	−2.853
	2,000	.046	−.147	−1.728	0.402	−1.096	−2.525
	4,000	.018	−.061	−1.101	0.225	−0.646	−1.871

Note. Local = local logit; global = global logit; CR = continuation ratio logit; N = sample size; item-Q = item level-Q; cat-Q = category level-Q; intercept = intercept parameters; main = main effect parameters; interaction = interaction effect parameters.

Table 8.

RMSE Values for Simulation Study (K = 7)

Link Function	N	Cat-Q			Item-Q
Link Function	N	Intercept	Main	Interaction	Intercept	Main	Interaction
Local	500	1.118	2.268	4.265	3.370	5.720	6.472
	1,000	0.641	1.382	3.760	2.444	3.739	5.724
	2,000	0.380	0.810	3.197	1.614	2.323	5.019
	4,000	0.224	0.521	2.616	1.108	1.539	4.087
Global	500	1.196	2.409	4.977	1.684	3.687	7.180
	1,000	0.514	1.456	3.735	0.816	2.502	5.900
	2,000	0.211	0.679	3.324	0.469	1.562	4.547
	4,000	0.250	0.517	2.282	0.310	1.023	2.914
CR	500	1.086	2.183	4.869	3.459	6.186	6.572
	1,000	0.669	1.323	4.279	2.637	4.575	5.801
	2,000	0.382	0.828	3.629	1.716	3.148	5.131
	4,000	0.215	0.447	2.833	1.145	2.190	4.189

In conclusion, the results of Simulation Study 1 showed that (1) item and examinee parameters can be recovered reasonably well based on the MMLE/EM algorithm; (2) cat-CDMs always had higher estimation accuracy for both item and examinee parameters than item-CDMs. This is mainly due to the fact that the cat-CDMs define the attributes required for each category, which provides additional information and helps improve the estimation accuracy of item and examinee parameters; and (3) as sample size increased, the estimation accuracy of item and respondent parameters improved.

5.2. Simulation Study 2

5.2.1. Design

The purpose of Study 2 was to evaluate the performance of the model-data fit indices in selecting appropriate link functions for the model. These factors were manipulated in the simulation study: sample size (500, 1,000, 2,000, and 4,000) and link functions (local, global, and CR logits). The number of attributes was fixed to K = 5. The Q-matrix, the generation of attribute profiles, and item parameters were the same as Study 1. Under each condition, 100 data sets were generated. When the cat-CDMs were used to generate the data, the cat-CDMs were also used to fit the data. Table 9 presented the data generation and fitting methods in Study 2. The proportion of times that the generating (or true) models were selected was used to evaluate the performance of the model fit indices.

5.2.2. Results

Table 10 gives the proportion of times that the cat-CDMs were chosen by different relative fit indices when data were generated using the cat-CDMs.

Table 9.

The Method of Generating and Fitted Data in Study 2

Generating Method		Fitted Method
Q-Matrix Level	Link Function	Q-Matrix Level	Link Function
Category	Local logit	Category	Local logit
			Global logit
			CR logit
	Global logit	Category	Local logit
			Global logit
			CR logit
	CR logit	Category	Local logit
			Global logit
			CR logit

Note. CR = continuation ratio; category = category-level.

Table 10.

Proportion of Selecting Generating Cat-CDMs

Link Function	N	AIC	CAIC	BIC
Local logit	500	1.00	1.00	1.00
	1,000	1.00	1.00	1.00
	2,000	1.00	1.00	1.00
	4,000	1.00	1.00	1.00
Global logit	500	0.93	0.95	0.95
	1,000	0.98	1.00	1.00
	2,000	0.99	1.00	1.00
	4,000	1.00	1.00	1.00
CR logit	500	1.00	1.00	1.00
	1,000	1.00	1.00	1.00
	2,000	1.00	1.00	1.00
	4,000	1.00	1.00	1.00

Note. CR logit = continuation ratio logit link function; AIC = Akaike’s information criterion; BIC = Bayesian information criterion; CAIC = consistent Akaike’s information criterion; cat-CDMs = category-level cognitive diagnosis models.

As shown in Table 10, when the generating model had CR logit or local logit, regardless of the sample size, the AIC, CAIC, and BIC always selected the generating model. When the generating model had global logit link function, under the condition of N = 500, the proportion of the AIC, CAIC, and BIC selecting the generating model was 93%, 95%, and 95%, respectively; for these indexes, the proportion of selecting the generating model increased as the sample size increased. For example, when N = 4,000, these indexes always selected the generating model.

6. Real Data Analysis

6.1. Data

In this section, a real data analysis was conducted to illustrate the application of the proposed class of polytomous CDMs. The data consist of responses of 516 Chinese eighth-grade students to 25 items of a mathematics diagnostic test. Of 25 items, there are 6 polytomous items and 19 dichotomous items. Six attributes are measured by the test, namely, ( $α_{1}$ ) basic computing power, ( $α_{2}$ ) multiplication of the same base powers, ( $α_{3}$ ) monomial and polynomial multiplication, ( $α_{4}$ ) formula for the difference of square, ( $α_{5}$ ) division of the same base powers, and ( $α_{6}$ ) extract the common factor. The Q-matrix of the test is given in Table 11.

Table 11.

The Q-Matrix of the Real Data

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{6}$
1	1	0	1	1	1	0	0	18	1	0	0	0	1	0
2	1	0	1	0	0	0	0	19	1	1	1	1	0	0
3	1	1	0	0	0	0	0	19	2	1	0	1	0	0
4	1	0	0	0	1	0	0	19	3	1	0	1	0	0
5	1	0	0	0	0	0	1	20	1	1	0	0	1	0
6	1	1	0	0	0	0	1	20	2	1	0	0	0	0
7	1	0	0	0	1	0	0	21	1	0	0	0	0	1
8	1	0	0	0	1	0	0	21	2	1	0	0	0	0
9	1	1	0	0	1	0	0	22	1	0	0	0	1	0
10	1	1	1	0	0	0	0	22	2	1	0	0	0	0
11	1	1	1	0	0	0	0	22	3	0	0	0	0	1
12	1	0	0	0	1	0	0	23	1	0	0	0	0	1
13	1	0	0	0	0	1	0	24	1	1	0	0	0	0
14	1	0	0	0	0	0	1	24	2	1	0	0	0	0
15	1	1	0	0	0	1	0	25	1	0	0	1	0	0
16	1	0	0	0	0	1	0	25	2	1	0	0	0	0
17	1	0	1	0	0	0	0	25	3	1	0	0	0	0

6.2. Analysis and Results

In this analysis, the proposed class of polytomous CDMs was used to analyze the real data and the model fit indices were calculated. The results are shown in Table 12. It can be observed that the cat-CDM with the CR logit and equality constraints across categories had the smallest values of the BIC and CAIC and, therefore, had a better model fit.

Table 12.

The Results of Test-Level Fit

Constraint Across Categories	Link Function	AIC	BIC	CAIC
Free	Local logit	14,702	15,378	15,537
	Global logit	14,670	15,447	15,630
	CR logit	14,642	15,317	15,476
Equal	Local logit	14,767	15,404	15,554
	Global logit	14,753	15,415	15,571
	CR logit	14,670	15,307	15,457

Note. CR = continuation ratio; AIC = Akaike’s information criterion; BIC = Bayesian information criterion; CAIC = consistent Akaike’s information criterion.

The results of relative model fit at test level indicated that the model with CR logit, cat-Q, and equality constraints across categories may be the most appropriate for analyzing the empirical data. However, it is still unclear whether it can fit data adequately in an absolute sense. To answer this question, the differences in observed and model-implied Fisher’s z-transformed Pearson correlations between all pairs of items were calculated. The differences were then divided by their corresponding standard errors, resulting in test statistics conforming to standard normal distribution (Chen et al., 2013). Table 13 shows the p value of the maximum test statistic associated with each item. Note that the p values were adjusted using Bonferroni method because 24 hypothesis tests were conducted for each item. It can be observed that the adjusted p values were greater than .05 for all items, indicating that the model had a good fit at the item level. Therefore, the analyses below were based on the model with CR logit and equality constraints across categories.

Table 13.

The Results of Item-Level Fit

Item	p Value	Item	p Value
Item 1	1.000	Item 14	1.000
Item 2	1.000	Item 15	1.000
Item 3	.432	Item 16	.576
Item 4	1.000	Item 17	.120
Item 5	.960	Item 18	.864
Item 6	1.000	Item 19	1.000
Item 7	.480	Item 20	.864
Item 8	1.000	Item 21	.864
Item 9	.672	Item 22	.432
Item 10	.120	Item 23	.576
Item 11	.120	Item 24	1.000
Item 12	.432	Item 25	.120
Item 13	.120

Table 14 summarizes the attribute-level classification consistency indices using the estimator of Johnson and Sinharay (2018). The classification consistency index is used as an important indicator of the reliability of CDA tests (Cui et al., 2012; W. Wang et al., 2015). For the cat-CDM with CR logit and equality constraints across categories, the classification consistency of six attributes ranged from .883 to .924, with the average classification consistency value of .892. Johnson and Sinharay (2018) suggested a very good reliability range of .8–.9 when reporting a single reliability indicator of skill level. This result showed that the selected model had very good reliability on each attribute.

Table 14.

The Classification Consistency for Each Attribute

$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	Mean
.884	.889	.883	.905	.870	.924	.892

Note. Mean denotes the average classification consistency value.

Table 15 gives the average mastery percentages of students in each attribute. As the results show, for each attribute, at least 45% of the students had mastered it, with $α_{6}$ being the lowest at 46.9% and $α_{2}$ being the highest at 57.6%. Among the six attributes, the mastery percentages of attributes $α_{4}$ , $α_{5}$ , and $α_{6}$ were more similar and lower than the other three attributes.

Table 15.

Mean of Students’ Mastery Percentages Across Six Attributes

$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$
.514	.576	.545	.479	.473	.469

Table 16 presents the number and percentage of students in different attribute patterns. The results in Table 16 showed that the attribute patterns of the 516 students were included in 58 attribute patterns. The proportion of students who only mastered one or fewer attributes was about 12%. The percentage of those who mastered two attributes was about 22%. The majority (more than 65%) of the students mastered at least three attributes.

Table 16.

The Number and Percentage of Students With Different Attribute Patterns

Attribute Pattern	Number of Students	Percentage	Attribute Pattern	Number of Students	Percentage
000000	9	1.744	100000	11	2.132
000001	11	2.132	100001	7	1.357
000010	5	0.969	100010	6	1.163
000011	4	0.775	100011	4	0.775
000100	7	1.357	100101	12	2.325
000101	7	1.357	100110	11	2.132
000110	5	0.969	101000	11	2.132
001000	14	2.132	101001	3	0.581
001001	8	1.550	101010	10	1.938
001010	7	1.357	101011	4	0.775
001011	6	1.163	101100	15	2.907
001100	7	1.357	101101	7	1.357
001101	8	1.551	101110	12	2.235
001111	8	1.550	110000	12	2.326
010000	9	1.744	110001	5	0.969
010001	8	1.550	110010	8	1.550
010010	10	1.938	110011	14	2.713
010011	5	0.969	110100	6	1.163
010100	10	1.938	110101	10	1.938
010110	7	1.357	110110	6	1.163
010111	12	2.326	110111	11	2.132
011000	10	1.938	111000	8	1.550
011001	12	2.326	111001	7	1.357
011010	9	1.744	111010	9	1.744
011011	10	1.938	111011	16	3.101
011100	14	2.713	111100	6	1.163
011101	9	1.744	111101	14	2.713
011110	10	1.938	111110	12	2.326
011111	10	1.938	111111	8	1.550

7. Discussion

In this article, we proposed a class of polytomous CDMs for polytomous items with different link functions and various constraints on Q-matrix and item parameters. The proposed CDM for polytomous response is very general and subsume several commonly used polytomous CDMs including the P-LCDM (Hansen, 2013), the sequential G-DINA model (Ma & de la Torre, 2016), and RSDM (R. Liu & Jiang, 2020). The proposed models can also help us understand the relation of existing models. In particular, some existing models, such as the P-LCDM and sequential G-DINA model, are closely related with the only difference in the use of link functions. This article also developed the MLE procedure for estimating model parameters, which could be used for some existing models that were subsumed by the developed model and that were originally proposed to be estimated using Markov chain Monte Carlo (MCMC) method such as the RSDM (R. Liu & Jiang, 2020). Despite some advantages, MCMC tends to be slow and the MLE estimation based on the proposed models could be particularly useful in practice.

The simulation results indicated that the EM algorithm can be used to adequately estimate model parameters under varied conditions. The simulations studies also showed that the structure of Q-matrix was an important factor affecting the estimation accuracy. For example, under the same simulation condition, the cat-CDMs produced higher item and person parameters estimation accuracy than the item-CDMs. This is mainly because the category-level Q-matrix defines the relationship between attributes and response categories and thus provides additional information to aid the parameter estimation. This also suggests that identifying the association between attributes and categories when developing polytomously scored items for cognitive diagnosis assessment could be useful. In addition, this article carried out a simulation study to investigate the performance of several model fit indices in selecting appropriate model under different sample sizes and generating models. According to the simulation study, for data generated using the cat-CDMs with the global link function, the AIC, CAIC, and BIC did not always select the correct model when the sample size was small (e.g., 500).

Although the results are promising, several future directions of research can be identified to unlock the potential of these proposed polytomous CDMs. First, some constraints on the Q-matrix design need to be further studied. This study assumed that the Q-matrix is correctly specified. However, in practice, this assumption is not always satisfied. Some existing studies had shown that the Q-matrix misspecifications would reduce the accuracy of item and person parameter estimation (Kunina-Habenicht et al., 2012; Lei & Li, 2016). Therefore, it is worth examining how the proposed polytomous CDMs perform when Q-matrix is misspecified. The current study did not model the relation between attributes, but, in practice, attributes may have different hierarchical structures such as linear, convergent, divergent, and mixed structure. In order to get a more general conclusion, future research should consider these factors.

Second, the same model was fit to all polytomous items in a single test in this study, though, theoretically, researchers do not have to do so, given that the model we proposed is defined at item level and that it is possible to specify different models for different items. The practical challenge, however, is how to determine the most appropriate model for each item. Ostini and Nering (2006) emphasized that data characteristics should be first considered (p. 91). For example, if the problem-solving process is sequential (this might be evaluated by domain experts), the continuation ratio model appears the natural choice. In addition to data characteristics, the evaluation of model-data fit at item level can provide valuable information for model selection. Currently, most studies on item-level model-data fit measures focused on dichotomous data (e.g., de la Torre & Lee, 2013; Y. Liu et al., 2016; Ma et al., 2016; Sorrel et al., 2017; C. Wang et al., 2015). An exception is Ma and de la Torre (2019a), which examined the performance of the Wald statistic in selecting the most appropriate models under the sequential G-DINA model, but it is still unclear how the Wald statistic can be used to determine the best link functions based on the proposed models. Future research could explore how models should be selected at the item level.

Finally, the proposed polytomous CDMs are confirmatory in nature, given that the Q-matrix is assumed to be known a priori. In contrast, some exploratory diagnostic models for polytomous response have been developed recently (e.g., Culpepper, 2019; Fang et al., 2019). An advantage of the exploratory diagnostic models is that they do not require prior knowledge of the underlying structure and provide a framework for researchers to infer the underlying structure and item response process simultaneously (Culpepper, 2019). Future research can explore how confirmatory and exploratory CDMs could be used together to better analyze the data from diagnostic assessments. In addition, although a set of CDMs is developed, the conditions for ensuring model identifiability are still unclear. Culpepper (2019) and Fang et al. (2019) discussed the identifiability conditions for some general CDMs, but further investigation is needed to shed more light on the connection between their findings and the proposed models.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Humanities and Social Science Research Projects in Colleges and Universities in Guizhou Province (grant ID: 2020QN018), National Natural Science Foundation of China (31660278, 31760288, 31960186), and Guizhou Normal University’s 2019 PhD Research Startup Project (GZNUD[2019] No. 27).

ORCID iD

Wenchao Ma

References

Akaike

(1974). A new look at the statistical identification model. IEEE Transactions on Automated Control, 19, 716–723.

Bandalos

D. L.

(2018). Measurement theory and applications for the social sciences. Guilford Press.

Birenbaum

Tatsuoka

K. K.

(1987). Open-ended versus multiple-choice response formats—It does make a difference for diagnostic purposes. Applied Psychological Measurement, 11, 385–395.

Bozdogan

(1987). Model selection and Akaike’s information criterion (AIC): The general theory and its analytical extensions. Psychometrika, 52(3), 345–370.

Chen

de la Torre

Zhang

(2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123–140.

Cox

E. P.

(1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17, 407–422.

Cui

Gierl

Chang

H.-H.

(2012). Estimating classification consistency and accuracy for cognitive diagnostic assessment. Journal of Educational Measurement, 49, 19–38.

Culpepper

S. A.

(2019). An exploratory diagnostic model for ordinal responses with binary attributes: Identifiability and estimation. Psychometrika, 84, 921–940.

de la Torre

(2010, July). The partial-credit DINA model [Paper presentation]. International Meeting of the Psychometric Society, Athens, GA.

10.

de la Torre

(2011). The generalized DINA model framework. Psychometrika, 76(2), 179–199.

11.

de la Torre

Douglas

J. A.

(2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333–353.

12.

de la Torre

Lee

Y. S.

(2013). Evaluating the Wald test for item-level comparison of saturated and reduced models in cognitive diagnosis. Journal of Educational Measurement, 50, 355–373.

13.

Embretson

S. E.

Reise

S. P.

(2000). Item response theory for psychologists. Lawrence Erlbaum Associates.

14.

Fang

Liu

Ying

(2019). On the identifiability of diagnostic classification models. Psychometrika, 84, 19–40.

15.

Haertel

E. H.

(1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26(4), 301–321.

16.

Hansen

(2013). Hierarchical item response models for cognitive diagnosis [Unpublished doctoral dissertation]. University of California at Los Angeles.

17.

Hartz

S. M.

(2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality [Doctoral dissertation]. University of Illinois at Urbana–Champaign.

18.

Henson

R. A.

Templin

J. L.

Willse

J. T.

(2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191–210.

19.

Jiang

(2018). Integrating differential evolution optimization to cognitive diagnostic model estimation. Frontiers in Psychology, 9, Article 2142.

20.

Johnson

M. S.

Sinharay

(2018). Measures of agreement to assess attribute-level classification accuracy and consistency for cognitive diagnostic assessments. Journal of Educational Measurement, 45(4), 635–664.

21.

Junker

B. W.

Sijtsma

(2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25(3), 258–272.

22.

Kunina-Habenicht

Rupp

A. A.

Wilhelm

(2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49, 59–81.

23.

Lee

Y.-S.

Park

Y. S.

Taylan

(2011). A cognitive diagnostic modeling of attribute mastery in Massachusetts, Minnesota, and the U.S. national sample using the TIMSS 2007. International Journal of Testing, 11, 144–177.

24.

Lei

P. W.

(2016). Performance of fit indices in choosing correct cognitive diagnostic models and Q-matrices. Applied Psychological Measurement, 40(6), 405–417.

25.

Liu

Jiang

(2020). A general diagnostic classification model for rating scales. Behavior Research Methods, 52(1), 422–439.

26.

Liu

Tian

Xin

(2016). An application of M2 statistic to evaluate the fit of cognitive diagnostic models. Journal of Educational and Behavioral Statistics, 41, 3–26.

27.

de la Torre

(2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69(3), 253–275.

28.

de la Torre

(2019a). Category-level model selection for the sequential G-DINA model. Journal of Educational and Behavioral Statistics, 44(1), 45–77.

29.

de la Torre

(2019b). GDINA: The generalized DINA model framework (R package version 2.5). https://CRAN.R-project.org/package=GDINA

30.

Iaconangelo

de la Torre

(2016). Model similarity, model selection, and attribute classification. Applied Psychological Measurement, 40, 200–217.

31.

Maris

(1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187–212.

32.

Nering

M. L.

Ostini

(2010). Handbook of polytomous item response theory models. Taylor & Francis.

33.

Ostini

Nering

M. L.

(2006). Polytomous item response theory models. Sage.

34.

R Core Team. (2018). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

35.

Schwarz

(1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.

36.

Sorrel

M. A.

Abad

F. J.

Olea

de la Torre

Barrada

J. R.

(2017). Inferential item-fit evaluation in cognitive diagnosis modeling. Applied Psychological Measurement, 41(8), 614–631.

37.

Tatsuoka

K. K.

(1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345–354.

38.

Templin

J. L.

Henson

R. A.

(2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11(3), 287–305.

39.

Zheng

Cai

Gao

Wang

(2017). A polytomous model of cognitive diagnostic assessment for graded data. International Journal of Testing, 18, 1–21.

40.

van der Ark

L. A.

(2001). Relationships and properties of polytomous item response theory models. Applied Psychological Measurement, 25(3), 273–282.

41.

von Davier

(2005). A general diagnostic model applied to language testing data. ETS Research Report Series, 2005(2), i–35.

42.

von Davier

(2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307.

43.

Wang

(2013). Mutual information item selection method in cognitive diagnostic computerized adaptive testing with short test length. Educational and Psychological Measurement, 73(6), 1017–1035.

44.

Wang

Shu

Shang

(2015). Assessing item level fit for the DINA model. Applied Psychological Measurement, 39, 525–538.

45.

Wang

Song

Chen

Meng

Ding

(2015). Attribute-level and pattern-level classification consistency and accuracy indices for diagnostic assessment. Journal of Educational Measurement, 52, 457–476.

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$
1	1	1	0	0	0	0	11	1	1	1	0	0	0
1	2	0	1	0	0	0	11	2	0	0	0	0	1
2	1	0	0	1	0	0	12	1	0	1	0	0	0
2	2	0	0	1	1	0	12	2	0	0	0	1	0
3	1	1	0	0	0	1	12	3	0	0	0	0	1
3	2	1	0	0	0	0	13	1	0	0	0	0	1
4	1	0	0	0	0	1	13	2	0	0	0	1	0
4	2	0	0	0	1	1	13	3	0	0	1	0	0
5	1	0	0	1	0	0	14	1	1	0	0	0	0
5	2	0	1	0	1	0	14	2	0	1	0	0	0
6	1	1	1	0	0	0	14	3	0	0	1	0	0
6	2	0	0	1	0	0	15	1	0	0	0	1	0
7	1	0	1	0	0	0	15	2	0	0	0	0	1
7	2	0	1	0	1	0	15	3	1	0	0	0	0
8	1	0	0	0	1	0	16	1	1	0	0	0	0
8	2	1	0	1	0	0	17	1	0	1	0	0	0
9	1	0	0	0	1	1	18	1	0	0	1	0	0
9	2	0	0	1	0	1	19	1	0	0	0	1	0
10	1	0	1	1	0	0	20	1	0	0	0	0	1
10	2	1	0	0	0	0

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	$α_{7}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	$α_{7}$
1	1	1	0	0	0	0	0	0	17	1	0	1	1	0	0	0	0
2	1	0	1	0	0	0	0	0	17	2	1	0	0	0	0	0	0
3	1	0	0	1	0	0	0	0	18	1	1	1	0	0	0	0	0
4	1	0	0	0	1	0	0	0	18	2	0	0	0	0	1	0	0
5	1	0	0	0	0	1	0	0	19	1	1	0	0	0	0	0	0
6	1	0	0	0	0	0	1	0	19	2	0	1	0	0	0	0	0
7	1	0	0	0	0	0	0	1	19	3	0	0	1	0	0	0	1
8	1	1	0	0	0	0	0	0	20	1	0	0	0	0	1	0	0
8	2	0	1	0	0	0	0	0	20	2	0	0	0	0	0	0	1
9	1	0	1	1	0	0	0	0	20	3	0	0	0	1	0	0	1
9	2	0	0	1	1	0	0	0	21	1	0	0	1	0	0	0	0
10	1	1	0	0	0	1	0	0	21	2	0	0	0	1	0	0	0
10	2	1	0	0	1	0	0	1	21	3	0	0	0	0	1	1	0
11	1	0	0	0	0	1	0	0	22	1	0	0	0	1	0	0	0
11	2	1	0	0	0	1	0	0	22	2	0	0	0	0	0	0	1
12	1	0	0	0	0	0	1	0	22	3	0	0	0	0	0	1	1
12	2	0	0	0	0	1	0	1	23	1	0	0	0	0	1	0	0
13	1	0	1	0	0	0	0	0	23	2	0	0	0	0	0	1	0
13	2	0	0	1	0	0	1	0	23	3	0	0	0	0	0	1	1
14	1	0	1	0	0	0	0	0	24	1	1	0	0	0	0	1	1
14	2	0	1	0	1	0	0	0	24	2	0	1	0	0	0	0	0
15	1	0	0	0	1	0	0	0	24	3	0	0	0	0	0	1	0
15	2	1	0	1	0	0	0	0	25	1	0	0	1	0	0	0	0
16	1	0	0	0	1	0	1	0	25	2	0	0	0	0	1	0	0
16	2	0	0	1	0	0	1	0	25	3	0	0	0	0	0	0	1

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{6}$
1	1	0	1	1	1	0	0	18	1	0	0	0	1	0
2	1	0	1	0	0	0	0	19	1	1	1	1	0	0
3	1	1	0	0	0	0	0	19	2	1	0	1	0	0
4	1	0	0	0	1	0	0	19	3	1	0	1	0	0
5	1	0	0	0	0	0	1	20	1	1	0	0	1	0
6	1	1	0	0	0	0	1	20	2	1	0	0	0	0
7	1	0	0	0	1	0	0	21	1	0	0	0	0	1
8	1	0	0	0	1	0	0	21	2	1	0	0	0	0
9	1	1	0	0	1	0	0	22	1	0	0	0	1	0
10	1	1	1	0	0	0	0	22	2	1	0	0	0	0
11	1	1	1	0	0	0	0	22	3	0	0	0	0	1
12	1	0	0	0	1	0	0	23	1	0	0	0	0	1
13	1	0	0	0	0	1	0	24	1	1	0	0	0	0
14	1	0	0	0	0	0	1	24	2	1	0	0	0	0
15	1	1	0	0	0	1	0	25	1	0	0	1	0	0
16	1	0	0	0	0	1	0	25	2	1	0	0	0	0
17	1	0	1	0	0	0	0	25	3	1	0	0	0	0

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$
1	1	1	0	0	0	0	11	1	1	1	0	0	0
1	2	0	1	0	0	0	11	2	0	0	0	0	1
2	1	0	0	1	0	0	12	1	0	1	0	0	0
2	2	0	0	1	1	0	12	2	0	0	0	1	0
3	1	1	0	0	0	1	12	3	0	0	0	0	1
3	2	1	0	0	0	0	13	1	0	0	0	0	1
4	1	0	0	0	0	1	13	2	0	0	0	1	0
4	2	0	0	0	1	1	13	3	0	0	1	0	0
5	1	0	0	1	0	0	14	1	1	0	0	0	0
5	2	0	1	0	1	0	14	2	0	1	0	0	0
6	1	1	1	0	0	0	14	3	0	0	1	0	0
6	2	0	0	1	0	0	15	1	0	0	0	1	0
7	1	0	1	0	0	0	15	2	0	0	0	0	1
7	2	0	1	0	1	0	15	3	1	0	0	0	0
8	1	0	0	0	1	0	16	1	1	0	0	0	0
8	2	1	0	1	0	0	17	1	0	1	0	0	0
9	1	0	0	0	1	1	18	1	0	0	1	0	0
9	2	0	0	1	0	1	19	1	0	0	0	1	0
10	1	0	1	1	0	0	20	1	0	0	0	0	1
10	2	1	0	0	0	0

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	$α_{7}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	$α_{7}$
1	1	1	0	0	0	0	0	0	17	1	0	1	1	0	0	0	0
2	1	0	1	0	0	0	0	0	17	2	1	0	0	0	0	0	0
3	1	0	0	1	0	0	0	0	18	1	1	1	0	0	0	0	0
4	1	0	0	0	1	0	0	0	18	2	0	0	0	0	1	0	0
5	1	0	0	0	0	1	0	0	19	1	1	0	0	0	0	0	0
6	1	0	0	0	0	0	1	0	19	2	0	1	0	0	0	0	0
7	1	0	0	0	0	0	0	1	19	3	0	0	1	0	0	0	1
8	1	1	0	0	0	0	0	0	20	1	0	0	0	0	1	0	0
8	2	0	1	0	0	0	0	0	20	2	0	0	0	0	0	0	1
9	1	0	1	1	0	0	0	0	20	3	0	0	0	1	0	0	1
9	2	0	0	1	1	0	0	0	21	1	0	0	1	0	0	0	0
10	1	1	0	0	0	1	0	0	21	2	0	0	0	1	0	0	0
10	2	1	0	0	1	0	0	1	21	3	0	0	0	0	1	1	0
11	1	0	0	0	0	1	0	0	22	1	0	0	0	1	0	0	0
11	2	1	0	0	0	1	0	0	22	2	0	0	0	0	0	0	1
12	1	0	0	0	0	0	1	0	22	3	0	0	0	0	0	1	1
12	2	0	0	0	0	1	0	1	23	1	0	0	0	0	1	0	0
13	1	0	1	0	0	0	0	0	23	2	0	0	0	0	0	1	0
13	2	0	0	1	0	0	1	0	23	3	0	0	0	0	0	1	1
14	1	0	1	0	0	0	0	0	24	1	1	0	0	0	0	1	1
14	2	0	1	0	1	0	0	0	24	2	0	1	0	0	0	0	0
15	1	0	0	0	1	0	0	0	24	3	0	0	0	0	0	1	0
15	2	1	0	1	0	0	0	0	25	1	0	0	1	0	0	0	0
16	1	0	0	0	1	0	1	0	25	2	0	0	0	0	1	0	0
16	2	0	0	1	0	0	1	0	25	3	0	0	0	0	0	0	1

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{6}$
1	1	0	1	1	1	0	0	18	1	0	0	0	1	0
2	1	0	1	0	0	0	0	19	1	1	1	1	0	0
3	1	1	0	0	0	0	0	19	2	1	0	1	0	0
4	1	0	0	0	1	0	0	19	3	1	0	1	0	0
5	1	0	0	0	0	0	1	20	1	1	0	0	1	0
6	1	1	0	0	0	0	1	20	2	1	0	0	0	0
7	1	0	0	0	1	0	0	21	1	0	0	0	0	1
8	1	0	0	0	1	0	0	21	2	1	0	0	0	0
9	1	1	0	0	1	0	0	22	1	0	0	0	1	0
10	1	1	1	0	0	0	0	22	2	1	0	0	0	0
11	1	1	1	0	0	0	0	22	3	0	0	0	0	1
12	1	0	0	0	1	0	0	23	1	0	0	0	0	1
13	1	0	0	0	0	1	0	24	1	1	0	0	0	0
14	1	0	0	0	0	0	1	24	2	1	0	0	0	0
15	1	1	0	0	0	1	0	25	1	0	0	1	0	0
16	1	0	0	0	0	1	0	25	2	1	0	0	0	0
17	1	0	1	0	0	0	0	25	3	1	0	0	0	0

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$
1	1	1	0	0	0	0	11	1	1	1	0	0	0
1	2	0	1	0	0	0	11	2	0	0	0	0	1
2	1	0	0	1	0	0	12	1	0	1	0	0	0
2	2	0	0	1	1	0	12	2	0	0	0	1	0
3	1	1	0	0	0	1	12	3	0	0	0	0	1
3	2	1	0	0	0	0	13	1	0	0	0	0	1
4	1	0	0	0	0	1	13	2	0	0	0	1	0
4	2	0	0	0	1	1	13	3	0	0	1	0	0
5	1	0	0	1	0	0	14	1	1	0	0	0	0
5	2	0	1	0	1	0	14	2	0	1	0	0	0
6	1	1	1	0	0	0	14	3	0	0	1	0	0
6	2	0	0	1	0	0	15	1	0	0	0	1	0
7	1	0	1	0	0	0	15	2	0	0	0	0	1
7	2	0	1	0	1	0	15	3	1	0	0	0	0
8	1	0	0	0	1	0	16	1	1	0	0	0	0
8	2	1	0	1	0	0	17	1	0	1	0	0	0
9	1	0	0	0	1	1	18	1	0	0	1	0	0
9	2	0	0	1	0	1	19	1	0	0	0	1	0
10	1	0	1	1	0	0	20	1	0	0	0	0	1
10	2	1	0	0	0	0

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	$α_{7}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	$α_{7}$
1	1	1	0	0	0	0	0	0	17	1	0	1	1	0	0	0	0
2	1	0	1	0	0	0	0	0	17	2	1	0	0	0	0	0	0
3	1	0	0	1	0	0	0	0	18	1	1	1	0	0	0	0	0
4	1	0	0	0	1	0	0	0	18	2	0	0	0	0	1	0	0
5	1	0	0	0	0	1	0	0	19	1	1	0	0	0	0	0	0
6	1	0	0	0	0	0	1	0	19	2	0	1	0	0	0	0	0
7	1	0	0	0	0	0	0	1	19	3	0	0	1	0	0	0	1
8	1	1	0	0	0	0	0	0	20	1	0	0	0	0	1	0	0
8	2	0	1	0	0	0	0	0	20	2	0	0	0	0	0	0	1
9	1	0	1	1	0	0	0	0	20	3	0	0	0	1	0	0	1
9	2	0	0	1	1	0	0	0	21	1	0	0	1	0	0	0	0
10	1	1	0	0	0	1	0	0	21	2	0	0	0	1	0	0	0
10	2	1	0	0	1	0	0	1	21	3	0	0	0	0	1	1	0
11	1	0	0	0	0	1	0	0	22	1	0	0	0	1	0	0	0
11	2	1	0	0	0	1	0	0	22	2	0	0	0	0	0	0	1
12	1	0	0	0	0	0	1	0	22	3	0	0	0	0	0	1	1
12	2	0	0	0	0	1	0	1	23	1	0	0	0	0	1	0	0
13	1	0	1	0	0	0	0	0	23	2	0	0	0	0	0	1	0
13	2	0	0	1	0	0	1	0	23	3	0	0	0	0	0	1	1
14	1	0	1	0	0	0	0	0	24	1	1	0	0	0	0	1	1
14	2	0	1	0	1	0	0	0	24	2	0	1	0	0	0	0	0
15	1	0	0	0	1	0	0	0	24	3	0	0	0	0	0	1	0
15	2	1	0	1	0	0	0	0	25	1	0	0	1	0	0	0	0
16	1	0	0	0	1	0	1	0	25	2	0	0	0	0	1	0	0
16	2	0	0	1	0	0	1	0	25	3	0	0	0	0	0	0	1

Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	$α_{6}$	Item	Category	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{6}$
1	1	0	1	1	1	0	0	18	1	0	0	0	1	0
2	1	0	1	0	0	0	0	19	1	1	1	1	0	0
3	1	1	0	0	0	0	0	19	2	1	0	1	0	0
4	1	0	0	0	1	0	0	19	3	1	0	1	0	0
5	1	0	0	0	0	0	1	20	1	1	0	0	1	0
6	1	1	0	0	0	0	1	20	2	1	0	0	0	0
7	1	0	0	0	1	0	0	21	1	0	0	0	0	1
8	1	0	0	0	1	0	0	21	2	1	0	0	0	0
9	1	1	0	0	1	0	0	22	1	0	0	0	1	0
10	1	1	1	0	0	0	0	22	2	1	0	0	0	0
11	1	1	1	0	0	0	0	22	3	0	0	0	0	1
12	1	0	0	0	1	0	0	23	1	0	0	0	0	1
13	1	0	0	0	0	1	0	24	1	1	0	0	0	0
14	1	0	0	0	0	0	1	24	2	1	0	0	0	0
15	1	1	0	0	0	1	0	25	1	0	0	1	0	0
16	1	0	0	0	0	1	0	25	2	1	0	0	0	0
17	1	0	1	0	0	0	0	25	3	1	0	0	0	0