Nonparametric Classification Method for Multiple-Choice Items in Cognitive Diagnosis

Abstract

The multiple-choice (MC) item format has been widely used in educational assessments across diverse content domains. MC items purportedly allow for collecting richer diagnostic information. The effectiveness and economy of administering MC items may have further contributed to their popularity not just in educational assessment. The MC item format has also been adapted to the cognitive diagnosis (CD) framework. Early approaches simply dichotomized the responses and analyzed them with a CD model for binary responses. Obviously, this strategy cannot exploit the additional diagnostic information provided by MC items. De la Torre’s MC Deterministic Inputs, Noisy “And” Gate (MC-DINA) model was the first for the explicit analysis of items having MC response format. However, as a drawback, the attribute vectors of the distractors are restricted to be nested within the key and each other. The method presented in this article for the CD of DINA items having MC response format does not require such constraints. Another contribution of the proposed method concerns its implementation using a nonparametric classification algorithm, which predestines it for use especially in small-sample settings like classrooms, where CD is most needed for monitoring instruction and student learning. In contrast, default parametric CD estimation routines that rely on EM- or MCMC-based algorithms cannot guarantee stable and reliable estimates—despite their effectiveness and efficiency when samples are large—due to computational feasibility issues caused by insufficient sample sizes. Results of simulation studies and a real-world application are also reported.

Keywords

cognitive diagnosis multiple-choice DINA model MC-DINA model general CDM G-DINA model nonparametric cognitive diagnosis

Despite mixed reviews of its specific merits (Wood, 2003), the multiple-choice (MC) item format has been widely used in educational assessments across diverse content domains. A typical MC item consists of the stem presenting the narrative and context that motivate the particular question the examinee is required to answer and a set of response options, which constitutes the “MC component” of the item. The correct answer, called the “key,” is embedded among several alternatives, the “distractors,” that differ in their degree of “(in)correctness.” Compared with “traditional” items having a dichotomous response format, MC items purportedly allow for collecting potentially richer diagnostic information. In addition, different from essays and other constructed-response tasks that are praised for their richer diagnostic information, MC items require examinees to spend less time on recording their answers and are less susceptible to subjective scoring. More succinctly, the effectiveness and economy in administering MC items may account for their popularity not just in educational assessment.

The MC item format was also adapted to the cognitive diagnosis (CD) framework. CD seeks to provide a detailed evaluation of a student’s mastery of the instructional content in terms of skills learned and skills needing study. Early, less sophisticated approaches to analyzing MC items within the CD framework simply dichotomized the responses in scoring the key as 1, the distractors as 0, and applied one of the CD models available for binary responses. The disadvantage of such a strategy is obvious: The diagnostic potential of the MC response format is not exploited, and the additional information that can be gleaned from the distractors is ignored. The first CD model for analyzing MC items in explicitly considering the distractors was de la Torre’s (2009) MC Deterministic Inputs, Noisy “And” Gate (MC-DINA) model. Other CD models—henceforth, diagnostic classification models (DCMs)—for analyzing MC items were proposed by Ozaki (2015) and DiBello et al. (2015). The focus of de la Torre (2009) and Ozaki (2015) is on exploring how distractors can be used to improve examinee classification, whereas the generalized DCMs for MC option-based scoring (GDCM-MC) by DiBello et al. (2015) seek to develop a system for identifying students’ misconceptions of the specific tasks posed by test items.

The DCMs developed by de la Torre (2009), Ozaki (2015), and DiBello et al. (2015) all rely on EM- or MCMC-based algorithms for the estimation of the model parameters. These approaches are effective and efficient if large samples are available. However, for small-sample settings like classrooms, where these DCMs are presumably needed most, parametric estimation methods in general often encounter computational feasibility issues and cannot guarantee stable and reliable estimates (e.g., Paulsen, 2019). In response to these difficulties, nonparametric methods for the diagnostic classification of examinees have been developed (Chang et al., 2019; Chiu & Chang, 2021; Chiu & Köhn, 2019; Chiu et al., 2018).

In this article, a nonparametric diagnostic classification algorithm for MC items called MC-nonparametric classification (MC-NPC) method is presented that has been developed for the explicit use with small samples. The next section briefly reviews essential CD concepts. Technical details of the MC-NPC algorithm are described in the third section, followed by a report on the results of simulation studies for evaluating the performance of the proposed MC-NPC algorithm including a comparison with de la Torre’s EM-based MC-DINA algorithm (implemented in the R package GDINA). The fifth section presents the side-by-side application of MC-NPC and EM-MC-DINA to a real-world data set. The Discussion section concludes with a summary of the key insights and a discussion of some limitations and future research avenues.

Review: Key Technical Concepts of CD

Assume competence—or ability—in an instructional content area is characterized by a set of skills. The explicit purpose of CD is the assessment of mastery of these skills to provide immediate feedback to students on their strengths and weaknesses in terms of skills learned and skills needing study. CD-based tests consist of items that require a correct response mastery of different skills. From the item responses, students’ skill mastery can be inferred. In CD idiom, skills, specific knowledge, and aptitudes—any (cognitive) competence that is required to perform tasks—are collectively referred to as “attributes.” In formal terms, ability in a given knowledge domain is characterized as a composite of K attributes that an examinee may have mastered or failed, which is documented in a 0–1 profile having K entries denoted as $α = (α_{1}, α_{2}, \dots, α_{k}, \dots, α_{K})^{'}$ (DiBello et al., 2007; Haberman & von Davier, 2007; Leighton & Gierl, 2007; Nichols et al., 1995; Rupp et al., 2010; Sessoms & Henson, 2018; Tatsuoka, 2009). Distinct binary vectors define attribute profiles of different classes of proficiency denoted by $C_{m}$ . (The terms profile and vector are used interchangeably here.) If attributes are not hierarchically organized, then the index m runs from 1 to $M {= 2}^{K}$ . In reiterating, the primary goal of CD modeling is to assign examinees to one of these M classes based on their performance in a test that targets proficiency in the knowledge domain in question. Said differently, examinees’ individual attribute profiles $α_{i}$ must be estimated ( $i = 1, 2, \dots, N$ is the examinee index).

Like examinees, so are the items of cognitively diagnostic tests characterized by individual profiles that determine which specific attributes are required for a correct response. The item-attribute profiles are K-dimensional binary vectors $q_{j}$ , $j = 1, 2, \dots, J$ , where elements $q_{j k} = 1$ if a correct answer requires mastery of the $k th$ attribute $α_{k}$ , and 0 otherwise. Notice that, given K binary attributes, at most $2^{K} - 1$ distinct item-attribute profiles can be constructed; item-attribute profiles comprising all zeroes are not legitimate.

The J item attribute profiles of a cognitively diagnostic assessment form the rows of its $J \times K$ Q-matrix, $Q = {q_{j k}}_{(J \times K)}$ (Tatsuoka, 1985). As a key prerequisite to CD, the Q-matrix of a test must be known and it must be complete. A Q-matrix is said to be complete if its specific composition can guarantee the identifiability of all realizable proficiency classes among examinees (Chiu et al., 2009; Köhn & Chiu, 2016, 2017, 2019, 2021).

The DINA and MC-DINA Models

DINA model

De la Torre’s (2009) MC-DINA model is an extension of the DINA model (Haertel, 1989; Junker & Sijtsma, 2001; Macready & Dayton, 1977). The item response function (IRF) of the DINA model is

P (Y_{i j} = 1 | α_{i}) = (1 - s_{j})^{η_{i j}} g_{j}^{(1 - η_{i j})},

where $Y_{i j} = 1$ denotes a correct response to item j (otherwise, $Y_{i j} = 0$ ), and s_j and g_j are the slipping and guessing parameters, respectively, for item j, subject to $0 \leq g_{j} < 1 - s_{j} \leq 1$ . The ideal response (or conjunction parameter) $η_{i j} = \prod_{k = 1}^{K} α_{i k}^{q_{j k}} \in {0, 1}$ indicates whether examinee i has mastered all attributes required by item j. The DINA is a conjunctive model, as the probability of a correct response is maximal only if an examinee has mastered all attributes required for a given item. Hence, each DINA item generates a bipartition of the $M {= 2}^{K}$ proficiency classes of the latent attribute space into groups of examinees, who have mastered the attributes required for said item as opposed to those who have not.

De la Torre’s MC-DINA model (2009)

Before the advent of de la Torre’s (2009) MC-DINA model, according to common practice, the polytomous responses of MC-items were dichotomized, so that a student who had picked the key received a score of one, whereas the choice of a distractor was scored as zero. Such a recoding scheme neglects the potential value of the diagnostic information from the distractors (e.g., partial knowledge as indicated by mastery of only a subset of the required attributes; students’ misconceptions of the task required by an item; see Haertel & Wiley, 1993; Nitko, 2001; Sadler, 1998). Thus, the motivation for the MC-DINA model (de la Torre, 2009) was to exploit the unused diagnostic information from the distractors to improve the classification of examinees—but how? Recall that each DINA item results in a bipartition of the latent attribute space. In contrast, an MC item partitions the latent attribute space into as many proficiency classes as the number of coded response options plus one, thereby supposedly increasing the accuracy of examinee classification. The review of de la Torre’s (2009) MC-DINA model follows in notation and terminology as closely as possible his original presentation in Applied Psychological Measurement.

The polytomous response to an MC item is denoted by the random variable $X_{i j}$ taking on values $h = 1, 2 \dots, H_{j}$ , with the total number of response options equal to H_j . MC-DINA jargon distinguishes between “noncoded” and “coded” or “cognitively-based” response options. The total number of coded response options is $H_{j}^{*}$ ; as not all MC-DINA response options are coded, typically, $H_{j}^{*} \leq H_{j}$ . Coded response options are linked to an item attribute vector $q_{j h}$ that specifies the skills an examinee who endorses this option must have mastered. In contrast, “noncoded” response options like “none of these” or “all of the above” are not associated with a particular attribute vector. By convention, their item attribute vectors are written as a K-dimensional zero vector, $q_{j 0} = (0, 0, \dots {,0)}^{'}$ .

The relation between the different response options is governed by the following rules: The key always has the largest number of attributes; the attribute vectors of the coded response options must be nested within the q-vector of the key. As a further constraint, the attribute vectors of coded response options must be hierarchically nested within each other; said differently, they establish what Leighton et al. (2004) called a “linear hierarchy.” Hence, all response options form an ordinal scale with the key at the top, followed by the coded distractors, and the noncoded options last.

The ideal response of examinee i to MC item j is defined as

g_{i j} = \underset{h^{'} = 0, 1, 2, \dots, H_{j}}{arg
max} {α_{i}^{'} q_{j h^{'}} | α_{i}^{'} q_{j h^{'}} = q_{j h^{'}}^{'} q_{j h^{'}}},

(the notation $g_{i j}$ was introduced by de la Torre [2009], who also coined the term “latent group classification” for the ideal response). If an examinee has not mastered the attributes required for any coded option, then her ideal response $g_{i j}$ is zero, and she is assumed to choose randomly among the H_j response options. In contrast, an examinee who has mastered all required attributes of at least one coded option has ideal response $g_{i j} = 1, 2, \dots, H_{j}$ . Finally, the set $G_{j}$ is defined by de la Torre (2009) as the set formed by the union of ${0}$ and a subset of ${1, 2, \dots, H_{j}}$ .

Then, the probability that examinee i having ideal response $g_{i j} = g \in G_{j}$ chooses option h of item j is written as

P (X_{i j} = h | g_{i j} = g) = P_{j} (h | g) .

As illustration of these concepts, consider the following example presented in de la Torre (2009, p. 166) that concerns a test problem molded after Tatsuoka’s (1990; see also C. Tatsuoka, 2002) famous fraction-substraction items:

		$Required Attributes$
$Item:$	$2 \frac{4}{12} - \frac{7}{12}$ = ?	$α_{1}$	$α_{2}$	$α_{3}$
$Response Option 1$	$Distractor A .$ = $2 \frac{3}{12}$		$α_{2}$
$Response Option 2$	$Distractor B .$ = $2 \frac{1}{4}$		$α_{2}$	$α_{3}$
$Response Option 3$	$Distractor C .$ = $1 \frac{9}{12}$	$α_{1}$	$α_{2}$
$Response Option 4$	$Key$ = $1 \frac{3}{4}$	$α_{1}$	$α_{2}$	$α_{3}$

The attributes are defined as

$α_{1}$ Borrow one from whole number to fraction,

$α_{2}$ Basic fraction subtraction,

$α_{3}$ Reduce/simplify.

In conclusion, as an important detail, one should notice that despite the MC response format, the MC-DINA model is still a conjunctive model—that is, the choice of any response category is assumed to be guided by the match of attributes required for a particular response option and those mastered by an examinee. More to the point, in case of the MC-DINA model, the more attributes an examinee has mastered, the more likely she is to move up the hierarchy of response options as they match her attribute profile. Thus, the MC response format should not be confused with the concept of incremental probability of a correct answer as the result of mastering more items as it underlies general DCMs.

The NPC Method

The algorithm of the NPC method (Chiu & Douglas, 2013) is adapted to form the basis of the MC-NPC method proposed in this article. The key idea underlying the NPC method is to estimate an examinee’s proficiency class, the attribute vector $α_{i} = α_{m}$ , by comparing her observed item response vector with each of the ideal response vectors of the $M {= 2}^{K}$ realizable proficiency classes. The NPC estimator ${\hat{α}}_{i}$ of examinee i is defined as the attribute profile underlying that ideal response vector, which among all ideal response vectors, minimizes the Hamming distance to the manifest item response vector $Y_{i}$ . For $C_{m}$ , the ideal response vector is $η^{(m)} = (η_{1}^{(m)}, η_{2}^{(m)}, \dots, η_{J}^{(m)})$ ; the Hamming distance is defined as

d_{h} (Y_{i}, η^{(m)}) = \sum_{j = 1}^{J} | Y_{i j} - η_{j}^{(m)} | .

Depending on the specific data structure variations like the weighted Hamming distance may be more suitable (for further details, consult Chiu & Douglas, 2013). The estimated attribute profile of examinee i is identified by minimizing the distance $d_{h} (Y_{i}, η^{(m)})$ across all ideal response profiles $η^{(1)}, η^{(2)}, \dots, η^{(M)}$ (recall $M {= 2}^{K}$ ) and observed response profile $Y_{i} = y_{i}$ :

{\hat{α}}_{i} = \underset{α_{m} \in {α_{1}, α_{2}, \dots, α_{M}}}{arg
min} d (y_{i}, η^{(m)}) = \underset{α_{m} \in {α_{1}, α_{2}, \dots, α_{M}}}{arg
min} \sum_{j = 1}^{J} | y_{i j} - η_{j}^{(m)} | .

Given $P (Y_{j} = 1 | η_{j} = 0) < 0.5$ and $P (Y_{j} = 1 | η_{j} = 1) > 0.5$ for item j (Wang & Douglas, 2015) and completeness of the Q-matrix (Chiu et al., 2009; Köhn & Chiu, 2017), the estimator ${\hat{α}}_{i}$ obtained by the NPC method is statistically consistent. Simulation studies (Chiu & Douglas, 20213; Chiu et al., 2018) showed that examinee classification by the NPC method is more accurate than that by parametric methods when samples are small. This remarkable feature also inspired the development of advanced applications of CD tailored to small-scale educational settings such as CD-CAT for the classroom (Chang et al., 2019).

The MC-NPC Method

The development of the MC-NPC method was inspired by de la Torre’s (2009) MC-DINA model. Several features, however, distinguish the MC-NPC method and warrant a few remarks before presenting its technical details. First, different from MC-DINA, the q-vectors of the coded distractors do not need to be nested within each other and the q-vector of the key, but with these constraints removed, the linear ordering of the coded response options is lost. Consequently, for the MC-NPC method, the scale level of the response options is changed to nominal, which in turn calls for modifications of the ideal response as originally defined in Equation 1 for the MC-DINA model. Second, de la Torre (2009) does not provide the IRF of the MC-DINA model; however, the IRF underlying the MC-NPC method must be defined; otherwise, a comprehensive evaluation of its merits—theoretical and practical—is not possible.

The MC Setting Modified for the MC-NPC Method

To facilitate the implementation of these conceptual modifications and clarify the notation, in the MC-NPC method, the original MC-DINA response option index, $h = 1, 2, \dots, H_{j}$ , is replaced by the index $l = 0, 1, 2, \dots, H_{j}^{*}$ . As a convention, all noncoded response options are assigned $l = 0$ ; the key is indexed as $l = H_{j}^{*}$ . The indices $l = 1, 2, \dots, H_{j}^{*} - 1$ are reserved for the remaining coded response options. The general notation for a q-vector having index l is $q_{j}^{(l)} = (q_{j 1}^{(l)}, q_{j 2}^{(l)}, \dots, q_{j K}^{(l)})$ . Thus, the q-vector of the key is denoted as $q_{j}^{(H_{j}^{*})}$ , whereas $q_{j}^{(0)}$ refers to noncoded options. But how should the indices “in-between” ranging from 1 to $H^{*} - 1$ be assigned to the q-vectors of the remaining coded responses? Recall that these response options are perceived as a nominal scale; hence, the associated indices serve merely as labels. Still, to avoid ambiguity and arbitrariness, a rationale is needed for assigning indices to q-vectors. Let $q$ and $q^{'}$ denote two different q-vectors of the remaining coded responses. The first rule determines that if

(1) ∥ q_{j}^{(l)} ∥_{1} > ∥ q_{j}^{(l^{'})} ∥_{1} then l > l^{'} .

The second rule concerns the case, where $q$ and $q^{'}$ are of the same length. The response options are indexed based on their evaluation in lexicographic order—that is, $l > l^{'}$ if the position of the first nonzero entry in $q$ precedes that in $q^{'}$ or vice versa. Ties—both q-vectors share the position of the first nonzero entry—are ignored and the evaluation is based on the first position with distinct entries; such a position can always be identified because all coded response q-vectors must be distinct. Formally, define the set $L (q, q^{'}) = {k | q_{k} > q_{k}^{'}, k = 1, 2, \dots, K}$ . Notice that $L (q, q^{'}) \neq L (q^{'}, q)$ due to the evaluation of q_k and $q_{k}^{'}$ in lexicographic order. The second rule then determines that if

(2) (| | q_{j}^{(l)} {| |}_{1} = | | q_{j}^{(l^{'})} {| |}_{1}) \land (min L (q_{j}^{(l)}, q_{j}^{(l^{'})}) < min L (q_{j}^{(l^{'})}, q_{j}^{(l)})) then l > l^{'} .

Here is an example using some arbitrary item j with five response options (i.e.,

H_{j} = 5

). Response Options 3 and 5 are noncoded; hence, their q-vectors receive indices

l = 0

and are denoted as

q_{j}^{(0)}

. Options 1 and 4 are (coded) distractors having q-vectors (110) and (101), respectively. Response Option 2 with q-vector (011) is the key. There are two coded response options in addition to the key; hence,

H_{j}^{*} = 3

. The attribute vector of the key is indexed as

q_{j}^{(H_{j}^{*})} = q_{j}^{(3)} = (011)

. But how to assign the two “nonused” indices

l = 1, 2

to the two coded q-vectors? Comparing (110) and (101) leads to

L ((110), (101)) = {2}

due to

q_{j 2} > q_{j 2}^{'}

; then,

L ((101), (110)) = {3}

, as

q_{j 3}^{'} > q_{j 3}

). Because

min L ((110), (101)) < min L ((101), (110))

, the q-vectors are indexed as

(101) = q_{j}^{(1)}

and

(110) = q_{j}^{(2)}

. Table 1 summarizes these steps.

Table 1.

Example—Determining the Levels l of the Five Response Options of Item j

Option h	q-Vector	Level l of $X_{i j}$	Comments
3, 5	$(000)$	0	Noncoded response options
2	$(011) = q_{j}^{(H_{j}^{*})}$	3	Response Option 2 is the key; thus, $H_{j}^{*} = 3 = l$ , as there are two additional coded response options
1	$(110) = q$	?	$q_{j 2} > q_{j 2}^{'}$ ; hence, $L ((110), (101)) = {2}$
4	$(101) = q^{'}$	?	$q_{j 3}^{'} > q_{j 3}$ ; hence, $L ((101), (110)) = {3}$
			Therefore,
1	$(110) = q$	2	due to $min L ((110), (101)) < min L ((101), (110))$
4	$(101) = q^{'}$	1	$\Rightarrow 2 < 3$

After the indices l of the item response options have been determined, the ideal response $η_{i j}$ of examinee i to item j can be computed

η_{i j} = max_{l = 0, 1, 2, \dots, H_{j}^{*}} {l \prod_{k = 1}^{K} I [α_{i k} \geq q_{j k}^{(l)}]} .

Notice that Equation 3 is equivalent to Equation 1 when all the coded options are nested within each other. Here is an illustration how to use Equation 3. Suppose examinee i has attribute profile $α_{i} = (110)$ . Recall that Response Options 3 and 5 are noncoded; hence, their q-vectors receive indices $l = 0$ and are denoted as $q_{j}^{(0)}$ . Options 1 and 4 are (coded) distractors having q-vectors $(101) = q_{j}^{(1)}$ and $(110) = q_{j}^{(2)}$ , respectively. Response Option 2 with q-vector (011) is the key with attribute vector $q_{j}^{(H_{j}^{*})} = q_{j}^{(3)} = (011)$ . Thus, in using Equation 3, the ideal response of examinee i is computed as

η_{i j} = max_{l \in {0, 1, 2, 3}} {l \prod_{k = 1}^{5} I [α_{i k} \geq q_{j k}^{(l)}]} = max {0, 0, 2, 0} = 2.

Given the different instances of $η_{i j}$ for each of $α_{m}$ , the IRF of the variant of the MC-DINA model underlying the MC-NPC method can be explicitly formulated. A few conceptual clarifications are necessary. First, recall that in de la Torre’s original concept of the MC-DINA model, an examinee with $η_{i j} = 0$ is not disposed to choose a particular response option (in de la Torre’s [2009] terminology, the examinee is not “attracted” to any of the coded response options). Instead, said examinee is supposed to choose one of the response options at random; this assumption is also maintained here. Second, if for an examinee $(η_{i j} = l) \land (l \neq 0)$ is true, then she is supposed to choose the coded response option $X_{i j} = l$ with a high probability; however, with nonzero probability, she may also pick one of the other response options. Third, because manifest and ideal item responses, X and $η$ , now have more than two levels, addressing potential discrepancies between X and $η$ as “slips” and “guesses” does not adequately account for the increased complexity of the MC setting with multiple item parameters. Instead, whenever observed and ideal responses disagree, the more general term “perturbation” should be preferred, which calls also for adjusting the notation. Define $ε_{j l l^{'}}$ as the probability that the observed response level l disagrees with the ideal response level $l^{'}$ . The explicit form of the IRF of the MC-DINA model is then

P (X_{i j} = l | α_{i}) = {\begin{matrix} \frac{1}{H_{j}} + \frac{H_{j} - H_{j}^{*} - 1}{H_{j}} I [l = 0] {if η}_{i j} = 0 \\ {(1 - \sum_{m \neq l} ε_{j m l})}^{I [η_{i j} = l, l > 0]} \prod_{l^{'} \neq l} ε_{j l l^{'}}^{I [η_{i j} = l^{'}]} {if η}_{i j} > 0. \end{matrix}

Notice that in case of the DINA model, the two perturbations, slipping s_j and guessing g_j , are constrained to be less than 0.5 (otherwise, an individual mastering none of the attributes would have a probability greater than 0.5 to provide the correct answer). Of course, if there are more than two perturbation terms, then the desirable property is that $\sum_{m \neq l} ε_{j m l} < 0.5$ . As an illustration, consider again the example used earlier. Table 2 reports the item parameters $ε_{j l l^{'}}$ of the MC-DINA model.

Table 2.

Example: Item Perturbation Parameters $ε$

Option h	Level l of $X_{i j}$	Level $l^{'}$ of $η_{i j}$
Option h	Level l of $X_{i j}$	0	1	2	3
3, 5	0	2/5	$ε_{j 01}$	$ε_{j 02}$	$ε_{j 03}$
4	1	1/5	$1 - ε_{j 01} - ε_{j 21} - ε_{j 31}$	$ε_{j 12}$	$ε_{j 13}$
1	2	1/5	$ε_{j 21}$	$1 - ε_{j 02} - ε_{j 12} - ε_{j 32}$	$ε_{j 23}$
2	3	1/5	$ε_{j 31}$	$ε_{j 32}$	$1 - ε_{j 03} - ε_{j 13} - ε_{j 23}$

Recall Response Options 3 and 5 are noncoded; hence, their q-vectors receive indices $l = 0$ and are denoted as $q_{j}^{(0)}$ . Options 1 and 4 are (coded) distractors having q-vectors $(101) = q_{j}^{(1)}$ and $(110) = q_{j}^{(2)}$ , respectively. Response Option 2 with q-vector (011) is the key with attribute vector $q_{j}^{(H_{j}^{*})} = q_{j}^{(3)} = (011)$ . Hence, given $α_{i} = (110)$ and $η_{i j} = 2$ , Equation 4 of the IRF of the MC-DINA model provides the following probabilities for the response levels of $X_{i j}$ :

P (X_{i j} = 0 | α_{i}) = ε_{j 01}^{I [η_{i j} = 1]} ε_{j 02}^{I [η_{i j} = 2]} ε_{j 03}^{I [η_{i j} = 3]},

= ε_{j 02},

P (X_{i j} = 1 | α_{i}) = (1 - ε_{j 01} - ε_{j 21} - ε_{j 31})^{I [η_{i j} = 1]} ε_{j 12}^{I [η_{i j} = 2]} ε_{j 13}^{I [η_{i j} = 3]},

= ε_{j 12},

P (X_{i j} = 2 | α_{i}) = ε_{j 21}^{I [η_{i j} = 1]} {(1 - ε_{j 02} - ε_{j 12} - ε_{j 32})}^{I [η_{i j} = 2]} ε_{j 23}^{I [η_{i j} = 3]},

= 1 - ε_{j 02} - ε_{j 12} - ε_{j 32},

P (X_{i j} = 3 | α_{i}) = ε_{j 31}^{I [η_{i j} = 1]} ε_{j 32}^{I [η_{i j} = 2]} {(1 - ε_{j 03} - ε_{j 13} - ε_{j 23})}^{I [η_{i j} = 3]},

= ε_{j 32} .

Inconsistencies of the MC Setting and Their Remedy

Recall that for the MC-DINA model as presented here, the response options $h = 1, 2, \dots, H_{j}$ are mapped into the response levels $l = 0, 1, \dots, H_{j}^{*}$ . If $η > 0$ , then the probability of a response at level l is large provided l and $l^{'}$ agree, and it is small when l and $l^{'}$ disagree. For all proficiency classes having $η > 0$ , the application of the MC-NPC method is a straightforward extension of the NPC method. However, if $η = 0$ , then the MC setting could lead to the paradox that the probability $P (X_{i j} = 0 | η_{i j} = 0)$ is less than $P (X_{i j} > 0 | η_{i j} = 0)$ . Verbally stated, an examinee with $η_{j} = 0$ is more likely to choose $X_{i j} > 0$ than $X_{i j} = 0$ , which is a profound violation of a key assumption of the DINA model: Examinees who fail mastery of the attributes required for an item (i.e., $η_{j} = 0$ ) should choose response $X_{i j} = 0$ with maximal and a (partially) correct response option with minimal probability. As an illustration of such a violation, consider again the earlier example presented in Table 2. Clearly, $P (X_{i j} = 0 | η_{i j} = 0) = 2 / 5$ , whereas $P (X_{i j} > 0 | η_{i j} = 0) = \sum_{l = 1}^{H_{j}^{*}} P (X_{i j} = l | η_{i j} = 0) = 3 / 5$ . For the MC-DINA model, this constellation is not a rare anomaly but occurs whenever the ideal response is zero and the ratio of coded to the total of response options exceeds 0.5 (i.e., $H_{j}^{*} / H_{j} > 0.5$ ). How does this affect the NPC method? Recall that the NPC method estimates an examinee’s proficiency class in relying on the shortest distance between observed and ideal responses. If a response $X_{i j} > 0$ , discrepant from an examinee’s true ideal response $η_{i j} = 0$ , is more likely to occur than the concordant response $X_{i j} = 0$ , then the NPC method may result in the misclassification of an examinee because the more likely, but discrepant response $X_{i j} > 0$ points at an incorrect proficiency class.

As a remedy to this issue, the distances used by the NPC method are adjusted to reduce the risk of potential misclassifications in case the ideal response $η_{i j} = 0$ . A few more technical explanations seem warranted. First, notice that the Hamming distance between observed and ideal response as defined in Equation 2 is not suitable for the MC item format. However, as the Hamming distance concerns only binary entities, Equation 2 can be readily adapted to the MC item format in using the indicator function:

d_{h} (X_{i}, η^{(m)}) = \sum_{j = 1}^{J} I [η_{j}^{(m)} \neq X_{i j}],

which can be written as

d_{h} (X_{i}, η^{(m)}) = \sum_{j = 1}^{J} I [η_{j}^{(m)} > 0, X_{i j} \neq η_{j}^{(m)}] + \sum_{j = 1}^{J} I [η_{j}^{(m)} = 0, X_{i j} \neq η_{j}^{(m)}],

in distinguishing explicitly between $η_{j}^{(m)} > 0$ and $η_{j}^{(m)} = 0$ . In the latter case, if $X_{i j} \neq η_{j}^{(m)}$ , then an item-specific penalty $w_{j} \in [0, 1]$ is applied to shrink the value of the associated integer function resulting in the penalized Hamming distance

d_{p} (X_{i}, η^{(m)}) = \sum_{j = 1}^{J} I [η_{j}^{(m)} > 0, X_{i j} \neq η_{j}^{(m)}] + \sum_{j = 1}^{J} w_{j} I [η_{j}^{(m)} = 0, X_{i j} \neq η_{j}^{(m)}],

that reduces the impact of those instances possibly pointing at an incorrect $η_{j}^{(m)}$ . But how to determine w_j ?

Recall that the potential misclassification by the NPC method is related to the critical incident $P (X_{i j} = 0 | η_{i j} = 0) < P (X_{i j} > 0 | η_{i j} = 0)$ . The difference between these two probabilities—in casual terms, the “severity” of the incident—can be shown to be a direct function of the number of coded response options of an item, $H_{j}^{*}$ : The larger $H_{j}^{*}$ , the more likely an undesirable impact on $d_{p} (X_{i}, η^{(m)})$ . Specifically, the following scenarios can be distinguished. First, if there are no coded distractors—that is, $H_{j}^{*} = 1$ —then the MC-NPC method reduces to the familiar NPC method for binary items, as can be illustrated using the earlier example, where the $4 \times 4$ Table 2 is collapsed into the $2 \times 2$ Table 3 corresponding to a binary setting.

Table 3.

Example: Item Perturbation Parameters $ε_{jll,}$ When $H_{j}^{*} = 1$

	Level $l^{'}$ of $η_{i j}$
Level l of $X_{i j}$	0	1
	4/5	$ε_{j 01}$
	1/5	$1 - ε_{j 01}$

Notice that now $P (X_{i j} = 0 | η_{i j} = 0) = (H_{j} - 1) / H_{j} = 4 / 5$ exceeds $P (X_{i j} = 1 | η_{i j} = 0) = 1 / H_{j} = 1 / 5$ , as it should.

Next, consider $H_{j}^{*} > 1$ . If $H_{j}^{*}$ is increased by 1, then the probability $P (X_{i j} = 0 | η_{i j} = 0)$ can be shown to decrease by $1 / H_{j}$ . If $H_{j}^{*} = H_{j} - 1$ , then $P (X_{i j} = 0 | η_{i j} = 0)$ is $1 / H_{j}$ . Finally, if $H_{j}^{*} = H_{j}$ , then all distractors are coded and $X_{j} > 0$ must be true for all j. Said differently, the choice of $X_{j} = 0$ is impossible regardless of whether the ideal response $η_{j}$ is zero or not. Thus, $P (X_{i j} = 0 | η_{i j} = 0) = 0$ , which implies $P (X_{i j} \neq 0 | η_{i j} = 0) = 1$ . But this particular constellation is the ultimate violation of the conjunctive rule, which invalidates the NPC method. Hence, the penalty w_j must be set to 0. In summary, these different scenarios suggest the following formal definition of the penalty term w_j :

w_{j} = {\begin{matrix} 1 - \frac{H_{j}^{*} - 1}{H_{j}} H_{j}^{*} < H_{j} \\ 0 H_{j}^{*} = H_{j} . \end{matrix}

After w_j has been identified, the attribute profile of examinee i is estimated by minimizing the distance $d_{p} (X_{i}, η^{(m)})$ across all ideal response profiles $η^{(1)}, η^{(2)}, \dots, η^{(M)}$ and observed response profile $X_{i} = x_{i}$ :

{\hat{α}}_{i} = \underset{α_{m} \in {α_{1}, α_{2}, \dots, α_{M}}}{arg
min} d_{p} (x_{i}, η^{(m)}),

= \underset{α_{m} \in {α_{1}, α_{2}, \dots, α_{M}}}{arg
min} (\sum_{j = 1}^{J} I [η_{j}^{(m)} > 0, X_{i j} \neq η_{j}^{(m)}] + \sum_{j = 1}^{J} w_{j} I [η_{j}^{(m)} = 0, X_{i j} \neq η_{j}^{(m)}]) .

Theoretical Legitimization of the MC-NPC Method

The legitimacy of the nonparametric MC-NPC method as a tool for the analysis of MC-DINA data is established in demonstrating that minimizing the distance between the observed and the ideal response in Equation 5 used by the MC-NPC method is equivalent to maximizing the likelihood of the MC-DINA model. Recall the MC-DINA IRF in Equation 4 (repeated here for convenience):

P (X_{i j} = l | α_{i}) = {\begin{matrix} \frac{1}{H_{j}} + \frac{H_{j} - H_{j}^{*} - 1}{H_{j}} I [l = 0] {if η}_{i j} = 0 \\ {(1 - \sum_{m \neq l} ε_{j m l})}^{I [η_{i j} = l, l > 0]} \prod_{l^{'} \neq l} ε_{j l l^{'}}^{I [η_{i j} = l^{'}]} {if η}_{i j} > 0. \end{matrix}

(4 revisited)

For the MC-DINA model, the likelihood of examinee i is

L_{i} = \prod_{j = 1}^{J} L_{i j},

where $L_{i j}$ , the likelihood of item j of examinee i, given response $x_{i j}$ , is defined as

L_{i j} = {\begin{matrix} {(\frac{H_{j} - H_{j}^{*}}{H_{j}})}^{I [x_{i j} = η_{i j}]} {(\frac{1}{H_{j}})}^{I [x_{i j} \neq η_{i j}]} {if η}_{i j} = 0 \\ {(1 - \sum_{m \neq l} ε_{j m l})}^{I [x_{i j} = η_{i j} = l]} \prod_{l^{'} \neq l} ε_{j l l^{'}}^{I [x_{i j} = l, η_{i j} = l^{'}]} {if η}_{i j} > 0. \end{matrix}

To emphasize the connection between the likelihood in Equation 7 and the distance used by the MC-NPC method defined in Equation 5, four cases are distinguished. (Recall that $η_{i j} = η_{j}^{(m)}$ .)

Case 1: $η_{i j} = 0$ and $1 \leq H_{j}^{*} < H_{j} - 1$ .

For this case, the likelihood $L_{i j}$ reduces to

L_{i j} = L_{i j}^{(1)} = {(\frac{H_{j} - H_{j}^{*}}{H_{j}})}^{I [x_{i j} = η_{i j}]} {(\frac{1}{H_{j}})}^{I [x_{i j} \neq η_{i j}]} .

$H_{j} - H_{j}^{*} > 1$ implies $\frac{H_{j} - H_{j}^{*}}{H_{j}} > \frac{1}{H_{j}}$ . Hence, $L_{i j}^{(1)}$ is maximized if $X_{i j} = η_{i j} = 0$ . Also, if $η_{j}^{(m)} = 0$ , then, according to Equation 5

d_{p} (X_{i j}, η_{j}^{(m)}) = d_{p}^{(1)} = w_{j} I [η_{j}^{(m)} = 0, X_{i j} \neq η_{j}^{(m)}] .

Hence, $d_{p} (X_{i j}, η_{j}^{(m)})$ is minimized if $X_{i j} = η_{j}^{(m)} = 0$ —or, more succinctly, if $X_{i j} = 0$ , then $d_{p}^{(1)}$ is minimized and $L_{i j}^{(1)}$ is maximized.

Case 2: $η_{i j} = 0$ and $H_{j}^{*} = H_{j} - 1$ .

For this case, the likelihood $L_{i j}$ equals

L_{i j} = {(\frac{H_{j} - H_{j}^{*}}{H_{j}})}^{I [x_{i j} = η_{i j}]} {(\frac{1}{H_{j}})}^{I [x_{i j} \neq η_{i j}]} .

Because $H_{j}^{*} = H_{j} - 1$ , implying $H_{j} - H_{j}^{*} = 1$ , $L_{i j}$ reduces to

L_{i j} = L_{i j}^{(2)} = {(\frac{1}{H_{j}})}^{I [x_{i j} = η_{i j}]} {(\frac{1}{H_{j}})}^{I [x_{i j} \neq η_{i j}]} = \frac{1}{H_{j}} .

Now, based on Equation 6, the penalty w_j equals $w_{j} = \frac{2}{H_{j}}$ , which leads to

d_{p} (X_{i j}, η_{j}^{(m)}) = d_{p}^{(2)} = \frac{2}{H_{j}} I [η_{j}^{(m)} = 0, X_{i j} \neq η_{j}^{(m)}] .

Thus, if $H_{j}^{*} = H_{j} - 1$ , then $L_{i j}^{(2)}$ is a constant, but $d_{p}^{(2)}$ is minimized when $X_{i j} = η_{i j} = 0$ .

Case 3: $η_{i j} = 0$ and $H_{j}^{*} = H_{j}$ .

For this case, the likelihood $L_{i j}$ equals

L_{i j} = {(\frac{H_{j} - H_{j}^{*}}{H_{j}})}^{I [x_{i j} = η_{i j}]} {(\frac{1}{H_{j}})}^{I [x_{i j} \neq η_{i j}]} .

Because $X_{i j}$ cannot be 0 if $H_{j}^{*} = H_{j}$ , $X_{i j} \neq η_{i j}$ , and thus, $L_{i j} = \frac{1}{H_{j}}$ , which is $L_{i j}^{(2)}$ . In addition, based on Equation 6, the penalty w_j is 0, and thus, $d_{p} (X_{i j}, η_{j}^{(m)}) = 0$ .

Case 4: $η_{i j} > 0$ .

In this case, the likelihood $L_{i j}$ is

L_{i j} = L_{i j}^{(3)} = {(1 - \sum_{l^{'} \neq l} ε_{j l^{'} l})}^{I [x_{i j} = η_{i j} = l > 0]} \prod_{l^{'} \neq l} ε_{j l l^{'}}^{I [x_{i j} = l, η_{i j} = l^{'} > 0]} .

Given the assumption that an examinee whose $η_{i j} = l$ and $l \neq 0$ has a higher probability to be attracted by the coded option $X_{i j} = l$ than by any other option, it must be true that $1 - \sum_{l^{'} \neq l} ε_{j l^{'} l} > ε_{j l l^{'}}$ . Hence, $L_{i j}^{(3)}$ is maximized when $X_{i j} = η_{i j}$ . In addition, based on Equations 5 and 6, it can be shown that

d_{p} (X_{i j}, η_{j}^{(m)}) = d_{p}^{(3)} = I [η_{j}^{(m)} > 0, X_{i j} \neq η_{j}^{(m)}],

which is minimized when $X_{i j} = η_{j}^{(m)}$ . In conclusion, $d_{p}^{(3)}$ is minimized when $X_{i j} = η_{j}^{(m)}$ , whereas $L_{i j}^{(3)}$ is maximized.

Based on the above cases, the likelihood of examinee i can be expressed as

L_{i} = \prod_{j = 1}^{J} L_{i j} = \prod_{1 \leq H_{j}^{*} < H_{j} - 1} L_{i j}^{(1) I [η_{i j} = 0]} L_{i j}^{(3) I [η_{i j} \neq 0]} \prod_{H_{j} - 1 \leq H_{j}^{*} \leq H_{j}} L_{i j}^{(2) I [η_{i j} = 0]} L_{i j}^{(3) I [η_{i j} \neq 0]},

where $L_{i j}^{(2)}$ is a constant. In addition, it can be shown that

d_{p} (X_{i}, η^{(m)}) = \sum_{j = 1}^{J} d_{p} (X_{i j}, η_{j}^{(m)}) = \sum_{1 \leq H_{j}^{*} < H_{j} - 1} (d_{p}^{(1)} + d_{p}^{(3)}) \sum_{H_{j}^{*} = H_{j} - 1} (d_{p}^{(2)} + d_{p}^{(3)}) + \sum_{H_{j}^{*} = H_{j}} d_{p}^{(3)} .

Based on Equation 15, when $d_{p} (X_{i}, η^{(m)})$ is minimized, $d_{p}^{(1)}$ , $d_{p}^{(2)}$ , and $d_{p}^{(3)}$ are minimized, which implies $L_{i j}^{(1)}$ and $L_{i j}^{(3)}$ are maximized based on the above case distinction, and thus, so is L_i .

In conclusion, the above justification supports the legitimacy of using the MC-NPC method to analyze the MC-DINA data.

Simulation Studies

Two simulation studies implementing different experimental conditions were conducted to evaluate the performance of the proposed MC-NPC method. Key research questions were as follows:

i Can the purported advantage of the MC over the binary response format in improving the accuracy of examinee classification in CD be confirmed?

ii Do parametric estimation methods underperform in comparison with nonparametric methods when sample sizes are small?

The technical details of the two studies and their results are reported in the subsequent sections.

Study I: Replication of de la Torre’s (2009) Simulation

Study I is essentially a replication of de la Torre’s (2009) original simulation study, where he compared the performance of the DINA with that of his new MC-DINA model. In the present study, MC-NPC and NPC were used as additional methods for investigating the performance of the various methods also when sample sizes were small.

Design

De la Torre (2009) used only a sample size of $N = 1, 000$ , whereas here, further sample sizes of $N = 20, 30, 50$ , and $100$ were included. The remaining experimental factors were the number of items $J = 30$ , of attributes $K = 5$ , and of response options $H_{j} = 4$ for all j. The Q-matrix from de la Torre (2009, p. 13) was used, which had imposed the constraint that for each item, the q-vectors of the coded distractors must be nested within that of the key.

As a technical detail, one should notice that the version of the MC-DINA model used in de la Torre (2009) was a simpler version of the more general one that was presented in the earlier section “The DINA and MC-DINA Models.” Specifically, in de la Torre’s (2009) version, the perturbations $ε_{j l l^{'}}$ for a given level $l^{'}$ of $η_{i j}$ , with $l^{'} > 0$ , are constrained to equal probabilities for all levels l across different responses $X_{i j}$ , with $l \neq l^{'}$ . Denote a particular perturbation shared by different responses as $ε_{j l^{'}}$ for level $l^{'}$ of $η_{i j}$ ; then, Equation 4 is reduced to

P (X_{i j} = l | α_{i}) = {\begin{matrix} \frac{1}{H_{j}} + \frac{H_{j} - H_{j}^{*} - 1}{H_{j}} I (l = 0) i f η_{i j} = 0 \\ {(1 - (H_{j} - 1) ε_{j l^{'}})}^{I [η_{i j} = l]} ε_{j l^{'}}^{I [η_{i j} \neq l]} {(H_{j} - H_{j}^{*})}^{I [l = 0]} i f η_{i j} > 0 . \end{matrix}

In reiterating, this simpler version from de la Torre’s (2009) paper was used in Study I.

Like de la Torre (2009), the examinee attribute profiles were drawn from a discrete uniform distribution with equal probabilities. In addition, for this study, examinee attribute profiles were generated based on the multivariate normal threshold model (Chiu et al., 2009), with variances equal to 1 and covariances sampled from Unif(0.3, 0.5). The probability of an examinee in group $η_{i j} = l^{'}$ choosing response level $X_{i j} = l$ of item j was specified as

P (X_{i j} = l | η_{i j} = l^{'}) = {\begin{matrix} \frac{1}{H_{j}} + \frac{H_{j} - H_{j}^{*} - 1}{H_{j}} I (l = 0) if l^{'} = 0 \\ 0.82 if l^{'} > 0 and l = l^{'} \\ \frac{0.18}{H_{j} - 1} if l^{'} > 0 and l \neq l^{'} . \end{matrix}

The implication of this particular setting is that examinees who do not master any of the attributes required for a coded response option are not attracted to any response option. Hence, these examinees are assumed to choose a response randomly. In contrast, examinees who master the attributes required by at least one of the coded response options have a probability of 0.82 to pick this particular option; but they may also chose among the remaining options randomly.

The DINA and the MC-DINA model were fitted using the functions GDINA and MCmodel, respectively, in the R package GDINA that uses the EM algorithm for the marginal maximum likelihood estimation of the model parameters. Thus, the DINA and the MC-DINA model are henceforth referred as DINA and MC-DINA.

Evaluation criteria

The performance of the methods used in Study I was assessed in terms of the patternwise agreement rate (PAR), which is essentially the percentage of correct examinee classifications defined as

P A R = \frac{\sum_{i = 1}^{N} I [\hat{α}_{i} = α_{i}]}{N} .

The mean PAR of each set of 100 replications was computed. Notice that the MC-NPC, NPC, and DINA methods did not encounter any convergence issues; thus, the mean PAR is based on all 100 replications. However, for small samples, the MC-DINA algorithm in the GDINA package occasionally did not converge. Hence, in these instances, the mean PAR was computed based only on those replications, where MC-DINA converged; these mean PAR are marked as “C.” For the sake of a fair comparison of performance, the mean PAR for the MC-NPC method was also computed for the replications, where the MC-DINA algorithm converged. These additional mean PAR are identified by “MC-NPC*.”

Results

The results are reported in Table 4.

Table 4.

Replication of de la Torre’s (2009) Simulation Study: Mean PAR Obtained for the MC-NPC, MC-DINA, NPC, and DINA Methods

		100 Replications			C Replications
$α$ Structure	N	MC-NPC	NPC	DINA	MC-NPC*	MC-DINA (C)
Unif	20	.869	.695	.618	.867	.457 (21)
	30	.895	.738	.658	.894	.789 (77)
	50	.891	.733	.663	.891	.879 (100)
	100	.884	.718	.683	.884	.868 (100)
	1,000	.884	.728	.729	.884	.907 (100)
MVN	20	.828	.647	.619	.840	.380 (15)
	30	.801	.607	.588	.803	.753 (88)
	50	.871	.702	.671	.872	.850 (99)
	100	.835	.666	.679	.835	.852 (100)
	1,000	.843	.671	.706	.843	.878 (100)

Note. “C” identifies the number of replications with successful convergence. The column MC-NPC* contains the mean PAR of the C replications only. PAR = patternwise agreement rate; MC = multiple-choice; DINA = Deterministic Inputs, Noisy “And” Gate; NPC = nonparametric classification.

The comparison of the MC-NPC method with the two methods for dichotomous responses, NPC and DINA, reveals that across all sample sizes, the MC-NPC method resulted in substantially increased mean PAR—at least, for the Q-matrix with coded distractors as it was used by de la Torre (2009) and here. Comparing the mean PAR of the MC-NPC method (100 replications) with those obtained for the MC-DINA method (C replications only) confirms that the former outperformed the latter if samples were small. This was also true when only the C replications were compared (i.e., MC-NPC * vs. MC-DINA column). However, as expected, the MC-NPC method performed slightly worse than the MC-DINA method if

N = 1, 000 and 3, 000

Notice that for small samples (i.e., $N = 20 and 30$ ) in 82% and 17%, respectively, of the replications the MC-DINA method did not converge. As a remarkable aside, Table 4 suggests that the MC-DINA method might lose its advantage of finer-grained examinee classification if samples are small: The PAR of the MC-DINA method is lower than those obtained from the DINA and NPC methods for dichotomous data.

Study II: Assessment With Random Q-Matrices

Recall that in Study I, de la Torre’s (2009) Q-matrix was constrained such that the q-vectors of the coded response options formed a linear hierarchy. In contrast, Study II used randomly generated Q-matrices with the nestedness constraint removed.

Design

The experimental factors and evaluation criteria used in Study I were also used in Study II. However, the design of Study II was supplemented by several features that considerably increased the complexity of the data generating process. The number of response options for the MC item, H_j , was set to 4 or 5. As already mentioned, the Q-matrices were constructed using a random multistep procedure. First, the q-vectors of the keys were randomly generated, so that the Q-matrix formed by the attribute profiles of the keys conformed to the completeness condition for the DINA model (Chiu et al., 2009). Next, for each item key, the number of coded distractors needed was determined in randomly choosing from the set ${0, 1, \dots, H_{j}^{*} - 1}$ . The q-vector of each coded distractor was selected from among the $2^{K} - 1$ admissible attribute patterns subject to two constraints: (1) they had to allow for the nonambiguous identification of an examinee’s ideal response, and (2) each of the distractors had to be nonredundant with regard to classifying the examinees. A redundant distractor is one that does not improve the classification of examinees beyond the response options already available for a given item. In a final step, the key and the coded distractors were randomly assigned to $H_{j}^{*}$ positions among the total of H_j positions.

Table 5 provides an example of a Q-matrix as it was used to generate data with $K = 3$ attributes and $J = 20$ items. First notice that Items 1, 2, 3, 9, 12, 13, and 17 do not have coded distractors. For each item, the first row indicates the option ID and the q-vector of the key. The remaining rows of a particular item contain the q-vectors of the coded distractors. For example, the key of item 6 was assigned to the position of option 2; its q-vector is (1, 1, 0). A single coded distractor was generated and assigned to the position of Option 3; its q-vector is (1, 0, 1).

Table 5.

A Sample Q-Matrix Used in the Simulated Data ( $K = 3$ and $J = 20$ )

		Attribute					Attribute
Item	Option	1	2	3	Item	Option	1	2	3
1	3	1	0	0	11	1	1	0	0
2	1	0	1	0	12	2	1	0	0
3	3	0	0	1	13	2	1	1	1
4	4	1	1	1	14	2	1	0	1
4	1	0	1	1	14	1	0	0	1
4	2	0	1	0	14	3	1	0	0
5	1	1	1	1	14	4	0	1	0
5	3	0	1	0	15	1	1	1	1
6	2	1	1	0	15	4	0	1	1
6	3	1	0	1	16	3	1	1	1
7	1	0	1	1	16	2	1	0	0
7	2	0	0	1	16	4	0	0	1
7	4	1	0	1	17	1	0	0	1
8	4	1	1	1	18	1	1	1	0
8	2	0	1	1	18	2	1	0	0
8	1	0	0	1	19	2	1	0	1
9	1	1	1	1	19	4	1	0	0
10	4	1	0	1	20	3	1	1	1
10	1	1	0	0	20	2	0	0	1
11	4	1	0	1

The probability that an examinee in group $η_{i j} = l^{'}$ was attracted by response level $X_{i j} = l$ of item j, $P (X_{i j} = l | η_{i j} = l^{'})$ , was controlled in first generating a parameter $λ_{j}$ from Unif $(0.74, 0.9)$ for each item j. The probability $P (X_{i j} = l | η_{i j} = l^{'})$ was then determined as

P (X_{i j} = l | η_{i j} = l^{'}) = {\begin{matrix} \frac{1}{H_{j}} + \frac{H_{j} - H_{j}^{*} - 1}{H_{j}} I (l = 0) if l^{'} = 0 \\ λ_{j} if l^{'} > 0 and l = l^{'} \\ \frac{1 - λ_{j}}{H_{j} - 1} if l^{'} > 0 and l \neq l^{'} . \end{matrix}

The uniform distribution $U (0.74, 0.9)$ was chosen to ensure that the mean of the parameter $λ_{j}$ was about 0.82—the value used by de la Torre (2009).

Results

Table 6 reports the mean PAR from the four methods when each item has four options. Recall that when samples are small, the MC-DINA method may not converge in all 100 replicated data sets. Hence, like in Study I, two mean PAR were computed for the MC-NPC method: The first was computed across the 100 replications and used to evaluate the performance of the MC-NPC method in comparison with NPC and DINA methods. The second mean PAR was computed only across those replications, where MC-DINA converged (marked by “C”) and used to evaluate the performance of the two methods.

Table 6.

Mean PARs From the MC-NPC, MC-DINA, NPC, and DINA Methods When $H_{j} = 4$

			100 Replications			C Replications
	$α$ Structure	N	MC-NPC	NPC	DINA	MC-NPC*	MC-DINA (C)
$K = 3$ $J = 20$	Unif	20	.862	.720	.703	.868	.632 (42)
		30	.934	.841	.798	.935	.868 (77)
		50	.872	.800	.772	.872	.845 (99)
		100	.915	.764	.767	.915	.915 (100)
	MVN	20	.821	.705	.704	.829	.632 (28)
		30	.910	.803	.779	.908	.829 (76)
		50	.916	.860	.863	.916	.895 (97)
		100	.879	.757	.753	.879	.887 (100)
$K = 5$ $J = 30$	Unif	20	.771	.488	.418	.650	.050 (1)
		30	.827	.554	.509	—	NA (0)
		50	.836	.612	.577	—	NA (0)
		100	.792	.517	.491	.800	.725 (2)
	MVN	20	.775	.547	.521	.775	.700 (2)
		30	.685	.494	.481	.703	.669 (12)
		50	.789	.600	.590	.763	.757 (7)
		100	.746	.604	.596	.746	.733 (100)

The comparison between the first three columns with 100 replications shows that the MC-NPC method outperforms the NPC and the DINA methods across all conditions, which suggests that careful coding of the distractors substantially improves examinee classification in comparison with dichotomous responses. Observe that the differences in PAR increase when moving from condition $K = 3$ and $J = 20$ to $K = 5$ and $J = 30$ .

The last two columns of Table 6 report the mean PAR computed for MC-NPC and MC-DINA across the C replications only. The results (labeled as MC-NPC*) confirm that the MC-NPC method is more effective than MC-DINA when samples are small. In reiterating, as observed earlier, the MC-DINA method was plagued by convergence issues that reduced its efficiency considerably. The results further show that the MC-DINA convergence issues were amplified when moving from the $K = 3$ and $J = 20$ conditions to those where $K = 5$ and $J = 30$ . Notice if $K = 5$ and $J = 30$ , with attribute profiles uniformly distributed, then MC-DINA barely ever converged, while the MC-NPC method remained efficient across all the 100 replications with PARs of at least 0.75.

In Table 7, the mean PAR is presented that was obtained from the four methods when each item had five response options.

Table 7.

Mean PARs From the MC-NPC, MC-DINA, NPC, and DINA Methods When $H_{j} = 5$

			100 Replications			C Replications
	$α$ Structure	N	MC-NPC	NPC	DINA	MC-NPC*	MC-DINA (C)
$K = 3$ $J = 20$	Unif	20	.925	.802	.730	.932	.500 (14)
		30	.940	.858	.831	.936	.724 (39)
		50	.945	.840	.815	.945	.921 (90)
		100	.940	.867	.861	.940	.931 (100)
	MVN	20	.942	.803	.790	.959	.268 (17)
		30	.935	.827	.804	.932	.708 (29)
		50	.950	.878	.873	.950	.937 (87)
		100	.937	.845	.848	.937	.933 (100)
$K = 5$ $J = 30$	Unif	20	.876	.663	.611	—	NA (0)
		30	.901	.689	.685	.888	.854 (16)
		50	.853	.591	.523	.875	.810 (4)
		100	.910	.673	.661	.910	.896 (100)
	MVN	20	.744	.606	.604	—	NA (0)
		30	.850	.641	.639	.861	.672 (6)
		50	.845	.631	.613	.847	.800 (67)
		100	.881	.655	.661	.881	.864 (100)

Comparing the results in Tables 6 and 7 reveals two similarities worth noting. First, in some conditions with small samples, fitting dichotomous responses with the NPC or DINA method led to higher PAR than fitting polytomous responses with the MC-DINA method (e.g., some conditions where

N = 20

and 30). Second, Tables 6 and 7 suggest that a larger number of response options (like in Table 7) result in higher PAR. However, this finding should not be misinterpreted that a larger H_j automatically increases PAR. It simply means that more response options—if appropriate—may contribute to improved examinee classification. As a concluding remark, the standard deviations (not reported here) of the PAR obtained for MC-NPC and MC-DINA when the latter converged for a replicated data set were substantially smaller for the MC-NPC than for MC-DINA—an indication that the former produces more consistent PAR.

Real Data Analysis

Data

Responses from 132 fourth graders in Taiwan to 30 MC items on fraction operations were analyzed. Each item had four response options. The items required at most five attributes; their description is provided in Table 8.

Table 8.

The Required Attributes of the Fraction Operations Assessment

Attribute	Description
1	Write, read, do, and compare fractions
2	Identify and convert between proper, improper, and mixed fractions
3	Add or subtract fractions with the same denominator
4	Multiply fractions
5	Translate a word problem into a mathematical equation

The test included 15 single-attribute items; hence, completeness of the Q-matrix was guaranteed. It is presented in Table 9. For each item, a bundle of rows contains the q-vectors of the coded response options. The first row is the key, and the remaining rows are the coded distractors.

Table 9.

Q-matrix of the Fraction Assessment

Item	Option	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	Item	Option	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$
1	1	1	0	0	0	0	18	2	0	0	1	0	0
2	3	1	0	0	0	0	19	1	1	1	0	0	0
3	4	0	1	0	0	0	20	2	0	0	0	1	1
4	2	0	1	0	0	0	21	1	0	1	1	0	1
5	1	0	1	0	0	0	21	4	0	0	1	0	0
6	4	0	1	0	0	0	22	4	0	1	0	1	1
7	1	1	0	0	0	0	22	2	0	1	0	0	0
8	2	1	0	0	0	0	23	3	0	0	1	0	1
9	4	0	0	1	0	0	23	1	0	0	1	0	0
10	1	0	0	0	1	0	23	4	0	0	0	0	1
10	3	0	0	1	0	0	24	1	0	0	0	1	1
11	2	0	0	0	0	1	24	2	0	0	0	0	1
12	3	0	0	0	0	1	25	1	0	1	0	1	1
13	3	0	0	0	0	1	25	4	0	0	0	1	1
14	1	0	0	0	0	1	26	1	1	1	1	0	1
15	2	1	1	0	0	0	27	3	1	1	1	0	1
16	4	1	1	0	0	0	27	1	0	0	1	0	0
16	1	1	0	0	0	0	28	4	0	0	0	0	1
17	1	0	1	1	0	1	29	4	1	1	0	0	1
17	3	0	0	1	0	0	30	1	1	1	1	0	1

Figure 1 displays an example item: Response option C is the key requiring Attributes 3 and 5. Options A and D are two coded distractors; the former requires Attribute 3, the latter Attribute 5. The 132 examinees belonged to six classes with 27, 24, 23, 23, 21, and 14 students. From the total of $132 \times 30 = 3, 960$ responses and 31 were missing. They were imputed with a response randomly chosen from among the four options of the item with missing responses. (Notice that listwise deletion of all cases with missing data would have led to a substantial loss of data, which would have derailed the MC-DINA estimation process.)

Figure 1.

An item in the fraction assessment.

Evaluating a new method with real-world data presents a challenge in so far as the true proficiency class of examinees, different from synthetic data, is never known. Any evaluation can only be based on relative standards. To develop such a relative standard, examinees’ attribute profiles were estimated twice: first, at the class level using all four methods, MC-NPC, MC-DINA, NPC, and DINA. In a second step, the class-level data were merged into a single sample and reanalyzed with the MC-NPC and the MC-DINA method to assess their performance with a large sample. The large sample estimates—expected to result in more accurate examinee classification—were used as a benchmark for evaluating examinee classification using MC-NPC, MC-DINA, NPC, and DINA with small samples as they are common at the classroom level.

As a technical note, the MC-DINA method implemented in the GDINA package can only handle items, where all response options have been chosen at least once, which required to exclude several items from the cognitively diagnostic assessment. In case of the merged data set, Items 1 and 20 had to be removed. The analysis at the level of individual classes required the elimination of a larger number of items; they are listed in Table 10.

Table 10.

Items Removed in Class-Level Analysis

	Items Removed
Class	MC-DINA	DINA
1	1, 3–11, 13–15, 17, 18–21, 23, 24, 26, 28, 30	1, 6–7, 10, 24
2	1, 3, 5, 8, 11, 17, 20, 23, 26, 29	—
3	1, 4–11, 13, 15, 17–21, 23–25	10, 20, 24
4	1, 3, 8–10, 17, 20, 23, 24	—
5	1, 3–11, 14, 15, 17–22, 23, 24, 26–28	19, 24
6	1–17, 19, 20, 22–24, 26, 28, 30	6, 9, 24

Note. MC = multiple-choice; DINA = Deterministic Inputs, Noisy “And” Gate.

Results

The PAR between the estimates of examinees’ attribute profiles obtained by analyzing the merged data set and individual classes is reported in Table 11, which warrants some explanation.

Table 11.

PAR Between the Total Examinees Analyzed by the Polytomous Methods and the Class Analyzed by Each Method

		Class1	Class2	Class3	Class4	Class5	Class6	Total
MC-NPC-ALL	MC-NPC	1.000	.917	.957	1.000	1.000	.929	.970
	MC-DINA	0.444	.625	.261	0.391	0.333	.571	.432
	NPC	0.963	.708	.870	0.739	0.857	.643	.811
	DINA	0.296	.750	.478	0.522	0.524	.643	.523
MC-DINA-ALL	MC-NPC	0.852	.625	.609	0.652	0.857	.500	.697
	MC-DINA	0.444	.667	.304	0.478	0.333	.571	.462
	NPC	0.852	.458	.696	0.652	0.857	.571	.689
	DINA	0.259	.500	.522	0.478	0.571	.643	.477

Note. MC = multiple-choice; DINA = Deterministic Inputs, Noisy “And” Gate; NPC = nonparametric classification.

Table 11 is a three-way table reporting the results of a complex consistency analysis of the various methods to evaluate the key hypothesis that nonparametric methods for analyzing MC items outperform parametric methods if samples are small. The approach presented here is tailored to the specific dilemma of any real-world data analysis: The true proficiency classes of examinees are unknown. In assuming that larger samples result in more accurate examinee classification, the responses collected in the six classes were merged into a sample with

132

students and analyzed with the nonparametric MC-NPC method and the parametric MC-DINA algorithm. These two methods form the two levels of the fist dimension of Table 11: “MC-NPC-ALL” versus “MC-DINA-ALL.” Each of the two levels is completely crossed with the four methods MC-NPC, MC-DINA, NPC, and DINA that were used to classify examinees in the six classes. The third dimension of the table is formed by the six classes. The PAR values listed in the cells of the table were computed in comparing the classification results of the merged sample with those obtained for each of the four methods in the six classes. For example, the PAR in the first row results from comparing the classification of students obtained for each of the six classes with their classification when it was based on the merged sample. The hypothesis was that especially the parametric methods, MC-DINA and DINA, should produce varying classification depending on sample size; the nonparametric methods, MC-NPC and NPC, were supposed to be unaffected by changes in sample size. Consider, for example, the PAR in the first row of Table 11 that were all computed in comparing the MC-NPC classification of students based on the merged sample with those obtained in the six classes. As expected, the PAR all indicates perfect agreement (i.e., 1.000). In contrast, and as a further illustration of the principle underlying this specific analysis, consider the sixth row that contains the PAR computed in comparing the MC-DINA classification of the large sample with those realized in the six classes: Here, indeed, moving from the large sample to the small class samples severely affected the agreement between the two classifications.

The major findings can be summarized as follows. First, sample size did not seem to affect the performance of the MC-NPC method significantly: It performed consistently well when compared to the nonparametric as well as the parametric methods (recall, e.g., the PAR of the MC-NPC-ALL vs. MC-NPC when used in the six classes: All PAR were either 1 or at least 0.9). Second, in contrast, MC-DINA displayed much greater sensitivity to changes in sample size (see the earlier explanation concerning the sixth row of Table 11). Third, surprisingly, the two methods for binary responses, NPC and DINA, produced higher PAR than the MC-DINA method for almost all of the six classes. Finally, a brief comment concerning the last column of Table 11: It contains the PAR comparing the merged with the stacked classification as they were obtained separately for each of the six classes. Hence, the PAR reported in the last column can be viewed as a weighted mean PAR of the six classes.

Consistency or agreement of classification was also evaluated from a different point of view: Does mastery of individual attributes agree with an examinee’s estimated attribute profile; that is, her proficiency class? Specifically, for each student, attribute subscores were computed and compared with her estimated attribute profile. Attribute subscores of a given attribute are the percentage of items that require said attribute and that an examinee has answered correctly. In light of the conjunctive framework used by all four methods investigated here, extreme attribute subscores, say above 0.8 and below 0.1, are more informative than midrange scores because the latter are typically distorted by the conjunctive framework: An examinee may have mastered a particular item, but failed to master all the attributes required for a specific item and did not answer the item correctly; hence, her mastery of that particular item is not included in the count. High and low attribute subscores are then compared with an examinee’s estimated attribute profile: High and low attribute subscores should correspond to 1 and 0 entries, respectively, in the estimated attribute profile. Discrepancies in either direction—a 1 where there should be a zero or vice versa—indicate an inconsistency between actual test performance and estimated proficiency. The students in Class 5 are used as examples here; their attribute profiles were estimated by the MC-NPC and MC-DINA method. Table 12 presents a few striking examples of grave discrepancies between attribute subscores and estimated proficiency class.

Table 12.

Distinct Estimated Attribute Profiles of Class Five With the MC-NPC Method and the MC-DINA Algorithm

ID	MC-NPC		MC-DINA		Subscores
ID	ALL	CLASS	ALL	CLASS	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$
99	11111	11111	11111	01011	.833	0.867	1.000	1.000	.813
100	11011	11011	11011	11100	.750	0.600	0.500	0.800	.500
101	11111	11111	11111	00111	.750	0.733	0.875	0.600	.750
103	11111	11111	11111	11111	.833	0.733	1.000	1.000	.938
105	00010	00010	00010	11000	.417	0.200	0.375	0.400	.313
106	11111	11111	11111	10001	.833	0.667	0.625	0.800	.688
108	11111	11111	11111	00100	.917	1.000	1.000	1.000	.938
109	11111	11111	00000	11110	.500	0.600	0.750	0.600	.625
111	11111	11111	11111	10001	.833	0.733	0.875	0.600	.750
112	11010	11010	11110	01000	.667	0.467	0.375	0.600	.313
113	11111	11111	11111	00011	.833	0.867	1.000	0.600	.875
116	11111	11111	11111	00011	.833	0.733	0.875	0.600	.750
117	11111	11111	11111	01010	.833	0.533	0.625	0.800	.563
118	11111	11111	11111	00111	.833	0.867	0.875	0.800	.750

Note. MC = multiple-choice; DINA = Deterministic Inputs, Noisy “And” Gate; NPC = nonparametric classification.

Consider, for example, Examinee 108. All proportions of correct responses to items associated with the five attributes were $100 %$ or at least above $90 %$ . Her attribute profile estimated by the MC-NPC method, $(11111)$ , was consistent with that of the subscores; however, the attribute profile estimated by the MC-DINA algorithm, $(00100)$ , was severely off. Two other extreme examples are Examinees 100 and 99. The former answered only $50 %$ of the items requiring Attribute 3 correctly but was identified as a master of Attribute 3. In contrast, Examinee 99 answered 100% of these items correctly but was identified as a non mastery of Attribute 3. The estimates of the attribute profiles of these two examinees obtained by using the MC-NPC method did not suffer from these grave inconsistencies. The MC-NPC method, however, is not without fault: For example, Examinee 106 only answered $62.5 %$ of the items requiring Attribute 3 correctly but was identified as a mastery of this attribute. On the other hand, Examinee 113 answered 100% of these items correctly but was identified as a non-mastery of Attribute 3. Still, close inspection of all examinees in the fifth class revealed that overall, MC-DINA resulted in considerably more inconsistencies than the MC-NPC method. A plausible explanation of this difference seems that many items needed to be deleted to meet the requirement of MC-DINA that each response option must be used at least once.

Discussion

A MC item consists of a list of response options comprising the key, the correct answer, and several distractors (i.e., incorrect responses). If designed carefully, additional diagnostic information can be gleaned from the responses to the distractors that improves the classification of examinees (de la Torre, 2009). The design of distractors according to de la Torre’s (2009) MC-DINA model is guided by the assumption that distractors lack one of more attributes that are required for a correct response to the key. Said differently, the q-vectors of the distractors are restricted to be nested in that of the key. The implementation of de la Torre’s (2009) MC-DINA model in the GDINA package in R, however, does not impose this restriction, as coded distractors are allowed to require attributes not required by the key.

In this article, the MC-NPC method is proposed as a nonparameteric alternative to de la Torre’s (2009) MC-DINA model for use with small samples. Like its companion, the NPC method for dichotmous item responses, the MC-NPC method uses as loss function based on penalized Hamming distances (Chiu & Douglas, 2013). However, the loss function had to be redesigned to meet the specific challenges to accommodate the MC format of observed and ideal item responses. Three major findings from the simulation studies must be mentioned. First, the MC-NPC method can improve the examinee classification rate in comparison with the DINA model and the NPC method applied to dichotomized data. The results further suggest that MC-NPC outperforms MC-DINA in case of small samples. A specific drawback of MC-DINA is its susceptibility to fail convergence when samples are small. In contrast, the MC-NPC method does not rely on fitting a parametric model and is thus, immune to any convergence issues, which makes it the method of choice for monitoring teaching and learning in small educational programs. In fact, the real-world application concerned the CD of assessment data collected in six classes of an elementary school. The key findings are (1) sample size did not seem to affect the performance of the MC-NPC method significantly, (2) MC-DINA appeared to be more sensitive to changes in sample size, and (3) for almost all of the six classes, NPC and DINA, the two methods for binary responses, produced higher PAR than the MC-DINA method. The MC-NPC method, however, should be used with caution for large samples, as it may be less effective compared to the parameter-based MC-DINA method, as the results of Simulation Study I suggest. As another limitation, the MC-NPC method does not provide item parameter estimates, as it does not rely on fitting the data with a parameterized statistical model.

As a general take home message from the simulation studies, using a valid Q-matrix is mandatory. Great efforts should be made to secure completeness of the Q-matrix to guarantee identifiability of the model (e.g., Gu & Xu, 2018) before any data are collected or analyzed. Thus, the development of a solid validation method for the q-vectors of the keys and the distractors should be prioritized. Within this context, recall that for the simulation studies, a Q-matrix of the keys was generated that satisfied Lemma 1 in Chiu et al. (2009). Said lemma states that a Q-matrix is complete for the DINA model if and only if it contains a $K \times K$ identity submatrix formed by the q-vectors of the K distinct single-attribute items. The completeness condition described in Lemma 1 in Chiu et al. (2009) is sufficient and necessary for the NPC and the DINA methods. For MC-NPC and MC-DINA, however, the completeness condition of Lemma 1 is only sufficient. Said differently, the Q-matrix of assessment using MC items and the conjunctive DINA framework may be complete despite some single-attribute items missing from the q-vectors of the keys. As an example, consider a Q-matrix containing only two items with four response options, $H = 4$ , and $K = 3$ attributes. Let $q_{1}^{3} = (110)$ , $q_{1}^{2} = (100)$ , and $q_{1}^{1} = (010)$ be q-vectors of three coded options of Item 1. Item 2 has only a single coded response option, the key, with q-vector $q_{2}^{1} = (001)$ . Table 13 presents the ideal response patterns for each of the $2^{K}$ distinct attribute profiles.

Table 13.

Ideal Response Patterns With $2^{K}$ Attribute Profiles

Attribute Pattern	Item 1	Item 2
(000)	0	0
(100)	2	0
(010)	1	0
(001)	0	1
(110)	3	0
(101)	2	1
(011)	1	1
(111)	3	1

Obviously, every attribute profile has its own, distinct ideal response pattern. Therefore, the two items are sufficient to identify all possible attribute profiles. Thus, the Q-matrix formed by the q-vectors of coded response options of Items 1 and 2 is complete. In this example, the Q-matrix of the keys and coded distractors still includes all possible $K = 3$ single attribute q-vectors. However, since the coded distractors provide additional diagnostic information, the Q-matrix can be complete even if the test does not include all single-attribute items.

In conclusion, and closely related, one should also notice that additional (but not necessarily comprehensive) simulation studies (not reported here due to space limitations) suggest that MC-NPC in comparison with MC-DINA is surprisingly “insensitive” to the misspecification of Q-matrix entries—a remarkable finding that certainly deserves further systematic exploration.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Chang

Y.-P.

Chiu

C.-Y.

Tsai

R.-C.

(2019). Nonparametric CAT for CD in educational settings with small samples. Applied Psychological Measurement, 43, 543–561.

Chiu

C.-Y.

Chang

Y.-P.

(2021). Advances in CD-CAT: The general nonparametric item selection method. Psychometrika, 86, 1039–1057.

Chiu

C.-Y.

Douglas

J. A.

(2013). A nonparametric approach to cognitive diagnosis by proximity to ideal response profiles. Journal of Classification, 30, 225–250.

Chiu

C.-Y.

Douglas

J. A.

(2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633–665.

Chiu

C.-Y.

Sun

Bian

(2018). Cognitive diagnosis for small educational programs: The general nonparametric classification method. Psychometrika, 83, 355–375.

de la Torre

(2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33, 163–183.

DiBello

L. V.

Henson

R. A.

Stout

W. F.

(2015). A family of generalized diagnostic classification models for multiple choice option-based scoring, Applied Psychological Measurement, 39, 62–79.

DiBello

L. V.

Roussos

L. A.

Stout

W. F.

(2007). Review of cognitively diagnostic assessment and summary of psychometric models. In Rao

C. R.

Sinharay

(Eds.), Handbook of statistics: Vol. 26, Psychometrics (pp. 979–1030). Elsevier.

. (2018). The sufficient and necessary condition for the identifiability and estimability of the DINA model. Psychometrika, 84, 468–483.

10.

Haberman

S. J.

von Davier

(2007). Some notes on models for cognitively based skill diagnosis. In Rao

C. R.

Sinharay

(Eds.), Handbook of statistics: Vol. 26, Psychometrics (pp. 1031–1038). Elsevier.

11.

Haertel

E. H.

(1989). Using restricted latent class models to map the skill structure of achievement items. Journal of Educational Measurement, 26, 333–352.

12.

Haertel

E. H.

Wiley

D. E.

(1993). Representations of ability structures: Implications for testing. In Frederiksen

Mislevy

R. J.

Bejar

(Eds.), Test theory for a new generation of tests (pp. 359–384). Erlbaum.

13.

Junker

B. W.

Sijtsma

(2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272.

14.

Köhn

H.-F.

Chiu

C.-Y.

(2016). Conditions of completeness of the Q-matrix of tests for cognitive diagnosis. In van der Ark

L. A.

Bolt

D. M.

Wang

W. C.

Douglas

J. A.

Wiberg

(Eds.), Quantitative Psychology Research: The 80th Annual Meeting of the Psychometric Society (pp. 255–264). Springer.

15.

Köhn

H.-F.

Chiu

C.-Y.

(2017). A procedure for assessing completeness of the Q-matrices of cognitively diagnostic tests. Psychometrika, 82, 112–132.

16.

Köhn

H.-F.

Chiu

C.-Y.

(2019). Attribute hierarchy models in cognitive diagnosis: Identifiability of the latent attribute space and conditions for completeness of the Q-matrix. Journal of Classification, 36, 541–565.

17.

Köhn

H.-F.

Chiu

C.-Y

. (2021). A unified theory of the completeness of Q-matrices for the DINA model. Journal of Classification, online first.

18.

Leighton

Gierl

(Eds.). (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge University Press.

19.

Leighton

J. P.

Gierl

M. J.

Hunka

(2004). The attribute hierarchy method for cognitive assessment: A variation on Tatsuoka’s rule-space approach. Journal of Educational Measurement, 41, 205–236.

20.

Macready

G. B.

Dayton

C. M.

(1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 33, 379–416.

21.

Nichols

P. D.

Chipman

S. F.

Brennan

R. L.

(Eds.). (1995). Cognitively diagnostic assessment. Lawrence Erlbaum.

22.

Nitko

A. J.

(2001). Educational assessment of students (3rd ed.). Merrill Prentice Hall.

23.

Ozaki

(2015). DINA models for multiple-choice items with few parameters: Considering incorrect answers. Applied Psychological Measurement, 39, 431–447.

24.

Paulsen

(2019). Examining cognitive diagnostic modeling in small sample contexts. (Publication No. 22583956) [Doctoral Dissertation, University of Indiana]. ProQuest Dissertations & Theses Global.

25.

Rupp

A. A.

Templin

J. L.

Henson

R. A.

(2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.

26.

Sadler

P. M.

(1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35, 265–296.

27.

Sessoms

Henson

R. A.

(2018). Applications of diagnostic classification models: A literature review and critical commentary. Measurement: Interdisciplinary Research and Perspectives, 16(1), 1–17. https://doi.org/10.1080/15366367.2018.1435104

28.

Tatsuoka

(2002). Data-analytic methods for latent partially ordered classification models. Journal of the Royal Statistical Society Series C (Applied Statistics), 51, 337–350.

29.

Tatsuoka

K. K.

(1985). A probabilistic model for diagnosing misconception in the pattern classification approach. Journal of Educational and Behavioral Statistics, 12, 55–73.

30.

Tatsuoka

K. K.

(1990). Toward an integration of item-response theory and cognitive error diagnosis. In Frederiksen

Glaser

Lesgold

Safto

(Eds.), Monitoring skills and knowledge acquisition (pp. 453–488). Erlbaum.

31.

Tatsuoka

K. K.

(2009). Cognitive assessment: An introduction to the rule space method. Routledge/Taylor & Francis Group.

32.

Wang

Douglas

(2015). Consistency of nonparametric classification in cognitive diagnosis. Psychometrika, 80, 85–100.

33.

Wood

E. J.

(2003). What are extended matching sets questions? Bioscience Education, 1(1), 1–8.

Item	Option	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	Item	Option	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$
1	1	1	0	0	0	0	18	2	0	0	1	0	0
2	3	1	0	0	0	0	19	1	1	1	0	0	0
3	4	0	1	0	0	0	20	2	0	0	0	1	1
4	2	0	1	0	0	0	21	1	0	1	1	0	1
5	1	0	1	0	0	0	21	4	0	0	1	0	0
6	4	0	1	0	0	0	22	4	0	1	0	1	1
7	1	1	0	0	0	0	22	2	0	1	0	0	0
8	2	1	0	0	0	0	23	3	0	0	1	0	1
9	4	0	0	1	0	0	23	1	0	0	1	0	0
10	1	0	0	0	1	0	23	4	0	0	0	0	1
10	3	0	0	1	0	0	24	1	0	0	0	1	1
11	2	0	0	0	0	1	24	2	0	0	0	0	1
12	3	0	0	0	0	1	25	1	0	1	0	1	1
13	3	0	0	0	0	1	25	4	0	0	0	1	1
14	1	0	0	0	0	1	26	1	1	1	1	0	1
15	2	1	1	0	0	0	27	3	1	1	1	0	1
16	4	1	1	0	0	0	27	1	0	0	1	0	0
16	1	1	0	0	0	0	28	4	0	0	0	0	1
17	1	0	1	1	0	1	29	4	1	1	0	0	1
17	3	0	0	1	0	0	30	1	1	1	1	0	1

Item	Option	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	Item	Option	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$
1	1	1	0	0	0	0	18	2	0	0	1	0	0
2	3	1	0	0	0	0	19	1	1	1	0	0	0
3	4	0	1	0	0	0	20	2	0	0	0	1	1
4	2	0	1	0	0	0	21	1	0	1	1	0	1
5	1	0	1	0	0	0	21	4	0	0	1	0	0
6	4	0	1	0	0	0	22	4	0	1	0	1	1
7	1	1	0	0	0	0	22	2	0	1	0	0	0
8	2	1	0	0	0	0	23	3	0	0	1	0	1
9	4	0	0	1	0	0	23	1	0	0	1	0	0
10	1	0	0	0	1	0	23	4	0	0	0	0	1
10	3	0	0	1	0	0	24	1	0	0	0	1	1
11	2	0	0	0	0	1	24	2	0	0	0	0	1
12	3	0	0	0	0	1	25	1	0	1	0	1	1
13	3	0	0	0	0	1	25	4	0	0	0	1	1
14	1	0	0	0	0	1	26	1	1	1	1	0	1
15	2	1	1	0	0	0	27	3	1	1	1	0	1
16	4	1	1	0	0	0	27	1	0	0	1	0	0
16	1	1	0	0	0	0	28	4	0	0	0	0	1
17	1	0	1	1	0	1	29	4	1	1	0	0	1
17	3	0	0	1	0	0	30	1	1	1	1	0	1

Item	Option	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$	Item	Option	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$α_{5}$
1	1	1	0	0	0	0	18	2	0	0	1	0	0
2	3	1	0	0	0	0	19	1	1	1	0	0	0
3	4	0	1	0	0	0	20	2	0	0	0	1	1
4	2	0	1	0	0	0	21	1	0	1	1	0	1
5	1	0	1	0	0	0	21	4	0	0	1	0	0
6	4	0	1	0	0	0	22	4	0	1	0	1	1
7	1	1	0	0	0	0	22	2	0	1	0	0	0
8	2	1	0	0	0	0	23	3	0	0	1	0	1
9	4	0	0	1	0	0	23	1	0	0	1	0	0
10	1	0	0	0	1	0	23	4	0	0	0	0	1
10	3	0	0	1	0	0	24	1	0	0	0	1	1
11	2	0	0	0	0	1	24	2	0	0	0	0	1
12	3	0	0	0	0	1	25	1	0	1	0	1	1
13	3	0	0	0	0	1	25	4	0	0	0	1	1
14	1	0	0	0	0	1	26	1	1	1	1	0	1
15	2	1	1	0	0	0	27	3	1	1	1	0	1
16	4	1	1	0	0	0	27	1	0	0	1	0	0
16	1	1	0	0	0	0	28	4	0	0	0	0	1
17	1	0	1	1	0	1	29	4	1	1	0	0	1
17	3	0	0	1	0	0	30	1	1	1	1	0	1