Abstract
This article proposes a generalized distance discriminating method for test with polytomous response (GDD-P). The new method is the polytomous extension of an item response theory (IRT)-based cognitive diagnostic method, which can identify examinees’ ideal response patterns (IRPs) based on a generalized distance index. The similarities between observed response patterns and IRPs for polytomous response situation are measured by the index of GDD-P, and the attribute patterns can be recognized via the relationship between attribute patterns and IRPs. Feasible designs about polytomous Q-matrix and scoring items for polytomous response are also discussed. In simulation, the classification accuracy of the GDD-P method for the test with polytomous response was investigated, and results indicated that the proposed method had promising performance in recognizing examinees’ attribute patterns.
Keywords
Cognitive diagnosis assessment aims at providing valuable information about examinees’ cognitive strengths and weaknesses at the skill, attribute, or competence level. The introduction of the Q-matrix (Tatsuoka, 1983) to a test has made it possible to establish the relationship between test items and attributes. An examinee’s latent information reflected by an attribute pattern can be inferred through a pattern recognition procedure based on the Q-matrix. Although dozens of cognitive diagnostic models (CDMs) or approaches have been proposed during the last 30 years, most of them are developed for dichotomous response. However, in practical applications, tests often include polytomous items (i.e., items that are scored polytomously). If researchers use existent CDMs or methods for dichotomous response to analyze such test data, information might be lost due to the process of reducing polytomous to dichotomous scores. To address this particular testing need, it is necessary for researchers to develop and use cognitive diagnostic methods for polytomous response.
Although many CDMs or approaches have been proposed and some of them have been summarized from different aspects (e.g., de la Torre, 2011; Embretson & Reise, 2000; Fu & Li, 2007; Haertel, 1989; Maris, 1999; Nichols, Chipman, & Brennan, 1995), it is difficult to determine which method provides the most reliable diagnostic information for examinees. A cognitive diagnostic method for practical application should diagnose an examinee’s attribute pattern with relatively high precision. The accuracy of examinees’ pattern recognition can therefore be considered as criteria for measuring the precision of a CDM, especially for a new proposed approach. Regarding this, the generalized distance discriminating (GDD) method for dichotomous response (Sun, Zhang, Xin, & Bao, 2011) has been shown to have good performance using simulation data. The method constructs a discriminating index based on item response theory (IRT) to recognize examinees’ attribute patterns from their dichotomous responses. Specifically, the GDD method as an IRT-based method offers a new way of doing cognitive diagnosis. For CDMs, which are widely used for cognitive diagnosis, attribute patterns can be estimated based on a specific item response function that reflects a direct relationship between the attribute patterns and the item responses. For IRT-based procedures such as the Rule Space Method (RSM; Tatsuoka, 1983, 1995), Attribute Hierarchy Method (AHM; Leighton, Gierl, & Hunka, 2004) and GDD, attribute patterns are not estimated directly. The ideal response patterns (IRPs) are identified from the observed response patterns (ORPs) using an IRT-based discriminating index; thereafter, the attribute patterns can be determined by examining the corresponding relationship between the IRPs and the attribute patterns as indicated by the Q-matrix.
Given the relatively high classification accuracy of the GDD method, this study extends the method to make it appropriate for polytomous response. The development of the new method denoted as GDD-P, where P stands for polytomous response, is given in the next section. The design of the Q-matrix with polytomous elements and an item scoring method for polytomous response are discussed in the section titled “Q-matrix Design and Scoring Method.” Although Fu (2005) proposed a method for scoring polytomous items in the context of the polytomous extension of the fusion model, this is by no means the only possible approach to scoring polytomous items. In this work, the authors define an item scoring method for describing the correspondence relationship between the attribute patterns and IRPs for both the dichotomous Q-matrix and polytomous Q-matrix. Following the specific design of Q-matrix with polytomous elements and the scoring method, the submatrix of the polytomous Q-matrix for guaranteeing a one-to-one mapping between the attribute patterns and the IRPs is also discussed. Based on the theoretical part of the GDD-P method and the designs related with Q-matrix and item scoring, three simulation studies are conducted in the section titled “Simulation.” In the “A Real Data Analysis” section, the authors provide a real data analysis by implementing the GDD-P method using dichotomous Q-matrix and polytomous Q-matrix. They conclude the article in the last section by discussing the practical implications of the proposed method.
GDD Method for Polytomous Response
GDD Method
Sun et al. (2011) proposed the GDD method to identify each examinee’s IRP for a cognitive diagnostic test with dichotomous items. Attribute patterns of these examinees can be recognized based on the relationship between IRPs and attribute patterns, which is determined by the Q-matrix (Tatsuoka, 1983) of the test with an assumption that the underlying process is conjunctive (e.g., Fu & Li, 2007). The core idea of the method is to use a generalized distance index to identify each examinee’s IRP. The index is based on the item response function of an IRT model. The distance definition generalizes the Hamming distance by adding the response probabilities of examinees as adjustment factors. Below is an overview of the generalized distance index for dichotomous response.
Assume N examinees respond to a cognitive diagnostic test, and the total number of IRPs inferred from the Q-matrix is T. Let vector
where
Case 1. If
Case 2. If
Case 3. If
The generalized distance between the vectors
It represents the sum of the generalized distances across the J items. Therefore, each examinee’s latent IRP can be identified by calculating all the T generalized distance indices and following the rule
In Sun et al.’s (2011) work, a Monte Carlo simulation study was conducted to compare the examinee classification accuracy of three cognitive diagnosis methods: GDD, RSM, and AHM. The latter two methods are chosen for the comparison because they are also both IRT-based methods. The simulation data were generated from the DINA (deterministic inputs, noisy “and” gate) model (Junker & Sijtsma, 2001), which was viewed as the baseline to check the performance among GDD, RSM, and AHM. Sixteen conditions representing the combinations of four kinds of item slippage probabilities and four types of attribute hierarchies (Leighton et al., 2004) were considered. The item parameters of both the DINA and unidimensional IRT models were estimated using the expectation-maximization (EM) algorithm (e.g., de la Torre, 2009; Woodruff & Hanson, 1996). The results showed that GDD and DINA models performed almost equally with respect to classification accuracy, and both performed better than the RSM and AHM.
Upon closer examination, it is found that the GDD method and DINA model share a similarity in how examinee classification is carried out. As shown in Equation 1, if
Definition of Generalized Distance Index for Polytomous Response
In the polytomous context, the symbols
where
For the polytomous response situation, whenever Yij≠Ij(t), Equation 3 represents the distance between the ideal and observed responses weighted by the “unlikeliness” of the ideal response given the examinee’s ability.
In comparing the IRT-based methods for cognitive diagnosis (i.e., RSM, AHM, GDD, and GDD-P), it can be found that these methods are all based on the match or lack thereof between the ORPs and IRPs. Attribute patterns can be recognized by these methods in two steps. First, an examinee’s IRP is identified by applying an IRT-based discriminating index. Second, the attribute pattern behind the IRP is determined based on the information contained in the Q-matrix. Note that the form of Q-matrix determines the form of attribute patterns and the item scoring method determines how the attribute patterns correspond to the IRPs. In other words, if one would like to use an IRT-based method like GDD-P, the specific attribute patterns and how they map to the IRPs should be known. This implies that the design of the Q-matrix and the item scoring method play a critical role in the pattern recognition procedure of an IRT-based method. For this reason, some feasible Q-matrix designs and scoring item methods are discussed in the next section.
Q-matrix Design and Scoring Method
Design of Q-Matrix
A polytomous Q-matrix is defined as a Q-matrix that can assume nonnegative integers as its elements. To differentiate Tatsuoka’s (1983) Q-matrix, which can only contain dichotomous values, from the polytomous Q-matrix, the authors refer to the former as the dichotomous Q-matrix in this article. Note that the dichotomous Q-matrix could not sufficiently reflect the different difficulty levels of an attribute in different items, whereas the polytomous Q-matrix can. A test for probing a comprehensive problem-solving process often requires that the grain size of an attribute should not be too small. Therefore, it is necessary to design the polytomous Q-matrix to characterize the large grain sizes of the attributes. Table 1 is as an example of the polytomous Q-matrix. The test contains 10 items and three independent attributes, A1, A2, and A3. The attributes A2 and A3 are probed to different extents. Because the maximum elements corresponding to A1, A2, and A3 in Q-matrix are 1, 2, and 3, respectively, there is a total number of 24 attribute patterns (i.e., 2 × 3 × 4 = 24; see Table 2). The number indicates some examinees could possess the same attributes but to different extents.
Polytomous Q-Matrix for the Test With Three Attributes.
Attribute Patterns for the Test With Three Attributes.
Item Scoring Method
A feasible item scoring method is introduced here to show how attribute patterns are mapped to the IRPs for a polytomous response test. The scoring method can work with both the dichotomous and the polytomous Q-matrices. Let vector
where I represents an indicator function.
When a dichotomous Q-matrix is used,
Ideal Response Patterns for the Test With Three Attributes.
Note: The order of 24 ideal response patterns listed here is consistent with that of 24 attribute patterns in Table 2.
Sufficiency of Polytomous Q-Matrix
A critical issue about the polytomous Q-matrix and item scoring method introduced is ensuring under which conditions the attribute patterns under those designs are identifiable. The identifiability of attribute patterns for the dichotomous Q-matrix and conjunctive item scoring method has been explored. Chiu, Douglas, and Li (2009) proved that if the attributes are all independent for dichotomous Q-matrix, the IRPs for different attribute patterns are different if all the single-attribute items are included in the Q-matrix. Ding, Yang, and Wang (2010) pointed out that for dichotomous Q-matrix with a strong attribute prerequisite relationship (Leighton et al., 2004), one-to-one mapping exists between attribute patterns and IRPs if and only if the dichotomous Q-matrix contains the reachability matrix (e.g., Tatsuoka, 1995) as its submatrix. Based on these results, for the polytomous Q-matrix with the item scoring method introduced above, a matrix denoted by
Matrix R for Polytomous Q-Matrix
If an attribute hierarchy of a test is defined by domain experts and the test is designed to probe different levels of the attributes, the matrix
Matrix GR for Polytomous Q-Matrix
The method for generating
Matrix
Simulation
In this section, three studies were conducted to investigate the classification performance of the GDD-P method for polytomous items under the scoring method introduced in the previous section. Studies 1 and 2 focused on the condition that strong prerequisite structures existed among the attributes. Study 3 focused on the condition that the attribute dependence and distribution of attribute patterns followed the rule suggested by de la Torre and Douglas (2004). The correct pattern classification rate (CPCR) and average attribute match rate (AAMR) were used as the evaluation criteria for the GDD-P method for these studies. These statistics are defined as
and
where N is the sample size, K is number of attributes,
Study 1
Condition 1: Data From Dichotomous Q-Matrix and Four Attribute Structures
In this study, K = 7 attributes were assumed and the attributes followed four structures. As shown by Figure 1, the attributes under the first three structures—linear, convergent, and divergent (Leighton et al., 2004)—were dependent; the attributes under the last structure were independent. For the first three structures, an assumption that is same as that in the examples in the “Item Scoring Method” section is set: An attribute could be mastered only if all of its prerequisite attributes are also mastered. The Q-matrix for each structure was designed to be composed of J = 35 items. Under each structure, the reachability matrix (see Table 5) was used as a submatrix in a dichotomous Q-matrix and the remaining 28 columns were randomly sampled from all possible attribute patterns except the all-zero pattern.

Attribute structures for Study 1.
Reachability Matrix for Study 1.
To generate the true attribute patterns of N = 1,000 examinees, the number of examinees possessing each attribute pattern can be determined first. As different attribute patterns correspond to different IRPs, the total scores of IRPs were used to design the frequencies of those examinees’ true attribute patterns. Specifically, by anchoring zero and the full score to the ability values of −3 and 3, respectively, the total score of IRPs were calculated, sorted, and transformed to some ability values. If these ability values are assumed to approximately follow a standard normal distribution, the number of examinees possessing every attribute pattern can be obtained. These true attribute patterns were later used to calculate the CPCR and AAMR of the GDD-P method.
Next, the rule illustrated in the Appendix is used to generate the data (i.e., ORPs) for polytomous response. The main principle of the rule is that small difference between an alternative score and an ideal score will lead to a big likelihood of observing this alternative score. Then, a random number sampled from a uniform distribution U(0, 1) was generated to simulate the examinees’ ORPs. Under each attribute hierarchy, slippage probability was designed to be 10%, 20%, 30%, or 40%. For each of the 16 combinations from four slippage probabilities and four attribute structures, 30 data sets are simulated and analyzed. The GDD-P method was used by fitting the response data with the polytomous IRT model, GRM, and used EM algorithm and EAP (expected a posterior) to estimate the item and ability parameters, respectively.
Results
The classification accuracy results were reported in Table 6. As Table 6 shows, the CPCR values of the GDD-P method under each hierarchy for the 10% slippage probability were all above 0.99; for the 20% slippage probability, the values were all higher than 0.97; for the 30% slippage probability, the values decreased to about 0.89; and for the 40% slippage probability, the lowest CPCR value was higher than 0.68. The AAMR values of the GDD-P method for the slippage probability of 10%, 20%, 30%, and 40% under four hierarchies were all above 0.90.
Mean CPCR and AAMR for the GDD-P method in Study 1.
Note: CPCR = correct pattern classification rate; AAMR = average attribute match rate; GDD-P=GDD method for polytomous response.
In comparing the CPCR results under different attribute hierarchies given in Table 6, it can be found that under each slippage probability, the values for linear hierarchy were almost always higher than those for the other three structures. In addition, the CPCR values under the 10% and 20% slippage probabilities were slightly higher for independent structures than those for divergent and convergent hierarchy; for the 30% and 40% slippage probabilities, the CPCR values were highest for the divergent structure, followed by the convergent and then the independent structures. The AAMR values given in Table 6 showed similar trends as the CPCR values.
Study 2
Condition 2: Data From Polytomous Q-Matrix and Four Attribute Structures
This study differed from Study 1 in that the Q-matrix had polytomous elements. Although involving only K = 3 attributes, the four attribute structures in Figure 2 were designed as same as those in Study 1. The element of the Q-matrix was either 0 or 1 for A1; 0, 1, or 2 for A2; and 0, 1, 2, 3, or 4 for A3. For each attribute structure, the Q-matrix had J = 35 items and contained the matrix
Matrix

Attribute structures for Study 2.
Results
The classification accuracy results of Study 2 are reported in Table 8. In this study, the CPCR values of the GDD-P method under each hierarchy for the 10% slippage probability were all above 0.98; for the 20% slippage probability, the values were all higher than 0.94; for the 30% slippage probability, the values decreased to no lower than 0.82; for the 40% slippage probability, the lowest CPCR value was around 0.60. The AAMR values of the GDD-P method for the 10%, 20%, 30%, and 40% slippage probabilities under the four hierarchies were all above 0.87. Due to the more stringent definition of CPCR, the CPCR values showed in Table 8 were consistently lower than the corresponding AAMR values.
Mean CPCR and AAMR for the GDD-P Method in Study 2.
Note: CPCR = correct pattern classification rate; AAMR = average attribute match rate; GDD-P=GDD method for polytomous response.
In comparing the CPCR results under different attribute hierarchies, it was found that for each slippage probability, the values for linear and convergent hierarchies were all higher than those for divergent and independent structures. This trend became clearer as the slippage probability increased from 10% to 40%. The AAMR values had a similar trend as the CPCR values. Note that other conditions that represented different numbers of K and J have been considered in simulation but the results were generally similar to the results included in the preceding studies.
By summarizing the results for the two studies, it should be noted that the polytomous Q-matrix design is helpful but not necessary to any polytomous-score test. The choice between dichotomous Q-matrix and polytomous Q-matrix could closely depend on the number of attributes in a test. When the polytomous Q-matrix was designed to analyze the polytomous scores, the attribute number and the levels of the attributes should be designed to be relatively small. Otherwise, the
If the attribute patterns are only partially ordered, which might render the sum score not a good statistic to model the relationship between the ability and an attribute pattern, the distribution of the attribute patterns in the first two studies may be not practical. With this limitation in mind, the third study was designed.
Study 3
Condition 3: Data From Dichotomous Q-Matrix and Higher Order Latent Trait Models
Here, the stringent assumption about the attribute structures in the first two studies was relaxed so that the attribute pattern number could be 2
K
− 1 even if some attributes are correlated. In addition, different from the attribute pattern generation method in the first two studies, the idea of the higher order latent trait models for cognitive diagnosis (de la Torre & Douglas, 2004) was implemented to generate the true attribute patterns of N = 5,000 examinees. In this study, the attribute number was fixed to K = 5, item number J = 31, and the dichotomous Q-matrix composed of all the attribute patterns except for the all-zero pattern was used. The unidimensional trait θ was assumed to follow the standard normal distribution. The probability of an attribute pattern
Results
As shown in Table 9, the CPCR values of the GDD-P method for the 10% and 20% slippage probabilities were all above 0.99. The CPCR values of the method for the 30% and 40% slippage probabilities were higher than 0.96 and equal to 0.84, respectively. AAMR values of the method for the 10%, 20%, 30%, and 40% slippage probabilities were all higher than 0.95.
Mean CPCR and AAMR for the GDD-P Method in Study 3.
Note: CPCR = correct pattern classification rate; AAMR = average attribute match rate; GDD-P=GDD method for polytomous response.
A Real Data Analysis
Data Description
In this section, real data were chosen as an example to illustrate the use of the GDD-P method for diagnosing the attribute patterns of the examinees. The test was a science booklet within PISA (The Program for International Student Assessment) administered in 2006 and contained 22 items. The data were composed of the responses of 1,779 students from mainland China, Macao, Hong Kong, and Taipei. According to PISA 2006 science assessment framework, three competencies (i.e., attributes), namely, identifying scientific issues, explaining phenomena scientifically, and using scientific evidence, were probed in the test. Although the focus of PISA is not in individual but aggregated scores, cognitive diagnosis analysis of a low-stakes test was used here as an example of how additional individual-level information can be obtained when some items in the test represent questions from the same stems.
According to the design of PISA 2006 Science, one or more questions were asked in every item, and each question probed only one attribute. The polytomous and dichotomous Q-matrices for the test were identified based on this design (see Table 10). To do so, the frequency of every attribute that was probed in every item was identified and used as the corresponding element of the polytomous Q-matrix. For instance, there was only one question asked for Item 7 and it probed A3, so it produces the corresponding vector (0, 0, 1) for the polytomous Q-matrix. For Item 4, three questions were asked: One probed A1 and the other two probed A2, so the corresponding vector was (1, 2, 0) for the polytomous Q-matrix. Following this design, the number of attribute patterns for the polytomous Q-matrix should be 4 × 4 × 3 = 48. Second, the dichotomous Q-matrix can be defined in the traditional way by reducing the positive values of the polytomous Q-matrix to 1. The number of attribute patterns for the dichotomous Q-matrix was 23 = 8.
Two Types of Q-Matrix for the PISA Test.
Note: PISA = The Program for International Student Assessment; A1 = identifying scientific issues; A2 = explaining phenomena scientifically; A3 = using scientific evidence.
Analysis and Results
For the polytomous Q-matrix, the mapping from the attribute patterns to the IRPs follows the item scoring rule of Equation 4. As discussed in the “Q-Matrix Design and Scoring Method” section, a submatrix which contains the patterns corresponding to Items 2, 3, 7, 11, 15, 16, 17, and 22 can be found in the Q-matrix to make sure the mapping is one to one. Due to space limitation, the corresponding relationship was not shown. Similarly, the mapping between the attribute patterns and the IRPs for the dichotomous Q-matrix was also one to one. This is because the reachability matrix was also contained in the Q-matrix. The IRPs for the dichotomous Q-matrix are shown in Table 11.
Ideal Response Patterns for the Dichotomous Q-Matrix of the PISA Test.
Note: PISA = The Program for International Student Assessment; A1 = identifying scientific issues; A2 = explaining phenomena scientifically; A3 = using scientific evidence.
The GDD-P method, in which the generalized distance index was constructed by the GRM, was used to analyze the data with the two types of Q-matrices given in Table 10, respectively. Based on the classification results, the means of the students’ attribute patterns (i.e., the extent to which each attribute is mastered) from four different geographic areas were calculated (see Table 12). For the polytomous Q-matrix in Table 10, the attribute pattern that corresponds to the highest level of attribute mastery is (3, 3, 2). The highest and lowest levels of attribute masteries were bolded and italicized, respectively, for each attribute. With the polytomous Q-matrix, students from Hong Kong had the highest mastery of A1, whereas students from Mainland China had the highest masteries of A2 and A3; students from Macao had the lowest mastery levels for the three attributes. However, when the dichotomous Q-matrix was used to identify the students’ attribute patterns, different trends emerge: Students from Hong Kong did not have the highest mastery of A1; students from Mainland China had the lowest mastery of A2; although students from Macao still had the lowest mastery of A1 and A3, it was surprising that they had the highest mastery of A2. This phenomenon could be an indication of the information loss in converting the polytomous Q-matrix to a dichotomous Q-matrix.
Mean of Students’ Mastery Number Across Three Attributes Representing Science Competency.
Note: A1 = identifying scientific issues; A2 = explaining phenomena scientifically; A3 = using scientific evidence.
Summary and Discussion
In this article, the authors develop an IRT-based method, the GDD-P, to carry out cognitive diagnosis for polytomous items. The core idea of the method is to identify IRPs by the generalized distance index for polytomous response, which can not only work with the scoring method used in the simulation studies but also with any other item scoring method for polytomous response. As an extension of Tatsuoka’s dichotomous Q-matrix, the Q-matrix with polytomous elements was also introduced in this article to accommodate attributes that have more than two levels. In addition, the use of the
Although the proposed procedure is promising, the practicality of applying the method to real testing situations requires a thoughtful consideration of a few important issues. First, in practical implementations of the GDD-P method, the number of attribute patterns should be carefully considered because too many attribute patterns will either affect the classification accuracy or require longer tests. As simulation studies indicate, if the number of attributes is not small and almost none of the attributes have extreme prerequisite relationships, there will be a large number of attribute patterns. If the attributes are probed at different levels and some elements of polytomous Q-matrix are not very small, the total number of attribute patterns may be too large to be practicable. Therefore, the attribute number, prerequisite relationship of attributes, and the polytomous Q-matrix design, all of which are related with the total number of attribute patterns and the correct classification rate, need to be considered before using the GDD-P method.
Second, the simulation studies and real data analysis showed that the GDD-P method can be implemented under both the dichotomous and polytomous Q-matrices. The choice between the two types of Q-matrices should be made according to purpose of the test and the constraints under which the test will be administered. Trade-off exists in choosing between the two Q-matrices. Compared with the dichotomous Q-matrix, the polytomous Q-matrix is capable of probing an attribute at different levels, which makes the diagnostic results more informative. However, the design of the polytomous Q-matrix would result in a larger number of attribute patterns, which can affect the precision of the classification of the attribute patterns for a fixed test length. A basic factor that can be considered to decide the more appropriate Q-matrix is the attribute number. If this number is small, the polytomous Q-matrix can be considered because a relatively simple Q-matrix design can be used to probe the levels of attributes. Otherwise, the dichotomous Q-matrix may be a better choice for analyzing the test with polytomous scores.
Third, it is important for the IRT-based methods such as the GDD-P to choose the proper item response functions to construct the discriminating indices. Although the IRT model used by the GDD-P method in this work was mainly chosen from the unidimensional models, one may also consider the appropriateness of multidimensional IRT models for constructing the discriminating index. Furthermore, to get more accurate cognitive diagnosis in practice, model selection procedure can be used to choose which IRT model has the best fit to the real data.
Finally, as shown by several researches on dichotomous Q-matrix (e.g., de la Torre, 2008; Rupp & Templin, 2008), Q-matrix misspecification can affect the accuracy of cognitive diagnostic methods for the examinee classification. Because similar problem can occur with the polytomous Q-matrix, domain experts should exert the utmost care to ensure that attribute levels of the polytomous Q-matrix are correctly specified. In addition, techniques for validating the specifications of the polytomous Q-matrix based on empirical data should also be given proper attention in the future.
Footnotes
Appendix
Acknowledgements
The authors would like to thank the editor and the anonymous reviewers for insightful comments and valuable suggestions. The first author also would like to thank China Scholarship Council for her visiting scholarship in USA.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by National Natural Science Foundation of China (11171029).
