Abstract

Cognitive diagnostic models (CDMs) have been a focus of research in recent years as a way to provide diagnostic information instead of a single test score on educational and psychological tests. CDM-based tests are designed to provide attribute profiles, or classifications of examinees that usually involve mastery or nonmastery, on each of the attributes measured by the test (Rupp, Templin, & Henson, 2010). As our knowledge of CDMs expands, there is likely to be a need for a way to link attribute profiles and observed test scores. The purpose of this brief report is to introduce and demonstrate a method for estimating an observed score distribution from an attribute profile.
Although any CDM can be used with the proposed method, the authors chose the Log-Linear Cognitive Diagnostic Model (LCDM; Henson, Templin, & Willse, 2009) because many popular CDMs can be derived as special cases. The probability of a correct response to item i for an individual with an attribute profile
where the qi’s are the entries of the Q-matrix, and
Method
Observed score distributions for each attribute profile were estimated by adapting a recursive formula for a compound binomial, introduced by Lord and Wingersky (1984) and elaborated upon by Kolen and Brennan (2004), that was used originally for item response theory observed score equating. For a test composed of two items, the proportion of examinees with a score of 0 is
where
Results
We demonstrate the estimation procedure using two sets of items taken from the Trends in International Mathematics and Science Study (TIMSS) Assessment 2003 for Grade 8 Mathematics. These items and data were used in a previous study (Skaggs, Hein, & Wilkins, in press) in which Q-matrices with varying numbers of attributes were fit to the data. Here, a Q-matrix for four attributes is used: whole numbers, geometry, fractions, and algebra. For the first set of items, estimates from 20 dichotomously scored items that appeared across six TIMSS booklets were used. Ten items measured one attribute while the other 10 items measured two attributes. The items were selected because their parameters were precisely estimated, and the item response data fit the CDM. In addition, there was fairly equal coverage of the four attributes, thus simulating a situation in which a test was generated from an item bank by selecting high quality items. Because of the TIMSS booklet design, no examinees responded to all 20 items, and so observed score distributions could not be directly calculated.
In addition, a second set of 30 dichotomously scored items from TIMSS Booklet 1 was used to compare observed versus estimated raw score distributions. In this booklet, coverage of the four attributes ranged from six to 13 items per attribute. Nineteen items measured one attribute while 11 items measured two attributes. This comparison is not ideal since not all items fit the CDM equally well, nor was there equal coverage of the attributes, but it does serve as a rough validation of the procedure. The mdltm software program (von Davier, 2005) was used to estimate item parameters and examinee profiles. The estimated distributions were easily calculated in an EXCEL worksheet.
Table 1 shows the estimated observed score distributions for five attribute profiles in which different numbers of attributes were mastered. As expected, attribute profiles in which more attributes have been mastered resulted in higher estimated observed scores. It should be noted here that the location and shape of the estimated distributions depend on the item parameter estimates. As model intercepts decrease, the estimated distribution shifts lower, and vice versa. As attribute slopes increase, the distributions for attribute profiles become more peaked and therefore distinct from each other.
Estimated Observed Score Distributions for Five Attribute Profiles.
If the intercepts and slopes differ systematically between attributes, it would be possible to have estimated score distributions for attribute profiles with fewer attributes mastered that are higher than profiles with more attributes mastered. This inversion of attribute profile distributions, however, is not the most likely scenario. In Figure 1, the estimated distributions for all 16 attribute profiles were plotted. These form five distinct groups of attribute profiles, each based on the number of attributes mastered. Thus, for example, the five attribute profiles in which two attributes were mastered all had very similar distributions. The distributions for attribute profiles consisting of three to 10 attributes were estimated using TIMSS item parameter estimates for the CRUM model. In all cases, it was found that estimated distributions for profiles with the same number of attributes mastered were very similar. This means that at extreme score ranges, the number of attributes mastered can be predicted quite well. For example, examinees who score at the extremes (say greater than 17 or less than 6) are likely to have mastered or not mastered most of the attributes, respectively. For these examinees, further diagnostic testing might not be necessary. It is important to note that this relationship may not generalize to noncompensatory CDMs because, in such models, mastery of all attributes is necessary to increase the probability of a correct response.

Estimated attribute profile distributions.
To help validate this procedure, the TIMSS items and examinee responses from Booklet 1 were used. There were 757 examinees, 309 of which were estimated to have mastered all four attributes. The raw score distribution of these examinees was compared with the distribution estimated using the above procedure. This comparison is shown below in Figure 2. The observed distribution is not smooth, due to the relatively small sample size, but it does roughly correspond to the expected distribution. This finding, however, must be viewed as tentative. Four items had parameter estimates with relatively large standard errors and fit statistics, and this booklet did not cover all four attributes equally well. Item parameter estimation error undoubtedly impacts the precision of the estimated score distributions. In addition, comparisons involving other profiles may show different results. Still, these results do show some promise for the proposed estimation method.

Expected versus observed score distribution for profile 1111.
This method of score distribution estimation has been used in at least two studies so far: by Skaggs, Hein, and Wilkins (in press) for standard setting and by Xin and Zhang (2015) for test equating. However, it must be pointed out that research on CDMs is relatively new, and there are as yet few actual applications of CDMs in large-scale testing. Nevertheless, one use of the proposed method that can be predicted is its use as a screening device. Depending on the number of attributes, a test that accurately estimates attribute profiles could be quite lengthy or require a computer adaptive administration. One likely purpose of a CDM-based test will be to determine whether examinees are in need of an educational or psychological intervention. Based on the relationship shown in Figure 1, a shorter version of such a test could serve as a screening device to quickly classify the most extreme cases and thus save significant testing time and resources. Along this vein, another potential application would the equating of the shorter screening form with its full-length counterpart.
It is important to point out several limitations and directions for future research. This method of estimating score distributions depends on known item parameters, which instead are estimated from data. The accuracy of these estimates is in turn dependent on correct model and Q-matrix specification and sample size. The effect of these factors should be investigated. In addition, the results in this brief report are based on a compensatory model. The method should be replicated for noncompensatory CDMs.
Footnotes
Acknowledgements
The authors would like to thank the editor and two anonymous reviewers for their helpful comments on earlier versions of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Science Foundation REESE program (DRL-1109429).
