Abstract
Advances in educational technology provide teachers and schools with a wealth of information about student performance. A critical direction for educational research is to harvest the available longitudinal data to provide teachers with real-time diagnoses about students’ skill mastery. Cognitive diagnosis models (CDMs) offer educational researchers, policy makers, and practitioners a psychometric framework for designing instructionally relevant assessments and diagnoses about students’ skill profiles. In this article, the authors contribute to the literature on the development of longitudinal CDMs, by proposing a multivariate latent growth curve model to describe student learning trajectories over time. The model offers several advantages. First, the learning trajectory space is high-dimensional and previously developed models may not be applicable to educational studies that have a modest sample size. In contrast, the method offers a lower dimensional approximation and is more applicable for typical educational studies. Second, practitioners and researchers are interested in identifying factors that cause or relate to student skill acquisition. The framework can easily incorporate covariates to assess theoretical questions about factors that promote learning. The authors demonstrate the utility of their approach with an application to a pre- or post-test educational intervention study and show how the longitudinal CDM framework can provide fine-grained assessment of experimental effects.
Introduction
Cognitive diagnosis models (CDMs) offer educational researchers, policy makers, and practitioners a psychometric framework for designing instructionally relevant assessments (Huff & Goodman, 2007; Leighton & Gierl, 2007). Rather than relying upon broad measures of achievement, CDMs relate a collection of fine-grained skills to performance on educational tasks. A direct benefit of the CDM framework is that assessment results can be used to diagnose students’ skill mastery and provide teachers with information about students’ strengths and to illuminate skill deficits that must be addressed with educational interventions.
Prior research offered methodological developments and presented applications of CDMs. For example, early applications diagnosed student mastery of performance on fraction-subtraction questions (Mislevy & Wilson, 1996; Tatsuoka, 1984). The results of CDM analyses are becoming more relevant today in the modern, computerized classroom. Namely, the line between assessment and instruction is blurring with the development and dissemination of online learning tools (e.g., see a review by Ye et al., 2016) that allow students to practice content under the supervision of teachers. Educational technology provides teachers and schools with access to a wealth of information about student mastery and learning. Students are able to complete computerized assessments, and the responses can be analyzed in real time to support teachers’ instructional decisions.
CDMs have a clear role in supporting formative classroom assessments. Yet, one limitation is that the originally proposed CDM framework is static, in that, it was created for cross-sectional designs to diagnose student skills at a point in time rather than to model the trajectory students follow during the learning process. The modern classroom provides a wealth of longitudinal information that can be harvested to improve diagnoses and uncover common learning trajectories that students tend to follow toward skill mastery.
Modeling learning trajectories (i.e., the changes in skill profiles over time) is a familiar problem to the educational data mining community in which the method of Bayesian Knowledge Tracing (BKT) has become a dominant methodology (Corbett & Anderson, 1994). Recently, longitudinal CDMs emerged naturally from the extension of static CDMs and can be seen as new approaches of BKT. Both longitudinal CDMs and BKT are concerned with modeling the presence or absence of skills or attributes, and how this changes over time. They also both require a measurement model, such as the deterministic inputs, noisy “and” gate (DINA; Haertel, 1989; Junker & Sijtsma, 2001) model that is used here, which involves the possibility of slipping or guessing that is a familiar notion in BKT models. The transition model from presence or absence of a skill is common to both of these disciplines, and a challenging aspect is how to deal with multiple skills. For instance, existing longitudinal CDMs employ first-order (Chen et al., 2018; Kaya & Leite, 2017; Li et al., 2016) and higher order (Wang et al., 2017) hidden Markov models (HMMs). Y. Xu and Mostow (2011, 2012) address transition models in BKT with a logistic regression approach. Another common concern is how covariates may be employed to modify transition probabilities. See, for example, González-Brenes et al. (2014) from the BKT literature in which Feature Aware Student Knowledge Tracing (FAST) is introduced, or Wang et al. (2017) in which a longitudinal CDM is introduced that adjusts for practice and intervention effects, and includes a continuous random effect for learning ability to help capture dependence. Similarly, latent factor knowledge tracing (LFKT) makes clever use of both continuous and binary attributes by allowing the slipping and guessing parameters of a BKT model to depend on an individual’s latent continuous ability as discussed in Klingler et al. (2015) and Khajah et al. (2014). Although BKT and longitudinal CDMs have grown out of slightly different disciplines, the basic structure is identical, and particular methods only differ in how they address the measurement model, joint distribution of attributes, and the longitudinal learning transition aspect of the model.
In this article, the original contribution is to introduce another strategy to model the learning trajectory space that builds upon prior research on cross-sectional CDMs that approximated the latent class structure,
The remainder of this article includes four sections. The first section reviews the static and longitudinal CDM frameworks. The second section introduces the proposed learning trajectory model based on a multivariate probit model. Note that simulation results on model parameter recovery are presented in Online Appendix A. The authors consider an application in the third section with a data set involving a pre- or post-test experimental design to evaluate the impact of different kinds of feedback on mathematical skill development. One important finding from this application is that the method provides improved fit to the data in comparison with a longitudinal item response theory (IRT) approach. The third section also shows how to incorporate covariates in the method to evaluate educational interventions. Finally, the authors discuss the implications of the study and offer future research directions.
Cognitive Diagnosis Modeling
The authors first provide an overview of the cross-sectional CDM framework and then outline a more recently developed longitudinal CDM strategy. Note that they provide a brief review of previous research due to space considerations and direct readers to the original papers for specific details about equations and estimation algorithms.
The Cross-Sectional CDM Framework
Under the CDM framework, student skill mastery is represented as a binary latent variable
The psychometric literature on CDMs includes numerous item response functions (IRFs) to relate
In this article, the authors present an application using the DINA model. The DINA is a conjunctive model with ideal responses defined as
Readers are directed to de la Torre (2009) and Culpepper (2015) for additional details about DINA model parameter estimation.
An important feature of CDMs is the latent class structure, which is denoted by the
Note that the number of elements in
The Longitudinal CDM Framework
The previous section outlined the classical CDM framework where students’ skill profiles are fixed. The authors next discuss an extension to longitudinal cases, which are becoming more popular (Chen et al., 2018; Kaya & Leite, 2017; Li et al., 2016; Madison & Bradshaw, 2017; Wang et al., 2017; Ye et al., 2016) given the utility such models offer for classroom assessments where students learn.
We let
As discussed in the previous section, the IRF specifies the relationship between
The high-dimensional nature of the learning trajectory space is apparent in the data application below. In the application,
The authors next outline a multivariate probit model for learning trajectories and then report results from an application that estimated just 68 parameters. Although the application only concerns two time points, it may be extended to any number of time points and provides a flexible model for incorporating covariates and modeling the dependence of multiple attributes.
Multivariate Probit Model for Learning Trajectories
In this section, the authors introduce a model for describing student changes in skill profiles. In particular, the model for
where, for brevity, we let
There are many options for the density for
Bayesian Model Formulation
The authors next describe the Bayesian formulation for learning trajectories using the multivariate probit model, which builds upon the work of Lawrence et al. (2008). Parameter estimation for the multivariate probit model is notoriously challenging and one strategy is to augment the prior in Equation 2 by introducing additional latent variables. Specifically, the model for
where
We obtain the prior in Equation 2 from Equation 5 by integrating out
The multivariate probit model is not identified with unrestricted
where
It is important to note that the proposed Bayesian formulation offers a tractable Gibbs sampling algorithm for approximating the parameter posterior distribution. Readers are directed to Online Appendix A for additional technical details regarding the full conditional distributions for
Application to an Educational Intervention Study
In this section, the authors apply the multivariate probit model to data from the Adaptive Content With Evidence-Based Diagnosis (ACED) evaluation study (Shute et al., 2008). They first provide an overview of the ACED data and then describe the implementation of the modeling approach. The standard approach for modeling longitudinal data is to model changes in a continuous latent variable over time. They compare the longitudinal CDM framework with a longitudinal, two-parameter IRT model to evaluate the relative fit of the two modeling frameworks. They conclude this section with a discussion of results.
Overview of ACED Data
In the ACED evaluation, Shute et al. (2008) used a pre-test treatment–post-test design to assess the effect of different training strategies. Students first completed a pre-test consisting of 25 test items and were then randomly assigned to one of four conditions to receive a 1-hr practice intervention. The control group (Condition 4, N4 = 55) received content irrelevant to math, while the other three intervention groups received practice related to test items. Conditions 1 (N1 = 71) and 2 (N2 = 75) are adaptive conditions where tasks in the practice were presented to students based on their solution history, while the linear condition (Condition 3, N3 = 67) presented tasks in a predetermined order. The feedback of practice tasks in Conditions 1 and 3 were verification of correctness and explanation, while respondents randomly assigned to Condition 2 only received verification as to the correctness of solutions and were not provided additional explanation. After the 1 hr period, students took a post-test on 25 items where the required skills, difficulty, and format of each item were matched with the pre-test.
The data set contains responses to
Q-Matrix for ACED Pre- or Post-Test and the Estimated Item Parameters (EST) and Standard Errors (SE) for Forms A and B.
Note. The skills are defined as follows: (1) identify geometric sequence, (2) find the common ratio of sequence, (3) generate algebraic expressions for the nth term, (4) extend the geometric sequence with starting terms and common ratio, (5) generate a geometric sequence for the given situation, (6) interpret the table representing a geometric sequence, (7) generate a verbal rule for geometric sequence, and (8) use geometric sequence to model visual patterns. ACED = Adaptive Content With Evidence-Based Diagnosis; EST = Estimated Item Parameters.
Multivariate Probit Implementation
The authors implement the multivariate probit learning trajectory model for the ACED data by introducing three dummy variables to distinguish the four experimental conditions. Let
The design matrix for subject
where
Finally, the authors fixed the underlying correlation matrix over time so that
The simulation study in Online Appendix B mimics the setup of the real data, and the results suggest the algorithm converges after 2,000 iterations; therefore, the authors estimated the model parameters using MCMC with a chain of length 10,000 and a burn-in of 5,000. They coded the MCMC algorithm with C++ code and R (Eddelbuettel et al., 2011). They estimated model parameters (i.e.,
Results
Model fit and comparison with two-parameter, longitudinal IRT
Longitudinal models with continuous variables are widely used in education and it is necessary to evaluate the extent to which the proposed method improves upon existing strategies. The authors compare their method with a traditional two-parameter, longitudinal IRT model (e.g., see Albert, 1992) and assess relative model fit using the Deviance Information Criterion (DIC; Spiegelhalter et al., 2002). Note that the Bayesian formulation for the longitudinal IRT model is summarized in Online Appendix C. They compute the marginal DIC for their multivariate probit model and compare it with the marginal DIC for the longitudinal IRT model. Their method has a DIC value of 14,995.5, which is smaller than the DIC value of 16,624.5 for the longitudinal IRT model. Accordingly, they next discuss the parameter estimates for their model given the relative improvement in fit over the IRT model.
Multivariate probit model parameter estimates
The last four columns in Table 1 report the estimated item parameters for the DINA model. The point estimate is computed by sample posterior mean, and the standard error is computed by sample posterior standard deviation. The estimation shows a consistent pattern in item parameters for Forms A and B. For example, Test Items 5, 9, 13, 17, 18, 21, 22, and 25 in both forms have estimated slipping parameters higher than 0.5 and small guessing parameters, indicating these might be difficult questions; Test Items 11 and 20 in both forms have estimated guessing parameters greater than 0.4. Also, to address the comparability of forms, the authors performed signed-rank tests on the slipping and guessing parameters to assess whether the parameters differed between the two forms. The p values for slipping and guessing parameters were of .56 and .89, respectively, which suggests that the paired estimated values are from the same distribution.
Table 2 reports the estimated parameters of
Estimated
Note. The skills are defined as follows: (1) identify geometric sequence, (2) find the common ratio of sequence, (3) generate algebraic expressions for the nth term, (4) extend the geometric sequence with starting terms and common ratio, (5) generate a geometric sequence for the given situation, (6) interpret the table representing a geometric sequence, (7) generate a verbal rule for geometric sequence, and (8) use geometric sequence to model visual patterns. ACED = Adaptive Content With Evidence-Based Diagnosis; Int. = intercept; Cond. = condition.
Several additional observations are available from the results in Table 2. For Skills 1, 2, 3, and 8 whose
Table 3 reports the estimated correlation matrix,
Estimated Correlation Matrix
Note. The skills are defined as follows: (1) identify geometric sequence, (2) find the common ratio of sequence, (3) generate algebraic expressions for the nth term, (4) extend the geometric sequence with starting terms and common ratio, (5) generate a geometric sequence for the given situation, (6) interpret the table representing a geometric sequence, (7) generate a verbal rule for geometric sequence, and (8) use geometric sequence to model visual patterns.
The authors next summarize skill mastery rates in the posterior at pre- and post-test by experimental condition to provide insight as to the magnitude of change. Table 4 shows the transition of estimated mastery rate of each skill for the four conditions. Overall, the estimated mastery rates are consistent with the estimated parameters in Table 2 and with prior investigations with the ACED data. Namely, Shute et al. (2008) concluded that Group 1 performed best in the post-test, followed by Groups 2, 3, and 4. The method provides additional fine-grained insight about skill acquisition. For example, Group 3 increased the most on mastery of Skill 4 (extend) and only Group 1 increased on mastery of Skill 8 (visual). For Skill 3, the results indicate that if no training is involved, students would forget what they have mastered, and this is consistent with Shute et al.’s (2008) result that the test scores slightly decreased in Groups 3 and 4.
Estimated Mastery Rate for Each Skill by Experimental Condition.
Note. The skills are defined as follows: (1) identify geometric sequence, (2) find the common ratio of sequence, (3) generate algebraic expressions for the nth term, (4) extend the geometric sequence with starting terms and common ratio, (5) generate a geometric sequence for the given situation, (6) interpret the table representing a geometric sequence, (7) generate a verbal rule for geometric sequence, and (8) use geometric sequence to model visual patterns.
Discussion
The authors proposed a new framework for modeling the high-dimensional learning trajectory space. In particular, they extended the multivariate probit model to longitudinal contexts and provided several examples of how the model can be applied to describe changes in skill acquisition. In this final section, they highlight important implications from the study, offer future research directions, and provide concluding remarks.
Implications for Research
Several studies proposed longitudinal CDM approaches, and the method improves upon prior research in several ways. First, the method does not explicitly assume that learning trajectories are non-decreasing (e.g., see Chen et al., 2018; Wang et al., 2017). It may be the case that students move between the mastery and non-mastery states during the learning process and the method provides a framework for tracking such changes. However, there are instances where it may be reasonable to impose the condition that skills cannot be unlearned. In such cases, the authors can incorporate the non-decreasing assumption in their framework by restricting the support for the attributes and augmented data. That is, they can easily enforce conditions such as
The utility of the framework, and longitudinal CDMs more generally, will be determined by its ability to support practitioners and researchers in addressing fundamental theoretical questions about factors that promote learning. The authors provide some evidence of the value of their framework through their application. Specifically, the results of the ACED data application provided evidence about which feedback interventions promoted skill acquisition under the pre- or post-test design. In particular, the authors found evidence that Condition 1 was beneficial across skills, whereas Conditions 2 and 3 were more beneficial for specific skills. Their application results could be disseminated to practitioners to provide specific recommendations as to which types of feedback promotes learning of which skills in an effort to create student-tailored instructional interventions.
The application also provided an example of the relative merits of longitudinal CDMs in comparison with IRT models that use continuous latent variables. The authors found evidence that their model improved upon the fit of a pre- or post-test two-parameter IRT model. The difference in model performance may be expected a priori given the ACED data were originally designed within a cognitive diagnosis assessment framework, so an application of IRT to the ACED data is retrofitting (Haberman & von Davier, 2006). Furthermore, the relative fit of a CDM versus an IRT model may be related to the attribute structure. The authors found evidence that the attributes were mostly uncorrelated in the ACED data. In contrast, an IRT model may be better suited for circumstances with correlated attributes following a higher order factor structure.
Limitations and Future Directions
There are several directions for future research. First, the authors employed the DINA measurement model with the multivariate probit model for learning trajectories, which is a more parsimonious CDM that may not apply in all applications. Future research should consider applications of the learning model with more general CDMs, such as the general DINA model, the loglinear cognitive diagnostic model, or the general diagnostic model (de la Torre, 2011; Henson et al., 2009; von Davier, 2008). Second, computerized assessments provide opportunities to collect additional information about students that may improve the accuracy of classifications or offer insight about the process of student learning. Future research should incorporate other ancillary information, such as response times, to characterize student performance. Third, additional research is needed to assess minimum sample size requirements for longitudinal CDM studies. Fourth, the authors fixed the attribute tetrachoric correlation matrix over time in their application to reduce the number of estimated parameters. It is possible the attribute relationships change over time and future applications should explicitly assess such structural changes.
Fifth, the learning model in their application included an intercept for the pre- and post-test rather than an intercept and linear term as used in latent growth curve models. The authors recommend future applications of their model with more than two time points should consider parameterizations that use unstructured growth curves or polynomials of time as applied in existing growth curve models. Furthermore, in cases with at least three time points, an extension of the growth curve framework is to model individual differences in the growth curve regression coefficients to account for additional individual differences in how skills evolve. Specifically, students may differ in terms of intercepts and linear rates of growth. The proposed framework can be easily extended by assuming conditional independence of attributes over time given the growth curve random effects.
Sixth, the authors reported changes in the probability of possessing skills for each attribute (e.g.,
Finally, the previously developed HMMs account for dependence over time by relating skills at time
Concluding Remarks
The authors presented a new framework for modeling learning trajectories within the CDM framework that builds upon multivariate growth curve models. This study addresses the opportunities available in the modern classroom where advances in educational technologies provide practitioners and researchers with a wealth of information about student performance. Longitudinal CDMs are designed to provide fine-grained classifications over time to support instructional decisions, and future research must continue to advance methods that support data-driven policy recommendations to improve the condition of education.
Supplemental Material
Online_Appendix – Supplemental material for A Multivariate Probit Model for Learning Trajectories: A Fine-Grained Evaluation of an Educational Intervention
Supplemental material, Online_Appendix for A Multivariate Probit Model for Learning Trajectories: A Fine-Grained Evaluation of an Educational Intervention by Yinghan Chen and Steven Andrew Culpepper in Applied Psychological Measurement
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by National Science Foundation Methodology, Measurement, and Statistics Program Grant #1632023 and Spencer Foundation Grant #201700062.
Supplemental Material
Supplementary material is available for this article online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
