A deeper knowledge tracking model integrating cognitive theory and learning behavior

Abstract

Knowledge tracing (KT), which aims to trace human knowledge learning process by using machines, has widely applied in online learning systems. It dynamically models student’s knowledge states in relation to different learning factors through their learning interactions. Recently, KT has attracted many researches attention due to its good performance to using deep learning. Although most of KT models have shown outstanding results, they have limitations: either ignore the human cognitive law and learning behavior, or lack the ability to go deeper modeling to trace knowledge state. In this paper, we propose a deeper knowledge tracking model integrating cognitive theory and learning behavior (CLDKT). It united the advantages of memory network and recurrent neural network of the existing deep learning KT models for modeling student learning. To better implement CLDKT, we add the residual network (ResNet) to realize the deep modeling of learning behaviors. Extensive experiments on three open benchmark datasets to evaluate our model. Experimental results demonstrate that (I) CLDKT outperforms the state-of-the-art KT models on students’ performance prediction. (II) CLDKT can deeper modeling to trace knowledge state owing to the ResNet import. (III) CLDKT has better interpretability and predictability, which proves the effectiveness of the knowledge tracing model integrating cognitive law and learning behavior.

Keywords

Knowledge tracing cognitive law learning behavior ResNet deep learning

1 Introduction

One of the major characteristics of human intelligence is the ability to know one’s true knowledge, which means that humans can track the states of one’s knowledge in specific skills or concepts [3]. This enables human to identify gaps in their knowledge states to personalize their learning experience, then work in different industries. With the development of artificial intelligence (AI) in modeling various areas of human cognition [8, 9], human want to take advantage of the AI trace human knowledge state in online education. This has stimulated the research of knowledge tracing (KT), which is based on the learner’s performance on learning tasks in the learning process, the learner’s knowledge mastery state is modeled, the learner’s knowledge mastery level is traced and the probability of the learner answering the question correctly in the next moment is being predicted.

With the development of online education,the growths of online learning platforms such as massive open online courses (MOOCs) [19], intelligent tutoring systems, educational games, and learning management systems is experiencing rapid expansion. A vast amount of learning process datasets has been generated through these platforms, knowledge tracking can utilize these datasets to model students’ learning processes, providing personalized learning for individuals, teaching guidance for educators, and an evaluation basis for administrators. Nonetheless, it is a highly challenging task to employ artificial intelligence methods in modeling the human learning process for knowledge tracking, as human learning is influenced not only by cognitive factors [2] (e.g. memorizing, comprehension, attention, forgetting, and guessing, etc.) but also by individual learning ability (i.e. learning efficiency and interest).

Current mainstream knowledge tracing models can be categorized into two groups:traditional machine learning KT models and deep learning KT models. Bayesian knowledge tracing (BKT) [1] model is the representative model of the traditional machine learning KT. It utilizes the Hidden Markov Model (HMM) to separately model each knowledge skill, predicting learners’ mastery of specific points without considering correlations between them. While these models have made some progress, their oversimplification of the human learning process limits applications in real-world scenarios. Inspired by the impressive performance of deep leaning, several deep learning KT models have been developed in recent years. Deep knowledge tracing (DKT) [16] is a pioneering model was first proposed by Piech et al. in 2015, which based on Recurrent Neural Networks (RNN) to model the student knowledge states by a sequence of hidden states that implicit in historical learning record. Compared with BKT, the performance of DKT model has been greatly improved, but DKT represents a learner’s knowledge state of all knowledge concepts in one hidden state, which leads to it difficult to trace a learner’s level of mastery for a certain concept. To deal with the issue, Dynamic Key-Value Memory Networks (DKVMN) [23] was proposed, which uses dynamic key-value matrices to store knowledge skills and knowledge mastery state.

However, when implementing DKVMN models, it fails to take into consideration the impact of factors such as cognitive laws and learning behaviors on the predicted results.

In this paper, we propose a deeper knowledge tracking model integrating cognitive theory and learning behavior (CLDKT). The main innovations and contributions of this paper are:

First, integrating cognitive theory and learning behavior, the CLDKT model accounts for both learning and forgetting behaviors in the process of knowledge acquisition. The model considers three factors that affect knowledge tracing results: the interval between learners’ repeated exposure to knowledge skills, the interval between sequential learning sessions, and the number of times a learner repeats studying a particular knowledge skill.

Second, CLDKT designed a knowledge tracing model based on LSTM and DKVMN. The model includes attention layer, forgetting layer, learning layer, prediction layer and output layer. attention layer adopt a key-value memory of DKVMN model to trace knowledge states of learners, forgetting layers based on ResNet modelling a deeper neural network for feature extraction, and learning layer uses the LSTM to improve the problem of poor medium- and long-term dependence of DKVMN.

Third, experiments conducted on three publicly available real-world datasets demonstrate that CLDKT is capable of effectively modeling learners’ learning and forgetting behavior, tracking their knowledge mastery level in real-time, and exhibiting superior interpretability and accuracy performance compared to existing models.

The reminder of this paper is organized as follows. The related work is presented in Section 2. Section 3 detailed our proposed KT model CLDKT. Sections 4 discusses the experimental design and results. We conclude the paper in Section 5.

2 Related work

2.1 Knowledge tracing

Current mainstream knowledge tracing models can be categorized into two groups: traditional machine learning KT models and deep learning KT models [10]. Bayesian knowledge tracing (BKT) model is the representative model of the traditional machine learning KT. It utilizes the Hidden Markov Model (HMM) to separately model each knowledge skill, predicting learners’ mastery of specific points without considering correlations between them. While these models have made some progress, their oversimplification of the human learning process limits applications in real-world scenarios. Inspired by the impressive performance of deep leaning, several deep learning KT models have been developed in recent years. Deep knowledge tracing (DKT) is a pioneering model was first proposed by Piech et al. in 2015, which based on Recurrent Neural Networks (RNN) with Long Short-Term Memory (LSTM) units to model the student knowledge states by a sequence of hidden states that implicit in historical learning record. Compared with the knowledge tracing model based on traditional machine learning, the performance of the deep knowledge tracing model has been greatly improved. In recent years, scholars at home and abroad have been innovating in the field of in-depth knowledge tracing. Jian Zhang et al. proposed DKVMN, which uses dynamic key-value matrices to store knowledge skills and knowledge mastery status. Chun-Kit Yeung et al. on the basis of the DKVMN model, proposed a deep-knowledge tracing model (Deep Learning Based on knowledge Tracing Explainable Using Item Response Theory, Deep-IRT) that combines project response theory, enhancing the interpretability of in-depth knowledge tracing [21]; Nakagawa et al. applied graph neural network (GNN) to the field of knowledge tracing for the first time, and proposed Graph-based Knowledge tracing (GKT) [7]. This model complicates the knowledge skills of traditional linear relationships and makes it closer to the knowledge skill relationships in the real teaching environment. Convert this complex knowledge skill relationship into a graph structure, and then combine the knowledge tracing task with GNN, and improve the interpretability of the model through this knowledge skill modeling. Subsequently, Exercise-aware Knowledge Tracing for Student Performance Prediction(EKT) [12] is improved the problem of loss of semantic information of the topic during the training process. The model uses the BiLSTM network to mine text-level information and integrate it into the modeling process of changes in students’ knowledge level, further improving the interpretability of the model. However, EKT directly inputs the information of the text into the feature extractor, without taking into account the potential hierarchical nature of the topics, so it brings additional noise to the text. In order to solve this problem, Tong et al. used a hierarchical graph neural network to infer and aggregate the exercise text [18], which more completely characterized the exercise itself in the hierarchical structure, and effectively solved the problem of semantic loss of the topic.

Although many models have certain effects in terms of accuracy and interpretability, most of these models fail to consider cognitive theory and learning behavior influence on learning processing. According to the German psychologist Heathcote [11] and the forgetting curve proposed by Ebbinghaus [6], the degree of knowledge mastery and forgetting behavior in the learning process are closely related. Over time, the degree of knowledge mastery decreases, and repeated learning of the same knowledge skills can reduce the degree of forgetting. Qiu et al. considered the forgetting behavior in the knowledge tracing model and added a date label to the BKT model to represent the forgetting behavior of a certain day after the exercise interaction [17], but the model cannot achieve forgetting on a smaller time scale. Subsequently, Khajah et al. further expanded the BKT model, using the exercise interaction sequence to estimate the probability of forgetting, but failed to take into account the forgetting time after answering the question. Koki Nagatani et al. expanded the DKT model for the first time, using three forgetting factors [15], but the model did not establish a deep forgetting model, and the modeling of forgetting behavior was not sound enough.

2.2 LSTM network

LSTM is proposed to solve the problems of gradient explosion and gradient disappearance in recurrent neural networks (RNN) [22], and is generally used to process sequential data. As shown in Fig. 1, it is the internal structure of a single node cell in the LSTM network. Compared with the general RNN unit node, the state value C_t is added to the LSTM node. The internal node is implemented by three gate mechanisms. f_t determines which parts of C_t are retained to realize the forgetting gate; i_t implements the memory gate to determine the new information stored in C_t; o_t implements the output gate to determine the output value of the cell based on C_t.

Fig. 1

LSTM network.

2.3 ResNet

In a Convolutional Neural Network (CNN), network depth has a greater impact on network performance. When the number of layers of the CNN network reaches a certain depth, network degradation will occur. Studies indicate that this type of network degradation is not attributable to overfitting, but rather to issues with network optimization. He Kaiming et al. designed ResNet and introduced a residual learning framework to solve this problem [5].

Set the objective function of CNN to H (x), and the objective function of ResNet is F (x).As shown in Fig. 2, relu is the activation function and identity is the jump function. Through this jump link method, ResNet transforms the problem of CNN finding the objective function H (x) into finding the residual mapping function F (x) of the network. Therefore, in the event of network degradation during deep network training, updating only the partial weight of F (x) is sufficient to achieve better learning features.

Fig. 2

A residual learning framework.

2.4 Cognitive theory and learning behavior

Cognitive theory is a learning theory that explores learning rules by studying people’s cognitive process. The main perspectives encompassed are: the human as the subject of learning, active learning; The process by which humans acquire information is an exchange involving perception, attention, memory, comprehension and problem-solving. People’s perception, attention and understanding of external information are selective. Learning quality hinges on outcomes. The learning curve theory and the forgetting curve theory are two important theories in education studies, which provide the basic ideas for modeling the knowledge mastery of students. The learning curve theory argues that students can gain the knowledge with constant trails or exercises [13], the forgetting curve theory suggests that students have a decreasing memory on things they have learned so that their knowledge proficiency follows a declining curve. These theories show that knowledge proficiency has a great relationship with the individual learning behavior (repeated time interval, repeated learn times, the time interval between any two learning moments, and so on)of students.

3 The CLDKT model

3.1 Problem formulation

In an intelligent tutoring system, supposing the set of students S = {s₁, s₂, ⋯ s_i ⋯ s_t}, the students’ exercise interaction sequence χ = {(q_t, a_t) ∣ i = 1, 2, ⋯ , t - 1},where the tuple (q_t, a_t) is a exercise interaction cell, q_t is the exercise, a_t is a binary variable, a_t = 1 represents answered correctly, a_t = 0 is wrong. Suppose the set of exercise sequences is Q, which contains N knowledge concepts, and the knowledge concepts are stored in the memory matrix M^k (N × d_k). d_k represents the embedding size of the M^k matrix slot. The degree of mastery of these knowledge skill concepts is stored in the memory matrix $M_{t}^{v} (N \times d_{v})$ , and d_v is the embedding size of the slots of the matrix $M_{t}^{v}$ .

3.2 CLDKT framework

In this section, the CLDKT model is detailed. As shown in Fig. 3, the CLDKT model include five layers, attention layer, deep forgetting layer, LSTM learning layer, prediction layer and output layer. In CLDKT model, we use cognitive theory and learning behavior to model forgetting layer, import ReNet framework to deeper extract the feature of forgetting factors and LSTM is used train the model and update learner’s knowledge state.

Fig. 3

CLDKT model framework.

3.3 Attention layer

At any timestamp t, we define the embedding matrix A (Q × d_k), A and q_t are multiplied to obtain a k-dimensional embedding vector k_t. Then k_t and each memory slot M^k (i) in the knowledge skill storage matrix are used as the inner product. The attention weight ω_t is calculated through the softmax activation function of the fully connected layer as showed in Equation (1). $ω_{t} (i) = softmax (k_{t}^{T} M^{k} (i))$ (1) where softmax (z_i) = e^{z
_i}/∑_je^{z
_j}, $\sum_{i = 1}^{N} ω (i) = 1$ , ω (i) represents the i - th weight vector in ω_t, and M^k (i) is the vector of M^k.

3.3.1 Deep forgetting layer

Studies have shown that learning and forgetting behaviors in the learning process depend on the number of times the learner learns and the time interval of the last learning. Therefore, three factors that affect knowledge tracing results are selected in this article,namely:

RT (Repeated time interval): For the same knowledge skill, the time interval between the two studies before and after.

ST (Sequence time interval): Do not pay attention to the knowledge skills learned, the time interval between any two learning moments.

LT (Repeated learn times): In the historical learning record, the number of times learners used the same knowledge skills to answer questions.

Combine the three scalar RT, ST and LT to obtain the forgetting vector F_t = [RT_t (i) , ST_t (i) , LT_t (i)].The vector F_t (i) corresponding to each knowledge skill forms the matrix F_t (d_f × N).

As one of the branches of the CNN model, the ResNet model can build a deeper network level and perform multi-level abstraction, which has more advantages when learning complex inputs and outputs. In order to improve the modeling of forgetting behavior, the ResNet-12 network is selected to further process the forgetting vector, embed the forgetting feature vector into the ResNet-12 network for training, and perform feature extraction on the forgetting vector. The network structure of ResNet-12 is shown in Fig. 4.

Fig. 4

The ResNet-12 network.

After the ResNet-12 network training, get the extracted forgetting feature vector ${\tilde{F}}_{t}$ : ${\tilde{F}}_{t} = ResNet (F_{t})$ (2)

Figure 5 is the deep forgetting layer model,we use $\tilde{F_{t}}$ to connect the answer embedding vector v_t to obtain $v_{t}^{F} = [v_{t}, {\tilde{F}}_{t}]$ .

Fig. 5

The deep forgetting layer.

The answer embedding vector v_t is computed through the product of the answer interaction (q_t, a_t) and the knowledge skill embedding matrix B (2Q × d_v).Using $v_{t}^{F}$ to calculate the memory erasure vector e_t and the memory update vector a_t. $e_{t} = Sigmoid (W_{e}^{T} v_{t}^{F} + b_{e})$ (3) $a_{t} = \tanh (W_{a}^{T} v_{t}^{F} + b_{a})$ (4) where $W_{e}^{T}$ and $W_{a}^{T}$ are the weight matrices, b_e and b_e are the bias vectors.

Then the memory erasure e_t and the memory update vector a_t is used to update the current knowledge mastery state matrix $M_{t}^{v} (i)$ , the intermediate knowledge state matrix ${\tilde{M}}_{t + 1}^{v} (i)$ can be obtained after forgetting processing. ${\tilde{M}}_{t + 1}^{v} (i) = M_{t}^{v} (i) [1 - ω_{t} (i) e_{t}] + ω_{t} (i) a_{t}$ (5)

3.3.2 LSTM learning layer

Since the knowledge mastery state of the DKVMN model is obtained from the recent interactive training of exercises, the long-term dependence is poor. In order to deal with this problem, LSTM is introduced to improve the original model. Use ${\tilde{M}}_{t + 1}^{v} (i)$ as the hidden state, the answer embedding vector v_t as the input value, and the attention weight ω_t as the cell state value. The LSTM network outputs the state of knowledge mastery after learning. $M_{t + 1}^{v} (i) = LSTM (v_{t}, ω_{t} (i) {\tilde{M}}_{t}^{v} (i); β)$ (6)

3.3.3 Prediction layer

The reading correlation vector r_t can be calculated by the attention weight ω_t and the current knowledge skill mastery state matrix $M_{t}^{v}$ . The reading correlation vector r_t also can be regarded as the current knowledge mastery level. Then r_t and the embedding vector k_t are concatenated, the summary vector f_t is calculated through the tanh activation function of the fully connected layer, which represents the summary of the current mastery level and the difficulty of the previous exercises. Finally, the vector f_t is used to calculate the probability p_t that the student will correctly answer the question q_t. The calculation process is as follows. $r_{t} = \sum_{i}^{N} ω_{t} (i) M_{t}^{v} (i)$ (7) $f_{t} = \tanh (W_{1}^{T} [r_{t}, k_{t} + b_{1}])$ (8) $p_{t} = Sigmoid (W_{2}^{T} f_{t} + b_{2})$ (9) where $W_{1}^{T}$ and $W_{2}^{T}$ represent the weight matrix, b₁ and b₂ represent the bias vector.

3.3.4 Output layer

The output layer aims to computer the students’ knowledge mastery state. After LSTM learning layer output $M_{t}^{v}$ , in order to estimate the mastery of the i-th knowledge skill, we construct a weight vector γ_i = (0, ⋯ , 1, ⋯ 0), where the value of the i-th dimension is equal to 1. $M_{t}^{v} (i) = γ_{i} M_{t}^{v}$ (10) $y_{t} (i) = \tanh (W_{1}^{T} + b_{1})$ (11) ${value}_{t} (i) = Sigmoid (W_{2}^{T} y_{t} (i) + b_{2})$ (12) where vector 0 = (0, 0, ⋯ , 0) is the same dimension as the exercise embedding k_t, which is used to complement vector dimension. The parameters of $W_{1}^{T}$ , $W_{2}^{T}$ , b₁ and b₂ are identical to those in Equations (8) and (9). The mastery degree of each knowledge in the knowledge space is calculated in turn, and the vector of student knowledge mastery degree value_t is obtained.

3.3.5 Object function

To learn all parameters in CLDKT, we choose the cross-entropy loss between the prediction p_t and actual answer r_t as the object function: $L = - \sum_{t} (r_{t} \log p_{t} + (1 - r_{t}) \log (1 - p_{t}))$ (13)

4 Experiment

4.1 Experiment setting

4.1.1 Benchmark datesets

The experiment was conducted on three public real datasets: ASSIST2009, ASSIST2015, Statics2011. The data sets information are shown in Table 1, which list the number of users, the number of knowledge points and the number of interaction records of the data set used. The ASSIST2009 data set is collected from the online tutoring system Assists created in 2004. Users need to master these problem sets through similar exercises. The ASSIST2015 data set is the same as the ASSIST2009 data set, collected from the Assists online coaching system. The Statics2011 data set comes from a university’s engineering statics course, which combines exercise names and step names as knowledge concepts.

Table 1
Experimental datasets statistics

Statistical items Datasets

ASSIST2009 ASSIST2015 Statics2011

Uesrs 4151 19840 333

Number of KC 110 100 156

Number if interaction records 325637 683801 189297

Correct answers 68.0% 73.2% 77.7%

Statistical items	Datasets
Uesrs	4151	19840	333
Number of KC	110	100	156
Number if interaction records	325637	683801	189297
Correct answers	68.0%	73.2%	77.7%

The experiment uses a 50% off cross-verification method to evaluate the training of the knowledge tracing model. 80% of the data is used as the training set and verification set in each data set, and the other 20% of the data is used for testing. The upper limit of the length of the data input is set to 200 items. Delete users with less than 5 interaction records in each data set.

For ease of calculation, the state matrix dimensions d_k and d_v are set to the same value. In the model, $M_{t}^{v}$ is randomly initialized. The learning rate is set to 0.001, the number of batches is set to 32, and the training rounds are set to 200. For sequences with input data, the sequence length is set to 200, if the step size is less than 200, we use 0 to supplement.

4.1.2 Evaluation criteria

The experiments use AUC as the measure of the model. The higher the AUC value, the higher the prediction accuracy of the model. Five benchmark models were selected for comparative experiments.

DKT model: The model obtained by using a recurrent neural network to train the interactive sequence of exercises introduces deep learning ideas into knowledge tracing for the first time.

DKVMN model: Use a key-value pair network to store knowledge concepts and knowledge mastery status, and then predict the learner’s answer based on the learner’s knowledge mastery status.

DKT-Forgetful model: Based on the DKT model, three forgetting factors are embedded in the input data, and hidden states are used to represent the learner’s knowledge mastery state.

AKT model: A context-aware knowledge tracing model that represents the performance of learners in the learning process through a monotonic attention mechanism, and can capture the individual differences in the learning process of different learners [4].

GIKT model: A graph-based neural interaction knowledge tracing model that uses embedded communication to represent the correlation between knowledge points and exercises [20].

4.2 Experimental results and analysis

4.2.1 Experimental results

The experimental results are shown in Table 2. They are the AUC obtained by different models trained on each data set. The best results are highlighted in bold. It is known from Table 2:

Table 2
Prediction results of different models on knowledge tracing

Model ASSIST2009 ASSIST2015 Statics2011

DKT 0.772 0.710 0.775

DKT-forgetful 0.779 0.719 0.794

DKVMN 0.794 0.712 0.789

AKT 0.795 0.726 0.783

GIKT 0.789 0.732 0.795

CLDKT(ours) 0.831 0.743 0.836

Model	ASSIST2009	ASSIST2015	Statics2011
DKT	0.772	0.710	0.775
DKT-forgetful	0.779	0.719	0.794
DKVMN	0.794	0.712	0.789
AKT	0.795	0.726	0.783
GIKT	0.789	0.732	0.795
CLDKT(ours)	0.831	0.743	0.836

(1) On the datasets used in the three experiments, the knowledge tracing model of deep oblivion modeling is better than other comparative models. Compared with the DKVMN model, the AUC values of the CLDKT model on the ASSIST2009, ASSIST2015 and Static2011 datasets have improved by 3.7%, 3.1% and 4.7%, respectively. Indicating the effectiveness of deep oblivion modeling and the introduction of LSTM to improve the network model.

(2) In the comparative model, the AUC value of the DKT-forgetful model has improved on all three datasets compared to the DKT model, indicating the effectiveness of forgetting behavior modeling in in-depth knowledge tracing to improve the accuracy of the model.

(3) On the three datasets, the AUC value of the CLDKT model compared to DKT-forgetful has increased, while in the comparison between the DKVMN model and DKT-forgetful, the AUC value of the DKVMN model is only higher than that of DKT-forgetful on the ASSIST2009 dataset, indicating the effectiveness of forgetting behavior modeling to improve the accuracy of the model.

4.2.2 Comparison of training details

In order to further verify the performance of the CLDKT model, the DKT model was selected as the comparison model, and the AUC change values of the CLDKT model and the DKT model were visualized during the training process on the experimental datasets ASSIST2009, ASSIST2015, and Statics2011 as shown in Fig. 6.

Fig. 6

The training AUC of DKT and CLDKT on all datasets.

In the ASSIST2009 dataset, the DKT model reached the optimal value at 803 iterations, and the CLDKT model reached the optimal value at 330 iterations. In the ASSIST2005 data set, the DKT model reached the optimal value at 204 iterations, and the CLDKT model reached the optimal value at 54 iterations. In the Static2011 dataset, the DKT model reached the optimal value at 405 iterations, and the CLDKT model reached the optimal value at 230 iterations. Compared with the DKT model, the CLDKT model takes less iterations to train to reach the optimal iterative value on the same data set, and the AUC result is higher, indicating that the training speed of the model is faster than that of the DKT model, and the optimal AUC value of the model can be trained with fewer iterations.

4.2.3 Ablation experiment

In order to further analyze the effectiveness of the model and modeling, this paper conducted an ablation experiment to compare the model with the variant model. The variant model is as follows.

CLDKT-F model: In order to verify the effectiveness of the deep forgetting modeling of the model, the feature part of the forgetting factor extracted by ResNet is removed, and the forgetting feature vector ${\tilde{F}}_{t}$ is used to directly connect with the answer embedding vector v_t to degenerate the model into a knowledge tracing model of ordinary forgetting modeling.

CLDKT-L model: In order to verify the effectiveness of forgetting modeling, the forgetting modeling part is removed, and the model degenerates into an improved model after the DKVMN model is added to the LSTM network.

The experimental results are shown in Table 3.

The following conclusions can be obtained from the analysis of Table 3.

Table 3
Results of ablation experiment

Model ASSIST2009 ASSIST2015 Statics2011

CLDKT-F 0.772 0.710 0.775

CLDKT-L 0.779 0.719 0.794

DKVMN 0.794 0.712 0.789

CLDKT 0.831 0.743 0.836

Model	ASSIST2009	ASSIST2015	Statics2011
CLDKT-F	0.772	0.710	0.775
CLDKT-L	0.779	0.719	0.794
DKVMN	0.794	0.712	0.789
CLDKT	0.831	0.743	0.836

(1) Compared with the DKVMN model, the training accuracy of CLDKT-L has improved on all three data sets, proving the effectiveness of adding LSTM to improve the model.

(2) Compared with the CLDKT-L model, the CLDKT-F model has improved its training accuracy on all three data sets, indicating the effectiveness of forgetting behavior modeling to improve the accuracy of the knowledge tracing model.

(3) Compared with the CLDKT-L model, the CLDKT model has a higher accuracy rate on the three data sets, which verifies the effectiveness of deep forgetting modeling.

4.3 Knowledge tracing ability analysis

In order to further verify the performance of the model, a learner’s 4 knowledge points and 21 answer records in the data set ASSIST2009 were selected. As shown in the Fig. 7, it is a diagram of the change in the student’s knowledge mastery state on the DKT model and the CLDKT model. The abscissa is (q_t, a_t), which represents the sequence of answers, and the ordinate is the sequence number of knowledge points.

Fig. 7

An example of a student’s knowledge level output on 5 concepts using DKT and CLDKT in ASSISTments2009.

Analyzing Fig. 7, it can be seen that taking knowledge point 1 as an example, in the DKT model, only at the sequence related to the serial number of the knowledge point, and at other timestamps, the state of knowledge hardly changes, which is obviously inconsistent with the laws of memory and forgetting in reality, indicating that the DKT model does not have the ability to forget. In the CLDKT model, in the learning process, the state of knowledge mastery will decrease over time, which is more in line with the forgetting behavior in reality, proving the effectiveness of the CLDKT model for modeling forgetting behavior.

5 Conclusion

In this paper, we explored a new model for knowledge tracing named deeper knowledge tracking model integrating cognitive theory and learning behavior (CLDKT). Specifically, we first integrate cognitive theory and learning behavior to model forgetting layer that considers three factors in the process of knowledge acquisition. Then based on LSTM and DKVMN, we designed five layers deep learning model, which includes attention layer, forgetting layer, learning layer, prediction layer and output layer. With experiments on three public datasets, we proved that the CLDKT model based on ResNet deep forgetting modeling has good performance in knowledge tracing ability. Through the analysis of the knowledge state of the CLDKT model, our model is capable of effectively modeling learners’ learning and forgetting behavior, tracking their knowledge mastery level in real-time, and exhibiting superior interpretability and accuracy performance compared to existing models.

The CLDKT model improves the performance of knowledge tracing. However, we find that the relationship between different forgetting factors and the weight of each type of forgetting factor are not considered. In the future, we will research the characteristics of forgetting behavior data, study the relationship between different forgetting behavior data and forgetting laws, and further realize a knowledge tracing model that includes multiple forgetting characteristic data, incorporates information of the exercises into the students learning ability and learning behavior to further improve the representations. So that knowledge tracing can accommodate complex learning situations in reality.

Footnotes

Acknowledgment

This work is supported by Gansu Youth Science and Technology Fund Program under Grant No. 21JR11RA217 and 22JR11RA208. The Outstanding Youth Fund Project of Gansu Academy of Sciences No. 2023 YQ-03. Innovation Group Project of basic research in Gansu under Grant No. 23JRRA1348.

References

Albert Corbett

and John Anderson

, Knowledge tracing: Modeling the acquisition of procedural knowledge, User Modeling and User-Adapted Interaction 4 (1994), 253–278.

Hao Cen , Koedinger

, Junker

, Learning Factors Analysis –A General Method for Cognitive Model Evaluation and Improvement, Proceedings of the 8th International Conference on Intelligent Tutoring Systems, 2006, 164–175.

Ghodai Abdelrahman , Qing Wang , Bernardo Nunes , Knowledge Tracing: A Survey, ACM Computing Surveys 55(11) (2023), 1–37.

Ghosh

, Heffernan

, Lan

A.S.

, Context-aware attentive knowledge tracing, In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, 2330–2339.

He Kaiming , et al. Deep Residual Learning for Image Recognition, Proceedings of the 2016 IEEE conference on computer vision and pattern recognition, 2016, 770–778.

Hermann Ebbinghaus , Memory: A contribution to experimental psychology, Ann Neurosci 20(4) (2003), 155–156.

Nakagawa

, Iwasawa

, Matsuo

, Graph-based Knowledge Tracing: Modeling Student Proficiency Using Graph Neural Network, 2019 IEEE/WIC/ACM International Conference onWeb Intelligence (WI), 2019, 156–163.

Jeffrey Donahue , Lisa Anne Hendricks , Sergio Guadarrama , Marcus Rohrbach , Subhashini Venugopalan , Kate Saenko , Trevor Darrell , Long-Term Recurrent Convolutional Networks for Visual Recognition and Description, In the 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015, 2625–2634.

Jun Liu , Amir Shahroudy , Dong Xu , Gang Wang , Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition, In 14th European Conference on Computer Vision, 2016, 816–833.

10.

LeCun

, Bengio

and Hinton

, Deep learning, Nature 521 (2015), 436–444.

11.

Lee Averell , Andrew Heathcote , The form of the forgetting curve and the fate of memories, J Math Psychol 55(1) (2011), 25–35.

12.

Qi Liu , et al. EKT: Exercise-aware Knowledge Tracing for Student Performance Prediction, IEEE Transactions on Knowledge and Data Engineering 33(1) (2019), 100–115.

13.

Michel Jose Anzanello , Flavio Sanson Fogliatto , Learning curve models and applications: Literature review and research directions, International Journal of Industrial Ergonomics 41(5) (2011), 573–583.

14.

Mohammad Khajah , Robert Lindsey

, Michael Mozer

, How Deep is Knowledge Tracing? In Proceedings of the 9th International Conference on Educational Data Mining, 2016, 94–101.

15.

Nagatani

, Zhang

, Sato

, et al. Augmenting knowledge tracing by considering forgetting behavior, The world wide web conference, 2019, 3101–3107.

16.

Piech

, Bassen

, Huang

, Ganguli

, Sahami

, Guibas

, Sohl-Dickstein

, Deep knowledge Tracing, In Proceedings of the 28th International Conference on Neural Information Processing Systems, 2015, 505–513.

17.

Qiu

, Qi

, Lu

, et al. Does Time Matter? Modeling the Effect of Time with Bayesian Knowledge Tracing, Proceedings of the 4th International Conference on Educational Data Mining, 2011, 139–148.

18.

Tong

, Wang

, Zhou

, et al. HGKT: Introducing hierarchical exercise graph for knowledge tracing, arXiv e-print arXiv:2006,16915, (2020).

19.

Wayne Xin Zhao , Wenhui Zhang , Yulan He , Xing Xie , Ji-Rong Wen , Automatically learning topics and difficulty levels of problems in online judge systems, ACM Transactions on Information Systems 36(1) (2018), 1–33.

20.

Yang

, Shen

, Qu

, et al. GIKT: a graph-based interaction model for knowledge tracing, arXiv e-print arXiv:2009,05991, (2020).

21.

Yeung

C.K.

, Deep-IRT: Make Deep Learning Based Knowledge Tracing Explainable Using Item Response Theory, arXiv e-print arXiv:1904.11738, (2019).

22.

Zachary Lipton

, John Berkowitz , Charles Elkan , A critical review of recurrent neural networks for sequence learning, Computer Science, arXiv e-print arXiv:1506.00019, (2015).

23.

Zhang

, Shi

, King

, Yeung

, Dynamic key-value memory networks for knowledge tracing, In Proceedings of the 26th International Conference on World Wide Web, 2017, 765–774.