Abstract
Modeling the complex interrelationships among students, learning resources, and knowledge points remains a significant challenge in intelligent educational systems. To address this issue, this study proposes an improved Heterogeneous Graph Neural Network (HetGNN) framework for comprehensive learning behavior modeling and personalized educational recommendation. We define three distinct node types—students, resources, and knowledge points—and construct a multi-relational heterogeneous graph by incorporating diverse edge types derived from behavioral interactions (e.g., clicks, completions), knowledge dependencies (prerequisite relationships), and content associations (label-based matching). A student–resource–knowledge point meta-path is designed to capture composite relational patterns, and a Relational Graph Convolutional Network (R-GCN) is employed to aggregate neighborhood information while preserving semantic distinctions across relation types. To model temporal learning sequences, a Transformer encoder is integrated to generate dynamic attention weights that reflect evolving student engagement. A gated fusion mechanism is introduced to effectively combine dynamic sequential features with static structural representations, ensuring feature complementarity and minimizing interference. The model further incorporates two jointly optimized output branches—learning status prediction (including dropout risk and performance forecasting) and personalized resource recommendation—through shared parameter learning. Experimental results on real-world educational datasets demonstrate the superiority of the proposed approach: the F1-score for dropout risk prediction reaches 0.91, grade prediction achieves a stable RMSE of approximately 0.32, and resource recommendation attains an NDCG (Normalized Discounted Cumulative Gain) @10 of 0.93 in standard scenarios and 0.88 in knowledge gap scenarios. Moreover, the coverage of long-tail resources improves to 0.56, with a reduced recommendation bias coefficient of 0.29. The results validate the model’s effectiveness in capturing intricate student–resource–knowledge dynamics, offering a robust solution for learning analytics and adaptive educational systems.
Keywords
Introduction
With the full arrival of the era of educational informatization 2.0, the scale of the global educational technology market has gradually increased, but there are still significant bottlenecks in the efficiency of educational resource allocation and the level of personalized services. In the smart education system, the ternary relationship network composed of students, learning resources and knowledge points presents complex dynamic characteristics. Students’ cognitive level changes nonlinearly with the learning process,1,2 educational resources (videos, exercises, courseware) have multi-dimensional attribute characteristics, 3 and there are prerequisite dependencies and interdisciplinary associations between knowledge points.4,5 This complex relationship poses three core challenges to traditional recommendation systems. First, insufficient high-order interaction modeling leads to a lack of semantic relevance in resource recommendations.6,7 Second, the hierarchical structure and evolutionary characteristics unique to educational scenarios have not been fully modeled, resulting in the lack of temporal dynamics in learning situation prediction.8–10 Third, multi-source heterogeneous data cannot be effectively integrated, 11 resulting in a one-sided representation of the learner’s cognitive state by the model.12,13 These problems have seriously restricted the in-depth advancement of the digital transformation of education,14,15 and there is an urgent need to achieve breakthroughs through innovative graph neural network methods.
In recent years, the application research of graph neural networks in the field of education has shown explosive growth. Although the dual-graph structure proposed by Cui C et al. 16 can improve the relationship modeling ability, its computational complexity is as high as O(n2) (n is the number of nodes), which is difficult to adapt to large-scale educational graphs. In order to systematically summarize the key factors of MOOC (Massive Open Online Course) dropout prediction, Chen J et al. 17 sorted out different dropout definitions, proposed an overall framework (including influencing factors, feature extraction, machine learning methods and evaluation indicators), and analyzed key challenges such as interpretability, imbalanced data, and semantic trajectory modeling. They solved the problem that existing research lacks a unified framework and in-depth challenge discussion, but lacks quantitative modeling of dynamic learning trajectories, and its time sensitivity indicators deviate greatly from real educational scenarios. Although the bipartite graph model proposed by Suo W et al. 18 can mine student–resource interactions, it does not consider the prerequisite dependencies between knowledge points. By combining graph neural networks with bidirectional long short-term memory networks, Liu H et al. 19 addressed the shortcomings of traditional methods in suppressing spatial background interference, utilizing contextual information, and modeling global temporal associations, but did not explicitly mention the problem of modeling complex heterogeneous relationships in educational scenarios. More importantly, existing methods have not yet established a clear technical path to integrate heterogeneous graph modeling, time series dynamic fusion and multi-source data into the same framework.
In response to the above challenges, the academic community has conducted multi-dimensional innovative explorations. In terms of dynamic modeling, Qiang S U N et al. 20 solved the problem of insufficient personalization of traditional education models by using deep learning technologies such as recursive neural networks, Transformer, reinforcement learning, and multimodal learning analysis. However, their method is still limited by the bottleneck of insufficient modeling of the evolutionary characteristics of the hierarchical structure unique to educational scenarios. In order to improve the accuracy of academic performance prediction in online learning scenarios, Huang Q et al. 21 used a dual-graph neural network (interaction graph + attribute graph) to capture the local structural characteristics of learners’ interactive behaviors and the global correlation characteristics of students’ attributes, respectively. However, their method did not involve the multimodal fusion problem of knowledge dependency and content association. Razgallah H et al. 22 solved the problem of insufficient recommendation accuracy caused by data sparsity by using entity resolution principles and graph neural network (GNN) modeling. However, their static graph structure design cannot cope with the nonlinear evolution characteristics of students’ cognitive level. In order to ensure high-precision model training while ensuring data security and privacy, Liu Y et al. 23 proposed a privacy-preserving vertical federated GNN (graph neural network) training framework based on segmentation learning. They implemented end-cloud collaborative training through segmentation learning and combined function hiding multi-input encryption technology to protect uploaded parameters. However, its end-cloud collaborative mechanism did not address the knowledge gap detection requirements unique to educational scenarios. To address these limitations, this study innovatively integrates the temporal attention mechanism with static graph features, and realizes the adaptive association between student behavior sequences and knowledge networks through dynamic weight adjustment. The design of the time decay factor directly responds to the problem of the lack of temporal dynamics in educational scenarios, while the multi-order neighborhood aggregation of R-GCN effectively reduces the computational complexity of traditional GNN through meta-path design.
This paper proposes an enhanced Heterogeneous Graph Neural Network (HetGNN) model to address the challenge of collaborative optimization between learning situation prediction and personalized resource recommendation. The model constructs a multi-relational heterogeneous graph by defining three node types—students, resources, and knowledge points—and establishing three types of edges: behavioral interactions (e.g., clicks and completions), knowledge dependencies (prerequisite relationships), and content associations (semantic label matching). A Relational Graph Convolutional Network (R-GCN) is employed to aggregate neighborhood information, effectively capturing the distinct semantic associations across heterogeneous relations. To emphasize meaningful semantic pathways, a meta-path-aware hierarchical attention mechanism is designed to dynamically assign weights during feature aggregation, enhancing the representation of critical relational patterns such as student–resource–knowledge point. Furthermore, a temporal attention module is integrated to model the sequential dynamics of student learning behaviors, which are then fused with static structural representations through a learnable gating mechanism, enabling balanced integration of dynamic and static features. Finally, a dual-task joint optimization framework is developed, where the underlying representations are shared between the learning status prediction (e.g., performance and at-risk detection) and resource recommendation tasks, promoting deep inter-task collaboration. This approach overcomes key limitations of traditional models, including inadequate modeling of heterogeneous interactions, neglect of temporal dynamics, and suboptimal fusion of multi-source educational data. By synergizing dynamic behavioral patterns with static knowledge structures through parameter sharing, the proposed framework not only improves predictive performance but also provides interpretable insights for personalized learning path design, offering a robust and explainable solution for intelligent educational systems.
Multi-task modeling and verification of educational heterogeneous graphs and temporal attention
Construction of educational heterogeneous graphs
In the process of constructing the educational heterogeneous graph, this study takes students, educational resources, and knowledge points as core entity nodes, realizes multimodal association mapping through triple interaction relationship modeling, preprocesses student behavior logs,24,25 and defines the behavior interaction edge weight function:
In formula (1),
The construction of resource content association adopts a multi-label matching mechanism. For each resource r, the TF-IDF (Term Frequency-Inverse Document Frequency) feature vectors of its title, description and related topics are extracted, and the Jaccard similarity with the label dictionary of knowledge point c is calculated. When the value exceeds the threshold, an edge (r, c) is established. In order to distinguish the interaction intensity in different dimensions, an independent relationship type identifier is assigned to each type of edge in the heterogeneous graph: behavioral interaction (click, complete), knowledge dependency (prerequisite, semantic), and content association (tag_match), ultimately forming a multi-relationship adjacency matrix containing three node types and seven relationship types. The structure is optimized for storage through sparse tensors and adopts a joint query mechanism of adjacency list and edge type index to ensure efficient neighbor sampling in subsequent R-GCN calculations.
The spatiotemporal consistency of the graph structure is maintained through a dynamic update strategy: when a new student is registered, the connection strength with the knowledge points is initialized according to his/her initial test score. When resources are updated, local recalculation of related nodes is triggered, and a sliding window mechanism is used to retain recent behavior data. To solve the cold start problem, a default connection rule based on course meta-information is designed for new nodes: freshmen generate initial edges with basic knowledge points according to their majors, and new resources are automatically associated with the knowledge point set of the corresponding chapter according to the uploaded classification. The cold start problem is solved through the professional label mapping mechanism:
In formula (2),
In Figure 1, the architecture constructs a multimodal association network through three types of core nodes (students, resources, and knowledge points) and seven types of relationships (behavioral interaction, content association, and knowledge dependency). In the figure, student nodes and resource nodes are connected through weighted click behaviors and completion status, resource nodes and knowledge point nodes are associated through label matching, and the knowledge point nodes form a knowledge network through prerequisite relationships and semantic similarity, in which the dynamic update mechanism ensures the timeliness of the system. This design effectively solves the problem of insufficient modeling of heterogeneous entity interactions in traditional education recommendation systems. Its underlying mechanism aggregates multi-order neighborhood information through R-GCN, and combines dynamic weight adjustment (such as the normalized calculation of behavior frequency × time) and semantic perception (such as the similarity threshold of BERT embedding) to achieve quantitative expression of complex relationships. Educational heterogeneous graph construction architecture.
Meta-path design and semantic aggregation
In the process of node feature extraction enhanced by meta-path, this study designs a composite semantic path based on heterogeneous graph structure and realizes multi-order neighborhood information aggregation through R-GCN,
27
in which the update mechanism of node v in the k-th layer is:
In formula (3), R represents the relationship set, and
The query vector q in formula (4) can be learned to dynamically capture the importance of meta-paths and the importance weights of different meta-paths. Meta-path generation follows a three-stage rule, combining three types of heterogeneous edges: behavioral interaction, knowledge dependency, and content association to generate candidate paths. Invalid combinations are screened out through the prerequisite relationship constraints of the course knowledge graph (such as excluding the closed-loop path of “student→knowledge point→student”). Finally, the effectiveness of the core path is verified by combining the analysis of typical learning models by education experts.
In view of the complex interactions among the three types of nodes, namely, students, resources, and knowledge points, this paper first defines the basic meta-path model. The student node establishes an indirect connection with the knowledge point through its associated resources, forming a semantic path in which the student connects to the knowledge point via the resource and then returns to the resource. 28 The path contribution visualization module is used to analyze the differences in the impact of different meta-paths on node representation. This module maps the attention weights of each layer to the meta-path level, generates a path importance distribution map, and reveals the feature attenuation phenomenon of the student–resource–knowledge point path during deep aggregation. The semantic relationship matrix corresponding to each meta-path generates a candidate neighbor set through breadth-first traversal of the adjacency list, and a composite meta-path is generated based on the combination of three types of heterogeneous edges: behavioral interaction, knowledge dependency, and content association. A dynamic threshold mechanism is introduced in the path combination process. For knowledge-dependent edges, the path generation threshold is dynamically adjusted according to the semantic similarity calculated by BERT embedding.
In the specific implementation of R-GCN, an independent learnable parameter matrix is maintained for each relation type, where the parameter dimension is determined by the embedding space. For the target node, its k-th hidden state is obtained by normalizing the neighborhood information aggregation. The low-level representation of the neighboring nodes is linearly transformed according to the relationship type. Then, an entity type-sensitive normalization strategy is adopted to weighted sum the aggregation results of different relationships. The weight coefficient is controlled by the relation importance parameter, which is automatically adjusted during model training to ensure the dominance of key semantic paths in feature fusion. It is achieved through a hierarchical aggregation mechanism. The first layer focuses on local feature extraction of single-type relations, the second layer establishes semantic associations through cross-relationship parameter sharing, and the third layer introduces global attention to regulate the contribution ratio of different paths. In the process of multi-layer propagation, the gated recurrent unit is introduced to maintain the stability of node representation and prevent the over-smoothing of features caused by increased depth.
In order to enhance the expressive power of semantic associations, this study proposes a hierarchical aggregation strategy. The first layer of R-GCN focuses on direct interaction relationships and only aggregates first-order neighbor information. The second layer extends to the second-order neighborhood and captures the potential connections between nodes of different types through meta-path combination. The third layer uses the attention mechanism to calculate the contribution of different semantic paths. Specifically, the attention weight is obtained through the dot product of the learnable query vector and each path representation to achieve dynamic feature fusion. The final node representation is obtained by converting the concatenated vector of the three-layer output through a nonlinear activation function, where the activation function uses LeakyReLU (Leaky Rectified Linear Unit) to alleviate the gradient vanishing problem. Through the path contribution visualization module, the attention weights of each layer are mapped to the meta-path level to generate a path importance distribution map, which is used to explain the differences in the contribution of different semantic paths to node representation.
Figure 2 shows the changes in node representation discrimination of the three core meta-paths at different aggregation orders. The discrimination of the student–resource–knowledge point path is 0.32 at the first order, jumps to 0.67 (reaching the peak) at the second order, and drops back to 0.58 at the third order. The resource-student-resource path and the knowledge point-resource-student path both show a steady upward trend. The significant differences among the three broken lines indicate that different semantic paths have essential differences in their sensitivity to aggregation depth, among which the resource-student-resource path performs best in deep aggregation. Meta-path aggregation effect comparison.
The data changes in Figure 2 reveal the key characteristics of the underlying mechanism of the model. The student–resource–knowledge point path reaches a peak value (0.67) at the second-order aggregation, but drops to 0.58 at the third order, indicating that there is a second-degree separation effect in the relationship between students and knowledge, and excessive aggregation will introduce noise (such as interference from irrelevant knowledge points). The resource-student-resource path continues to climb to 0.82 at the third order, reflecting that resource synergy relationships require deep communication to be fully explored (such as discovering potential resource associations through student behavior chains). The steady growth of knowledge propagation paths verifies the structural stability of the knowledge graph, and its semantic association increases linearly with the aggregation depth. These phenomena are due to the hierarchical design of R-GCN, where low-order aggregation captures direct interactions and high-order aggregation mines cross-entity semantics, but there is an optimal aggregation depth threshold for different meta-paths.
Under the topological constraints of heterogeneous graphs, a relationship-aware neighbor sampling algorithm is designed to optimize computational efficiency. For large-scale graph structures, a fixed number of neighbor nodes are sampled for each relationship of each node. If the actual number of neighbors is insufficient, a circular filling strategy is adopted. A heat weighting mechanism is introduced during the sampling process to adjust the neighbor selection probability according to the edge weight, ensuring that nodes with high interaction frequency have higher weights in feature aggregation. This strategy effectively alleviates the problem of semantic information dilution caused by traditional uniform sampling, while maintaining the model’s adaptability to changes in graph structure by dynamically adjusting the sampling window. The above method solves the problems of traditional homogeneous graph neural networks such as loss of cross-type node interactions and confusion of multiple semantic relationships by explicitly modeling multi-order heterogeneous relationships, and provides a node embedding representation rich in structural information for subsequent temporal behavior modeling and dual-task joint optimization.
Spatiotemporal dynamic fusion framework
In the design of the temporal attention mechanism, this study uses the Transformer encoder to model the student behavior sequence and dynamically fuses it with the static graph structure features. 29 For students’ heterogeneous behavior data, interaction events (resource clicks, knowledge point mastery) and status indicators (learning time, accuracy) are jointly encoded into a four-tuple (timestamp, resource identifier, operation type, status value), and mapped into a uniform dimensional vector through an embedding matrix. The input features are composed of low-level embedding of resource nodes, operation type encoding, and linear transformation of standardized state values to form a temporal input matrix. The position encoding uses the alternating form of sine and cosine to inject temporal information. The multi-head self-attention mechanism calculates the global dependency through the interaction of query vector, key vector, and value vector. Its weight matrix is normalized by Softmax to weighted aggregate context features. In order to enhance the model’s ability to perceive the temporal characteristics of educational scenarios, a time decay factor is introduced to quantify the impact of historical behaviors, and the priority effect of recent behaviors is explicitly modeled in the self-attention calculation.
The bimodal feature interaction module uses element-wise multiplication to fuse the sequence final state features output by Transformer and the student node representation generated by R-GCN to capture the interactive effect of dynamic behavior and static structure. The weight coefficient is generated through the gating mechanism: a learnable parameter matrix is applied to the product result and the original graph feature, respectively, and the weighted ratio is output through the Sigmoid activation function to dynamically adjust the feature fusion strength. The fusion vector further generates dynamic attention weights, reversely regulates the aggregation ratio of neighbor features in R-GCN, and constructs a closed-loop optimized information enhancement path. In this process, the gating mechanism not only realizes dynamic weight allocation, but also effectively alleviates the potential conflict between the static graph structure and the dynamic behavior sequence, ensuring the co-evolution of the two branches in the shared parameter space.
The time decay factor quantifies the impact of historical behavior, introduces the time decay factor into the self-attention calculation, and constructs a time-aware attention weight matrix:
In formula (5), Δt is the time interval between behaviors, and
In formula (6),
Figure 3 shows that when the current moment is taken as the benchmark, the attention weight for the current moment itself is the highest (0.40), and the attention weight for moment t-1 is the second highest (0.26). As time goes back, the weight gradually decays to 0.08 at moment t-4. The diagonal elements always show peak values. For example, when t-1 is taken as the benchmark, the weight of the t-1 moment reaches 0.43, indicating that the model has a strong focus on its own behavior at each moment. At the same time, there are asymmetric fluctuations. For example, when t-2 is used as the benchmark, the weight of t-3 (0.19) is higher than the weight of t-4 (0.14), which reflects the randomness of learning behavior. The visualization results reveal the key mechanism of the model in detecting knowledge gaps. When students exhibit leapfrogging learning behaviors (such as skipping basic knowledge points and directly entering advanced content), the attention weight distribution will show a significant deviation from the trend, providing a basis for subsequent intervention strategies. Temporal attention weight distribution.
The heat map distribution reveals the core principle of the temporal attention mechanism. The time decay function causes recent behaviors to obtain higher weights. For example, the weight of t-1 at the current moment (0.26) is 3.25 times that of t-4 (0.08), which is in line with the educational cognitive law that recent learning content is more relevant. The systematically high values on the diagonal come from the Transformer’s self-attention mechanism, which ensures that the model captures key events. Non-uniform fluctuations (e.g., the weight of t-2 in the baseline row with t-3 being 0.18 is higher than that of the adjacent t-1 at 0.15) simulate unconventional behaviors in actual learning. This dynamic adaptability is the core advantage of the HetGNN model that combines static knowledge graphs with dynamic behavior sequences.
In the self-attention calculation, the time decay factor
Formula (7) shows that the larger the time interval
Dual-task joint optimization
In the dual-task prediction and recommendation generation module, this study constructs a learning situation prediction branch and a resource recommendation branch, and achieves joint optimization by sharing the underlying parameters of the graph neural network. 30 The academic situation prediction branch includes two subtasks: grade prediction and dropout risk assessment. The former uses a multi-layer perceptron to perform regression calculations on the fused feature vector and outputs a continuous grade prediction value. The latter maps the features to the binary classification space through the fully connected layer and generates the risk probability distribution through the Softmax activation function. The resource recommendation branch builds a ranking model based on node representation, performs inner product operation on student feature vectors and candidate resource features, obtains the matching score, and then uses the Top-K truncation strategy to generate a recommendation list. In order to enhance the adaptability of the model to the characteristics of educational scenarios, a dynamic weight mechanism is introduced in the joint optimization process, so that the loss function can automatically adjust the task priority according to the performance of the validation set. The learnable scalar coefficient is dynamically updated on the validation set through a two-step optimization algorithm to ensure that the gradient propagation path of learning situation prediction and resource recommendation maintains a dynamic balance in the shared parameter space.
Key parameters for dual-task joint optimization.
The parameter sharing mechanism realizes cross-task information interaction through heterogeneous graph embedding propagation. The student node representation generated by R-GCN is used as a shared feature basis and input into two independent adaptation networks, respectively. The adaptation network of the learning situation prediction branch consists of two fully connected layers. The first layer introduces Batch Normalization for feature normalization, and the second layer uses Dropout to prevent overfitting. The adaptation network of the resource recommendation branch adopts a bilinear form and performs asymmetric transformation on node features to alleviate the popularity deviation problem in collaborative filtering. Its bilinear transformation matrix alleviates the popularity deviation:
In formula (8),
The loss function design adopts a multi-task joint optimization strategy. The score regression task of the academic situation prediction branch uses mean square error loss, and the dropout risk classification task uses cross entropy loss. The resource recommendation branch uses Pairwise sorting loss to achieve personalized sorting by maximizing the interval between positive and negative samples. The loss weight of each task is automatically adjusted through learnable parameters, and the task priority is dynamically balanced during the training process. The loss weight convergence curve analysis module can monitor the changing trend of the learnable coefficient in real time to identify task conflicts or dominant phenomena, and fine-tune the optimizer parameters accordingly. This dynamic weighting mechanism can effectively improve the performance of the model in high-priority tasks such as dropout risk prediction.
In order to enhance the interpretability of recommendation results, a feature tracing module based on attention mechanism is designed. In the process of generating Top-K resource ranking, the cross-layer attention weights between student nodes and resource nodes are recorded. The weight matrix reflects the influence of different learning stages on recommendation decisions. By aggregating multi-head attention weights to generate a feature contribution vector, combined with the prior relationship links of the knowledge graph, the key knowledge point paths that affect the recommendation results are traced back. The explanatory information output by this module can assist educators in analyzing the recommendation logic and improve the transparency and credibility of the system. The above framework is trained in an end-to-end manner, and each iteration synchronously updates the graph neural network parameters, adaptation network weights and loss coefficients to ensure the deep coupling of structural features and dynamic behaviors.
Before model deployment, this study further introduced a task dynamic balance monitoring mechanism. This module also supports a teacher-side intervention interface, allowing education experts to manually adjust task priorities according to actual teaching needs, such as increasing the weight of the grade prediction branch in the final sprint stage, and focusing on dropout risk detection at the beginning of the school year. This flexible task scheduling mechanism enhances the adaptability of the model in complex educational environments and provides a technical foundation for subsequent model deployment and educational adaptation chapters.
Experiment and test
Experimental design
This paper uses a large-scale real data set from smart education platforms of multiple universities, covering student behavior logs (clicks/completion), resource metadata, and topological relationships of knowledge points, spanning across semesters. The evaluation process is designed with multi-dimensional tests: the learning situation prediction task uses F1-score (dropout risk/knowledge point mastery) and RMSE (grade error) to measure the prediction accuracy. The resource recommendation task evaluates the recommendation effect based on NDCG@10, Hit Ratio@10 and long-tail resource coverage. The robustness of the system is analyzed in combination with the coefficient of deviation for fairness. In data processing, the training set, validation set, and test set are divided by timestamp. The validation set is used for task priority adjustment in the second-step optimization. A 5-fold cross-validation is used to ensure stability, and a new resource subset is constructed for the cold start scenario, and a knowledge gap subset is divided according to the threshold of the prior knowledge mastery.
Multi-model comparison experiment of learning situation prediction task
In order to verify the superiority of the improved HetGNN in dynamic modeling of learning situation, this experiment selected three representative baseline models: BKT (Bayesian Knowledge Tracing), GCN (Graph Convolutional Network), and DynaGNN for comparison. The focus is on evaluating the comprehensive performance of the dropout risk prediction and knowledge point mastery classification tasks, using F1-score as the core indicator. The improved HetGNN shows significant advantages in both types of high-risk decision-making tasks, effectively solving the feature confusion problem caused by the simplified interaction relationship of the traditional model.
In Figure 4, the improved HetGNN performs best in dropout risk prediction, with an F1-score of 0.91, significantly higher than DynaGNN’s 0.85. In the knowledge point mastery classification, the F1-score is 0.88, which is at least 0.07 better than other models. The F1-score of comprehensive learning situation prediction is 0.90, which is also ahead of all baselines. Among the three types of tasks, the BKT model has the lowest performance, while GCN and DynaGNN show progressive improvement, but the improved HetGNN has obvious advantages, especially in high-risk tasks such as dropout risk prediction, showing stronger decision reliability. Performance comparison of learning prediction tasks.
The significant advantage of the improved HetGNN comes from the accurate capture of complex associations through heterogeneous graph structures (three types of nodes: students, resources, and knowledge points) and dynamic meta-path weights (such as student–resource–knowledge point paths), while BKT only relies on traditional knowledge tracking and GCN does not model dynamic temporal behavior. Although DynaGNN in Figure 4 introduces dynamic graphs (dropout risk F1 = 0.85), the improved HetGNN further integrates Transformer temporal attention, giving recent learning behaviors a higher weight, thereby improving prediction accuracy. In addition, R-GCN’s multi-relation aggregation and dual-task joint optimization jointly support its overall leading position in the three types of tasks.
Convergence performance analysis of performance prediction
In order to meet the requirements of temporal stability in cognitive diagnosis scenarios, this experiment examines the training dynamic characteristics of HetGNN, DKVMN (Dynamic Key-Value Memory Network), KTM (Knowledge Tracing Machine), and GKT (graph-based knowledge tracing) in the performance prediction task. By monitoring the change curve of RMSE indicator with the training rounds, the robustness difference of the model in the representation of dynamic learning state is revealed. HetGNN achieves an earlier error plateau and lower fluctuation amplitude than the baseline model by leveraging the meta-path attention mechanism’s rapid focus on key semantic relationships and the temporal attenuation factor’s ability to filter historical noise.
Figure 5 shows the convergence characteristics of the improved HetGNN model and the three types of baseline models. The initial error of DKVMN is as high as about 1.08, and converges slowly. GKT and KTM start from about 0.96 and about 0.88, respectively, and still maintain a fluctuating error of about 0.4 in the middle and late stages of training. The improved HetGNN shows significant advantages, dropping rapidly to around 0.32 (Epoch 40) at the beginning of training, and maintaining this error level steadily in the subsequent 60 rounds of training. The model data shows that its convergence speed and fluctuation range are both at the core advantage. Comparative analysis of RMSE convergence performance of the improved HetGNN model.
The superior performance of the improved HetGNN comes from the hierarchical relational attention mechanism used when aggregating student–resource–knowledge point meta-path features through R-GCN, which enables the key path to complete semantic information extraction before Epoch 20. The time decay factor injected by the temporal attention module effectively suppresses historical noise interference, making the RMSE of the method close to 0.32 after Epoch 40. The dual-task joint optimization mechanism shares the underlying parameters of the graph neural network and dynamically balances the loss function, ultimately achieving the ultimate stability with extremely low fluctuations shown in the figure, which is significantly better than traditional models.
Quality evaluation of multi-scenario resource recommendation
In order to test the adaptability of the recommendation system to complex educational scenarios, this experiment constructs four types of differentiated environments: standard learning, sparse interaction, knowledge gap, and dynamic update. As shown in Figure 6, HetGNN is ahead of NGCF (Neural Graph Collaborative Filtering), R-GAT (Relational Graph Attention Networks) and MultiEduGraph advanced recommendation models in terms of NDCG@10 indicators. Its dual-task joint optimization mechanism significantly alleviates the semantic mismatch problem in scenarios with sparse data and knowledge gaps through the knowledge state information transmitted by the learning situation prediction branch, while the dynamic graph update strategy ensures the real-time recommendability of new resources. Comparative analysis of NDCG@10 ranking quality in multiple educational scenarios. (a) Standard learning environment, (b) sparse interaction scenario, (c) knowledge gap scenario, (d) dynamic update scenario.
Figure 6 shows that the improved HetGNN is significantly ahead in all scenarios, with the NDCG@10 in the standard learning scenario reaching 0.93, far exceeding NGCF’s 0.83. The sparse interaction scenario reaches 0.87, 0.08 higher than the second-best model MultiEduGraph’s 0.79. The knowledge gap scenario reaches 0.88, which is significantly better than the baseline NGCF’s 0.69. The dynamic update scenario also maintains the highest value at 0.89. The error bars show that it has the lowest volatility.
The excellent performance of the improved HetGNN in the sparse interaction and knowledge gap scenarios in Figure 6 particularly highlights the robustness of its unique mechanism. The dual-task joint optimization effectively utilizes the information of the learning situation prediction task and makes up for the lack of sparse interaction data, making it significantly better than MultiEduGraph (0.79). The temporal decay factor gives higher weight to recent learning behaviors, and combined with the dynamic graph update mechanism, it ensures the high timeliness of recommendation results in dynamic update scenarios. Even in knowledge gap scenarios with high uncertainty, its smaller standard deviation verifies the stability of multi-level feature fusion in representing complex learning states, and ultimately supports its leading NDCG@10 performance in four scenarios.
Comprehensive test of coverage and robustness
To address the challenges of long-tail effect and popularity bias in educational resource recommendation, this experiment uses a three-dimensional evaluation system: Hit Ratio@10, long-tail coverage, and bias coefficient, and compares the improved HetGNN model with PopRec, FairGNN, and DebiasedGNN models. The improved HetGNN explicitly decouples student preferences and resource popularity through a bilinear transformation matrix, and combines the reinforcement mechanism of temporal attention on recent learning behaviors to achieve coordinated optimization of long-tail resource coverage and fairness while maintaining a high hit rate.
The data in Figure 7 shows that the improved HetGNN has a Hit Ratio@10 of 0.89, which is significantly higher than PopRec’s 0.62. The long-tail coverage is as high as 0.56, far exceeding PopRec’s 0.18. Its coefficient of deviation is only 0.29, which is significantly lower than PopRec’s 0.83. HetGNN shows the distribution characteristics of “high hit ratio,” “high long-tail coverage,” and “low deviation” in all indicators, while PopRec is concentrated in the blue low-performance area, and FairGNN and DebiasedGNN are in the transitional blue-yellow interval, indicating that HetGNN is significantly better than the baseline model in all three indicators. Comparison of coverage and robustness.
The performance advantage of HetGNN stems from the deep adaptation of its underlying mechanism to educational scenarios, which enables the model to accurately identify the potential matching relationship between long-tail resources and unpopular knowledge points, thereby improving the long-tail coverage rate. The bilinear transformation matrix introduced in the dual-task joint optimization effectively suppresses the popularity deviation of resource recommendation through asymmetric parameter adjustment, reducing the deviation coefficient to 0.29. The time decay factor injected by the temporal attention mechanism strengthens the impact of recent learning behavior on the recommendation results, which is highly consistent with the characteristic of “dynamic evolution of knowledge mastery status” in educational scenarios, and ultimately achieves enhanced robustness while maintaining a high Hit Ratio@10 (0.89).
Ablation experiment and interpretability analysis
Ablation experiment and interpretability analysis results.
Table 2 shows the impact of missing different modules on the performance of the HetGNN model in the ablation experiment. The AE (Attention Entropy) of the complete model is 0.45, the PIS (Path Importance Score) is 2.18, and the path coverage accuracy is 0.82, indicating that the overall model performs well in heterogeneous graph semantic path representation, dynamic behavior stability, and key path identification. When the meta-path attention mechanism is removed, AE significantly increases to 0.72, PIS drops to 1.35, and path coverage accuracy drops to 0.61, indicating that meta-path attention is crucial for dynamically capturing key semantic paths and improving path coverage capabilities. If the time decay factor is removed, AE further increases to 0.89, and PIS and path coverage accuracy drop to 1.02 and 0.53, respectively, indicating that the temporal attention mechanism plays a core role in suppressing historical noise and maintaining the stability of behavior sequences. After removing the dynamic weight adjustment, the path coverage accuracy is only 0.48, which shows the supporting role of the task priority adaptation mechanism for multi-task collaborative optimization. When only the static graph structure is retained, the indicators are the worst (AE = 1.32, PIS = 0.64, path coverage accuracy = 0.39), highlighting the global improvement effect of the dynamic fusion module on model performance.
The data in Table 2 reflects that the meta-path attention mechanism dynamically allocates path weights, allowing the model to focus on key semantic associations when aggregating multi-order neighborhood information. When this mechanism is missing, the path coverage accuracy drops to 0.61, which directly leads to feature confusion. The time decay factor controls AE at 0.45 by explicitly modeling the priority of recent behaviors. If it is removed, AE rises to 0.89, which shows that temporal dynamics plays a role in ensuring the stability of behavioral sequences. Dynamic weight adjustment balances task priorities through two-step optimization. When it is missing, the path coverage accuracy drops sharply to 0.48, revealing the ability of multi-task joint optimization to alleviate feature conflicts in shared parameter space. When the static graph structure runs alone, the path coverage accuracy is only 0.39, which verifies the necessity of dynamic fusion (Transformer and R-GCN gating mechanism) for the dynamic evolution of heterogeneous graph structures. Through the dual constraints of time decay factor and meta-path attention, it ultimately achieves deep complementarity between static knowledge graphs and dynamic learning behaviors.
Educational scenario sensitivity test
In order to comprehensively evaluate the adaptability and robustness of the improved HetGNN model in different educational scenarios, this paper designs an educational scenario sensitivity test experiment. This experiment selected four typical educational scenarios: standard courses (mathematics/physics), humanities courses (history/literature), skill courses (programming/experiments) and interdisciplinary integration scenarios, and quantified the model performance differences from three dimensions: stability of academic situation prediction, volatility of long-tail resource coverage, and dynamic update response delay. By comparing the indicator performance in different scenarios, the model’s adaptability to the clarity of the knowledge system, the complexity of resource interaction and the multidisciplinary correlation characteristics is studied.
Results of educational scenario sensitivity test.
Conclusion
This study proposes HetGNN, a novel heterogeneous graph neural network framework designed to address the challenge of modeling complex, multimodal relationships in educational environments. The model features a three-level architectural innovation: first, it constructs a two-layer heterogeneous graph that integrates both entity attributes and relational interactions, enabling fine-grained representation of students, learning resources, and knowledge points. Second, it designs a meta-path-aware Relational Graph Convolutional Network (R-GCN) with hierarchical attention mechanisms to dynamically capture and emphasize semantically meaningful paths—such as student–resource–knowledge point—thereby improving the accuracy of relational reasoning. Third, an adaptive fusion module is introduced to integrate temporal dynamics, modeled via a Transformer-based sequence encoder, with static structural features from the graph, using a learnable gating mechanism to balance evolving behaviors and stable knowledge dependencies. Finally, a dual-task joint optimization framework is established to simultaneously perform learning situation prediction (e.g., dropout risk and academic performance) and personalized resource recommendation, with shared latent representations enabling mutual enhancement. Extensive experiments demonstrate that the proposed model significantly outperforms state-of-the-art baselines across multiple tasks: it achieves an F1-score of 0.91 in dropout risk prediction, an RMSE of approximately 0.32 in grade prediction, and an NDCG@10 of 0.93 in standard learning scenarios. Notably, the model exhibits strong robustness in high-stakes decision-making and substantially improves long-tail resource coverage, with a coverage ratio of 0.56 and reduced recommendation bias. These results highlight its effectiveness in capturing intricate educational dynamics and supporting equitable, personalized learning. As future work, we plan to investigate online generation of dynamic meta-paths to accommodate evolving interdisciplinary knowledge structures and incorporate a federated learning framework to enable privacy-preserving, distributed model training across decentralized educational data sources.
Footnotes
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
