Abstract
Due to the rapid updating of medical knowledge, existing online medical education systems have issues such as delayed tracking of knowledge status and poor matching of recommended content to users. To improve the medical knowledge level of learning users, this study develops an intelligent online medical education system. Then, the deep knowledge tracking technique is combined with reinforcement learning to dynamically track the user knowledge state and optimize the motion recommendation strategy. Based on this model, an improved migration learning algorithm is combined to optimize the recommendation accuracy of the model. The results indicated that compared to other models, the maximum learning efficiency value of the optimized model was 398.91 × 10−3. The mean square error was the smallest, at 2.11%. In the same dataset, the learning efficiency of the optimized model was 0.16 higher than that of the model using the deep knowledge tracking technique, and the initial reward value of the optimized model was 2.4 higher than that of the reinforcement learning model. The optimization model proposed in the study was effective and had good advantages in improving users’ knowledge level, which helped to enhance their learning status. The system performs well in situations with high concurrency, consumes fewer system resources, and delivers excellent performance. These features help improve the efficiency of medical education and meet users’ personalized needs. At the same time, it promotes the development of intelligent medical education and provides strong technical support for the field.
Introduction
With the intelligent development of science and technology, there has been tremendous innovation in the field of medical education. However, the traditional offline medical education model still faces challenges such as low efficiency and time constraints. For example, medical professionals only teach knowledge in the classroom. This method cannot meet the need for personalized, real-time learning that is accessible anytime and anywhere. This has a significant impact on the demand for personalized and instant learning. 1 To ensure the expansion and popularization of medical education, applying educational knowledge online has become a relatively novel approach. In this context, the establishment of Online Medical Education (OME) system can expand the coverage of education and provide a better platform for doctors and medical related users to continuously learn and improve their professional skills. The OME system is highly convenient. It is not limited by time and place and can save time and financial costs for medical teachers and students. 2 For example, Lucas, H.C., et al. used a large language model and system to search multiple databases and analyze medical knowledge content to better understand medical students’ learning experiences. The results showed that the system had good reproducibility. 3 Groh M et al. scholars studied the impact of learning systems on the accuracy of skin disease diagnosis, and simulated the diagnosis of skin diseases through large-scale digital experiments. The results showed that the diagnostic accuracy error was 0.19. 4 To improve the accuracy of early breast cancer diagnoses, professionals like Naz A. used a convolutional neural network and Internet of Things technology to distinguish tumor patients from non-tumor patients through super parameter adjustment. The results showed that the classification accuracy of this method reached 95%. 5
In recent years, the development of deep learning technology has provided new ideas for optimizing online education. Among them, Deep Knowledge Tracing (DKT) captures more complex representations of student knowledge and improves predictive performance. It provides real-time monitoring for traditional models, which can only perform static diagnosis. Meanwhile, transfer learning (TL) in machine learning technology can apply the knowledge of existing models to new tasks, improving learning efficiency and effectiveness. In addition, reinforcement learning (RL) learns optimal strategies through the interaction of intelligent sports environments, and has good advantages in strategy optimization. 6 Hooshyar D’s team addressed the issue of the low efficiency of artificial neural networks in education. The study conducted the research by introducing neural symbolic artificial intelligence methods that integrated educational knowledge and symbolic reasoning. These methods improved learners’ computational thinking models and effectively extracted educational knowledge. The results indicated that the method had interpretability and low bias. 7 Lăzăroiu used deep learning and machine learning to predict infection trends and resource requirements in an attempt to analyze patient survival rates and treatment outcomes. They integrated clinical data, symptoms, and biomarkers. The results showed the effectiveness of this technology. 8 Mosqueira-Rey et al. aimed to address the data bottleneck problem that deep learning models face when treating pancreatic cancer. They used a generated countermeasure network (GAN) to generate synthetic data and expand the sample size. They also combined this method with active learning, which introduces experts to label suspicious or new data. The results show that this method is practical. 9
In summary, OME systems and deep learning technologies have a wide range of applications and developments in the field of education. However, traditional DKT can model knowledge states but cannot actively optimize learning paths. RL is also difficult to explore directly in medical scenarios due to security constraints. Although medical knowledge is highly systematic, GAN lacks a structured knowledge transfer mechanism when transferring across disciplines. Although the existing OME system provides massive resources, its static recommendation logic causes most users to quit midway through due to the exercises’ mismatched difficulty levels. For this purpose, an innovative intelligent OME system has been developed. It combines DKT technology with RL to build a recommendation model. This recommendation model tracks the user’s knowledge status in real time. It improves the model’s convergence speed and recommendation accuracy by transferring the sequence across recommendation agents through an improved TL algorithm. This study is the first to integrate medical knowledge tracking, RL recommendation, and TL in three stages, solving the problem of traditional systems’ inability to balance dynamic adaptability and cross-specialty transfer. The research objective is to provide medical learners with an efficient and convenient online learning tool, promoting the intelligent and personalized development of OME.
Methods and materials
First, the study designs an OME system. Then, it recommends using DKT technology to determine users’ knowledge levels for exercises in the OME system. Meanwhile, DKT is combined with RL to build a DKT-RL recommendation model. Considering the improvement of model accuracy, the study proposes an optimization model that combines an improved TL algorithm with DKT-RL recommendation.
OME system design and DKT-RL recommendation model construction
At present, some OME systems have problems such as uneven course content quality and differences in personalized teaching. Therefore, this study designs an OME system. The system uses the general-purpose programming language Python for data processing. The front end uses the Vue.js framework combined with the HTML5/CSS3 standard to implement a responsive interface design. The back end uses the lightweight Flask framework with a modular design to achieve multiple functions.
10
To ensure consistency and scalability of the deployment environment, the system uses Docker containerization technology. The schematic diagram of OME system architecture is shown in Figure 1. Schematic diagram of OME system architecture.
In Figure 1, the system includes a view layer, a network layer, a model layer, a data layer, and an implementation layer. A user-friendly interface is designed and implemented by the view layer. The routing layer is used for path selection and management of data transmission. The MySQL database in the data layer is used to store user data and training data. The system functional modules include user management module, system management module, and personal center module. To ensure system stability and performance, the system ultimately needs to undergo unit testing, integration testing, and system testing. Meanwhile, real-time monitoring of the system’s operational status enables timely detection and resolution of issues.
11
To improve the user experience with exercise recommendations for medical education, the study’s functional module adopted a new recommendation algorithm. The study chooses DKT to model the knowledge states of users in OME systems. It uses DKT technology to dynamically model and track users’ knowledge status, accurately and promptly reflecting their knowledge level in RL recommendation and other related processes. The schematic diagram of DKT structure is shown in Figure 2. Schematic diagram of DKT model structure.
In Figure 2, the input data is the user’s historical answer sequence
The preprocessing of the dataset can be achieved through sentence segmentation and medical knowledge embedding. Sentence segmentation refers to dividing text data into smaller sentences or paragraphs for processing. The connection between the context of questions and answers is crucial when dealing with medical Q&A. Embedding medical knowledge involves mapping medical terms to a medical embedding space. This process uses PubMedBERT pre-trained models to obtain richer representations of medical knowledge. To align tasks with the DKT-RL model, a unified medical ontology framework is constructed first to map domain knowledge of medical problems onto a standardized medical ontology. The PubMedBERT model is then used to pre-train the questions and answers. Embedding vectors of medical concepts in the questions are generated. Map different types of medical questions into the same vector space and calculate the cosine similarity between each question’s embedding vector and the corresponding answer’s embedding vector. The overall structure of the user’s DKT-RL model is shown in Figure 3. Schematic diagram of the overall structure of DKT-RL model.
In Figure 3, the current knowledge state of the user is represented by the mastery probability vector output by the DKT model.
In equation (1),
In equation (2),
Improved TL algorithm for optimizing DKT-RL model
In the DKT-RL model, agents must balance exploring new strategies with utilizing known ones. This can result in prolonged trial and error and low efficiency. At the same time, general TL algorithms are unstable when dealing with large amounts of data in the field of medical education. They are unable to utilize the fine-grained information within the domain, resulting in low recommendation accuracy. In response, researchers have optimized DKT-RL with an improved TL algorithm to increase the speed and efficiency with which intelligent agents explore. This algorithm adapts to new tasks by selecting and adjusting existing instances. This algorithm assumes that the source domain and the target domain have many overlapping features, and that the support sets of the source domain and the target domain are the same or similar. The basic idea is that although there may be some differences between the auxiliary training data and the source training data, there is still a portion of the auxiliary training data that is more suitable for training an effective classification model and adapting to the test data. Therefore, the goal is to identify instances in the auxiliary training data suitable for testing and transfer them to learning from the source training data.
To further optimize the recommendation performance, the improved TL algorithm performs recommendation sequence transfer between different recommendation agents. The system will generate a series of recommendations for a certain user, which will then be used for other users. Useful information is extracted from the original task and applied to the new one. This enables new recommendation agents to learn how to recommend to new users faster by utilizing existing experience data. The process of optimizing the algorithm is to first calculate the similarity between the GG values of the source task
In equation (4),
In equation (5),
In equation (6),
In equation (7),
In equation (8),
Based on the above, reusable trajectory
In equation (10),
Results and discussion
The study involved setting up a testing environment, preparing configurations, and comparing different models to analyze the stability and effectiveness of the proposed optimization model. At the same time, test the functional modules and performance of the OME system to verify its effectiveness, compare its compatibility and feasibility.
Performance analysis of optimized models
Hardware environment of OME system was Central Processing Unit (CPU) model Intel® CoreTM i5-8300H. The main frequency was 2.3 GHz. The hard drive memory was 128 GB+1 TB, and the running memory was 16 GB. JavaScript programming language and Vue.js framework were selected for configuring the front-end development environment. Backend configuration included Python 3.8 programming language and FlaskWeb framework. The parameters set for the experiment included 200 hidden layers in DKT, dropout of 0.5, learning rate of 0.001, batch size of 64, and the selection of Adam optimizer. The parameter is set
To analyze the effectiveness of the optimization model, a study was conducted to compare the learning process of DKT, DKT-RL, and the optimization model. The learning efficiency Changes in E values of three models on two datasets. (a) MEDIQA RQE, (b) MedQA-BBY.
In Figure 4(a), the highest learning efficiency value of the optimized model and DKT-RL model on the MEDIQA RQE dataset was about 0.40. At this point, the learning efficiency
To further verify the stability of the DKT-RL model, four models were selected for comparative analysis. The four models were Multi-agent Federated Reinforcement Learning Policy (MFRLP),
23
Hybrid Recommendation (HR),
24
Adaptive Deep Q-learning (ADQ),
25
and Gated Recurrent Unit Analysis of Variation (GRU-A).
26
The learning efficiency values of different models on two datasets were calculated separately. Meanwhile, the study calculated the MAE, MSE, and RMSE results of different models in the same dataset. The performance comparison results of the DKT-RL model are shown in Figure 5. Performance comparison results of DKT-RL model. (a) E value of different models, (b) MAE, MSE, and RMSE results in MEDIQA RQE.
In Figure 5(a), compared with other models, the learning efficiency values of the DKT-RL model in the two datasets were 398.91 × 10−3 and 389.27 × 10−3, respectively. DKT-RL was effective and had good advantages in improving users’ knowledge level, which helped to enhance their learning status. In Figure 5(b), compared to the other four models, the DKT-RL model had the smallest MAE, MSE, and RMSE values. The results were 7.04%, 2.11%, and 13.23%, respectively. Except for the DKT-RL model, the MSE of ADQ was the smallest at 2.12%, and the MSE of MFRLP was the largest at 3.91%. The DKT-RL model differed from the two by 0.01% and 1.80%, respectively. This was because the RL reward mechanism in the DKT-RL model effectively guided users to higher-order knowledge states. Additionally, the DKT’s dynamic modeling ability solved the problem of insufficient characterization of medical knowledge coherence in traditional recommendation systems, such as MFRLP and HR. Furthermore, given the diminishing marginal benefit of model performance, optimizing the model’s mean square error from 2.11% to 2.0% required an additional 30% of training time. However, this only improved the accuracy by 0.11%, which was not cost-effective enough.
To verify the classification performance of the improved TL algorithm, three algorithms, TL, Deep Transfer Learning (DTL),
27
and Classes Weighting and Transfer Learning (CWTL),28–31 were selected for comparison. Study selected different Changes in average classification error rate on different datasets. (a) MEDIQA RQE, (b) MedQA-BBY.
In Figure 6(a), as the ratio of
To analyze the optimization model’s effectiveness, the RL, DTL, and CWTL models were selected for comparison. Their respective algorithms were used to perform 5000 training rounds in the same training environment. The total reward values changes of the four models during each round of training are shown in Figure 7. Changes in the total reward value of the four models.
In Figure 7, the total reward value change curve using only the RL model had the slowest convergence speed, starting to converge after approximately 1500 training rounds. The RL model was relatively inefficient in task learning. The DTL and CWTL models began to converge after 400 and 800 rounds, respectively. The total reward value curve of the optimized model converged relatively quickly. It began to converge after about 100 training rounds, at which point the total reward value was −8.3. It could quickly learn useful information from data and reach a stable state. In the initial stage, the optimization model transfers the diagnostic logic pattern from cardiology to pulmonology. This transfer resulted in an initial reward value that was 2.4 times higher than RL. Compared to other models, the optimized model could obtain more positive feedback and perform better during the learning process.
OME system functionality and performance test results
System functional test results.
In Table 1, all functional test results for both the course center module and the exam center module are passed. The key functions of course recommendation, course search, and score query support users in searching for course information and learning based on recommended content. The exam center module supported user login, online practice, and participation in exams. There were no interruptions during the exam process, and the testing module functions normally.
To verify the feasibility and applicability of the OME system, JMeter testing tool was used in the study. One hundred and fifty users were set to use the system at the same time, and the CPU and memory of the system were evaluated during the testing process in a specialized CPU detection tool. The compatibility and consumption changes of OME system are shown in Figure 8. OME system compatibility and consumption changes. (a) Number of transactions and throughput, (b) CPU consumption and memory consumption.
In Figure 8(a), at a runtime of 40 granularity/s, the transaction frequency range of the OME system was between 2 and 6 seconds. This met medical education interaction standards and ensures there was no perceptual delay in scenarios such as consultations, simulations, and image annotations. The system could handle concurrent operations within a given time frame. The system had a throughput of 4.6 B/s and could support real-time correction of over 200 clinical multiple-choice questions. In Figure 8(b), the system maintained a memory consumption rate of 49% throughout 5000 rounds of testing, with stable memory usage and no significant increase. CPU consumption remained at around 58%, with 42% of computing power reserved for sudden emergency teaching and diagnostic requests. Although there was only a slight fluctuation in the consumption of 3500 to 4500 rounds, the overall consumption remained below 60%.
Performance test results of OME system.
In Table 2, the average response time of the four items in the OME system performance test was within 1000–3000 ms. The overall error rate of the four projects was relatively low, with an error rate of 0.00% for user login and user query. The error rate of course recommendations was 0.13%, and the failure rate of their operations was still relatively low. The average throughput of the four projects was 4.66 B/s. Overall, the system had good stability in handling these four projects.
Conclusion
With the deepening of educational informatization, the development and design of online education systems are more in line with intelligence and personalization. To enhance the educational effectiveness for medical users, the study first designed an OME system and developed a DKT-RL model for recommending medical exercises. Next, research was conducted on the improved TL algorithm. This algorithm involves transferring recommendation sequences between different recommendation agents in order to improve accuracy further. The results showed that after 100 iterations of optimizing the model, the total reward value reached −8.3. When the ratio of
Footnotes
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research is supported by “The 14th Five-Year” plan project of Hebei Higher Education Association (GJXH2024-124).
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
