Combining transfer learning and reinforcement learning for optimization of online medical education

Abstract

Due to the rapid updating of medical knowledge, existing online medical education systems have issues such as delayed tracking of knowledge status and poor matching of recommended content to users. To improve the medical knowledge level of learning users, this study develops an intelligent online medical education system. Then, the deep knowledge tracking technique is combined with reinforcement learning to dynamically track the user knowledge state and optimize the motion recommendation strategy. Based on this model, an improved migration learning algorithm is combined to optimize the recommendation accuracy of the model. The results indicated that compared to other models, the maximum learning efficiency value of the optimized model was 398.91 × 10⁻³. The mean square error was the smallest, at 2.11%. In the same dataset, the learning efficiency of the optimized model was 0.16 higher than that of the model using the deep knowledge tracking technique, and the initial reward value of the optimized model was 2.4 higher than that of the reinforcement learning model. The optimization model proposed in the study was effective and had good advantages in improving users’ knowledge level, which helped to enhance their learning status. The system performs well in situations with high concurrency, consumes fewer system resources, and delivers excellent performance. These features help improve the efficiency of medical education and meet users’ personalized needs. At the same time, it promotes the development of intelligent medical education and provides strong technical support for the field.

Keywords

medicine online education transfer learning reinforcement learning DKT intelligent system

Introduction

With the intelligent development of science and technology, there has been tremendous innovation in the field of medical education. However, the traditional offline medical education model still faces challenges such as low efficiency and time constraints. For example, medical professionals only teach knowledge in the classroom. This method cannot meet the need for personalized, real-time learning that is accessible anytime and anywhere. This has a significant impact on the demand for personalized and instant learning.¹ To ensure the expansion and popularization of medical education, applying educational knowledge online has become a relatively novel approach. In this context, the establishment of Online Medical Education (OME) system can expand the coverage of education and provide a better platform for doctors and medical related users to continuously learn and improve their professional skills. The OME system is highly convenient. It is not limited by time and place and can save time and financial costs for medical teachers and students.² For example, Lucas, H.C., et al. used a large language model and system to search multiple databases and analyze medical knowledge content to better understand medical students’ learning experiences. The results showed that the system had good reproducibility.³ Groh M et al. scholars studied the impact of learning systems on the accuracy of skin disease diagnosis, and simulated the diagnosis of skin diseases through large-scale digital experiments. The results showed that the diagnostic accuracy error was 0.19.⁴ To improve the accuracy of early breast cancer diagnoses, professionals like Naz A. used a convolutional neural network and Internet of Things technology to distinguish tumor patients from non-tumor patients through super parameter adjustment. The results showed that the classification accuracy of this method reached 95%.⁵

In recent years, the development of deep learning technology has provided new ideas for optimizing online education. Among them, Deep Knowledge Tracing (DKT) captures more complex representations of student knowledge and improves predictive performance. It provides real-time monitoring for traditional models, which can only perform static diagnosis. Meanwhile, transfer learning (TL) in machine learning technology can apply the knowledge of existing models to new tasks, improving learning efficiency and effectiveness. In addition, reinforcement learning (RL) learns optimal strategies through the interaction of intelligent sports environments, and has good advantages in strategy optimization.⁶ Hooshyar D’s team addressed the issue of the low efficiency of artificial neural networks in education. The study conducted the research by introducing neural symbolic artificial intelligence methods that integrated educational knowledge and symbolic reasoning. These methods improved learners’ computational thinking models and effectively extracted educational knowledge. The results indicated that the method had interpretability and low bias.⁷ Lăzăroiu used deep learning and machine learning to predict infection trends and resource requirements in an attempt to analyze patient survival rates and treatment outcomes. They integrated clinical data, symptoms, and biomarkers. The results showed the effectiveness of this technology.⁸ Mosqueira-Rey et al. aimed to address the data bottleneck problem that deep learning models face when treating pancreatic cancer. They used a generated countermeasure network (GAN) to generate synthetic data and expand the sample size. They also combined this method with active learning, which introduces experts to label suspicious or new data. The results show that this method is practical.⁹

In summary, OME systems and deep learning technologies have a wide range of applications and developments in the field of education. However, traditional DKT can model knowledge states but cannot actively optimize learning paths. RL is also difficult to explore directly in medical scenarios due to security constraints. Although medical knowledge is highly systematic, GAN lacks a structured knowledge transfer mechanism when transferring across disciplines. Although the existing OME system provides massive resources, its static recommendation logic causes most users to quit midway through due to the exercises’ mismatched difficulty levels. For this purpose, an innovative intelligent OME system has been developed. It combines DKT technology with RL to build a recommendation model. This recommendation model tracks the user’s knowledge status in real time. It improves the model’s convergence speed and recommendation accuracy by transferring the sequence across recommendation agents through an improved TL algorithm. This study is the first to integrate medical knowledge tracking, RL recommendation, and TL in three stages, solving the problem of traditional systems’ inability to balance dynamic adaptability and cross-specialty transfer. The research objective is to provide medical learners with an efficient and convenient online learning tool, promoting the intelligent and personalized development of OME.

Methods and materials

First, the study designs an OME system. Then, it recommends using DKT technology to determine users’ knowledge levels for exercises in the OME system. Meanwhile, DKT is combined with RL to build a DKT-RL recommendation model. Considering the improvement of model accuracy, the study proposes an optimization model that combines an improved TL algorithm with DKT-RL recommendation.

OME system design and DKT-RL recommendation model construction

At present, some OME systems have problems such as uneven course content quality and differences in personalized teaching. Therefore, this study designs an OME system. The system uses the general-purpose programming language Python for data processing. The front end uses the Vue.js framework combined with the HTML5/CSS3 standard to implement a responsive interface design. The back end uses the lightweight Flask framework with a modular design to achieve multiple functions.¹⁰ To ensure consistency and scalability of the deployment environment, the system uses Docker containerization technology. The schematic diagram of OME system architecture is shown in Figure 1.

Figure 1.

Schematic diagram of OME system architecture.

In Figure 1, the system includes a view layer, a network layer, a model layer, a data layer, and an implementation layer. A user-friendly interface is designed and implemented by the view layer. The routing layer is used for path selection and management of data transmission. The MySQL database in the data layer is used to store user data and training data. The system functional modules include user management module, system management module, and personal center module. To ensure system stability and performance, the system ultimately needs to undergo unit testing, integration testing, and system testing. Meanwhile, real-time monitoring of the system’s operational status enables timely detection and resolution of issues.¹¹ To improve the user experience with exercise recommendations for medical education, the study’s functional module adopted a new recommendation algorithm. The study chooses DKT to model the knowledge states of users in OME systems. It uses DKT technology to dynamically model and track users’ knowledge status, accurately and promptly reflecting their knowledge level in RL recommendation and other related processes. The schematic diagram of DKT structure is shown in Figure 2.

Figure 2.

Schematic diagram of DKT model structure.

In Figure 2, the input data is the user’s historical answer sequence ${x_{1}, x_{2}, . . ., x_{t}}$ , and then ${x_{1}, x_{2}, . . ., x_{t}}$ is encoded as the input vector. The corresponding result ${y_{1}, y_{2}, . . ., y_{t}}$ is used to train the DKT model using these data. $h_{t}$ is the user’s knowledge state at $t$ time. The probability of the user answering the $x_{t}$ question correctly at this time is $y_{t}$ , $y_{t} \in R^{N \times i}$ . The model outputs the probability of each question’s mastery level $p_{t}$ . The overall exercise set is $A$ , which contains various medical education related content. The individual exercises are $q$ and ${q_{1}, q_{2}, . . ., q_{n}} \in A$ . The next question and answer pair are generated based on the level of mastery. Once the training is complete, a personalized knowledge model is generated for each user. After obtaining the user’s knowledge status, RL methods are used to recommend exercises to optimize learning outcomes. The implicit representation of the knowledge state at $t$ moment is extracted from the model, and the implicit representation is converted into a state through a discretization function. This helps dynamically track and model users’ knowledge status, thereby optimizing the effectiveness of personalized exercise recommendation systems.

The preprocessing of the dataset can be achieved through sentence segmentation and medical knowledge embedding. Sentence segmentation refers to dividing text data into smaller sentences or paragraphs for processing. The connection between the context of questions and answers is crucial when dealing with medical Q&A. Embedding medical knowledge involves mapping medical terms to a medical embedding space. This process uses PubMedBERT pre-trained models to obtain richer representations of medical knowledge. To align tasks with the DKT-RL model, a unified medical ontology framework is constructed first to map domain knowledge of medical problems onto a standardized medical ontology. The PubMedBERT model is then used to pre-train the questions and answers. Embedding vectors of medical concepts in the questions are generated. Map different types of medical questions into the same vector space and calculate the cosine similarity between each question’s embedding vector and the corresponding answer’s embedding vector. The overall structure of the user’s DKT-RL model is shown in Figure 3.

Figure 3.

Schematic diagram of the overall structure of DKT-RL model.

In Figure 3, the current knowledge state of the user is represented by the mastery probability vector output by the DKT model. $t$ time action $a_{t}$ is a recommended exercise. Specifically, by combining DKT technology with user answer records to train a user knowledge model, the state $s_{t}$ of the RL at the $t$ time is abstracted as the user knowledge state $h_{t}$ . When an intelligent agent explores a learning environment, rewards can ensure an optimal state, providing an optimal solution to the recommendation problem. The discretization processing function of $h_{t}$ is $f (h_{t})$ , the reward function $r_{t}$ at time $t$ , and the expression related to time $t$ is shown in equation (1)¹²

f (h_{t}) = s_{t}, r_{t} = \frac{1}{N} \sum_{i = 1}^{N} f (q_{i}), f (q_{i}) = p r o b a b i l i t y (y (q_{i})) = y (q_{i})

(1)

In equation (1), $N$ represents the number of exercises for $A$ . $q_{i}$ represents $i$ exercises. $f (q_{i})$ is the probability of the user answering $q_{i}$ correctly. The output layer result $y (q_{i})$ and knowledge state matrix $h_{t}$ of $q_{i}$ are expressed in equation (2)¹³

y (q_{i}) = σ (W_{y h} h_{t} + b_{y}), h_{t} = \tanh (W_{h_{q}} q t + W_{h_{h}} h_{t - i} + b_{h})

(2)

In equation (2), $σ$ and $\tanh$ are the Sigmoid and tanh activation functions, respectively. $b_{y}$ and $b_{h}$ are the bias of the output layer and the hidden layer, respectively. $W_{y h}$ is the input weight matrix. $W_{h_{q}}$ is the weight matrix of the user’s mastery status of the learned knowledge points. $W_{h_{h}}$ is a recursive weight matrix. $h_{t - i}$ is the knowledge state matrix at time $t - i$ . Meanwhile, the cumulative return of agent exploration is expressed as the sum of discount rewards $y (q_{i})$ , as shown in equation (3)¹⁴

G_{t} = r_{t} + \sum_{n = 1}^{N - n} γ^{n} r_{t}

(3)

In equation (3),

γ

is the discount factor,

γ \in [0, 1]

. The study determines the action

a_{t}

based on the

ε

greedy random strategy method, which randomly selects the action with probability

ε

and selects the action corresponding to the maximum

Q

value in other cases. Based on the selected actions, exercises are recommended, student feedback is obtained, and rewards are calculated. States, actions, rewards, and new states are stored in the experience replay pool. Small batches of data are sampled from the experience pool. Then, losses are calculated and the Q-network parameters are updated.

Improved TL algorithm for optimizing DKT-RL model

In the DKT-RL model, agents must balance exploring new strategies with utilizing known ones. This can result in prolonged trial and error and low efficiency. At the same time, general TL algorithms are unstable when dealing with large amounts of data in the field of medical education. They are unable to utilize the fine-grained information within the domain, resulting in low recommendation accuracy. In response, researchers have optimized DKT-RL with an improved TL algorithm to increase the speed and efficiency with which intelligent agents explore. This algorithm adapts to new tasks by selecting and adjusting existing instances. This algorithm assumes that the source domain and the target domain have many overlapping features, and that the support sets of the source domain and the target domain are the same or similar. The basic idea is that although there may be some differences between the auxiliary training data and the source training data, there is still a portion of the auxiliary training data that is more suitable for training an effective classification model and adapting to the test data. Therefore, the goal is to identify instances in the auxiliary training data suitable for testing and transfer them to learning from the source training data.

To further optimize the recommendation performance, the improved TL algorithm performs recommendation sequence transfer between different recommendation agents. The system will generate a series of recommendations for a certain user, which will then be used for other users. Useful information is extracted from the original task and applied to the new one. This enables new recommendation agents to learn how to recommend to new users faster by utilizing existing experience data. The process of optimizing the algorithm is to first calculate the similarity between the GG values of the source task $T_{o}$ and the target task $T_{t}$ . The formula for calculating the $Q$ value in the $s$ state and the action $a$ taken is shown in equation (4)¹⁵

Q^{π} (s, a) = E_{s^{'} \sim P_{s a (\cdot)}} [R (s, a) + γ V^{π} (s^{'})]

(4)

In equation (4), $V^{π} (s) = E (G_{s})$ is a state value function determined by $π$ . A trajectory $τ$ with a maximum length of $g$ is set, and the estimated expected return on $τ$ is $Q (s, a)$ , as expressed in equation (5)¹⁶

Q (s, a) = Q^{p} (s, a) + \sum_{s^{'}, a^{'}} μ_{s, a \to s^{'}} Q^{p} (s^{'}, a^{'})

(5)

In equation (5), $Q^{p} (s, a)$ is the estimated local $Q$ value for the trajectory of the custom local segment length $p$ , $p < g$ . $μ_{s, a \to s^{'}}$ is the overall probability of achieving state $s^{'}$ transition within $g$ . Essentially, the cumulative sum of $Q^{p} (s, a)$ is $Q (s, a)$ , while the current RL algorithm is difficult to calculate the overall $Q$ value. For this purpose, the Monte Carlo algorithm extension is selected for action $a_{t}$ and state $s_{t}$ . The calculation formula for this process is shown in equation (6)¹⁷

{\begin{cases} Q (s_{t}, a_{t}) = γ Q (s_{t}, a_{t}) + R_{t + 1}, \\ \hat{Q} (s_{t}, a_{t}) = a v e r a g e (r (s_{t}, a_{t})), \\ Q (s_{t - p}, a_{t - p}) = Q (s_{t - p}, a_{t - p}) + γ \hat{Q} (s_{t}, a_{t}) \end{cases}

(6)

In equation (6), $\hat{Q} (s_{t}, a_{t})$ is the local $Q$ value, and $\hat{Q} (s_{t}, a_{t})$ is the global $Q$ value at the $t - p$ position. In the source task $T_{o}$ , call an adjusted algorithm to decompose the trajectory into multiple local $Q$ values, and learn the optimal strategy $π_{T_{s}}^{*}$ through this method. The local data of $T_{o}$ is used to calculate the similarity between the local trajectory $τ_{t}$ sampled by $T_{t}$ and the local trajectory $τ_{s}$ of $T_{o}$ . The expression for migration confidence $κ_{s, a}$ is shown in equation (7)¹⁸

{\begin{cases} κ_{s, a} = 1, & \hat{ε} \geq ε (ε < ξ) \\ κ_{s, a} = e^{\frac{2 n {\hat{ε}}^{2}}{B^{2}}}, & \hat{ε} < ε (ε \geq ξ) \end{cases}

(7)

In equation (7), $ξ$ is the set minimum threshold, $ξ = 0.05$ . The expressions for the positive values $ε$ and the remaining positive values $\hat{ε}$ in the greedy strategy are shown in equation (8)

\hat{ε} = \hat{Q} (s, a^{*}) - \hat{Q} (s, a), ε = \sqrt{\frac{B^{2} \ln \frac{1}{δ}}{2 n}}

(8)

In equation (8), $\hat{Q} (s, a^{*})$ and $\hat{Q} (s, a)$ are the largest and second largest actions, respectively. $δ$ is the set confidence threshold, $δ = \frac{1}{3}$ . $n$ is the number of trajectories $τ$ sampled from $s$ and $a$ on $T_{t}$ . The transferability and reusability $T_{target}$ in $T_{o}$ is calculated based on the confidence level in $T_{t}$ sampling, as expressed in equation (9)¹⁹

ω (τ) = \frac{\sum_{s, a \in τ} κ_{s, a}}{| τ |}

(9)

Based on the above, reusable trajectory $τ^{*}$ is adopted from the source task trajectory $ψ_{T_{s}}^{s, a}$ according to the number of custom sampling sequences. The filtered $τ^{*}$ is stored in the migration sequence buffer $T_{B}$ . $τ^{*}$ is added to the complete training sample data of $T_{t}$ , and the DKT-RL algorithm is selected for training and learning based on $ψ_{T_{s}}^{s, a} \cup T_{B}$ . To measure the learning situation of users towards medical education content, the study chose learning efficiency as the evaluation criterion. Its expression is shown in equation (10)²⁰

E = \frac{s_{f i n a a l} - s_{i n i t i a l}}{s_{t o t a l} - s_{f i n a a l}}

(10)

In equation (10), $s_{i n i t i a l}$ is the initial score of the user before using the OME system, $s_{f i n a a l}$ is the score of the user after the end of the learning phase at time $t$ . Moreover, $s_{t o t a l}$ is the total score of the user after the end of the medical subject learning at time $t$ .

Results and discussion

The study involved setting up a testing environment, preparing configurations, and comparing different models to analyze the stability and effectiveness of the proposed optimization model. At the same time, test the functional modules and performance of the OME system to verify its effectiveness, compare its compatibility and feasibility.

Performance analysis of optimized models

Hardware environment of OME system was Central Processing Unit (CPU) model Intel^® CoreTM i5-8300H. The main frequency was 2.3 GHz. The hard drive memory was 128 GB+1 TB, and the running memory was 16 GB. JavaScript programming language and Vue.js framework were selected for configuring the front-end development environment. Backend configuration included Python 3.8 programming language and FlaskWeb framework. The parameters set for the experiment included 200 hidden layers in DKT, dropout of 0.5, learning rate of 0.001, batch size of 64, and the selection of Adam optimizer. The parameter is set $γ = 0.9$ in TL, with a learning rate of 0.5, $ρ = 0.9$ , $p = 10$ , $g = 50$ . The training epochs were 5000, and the maximum transfer sequence was 200. Normal reward was −1, intermediate state reward was −0.2, end position reward was 0.5. The study selected the publicly available datasets MEDIQA Recognizing Question Entrance (RQE) and MedQA-But-Better-Yield (BBY). MEDIQA RQE was a dataset focused on medical problem inference, which included pairs of questions in the medical field and annotated the inference relationships between each pair of questions.²¹ The MedQA BBY dataset was collected from professional medical committee exams, which simulated multiple-choice answers from the US Medical Licensing Examination.²² The evaluation indicators included learning efficiency $E$ value, Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE).

To analyze the effectiveness of the optimization model, a study was conducted to compare the learning process of DKT, DKT-RL, and the optimization model. The learning efficiency $E$ values of the three models on two datasets are shown in Figure 4.

Figure 4.

Changes in E values of three models on two datasets. (a) MEDIQA RQE, (b) MedQA-BBY.

In Figure 4(a), the highest learning efficiency value of the optimized model and DKT-RL model on the MEDIQA RQE dataset was about 0.40. At this point, the learning efficiency $E$ value of the DKT model was only 0.24. The learning efficiency $E$ value of the optimized model was 0.16 higher than that of the DKT model. The reason was that DKT in the optimization model played a good role, making the performance of user knowledge state changes more significant, which enhanced accuracy. Transferring common features between medical knowledge optimized the model, enabling it to capture changes in user knowledge status more quickly. This improved the adaptability of exercise recommendations. In Figure 4(b), as the number of learning epochs gradually increased, the learning efficiency $E$ of the three models on the MedQA-BBY dataset also improved. The learning efficiency value of the optimized model was always higher than the other two models. The learning efficiency $E$ value of the DKT-RL model increased from 0.17 to 0.39 in 60 learning rounds. In the same number of learning epochs, the $E$ value of the DKT model only increased from 0.08 to 0.16. The optimized model performed better in MedQA-BBY, indicating that TL played a key role in the generalization ability of complex medical knowledge systems.

To further verify the stability of the DKT-RL model, four models were selected for comparative analysis. The four models were Multi-agent Federated Reinforcement Learning Policy (MFRLP),²³ Hybrid Recommendation (HR),²⁴ Adaptive Deep Q-learning (ADQ),²⁵ and Gated Recurrent Unit Analysis of Variation (GRU-A).²⁶ The learning efficiency values of different models on two datasets were calculated separately. Meanwhile, the study calculated the MAE, MSE, and RMSE results of different models in the same dataset. The performance comparison results of the DKT-RL model are shown in Figure 5.

Figure 5.

Performance comparison results of DKT-RL model. (a) E value of different models, (b) MAE, MSE, and RMSE results in MEDIQA RQE.

In Figure 5(a), compared with other models, the learning efficiency values of the DKT-RL model in the two datasets were 398.91 × 10⁻³ and 389.27 × 10⁻³, respectively. DKT-RL was effective and had good advantages in improving users’ knowledge level, which helped to enhance their learning status. In Figure 5(b), compared to the other four models, the DKT-RL model had the smallest MAE, MSE, and RMSE values. The results were 7.04%, 2.11%, and 13.23%, respectively. Except for the DKT-RL model, the MSE of ADQ was the smallest at 2.12%, and the MSE of MFRLP was the largest at 3.91%. The DKT-RL model differed from the two by 0.01% and 1.80%, respectively. This was because the RL reward mechanism in the DKT-RL model effectively guided users to higher-order knowledge states. Additionally, the DKT’s dynamic modeling ability solved the problem of insufficient characterization of medical knowledge coherence in traditional recommendation systems, such as MFRLP and HR. Furthermore, given the diminishing marginal benefit of model performance, optimizing the model’s mean square error from 2.11% to 2.0% required an additional 30% of training time. However, this only improved the accuracy by 0.11%, which was not cost-effective enough.

To verify the classification performance of the improved TL algorithm, three algorithms, TL, Deep Transfer Learning (DTL),²⁷ and Classes Weighting and Transfer Learning (CWTL),^28–31 were selected for comparison. Study selected different $T_{t}$ and T_s ratios and performed 10 random learning classifications on different datasets using different algorithms. The average classification error rate obtained is shown in Figure 6.

Figure 6.

Changes in average classification error rate on different datasets. (a) MEDIQA RQE, (b) MedQA-BBY.

In Figure 6(a), as the ratio of $T_{t}$ and T_s gradually increased, the average classification error rates of the four algorithms on the MEDIQA RQE dataset decreased. The average classification error rate of the improved TL algorithm was consistently lower than that of other algorithms. When the ratio of $T_{t}$ to T_s was 0.01, the improved TL algorithm resulted in 0.275%. At this point, the TL result was 0.347%, while the improved TL algorithm resulted in a decrease of 0.072%. In Figure 6(b), when the ratio of $T_{t}$ to T_s was 0.01, the improved TL algorithm yielded a result of 0.277% on the MedQA-BBY dataset. At this point, the TL result was 0.369%, while the improved TL algorithm resulted in a decrease of 0.092%.

To analyze the optimization model’s effectiveness, the RL, DTL, and CWTL models were selected for comparison. Their respective algorithms were used to perform 5000 training rounds in the same training environment. The total reward values changes of the four models during each round of training are shown in Figure 7.

Figure 7.

Changes in the total reward value of the four models.

In Figure 7, the total reward value change curve using only the RL model had the slowest convergence speed, starting to converge after approximately 1500 training rounds. The RL model was relatively inefficient in task learning. The DTL and CWTL models began to converge after 400 and 800 rounds, respectively. The total reward value curve of the optimized model converged relatively quickly. It began to converge after about 100 training rounds, at which point the total reward value was −8.3. It could quickly learn useful information from data and reach a stable state. In the initial stage, the optimization model transfers the diagnostic logic pattern from cardiology to pulmonology. This transfer resulted in an initial reward value that was 2.4 times higher than RL. Compared to other models, the optimized model could obtain more positive feedback and perform better during the learning process.

OME system functionality and performance test results

The verification of OME system functionality can confirm the effectiveness of the research design. In order to analyze the practicality of the system functionality, the course center module and exam center module were tested. The test project descriptions, expected results, and actual results regarding the design of two modules are shown in Table 1.

Table 1.

System functional test results.

	Case name	Course recommendation	Course search	Score inquiry
Course center testing	Use case description	Select recommended courses after completing the course learning	Enter keywords in the search box	Students complete course exams and check
	Expected results	Support students by completing the course study page and choosing the recommended courses	Support students to search for course information	Students check at the grades and rankings of a medical course
	Conclusion	Pass	Pass	Pass
Exam center testing	Use case description	Support for preview after editing the exercises	Support for mobile terminal online testing	Support for test scores and ranking viewing
	Expected results	Edit the exercises after the successful system login	Complete the course study, and log in on the mobile terminal to complete the online exam	Complete each medical course examination
	Conclusion	Support for preview after editing the exercises	Support online testing	After the completion of the exam, people can view the results and the answers
	Use case description	Pass	Pass	Pass

In Table 1, all functional test results for both the course center module and the exam center module are passed. The key functions of course recommendation, course search, and score query support users in searching for course information and learning based on recommended content. The exam center module supported user login, online practice, and participation in exams. There were no interruptions during the exam process, and the testing module functions normally.

To verify the feasibility and applicability of the OME system, JMeter testing tool was used in the study. One hundred and fifty users were set to use the system at the same time, and the CPU and memory of the system were evaluated during the testing process in a specialized CPU detection tool. The compatibility and consumption changes of OME system are shown in Figure 8.

Figure 8.

OME system compatibility and consumption changes. (a) Number of transactions and throughput, (b) CPU consumption and memory consumption.

In Figure 8(a), at a runtime of 40 granularity/s, the transaction frequency range of the OME system was between 2 and 6 seconds. This met medical education interaction standards and ensures there was no perceptual delay in scenarios such as consultations, simulations, and image annotations. The system could handle concurrent operations within a given time frame. The system had a throughput of 4.6 B/s and could support real-time correction of over 200 clinical multiple-choice questions. In Figure 8(b), the system maintained a memory consumption rate of 49% throughout 5000 rounds of testing, with stable memory usage and no significant increase. CPU consumption remained at around 58%, with 42% of computing power reserved for sudden emergency teaching and diagnostic requests. Although there was only a slight fluctuation in the consumption of 3500 to 4500 rounds, the overall consumption remained below 60%.

To analyze the performance of the OME system, performance testing was also required. Corresponding test cases were designed based on system functions, including user login, user query, exam answering, course recommendation, etc. Performance testing results are shown in Table 2.

Table 2.

Performance test results of OME system.

Project	User login	User query	Exam answering	Course recommendation
Sample size	4500	4500	4500	4500
Average response time (ms)	1240	2530	2280	2490
Error rate (%)	0.00	0.00	0.02	0.13
Throughput (B/s)	4.69	4.63	4.65	4.68

In Table 2, the average response time of the four items in the OME system performance test was within 1000–3000 ms. The overall error rate of the four projects was relatively low, with an error rate of 0.00% for user login and user query. The error rate of course recommendations was 0.13%, and the failure rate of their operations was still relatively low. The average throughput of the four projects was 4.66 B/s. Overall, the system had good stability in handling these four projects.

Conclusion

With the deepening of educational informatization, the development and design of online education systems are more in line with intelligence and personalization. To enhance the educational effectiveness for medical users, the study first designed an OME system and developed a DKT-RL model for recommending medical exercises. Next, research was conducted on the improved TL algorithm. This algorithm involves transferring recommendation sequences between different recommendation agents in order to improve accuracy further. The results showed that after 100 iterations of optimizing the model, the total reward value reached −8.3. When the ratio of $T_{t}$ to $T_{t}$ was 0.01, the average classification error rate of the improved TL algorithm was always lower than other algorithms, only 0.275%. At this point, the TL result was 0.347%, while the improved TL algorithm resulted in a decrease of 0.072%. The CPU consumption of the OME system remained at around 58% and the memory consumption remained at 49% during the testing process. The system’s ability to operate efficiently while using fewer computing resources demonstrates its good performance. However, the research does not address the data privacy issues in the system, posing a threat to users’ security. In the future, the introduction of advanced encryption standard 256 and Paillier hybrid encryption technology ensures that learning behavior data is irreversibly traceable during transmission and recommendation result generation.

Footnotes

ORCID iD

Yiqing Yang

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research is supported by “The 14th Five-Year” plan project of Hebei Higher Education Association (GJXH2024-124).

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Samuel

King

Cervero

. Medical school faculty perceptions of online education: implications for the future of medical education. Am J Dist Educ 2024; 38(3): 263–278.

Lee

. The rise of ChatGPT: exploring its potential in medical education. Anat Sci Educ 2024; 17(5): 926–931.

Lucas

Upperman

Robinson

. A systematic review of large language models and their implications in medical education. Med Educ 2024; 58(11): 1276–1285.

Groh

Badri

Daneshjou

, et al. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat Med 2024; 30(2): 573–583.

Naz

Khan

Din

, et al. An efficient optimization system for early breast cancer diagnosis based on internet of medical things and deep learning. Eng Technol Appl Sci Res 2024; 14(4): 15957–15962.

Lee

Kim

Kron

. Virtual education strategies in the context of sustainable health care and medical education: a topic modelling analysis of four decades of research. Med Educ 2024; 58(1): 47–62.

Hooshyar

Azevedo

Yang

. Augmenting deep neural networks with symbolic educational knowledge: towards trustworthy and interpretable ai for education. Mach Learn Knowl Extr (2019) 2024; 6(1): 593–618.

Lăzăroiu

Gedeon

Rogalska

, et al. The economics of deep and machine learning-based algorithms for COVID-19 prediction, detection, and diagnosis shaping the organizational management of hospitals. oc 2024; 15(1): 27–58.

Mosqueira-Rey

Hernández-Pereira

Bobes-Bascarán

, et al. Addressing the data bottleneck in medical deep learning models using a human-in-the-loop machine learning approach. Neural Comput Appl 2024; 36(5): 2597–2616.

10.

Bignold

Cruz

Taylor

, et al. A conceptual framework for externally-influenced agents: an assisted reinforcement learning review. J Ambient Intell Hum Comput 2023; 14(4): 3621–3644.

11.

Silveira

de Sousa Romeiro

Junger

, et al. The impact of the Covid-19 pandemic on medical education: a systematic review of distance learning, student’s perceptions, and mental health. jhgd 2023; 33(3): 405–419.

12.

Zhang

Wang

, et al. Enhanced dynamic key-value memory networks for personalized student modeling and learning ability classification. Cognit Comput 2024; 16(6): 2878–2901.

13.

Boscardin

Gin

Golde

, et al. ChatGPT and generative artificial intelligence for medical education: potential impact and opportunity. Acad Med 2024; 99(1): 22–27.

14.

Triola

Burk-Rafel

. Precision medical education. Acad Med 2023; 98(7): 775–781.

15.

Yun

Ravi

Jumani

. Analysis of the teaching quality on deep learning-based innovative ideological political education platform. Prog Artif Intell 2023; 12(2): 175–186.

16.

Zhu

Sun

Luo

, et al.

How to harness the potential of ChatGPT in education?

Knowl Manag E-Learn 2023; 15(2): 133–152.

17.

Aburayya

Salloum

Alderbashi

, et al. SEM-machine learning-based model for perusing the adoption of metaverse in higher education in UAE. 10 5267/j ijdns 2023; 7(2): 667–676.

18.

Feng

Fang

, et al. blockchain-based scheme for secure data offloading in healthcare with deep reinforcement learning. IEEE/ACM Trans Netw 2023; 32(1): 65–80.

19.

Abdullah

Sofyan

. Machine learning in safety and health research: a scientometric analysis. Int J Integrated Supply Manag 2023; 21(1): 17–37.

20.

Liu

Zhang

Fan

, et al. A probabilistic generative model for tracking multi-knowledge concept mastery probability. Front Comput Sci 2024; 18(3): 155–170.

21.

Motwani

Shukla

Pawar

. Novel framework based on deep learning and cloud analytics for smart patient monitoring and recommendation (SPMR). J Ambient Intell Hum Comput 2023; 14(5): 5565–5580.

22.

Yin

Xia

, et al. Self-supervised learning for recommender systems: a survey. IEEE Trans Knowl Data Eng 2023; 36(1): 335–355.

23.

Tiwari

Lakhan

Jhaveri

, et al. Consumer-centric internet of medical things for cyborg applications based on federated reinforcement learning. IEEE Trans Consum Electron 2023; 69(4): 756–764.

24.

Parthasarathy

Sathiya Devi

. Hybrid recommendation system based on collaborative and content-based filtering. Cybern Syst 2023; 54(4): 432–453.

25.

Zhang

, et al. Deep reinforcement learning for adaptive learning systems. J Educ Behav Stat 2023; 48(2): 220–243.

26.

Lakshmi

Maheswaran

. Effective deep learning-based grade prediction system using gated recurrent unit (GRU) with feature optimization using analysis of variance (ANOVA). Automatika 2024; 65(2): 425–440.

27.

Wang

Yang

, et al. Production quality prediction of cross-specification products using dynamic deep transfer learning network. J Intell Manuf 2024; 35(6): 2567–2592.

28.

El Gannour

Hamida

Lamalem

, et al. Improving skin diseases prediction through data balancing via classes weighting and transfer learning. Bulletin EEI 2024; 13(1): 628–637.

29.

Deng

Chen

. Construction of application-oriented innovation and entrepreneurship education system in provincial universities and colleges based on “internet + cloud platform”. Adv Manag Sci 2022; 11(1): 66–68.

30.

Dai

Jin

Jiang

, et al. A qualitative study on the current situation and expectations of family education from the perspective of college students. Adv Manag Sci 2023; 12(2): 17–20.

31.

Zhu

. How to explain the relationship between play and learning in early childhood education from the perspective of learning theory. Adv Manag Sci 2024; 13(2): 22–24.