Abstract
Infectious diseases have always been a focus of global public health attention. However, the current methods for analyzing and predicting the trend of infectious disease transmission are not comprehensive, resulting in significant discrepancies in the analysis results. To address this issue, a differential equation model of infectious diseases with latent periods is proposed for Susceptible-Exposed-Infectious-Recovered-Susceptible. On this basis, a staged model combined with logistic regression is established. Finally, in response to the problem of missing data related to infectious diseases in certain regions, LSTM networks and autoregressive integrated moving average models are used for prediction to assist differential equation models in disease transmission analysis. Through empirical analysis, it is known that the fitting degree between the results obtained from the infectious disease differential equation model proposed in the study and the actual results reaches 0.95, with a root mean square error of 0.03 and an average relative error of 3.5%. The prediction model proposed in the study achieves an accuracy of 92.72%, reaching convergence accuracy and convergence loss values after only 23 and 22 iterations. The differential equation model for infectious diseases proposed in the study can more accurately predict the spread trend of infectious diseases and provide scientific basis for public health decision-making.
Introduction
Infectious diseases are diseases that are transmitted by pathogens and cause damage to human health. These types of diseases are usually highly contagious and can be transmitted through various routes such as air, contact, blood, sexual contact, and mother to child transmission. 1 Once a large-scale infectious disease outbreak occurs, it will directly threaten public health and safety, and have a significant impact on the social economy. 2 The Susceptible-Exposed-Infectious-Recovered-Susceptible Model (SEIRS) is a critical tool for studying the dynamics of infectious disease transmission. This model helps to understand the transmission mechanism and influencing factors of diseases by simulating the disease transmission process in a population. 3 The SEIRS model divides the population into four states: susceptible, exposed, infectious, and recovered. By analyzing the transitions between these four states, it is possible to predict the spread trend of diseases, evaluate the effectiveness of public health interventions, and provide scientific basis for formulating prevention and control strategies. 4
Many scholars have begun to conduct research on improving the SEIRS model. Huang G et al. developed an enhanced SEIRS infectious disease dynamics algorithm, incorporating considerations of environmental pollution to mitigate the influence of volatile organic compounds on the atmospheric environment. Taking Xi’an, China as an example, the algorithm was applied to the emission reduction calculation of meteorological monitoring stations. The experiment outcomes denoted that the solution accuracy of this algorithm was significantly higher compared to other algorithms. 5 Boulaaras et al. proposed an SEIRS model considering interpersonal communication examples to explore the mathematical strategies of COVID-19 infectious diseases under the SEIRS model. The model used the prevalence rate model based on real world information, and analyzed the prevalence of COVID-19 through numerical simulation. The outcomes denoted that the model could predict the spread trend of the epidemic. 6 Emmanuel et al. conducted a study on the transmission dynamics of dengue fever in subtropical Malaysia by collecting and analyzing relevant data using the SEIRS model. The research results indicated that different public health interventions would directly affect the transmission and control of dengue fever. 7 Ochieng et al. proposed an improved SEIRS model to assess the effect of consciousness-based control measures on the dynamics of malaria transmission. The model used the fourth-order and fifth-order Runge-Kutta methods to numerically solve the model equations. Finally, the least squares curve fitting method was utilized to fit the data. The results indicated that social media-based promotional activities and optimized control measures were the most effective in controlling malaria. 8
Moreover, the ongoing development of computer technology has resulted in the emergence of deep learning as a novel approach for the analysis of the dynamics of infectious disease transmission. 9 To investigate the effect of public health and social measures caused by COVID-19 on the spread of influenza virus, Ali S T et al. used influenza virus activity monitoring data and convolutional neural network modeling in different locations and countries in the past 5 years. The transmission rate of influenza was predicted by establishing a predictive model. The outcomes denoted that after the implementation of this measure, the global influenza transmission rate decreased by more than 15%. 10 Kumar et al. found that the existing SEIR model of COVID-19 disease obtained more accurate data of infection cases and death cases. Therefore, it proposed another extended model that introduces a hierarchical network structure to more accurately simulate the dynamic spread of viruses in different populations. The outcomes denoted that the disease prediction accuracy of the model reached over 90%. 11 Sardar et al. raised a disease prediction model grounded on Autoregressive Integrated Moving Average (ARIMA) model and extreme gradient enhancement algorithm to accurately predict the spread of the 2019 coronavirus. This model further improved prediction accuracy by integrating the prediction results of multiple machine learning methods. The outcomes denoted that the prediction accuracy of this method reached 92.45%. 12
Based on the above content, it can be known that the traditional SEIR model has deficiencies in aspects such as immune loss and latent infectivity, and it is difficult to accurately predict in the absence of data. In addition, when constructing and analyzing the SEIRS model, multiple factors need to be taken into account, such as the dynamic changes of the population, the incubation period of diseases, the length of the immune period, and the contact patterns among different populations. All these factors will affect the accuracy of the model and the prediction results. 13 Therefore, an SEIRS infectious disease differential equation model with latent period is proposed and its effectiveness is verified through empirical analysis. At the same time, in response to the insufficient data on certain early diseases, deep learning methods are combined to further accurately predict the spread trend of diseases, providing reference for early warning and prevention and control. The novelty of the research lies in considering the latent period and asymptomatic situation, and further optimizing the SEIRS model through a phased approach, so that the model can more accurately reflect the actual situation. The research aims to analyze the transmission trends of infectious diseases, discuss the existence and stability of endemic disease balance points and disease-free balance points, and provide theoretical support for public health decision-making.
The research gap and novelty lie in the meticulous consideration of the incubation period and asymptomatic cases. By integrating deep learning to optimize the SEIRS model, the prediction accuracy has been enhanced, filling the prediction blind spot of traditional models when early data is insufficient, and providing a more solid theoretical basis for prevention and control strategies.
Methods and materials
To conduct in-depth analysis of the transmission mechanism of infectious diseases, an SEIRS model including incubation period is constructed. This model adds latency states to the traditional SEIRS model to more accurately analyze disease transmission trends. At the same time, the research combines deep learning methods to establish disease prediction models, further improving prediction accuracy.
Construction of SEIRS differential equation model with latent period
During the transmission of infectious diseases, symptoms do not necessarily manifest once infected. The existence of incubation period makes it difficult to detect diseases in the early stages. The length of the incubation period also varies depending on the disease, with some diseases having an incubation period of only a few days, while others may last for several months or even years. During this period, infected individuals may appear completely healthy, but they are already contagious and able to spread the pathogen to others.14–16 Asymptomatic carriers may have no relevant clinical symptoms after the incubation period, and may self-heal and become recovered due to their immune system eliminating the virus. Therefore, the study considers both latency and asymptomatic infection and establishes an improved SEIRS differential equation. The improved model satisfies equation (1).
In equation (1), The infection mechanism of the improved SEIRS model.
In Figure 1, susceptible individuals enter the incubation period after being infected with the virus, during which the virus replicates in the body but has not yet shown obvious clinical symptoms. Subsequently, some patients began to show symptoms, while others did not, but still had a certain degree of infectivity. Finally, after treatment and rehabilitation, the patient may still become susceptible again. The differential equation of susceptible individuals is shown in equation (2).
In equation (2),
In equation (3),
In equation (4),
The study substituted equation (1) into the constructed SEIRS differential equation model to calculate the disease-free equilibrium (DFE) point. During the calculation process, the differential equations other than rehabilitation patients were set as Specific characterization results of DFE point.
In Figure 2, the study set the number of explicit infections to exceed the number of recovered patients, and used the above four equations to find the number of populations in four different states. Thus, the equilibrium point 2 of endemic diseases is obtained. The calculation method of
In equation (6),
In equation (7),
In equation (8), Principles of logistic model.
In Figure 3, the logistics regression model describes the process of disease transmission through an S-shaped curve. In the early stages of disease transmission, due to the large number of susceptible populations, the transmission rate is relatively fast and the slope of the curve is large. With the increase in the amount of infected individuals, the susceptible population gradually declines; the transmission rate undergoes a decrease, leading to a decline in the slope of the curve. Finally, when the number of infected individuals reaches a certain proportion, due to resource limitations, the formation of immune populations, and other factors, the transmission rate further decreases until it approaches zero. At this point, the curve tends to flatten and reaches a stable state, known as the “carrying capacity.” The change in the number of a single population in the logistics model is shown in equation (9).
In equation (9),
In equation (10),
Infectious disease prediction based on SEIRS model and ARIMA model
After improving the SEIRS differential equation model by introducing the concepts of latency and staging, the study found that the data statistics of certain regions or infectious disease events were not complete, which led to certain uncertainties in the estimation of model parameters. Therefore, based on the proposed phased approach, the ARIMA model is introduced to predict the number of cured and deceased individuals. The ARIMA model is a statistical tool that is frequently employed in the analysis of time series, which predicts future data points through three steps: autoregression, differencing, and moving average. The building process of ARIMA model is denoted in Figure 4. The building process of ARIMA model.
In Figure 4, the initial step involved in constructing the ARIMA model is to ascertain the stationarity of the time series. Differential operation is performed when non-stationary. The parameters of the ARIMA model are determined based on the autocorrelation function and partial autocorrelation function graph of the processed sequence. After determining the parameters, methods such as maximum likelihood estimation are utilized to estimate the model parameters and perform model diagnostic tests to ensure the applicability and accuracy of the model. Finally, the effectiveness of the model is verified through prediction and backtesting, and the model is adjusted and optimized according to actual needs. Long-and short-time memory network (LSTM) is a specialized form of recurrent neural network (RNN) that is capable of acquiring long-term dependency information. The LSTM model performs well in time series analysis; therefore, research is considering introducing it into infectious disease prediction. The LSTM input gating unit consists of two network layers, and the updated results of the two network layers are shown in equation (11).
In equation (11),
In equation (12), Training process of the LSTM model.
In Figure 5, the study first calculates the error term transferred backward over time, and then adjusts the weight and bias in the network to minimize the loss function. This process is implemented through gradient descent algorithm to ensure that the model can learn complex patterns and long-term dependencies in the data. The calculation method for the error term is shown in equation (13).
In equation (13), The specific flow of the combined model.
In Figure 6, the study used the advantage matrix method to determine the weights of the parallel combination model. The weights of each model were dynamically adjusted according to the performance of each model in different situations, so that the model that performs better in a given condition contributes more to the final prediction result. Based on the above content, the study first used the logistics model and SEIRS model to analyze the trend of infectious disease incidence, and in response to data gaps in different regions, used ARIMA model and LSTM to predict the incidence of infectious diseases, further filling the data gap in differential equation models. As a result, a multi-level and multi angle infectious disease prediction model was constructed. During the spread of infectious diseases, factors such as population mobility and the implementation of public health policies can affect the transmission trend, which will lead to fluctuations in the prediction effect. The research enhanced the robustness of the prediction by introducing spatio-temporal features and government intervention data, optimizing the model parameters. Specifically, the research incorporates spatio-temporal features and government intervention data into the model and further optimizes the prediction accuracy of the model by dynamically adjusting parameters.
The research provides a scientific basis for public health decision-making based on the proposed model and formulates effective prevention and control strategies. For high-risk areas, strengthen isolation measures and optimize the allocation of medical resources. In medium-risk areas, promote vaccination and enhance public awareness of protection. In low-risk areas, maintain regular monitoring, ensure information transparency, and respond promptly to emergencies. Through hierarchical management and precise prevention and control, the spread of the epidemic has been effectively curbed. Meanwhile, the research suggests establishing a dynamic data sharing platform to update epidemic information in real time, facilitating collaborative efforts among various departments. In light of the different seasonal characteristics, prevention and control strategies should be deployed in advance to ensure the scientific and forward-looking nature of the prevention and control measures. Through multi-dimensional data analysis, accurately identify the hotspots of the epidemic, respond quickly, and minimize the impact of the epidemic on the social economy to the greatest extent.
Results
To test the effectiveness of the SEIRS differential equation model with latent period proposed in the study in describing the transmission patterns of infectious diseases, a series of comparative experiments and empirical analysis were designed.
Performance analysis of infectious disease prediction models
To compensate for the bias in the analysis results of the SEIRS differential equation model with latent period due to the lack of disease incidence and cure data in certain regions, an infectious disease prediction model combining ARIMA model and LSTM was proposed. To identify the effectiveness of the raised model of the study, its training was analyzed and a single ARIMA model and LSTM model were compared. The comparison results are shown in Figure 7. Three predictive models of infectious diseases.
In Figure 7(a), the hybrid model combining ARIMA and LSTM had significantly better prediction accuracy than using ARIMA or LSTM models alone. Its convergence speed during training was faster and its convergence accuracy was higher compared to the other two models. At the 23rd iteration, the training accuracy reached 95.23%. In Figure 7(b), the convergence speed of the training loss of the hybrid model was still faster, and it tended to stabilize after only 22 training iterations.
To further test the predictive performance of the proposed infectious disease prediction model (Model 1), the study compared it with the infectious disease prediction models (Model 2) in reference 18 and (Model 3) in reference 19. The prediction accuracies of the three models with increasing data size and the changes in root mean square error (RMSE) values were compared. The specific results are shown in Figure 8. Comparison of prediction effect of infectious disease data of three models.
In Figure 8(a), with the increase of sample size, the prediction accuracy of the three models gradually decreased. Among them, Model 2 had the largest decrease and the lowest average prediction accuracy, only 82.45%. Model 1 had the smallest decrease, only 4.11%, and its average prediction accuracy was also the highest, at 92.72%. In Figure 8(b), the RMSE values of the three models showed a gradually increasing trend, with the RMSE curve of Model 1 consistently below the other two models.
Stability analysis of equilibrium points for SEIRS infectious disease differential equations with latent periods
To explore the stability of two disease equilibrium points, the study analyzed them through simulation. In the experiment, different parameters such as infection rate and rehabilitation rate were set, and the amount of people in the simulated space was adjusted to obtain different initial values. In initial value 1, the total amount of people in the space was 500, with 0 recovered individuals, 485 susceptible individuals, 2 asymptomatic infected individuals, 11 latent patients, and 2 infected individuals. In initial value 2, the total amount of people in the space was 500, with 0 recovered individuals, 369 susceptible individuals, 38 asymptomatic infected individuals, 80 latent patients, and 13 infected individuals. In initial value 3, the amount of recovered patients was 0, the total amount of people in the space was 500, the amount of susceptible individuals was 210, the amount of asymptomatic infected individuals was 83, the amount of latent patients was 168, and the amount of infected individuals was 39. The initial values and parameter values set for the study were: infection rate of 0.02%, probability of latent period patients becoming infected of 30.00%, recovery rate of 12.50%, and immune loss rate of 8.00%. The recovery rate of asymptomatic infected individuals was 5.00%, while the rate of overt infected individuals was 80.00%. At this point, the basic regeneration number was less than 1. The changes in the number of people in five states under three initial values are shown in Figure 9. The change in the number of people in five states under three initial values.
In Figure 9(a)–(e), under different initial values, the amount of susceptible individuals eventually tended toward the total amount of individuals in the space, while the amount of individuals in the other four states tended to develop toward zero. This indicates that under the condition of a fixed BRN, the amount of people in various states during the early stages of the disease does not affect the final direction of the disease. This further proves that when the BRN is less than 1, the disease will eventually tend to disappear.
The study reset the values of each parameter to explore disease progression in the presence of a basic regeneration number greater than one. The study set the infection rate at 0.3%, recovery rate at 34.5%, and immune loss rate at 1.0%. The probability of latent period patients becoming infected was 30.0%, the recovery rate of asymptomatic infected individuals was 5.00%, and the rate of overt infected individuals was 80.00%. At this point, the basic regeneration number was greater than 1. Under the initial condition of 1, the changes in the number of people in various states are shown in Figure 10. Changes in population numbers of different states when the basic regeneration number is greater than 1.
In Figure 10, in scenarios where the BRN exceeded unity, the population of the five states exhibited rapid fluctuations. The number of latent and recovered patients increased rapidly in the initial stage, reflecting the level of active disease transmission. As time went by, the amount of susceptible and infected individuals began to decrease. However, due to the fact that the BRN was greater than 1, the disease has not completely disappeared, but has entered a dynamic equilibrium state, in which the amount of latent patients, susceptible individuals, asymptomatic and symptomatic infected individuals fluctuates within a certain range. By comparing Figures 9 and 10, different BRNs have an impact on disease transmission and changes in population status.
Empirical analysis of SEIRS differential equation model with latency period
To test the findings of the differential equation model proposed in the study through empirical analysis, it was utilized to the analysis of COVID-19 infectious disease data in a city in northern China. Conventional SEIRS were compared with introducing SEIRS with latency period, hierarchical stage models, and staged models incorporating deep learning to compare their fit to the actual data. To enrich the data sources, the study supplemented COVID-19 epidemic data from other countries and cities, covering cases under different climates, population densities and prevention and control measures. Data from places such as New York in the United States, London in the United Kingdom, and Mumbai in India are included to further verify the universality and accuracy of the model. The results are denoted in Figure 11. Comparison of fitting of several SEIRS differential equation models.
In Figure 11(a), the fitting degree between the conventional SEIRS model and the actual data was only 0.85. In Figure 11(b), after introducing the latency theory, the fitting degree of the SEIRS model reached 0.89. In Figure 11(c), after introducing the Logistics model into the SEIRS model with latent period for staged analysis, the model fitting degree reached 0.91. In Figure 11(d), the SEIRS model that combines deep learning and staged concepts had the highest fitting degree, reaching 0.95.
In Table 1, the RMSE of Model A was 0.03, the MAPE was 3.5%, and the fitting degree was 0.95. Compared to other models, its accuracy in predicting the transmission trend of infectious diseases was significantly higher. The results indicate that Model A has high accuracy and reliability in analyzing the trends of infectious disease transmission.
Discussion and conclusion
The existing differential equation models for infectious diseases have insufficient consideration of virus latency and data gaps in analysis. In response to this, a differential equation model for infectious diseases with latent periods was proposed, and the ARMIA model and LSTM model were combined to predict infectious disease data in certain areas with incomplete data. The prediction results were further analyzed to determine the transmission trend of infectious diseases. The experiment outcomes denoted that the prediction accuracy of the proposed prediction model reached 92.72%, and its training iterations reached convergence accuracy and convergence loss values after 23 and 22 iterations. When analyzing the effectiveness of the BRN value in infectious disease transmission analysis, the study simulated two situations: when the BRN is greater than 1 and when it is less than 1. The results confirmed that if the BRN was greater than 1, the infectious disease would not eventually reach a DFE point. However, when the BRN was less than 0, the amount of susceptible individuals would gradually approach the total amount of people in the space, and the disease will disappear. In the process of empirical analysis, the data obtained from the differential equation model of infectious diseases proposed in the study had a fitting degree of 0.95 with actual data, and the RMSE was only 0.03, with an MAPE of 3.5%. Compared to other differential equation models, it had higher accuracy and reliability. The findings of the research are of significant importance in the context of the prevention and control of infectious diseases. The study only explored regional differences in infectious diseases. In the research, factors such as economic, cultural, and medical resources in different regions will be further considered, with the aim of constructing a more comprehensive and detailed infectious disease prediction model to provide more accurate decision support for global public health policies.
In line with the Sustainable Development Goals, this research contributes to achieving good health and well-being. By accurately predicting the spread trends of infectious diseases, it reduces the impact of disease outbreaks on the public health system, enhances the health level of the entire population, promotes social stability and economic development, and is in line with the core concept of ensuring health and well-being in the Sustainable Development Goals. In addition, the application of the model can also help reduce inequality, ensure that different regions can obtain timely and effective epidemic prevention measures, narrow the health gap, and promote global health equity. Meanwhile, the research also responds to the call for quality education. Through the popularization and application of scientific models, it enhances the public’s awareness of infectious disease prevention and control, strengthens the teaching ability of the education system in the field of public health, and lays the foundation for cultivating future talents with scientific literacy.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
