Evaluating ARIMA and LSTM Models for Forecasting COVID-19 Case Trends: A Comparative Study

Abstract

The COVID-19 pandemic has presented unprecedented challenges to global healthcare systems, underscoring the critical need for accurate prediction of infection cases to facilitate effective resource allocation and decision-making. This study evaluates the performance of two widely used time-series forecasting models, ARIMA and LSTM in predicting COVID-19 infection trends. Using a dataset of daily infection cases spanning January 2020 to June 2020, both models were trained and evaluated. The results demonstrate that the LSTM model achieves superior performance compared to the ARIMA model, as evidenced by lower Mean Absolute Error (MAE) and Mean Squared Error (MSE). The LSTM models ability to capture complex patterns and non-linear relationships in the data contributes significantly to its enhanced predictive accuracy. These findings highlight the potential of LSTM models to deliver more reliable forecasts of COVID-19 infection cases, providing healthcare authorities with valuable insights to inform strategic planning and preparedness for future outbreaks.

Keywords

COVID-19 forecasting ARIMA LSTM time series forecasting MAE and MSE

1. Introduction

The COVID-19 pandemic has been one of the most disruptive global health crises in modern history, with over 700 million confirmed cases and 6.9 million deaths reported by the World Health Organization (WHO) as of May 2024 (Benvenuto et al., 2020) World Health Organization (WHO) (2020). Governments and health organizations worldwide have struggled to predict infection waves, allocate medical resources, and implement timely interventions due to the virus’s unpredictable transmission dynamics. Traditional epidemiological models, such as compartmental SIR (Susceptible-Infectious-Recovered) models, rely on assumptions about disease parameters (e.g., reproduction number R0) that often fail to capture real-world complexities like new variants, vaccination rates, and human mobility patterns (Borges and Nascimento, 2022) Centers for Disease Control and Prevention (CDC) (2020). However, such models often falter when faced with real-world complexities, including abrupt policy changes, super spreader events, and evolving virus variants. In contrast, time-series forecasting techniques offer a data-driven alternative, utilizing historical case data to predict future trends without relying on explicit biological or social assumptions. This adaptability makes them particularly suited to dynamic and rapidly changing environments like the COVID-19 pandemic. The unpredictability of COVID-19 transmission dynamics driven by factors such as viral mutations, vaccination rates, and non-pharmaceutical interventions (NPIs) has complicated traditional epidemiological modeling (Borghi et al., 2021). Early-stage forecasts relied heavily on compartmental models (e.g., SIR, SEIR), which require assumptions about parameters like transmission rates and recovery periods (Box et al., 2015). However, these models struggle with real-world noise and abrupt shifts in trends (e.g., lock-downs, new variants). Time-series approaches, in contrast, leverage historical data to infer future trends without explicit mechanistic assumptions, making them adaptable to rapid policy changes. ARIMA, a classical linear model, has been extensively used in epidemiology due to its interpretability and robustness for stationary data (CDC, 2020). However, its performance degrades when handling non-linear patterns or long-term dependencies (Chaurasia and Pal, 2022). Conversely, LSTM, which is a type of recurrent neural network (RNN), excels at capturing complex, non-linear relationships in sequential data, as demonstrated in recent studies on influenza and dengue fever forecasting (Ge et al., 2022).

This research focuses on analyzing the temporal patterns of COVID-19 in India by evaluating and comparing the performance of two prominent time-series forecasting models: ARIMA and LSTM The primary aim is to assess their predictive accuracy for COVID-19 case trends to support effective public health interventions and resource allocation, especially at the state level. By examining model performance across diverse regions, this study contributes to a nuanced understanding of how the pandemic has impacted various Indian populations. Unlike prior works that often rely on either statistical models or deep learning approaches in isolation, this study provides a rigorous, side-by-side comparison of ARIMA and LSTM on identical datasets and evaluation metrics. This comparative analysis seeks to identify the strengths, limitations, and applicability of each model in handling the unique challenges posed by COVID-19 data, such as non-stationarity, reporting noise, and long-term dependencies. The findings aim to guide model selection for epidemic forecasting and broader time-series analysis in data-scarce or dynamic environments. In light of these challenges and research gaps, this study aims to the study provides the main contributions to evaluations of ARIMA and LSTM models for COVID-19 forecasting. First, it performs a systematic multi-horizon evaluation (14-, 30-, and 60-day projections) using similar datasets, pre-processing methods, and evaluation criteria, allowing for a fair assessment of error accumulation with time. Second, by focusing on India, a nation with significant demographic variability, sudden policy interventions, and repeated structural discontinuities, the research demonstrates how conventional and deep learning models react differently to extremely non-stationary epidemiological data. Third, the work expands on traditional error measurements with formal statistical significance testing and benchmark comparisons, providing greater empirical support for model selection in applied public health forecasting.

2. Literature Review

Hanif et al. (2023) proposes a simple econometric model, Auto Regressive Integrated Moving Average (ARIMA), to predict the spread of COVID-2019. It uses Johns Hopkins epidemiological data to predict the prevalence and incidence trends. The model is subject to potential bias, and real-time case definition and data collection are necessary for future comparisons and comparisons. Hochreiter and Schmidhuber (1997) assesses models like ARIMA, SVR, LSTM, and Bi-LSTM for time series prediction of confirmed cases, deaths, and recoveries in ten major countries affected. The Bi-LSTM model outperforms in terms of endorsed indices, with the lowest MAE and RMSE values for deaths in China. The best r2score value is 0.9997 for recovered cases in China. Bi-LSTMs robustness and enhanced prediction accuracy make it suitable for pandemic prediction, improving planning and management. Kumar et al. (2020) compares five deep learning methods for forecasting COVID-19 cases, including Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Units (GRUs), and Variational Auto-encoder (VAE). The results show promising potential for the deep learning model and superior performance of VAE compared to other algorithms, based on daily confirmed and recovered cases from six countries. Mahmud et al. (2025) applied a two-stage method using LSTM and Prophet models on data from the municipality of São José dos Campos. Although the models were not verified using benchmark data sets, they presented a Prophet-LSTM ensemble approach that performed better than benchmarks. MAE=0.99 was the stated performance metric. Mostafiz et al. (2022) analyzed 6406 photos from many sources to diagnose COVID-19 and chest infections using a region-based convolutional neural network (R-CNN) and generative adversarial network (GAN) for data creation. With the use of GAN-generated data, R-CNN performed better than conventional techniques. Neither benchmark data sets nor hybrid models were investigated in this study. $AUC = 98.36$ and accuracy $=$ 99.16 were among the reported metrics. Using an enrolment data set, Ospina et al. (2023) investigated time series forecasting of student enrollment using the Fuzzy Interval Partitioning Method and Moving Average (MA). Although the scientists pointed out that further research is needed to determine how interval length affects forecast accuracy, the fuzzy interval method performed better than the other examined approaches. $RMSE = 213.75$ , $MSE = 45689.062$ , and mean forecast error $=$ 1.22 were among the reported values. (Barman, 2020) compared ARIMA and multiple LSTM models for short-term forecasting of country-wise COVID-19 cases. Using novel k-period error metrics, the study found that LSTM slightly outperformed ARIMA, though both models showed comparable accuracy for multi-day forecasts. Mahmud et al. (2025) evaluated ARIMA, LSTM, and hybrid ARIMA–LSTM models using COVID-19 data from Malaysia. The results demonstrated that the hybrid model achieved the lowest prediction errors, confirming the effectiveness of combining statistical and deep learning approaches. Rguibi et al. (2022) applied ARIMA and LSTM models to forecast COVID-19 trends in Morocco and analyzed the effective reproduction number. The findings indicated continued growth in cases, though transmission levels remained manageable, suggesting the impact of government control measures. Table 1 represents an analysis of existing scholarly research and literature on time series analysis.

Table 1.
Summary of Forecasting Studies Including Datasets, Methods, Findings, and Limitations.

Dataset Methods Findings Limitations/Future Work

Shahid et al. (2020) COVID-19 confirmed and death cases of India and the US from various government health agencies Both countries projected to see an increase in confirmed and death cases over the next month Forecast future incidents; aerosol transfer assessment

Shastri et al. (2020) COVID-19 Data Repository of Johns Hopkins University Hybrid ARIMA–NAR model showed best performance and assessed national readiness Accuracy declines for long-term forecasts; requires larger datasets

Sherstinsky (2020) Verified and fatal cases from ourworldindata.org (Feb 22–Apr 23, 2020) ARIMA retraining was most effective across countries; LSTM failed to follow actual trend Short data period may not reflect later behavioral changes

Swaraj et al. (2021) WHO dashboard data (Jan–May 2020): confirmed, fatal, and recovered cases Naive method performed best for death prediction; ARIMA projected an increase in deaths Time-series methods less effective than machine learning approaches

Wathor et al. (2023) WHO data till May 11, 2020 Best results achieved with two hidden layers (9 and 4 neurons); sigmoid tangent and linear activations –

WHO (2020) Daily COVID-19 cases in Sweden, India, and the US Multivariate LSTM outperformed univariate models; effective forecasts up to six days Recommended pretrained networks, data augmentation, GANs, and transfer learning

Dataset	Methods	Findings	Limitations/Future Work
Shahid et al. (2020)	COVID-19 confirmed and death cases of India and the US from various government health agencies	Both countries projected to see an increase in confirmed and death cases over the next month	Forecast future incidents; aerosol transfer assessment
Shastri et al. (2020)	COVID-19 Data Repository of Johns Hopkins University	Hybrid ARIMA–NAR model showed best performance and assessed national readiness	Accuracy declines for long-term forecasts; requires larger datasets
Sherstinsky (2020)	Verified and fatal cases from ourworldindata.org (Feb 22–Apr 23, 2020)	ARIMA retraining was most effective across countries; LSTM failed to follow actual trend	Short data period may not reflect later behavioral changes
Swaraj et al. (2021)	WHO dashboard data (Jan–May 2020): confirmed, fatal, and recovered cases	Naive method performed best for death prediction; ARIMA projected an increase in deaths	Time-series methods less effective than machine learning approaches
Wathor et al. (2023)	WHO data till May 11, 2020	Best results achieved with two hidden layers (9 and 4 neurons); sigmoid tangent and linear activations	–
WHO (2020)	Daily COVID-19 cases in Sweden, India, and the US	Multivariate LSTM outperformed univariate models; effective forecasts up to six days	Recommended pretrained networks, data augmentation, GANs, and transfer learning

Despite extensive research on COVID-19 forecasting using statistical and deep learning models, notable gaps persist in existing literature. Many studies apply either ARIMA or deep learning approaches independently, while hybrid or comparative analyses often lack methodological consistency, such as uniform preprocessing, identical datasets, and standardized evaluation metrics. Several works focus primarily on short-term forecasting, offering limited insight into model performance over extended horizons where prediction uncertainty increases. Moreover, prior studies frequently emphasize accuracy metrics without examining how model effectiveness degrades over time or how well models capture non-linear dynamics and abrupt changes inherent in pandemic data. Additionally, comparisons are often conducted across different regions, datasets, and time spans, making direct performance evaluation challenging. Few studies provide a systematic, multi-horizon comparison of classical statistical models and deep learning techniques under controlled experimental conditions. These limitations highlight the need for a comprehensive and consistent comparative framework to evaluate ARIMA and LSTM models across short-, medium-, and long-term forecasting horizons using robust error metrics.

3. Background

3.1. ARIMA Model

One of the most popular methods for analyzing time series is the Auto Regressive Integrated Moving Average (ARIMA) technique. The evolving variable is regressed on its own prior values, according to the AR portion of the ARIMA model. The variance of a stationary time series is represented by $σ^{2}$ , and the average of the error term is zero. A time series can be expressed as a $p$ -order auto regressive process (AR( $p$ )) using equation (1), where $Y_{t}$ represents the value of the time series at time $t$ :

Y_{t} = δ + ϕ_{1} Y_{t - 1} + ϕ_{2} Y_{t - 2} + \dots + ϕ_{p} Y_{t - p} + ε_{t}

(1)

Here, $δ$ is a constant term and $ε_{t}$ is the error term. A time series can also be modeled as a moving average process of order $q$ , denoted MA( $q$ ), as shown in equation (2):

Y_{t} = μ + ε_{t} + θ_{1} ε_{t - 1} + θ_{2} ε_{t - 2} + \dots + θ_{q} ε_{t - q}

(2)

By combining the AR( $p$ ) and MA( $q$ ) processes, the ARMA( $p, q$ ) model can be defined:

Y_{t} = δ + ϕ_{1} Y_{t - 1} + \dots + ϕ_{p} Y_{t - p} + θ_{1} ε_{t - 1} + \dots + θ_{q} ε_{t - q}

(3)

If the time series is non-stationary, differencing can be applied $d$ times to achieve stationarity. The first difference is expressed as:

Δ Y_{t} = Y_{t} - Y_{t - 1} = Y_{t} - L Y_{t} = Y_{t}^{'}

(4)

The general ARIMA( $p, d, q$ ) model is then given by equation (5):

(1 - ϕ_{1} L - ϕ_{2} L^{2} - \dots - ϕ_{p} L^{p}) Δ^{d} Y_{t} = δ + ε_{t} + θ_{1} ε_{t - 1} + \dots + θ_{q} ε_{t - q}

(5)

The value of $q$ (MA order) can be estimated using the autocorrelation function (ACF), while the value of $p$ (AR order) can be identified using the partial autocorrelation function (PACF). The Akaike Information Criterion (AIC) is commonly used to evaluate model performance, and is computed as:

AIC = - 2 \log (L) + 2 (p + q + k)

(6)

where

L

is the likelihood function and

k

is the number of additional model parameters. Here,

L

stands for the data’s probability,

p

for the auto-regressive component,

q

for the moving average component, and

k

for the ARIMA model’s intercept. The model with the lowest AIC criterion is regarded as more successful than the others based on this parameter

3.2. LSTM Model

A machine learning approach with a recurrent neural network architecture is called long-short term memory (LSTM). As a model, it retains the knowledge acquired over a brief period of time and applies it to training over an extended period of time. Thus, “memory blocks” are units found in the hidden layer of long short-term memory. Traditionally, recurring neural networks use these memory blocks as hidden units. The memory blocks include one or more memory cells. Input and output ports are present in every memory block to regulate the information flow. The output doors regulate the flow of output activation information in the memory cell, whereas the input gate regulates the flow of input activation information. The memory blocks were later modified to include a “forget gate.” Prior to the input activation through the cell’s repeated link, the forgetting gate resets the cell’s memory and scales its internal state ?. Examining the model’s steps is necessary to gain a better understanding of the LSTM model. The network to be built must first reset the output from the preceding model at time $t$ if the LSTM model’s input is named Let $x_{t}$ denote the input at time $t$ and $h_{t}$ the model output. The LSTM forget gate is computed as:

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}])

(7)

The information to be stored in the cell is determined in two steps. First, the input gate decides which values to update, and then a candidate vector ${\tilde{C}}_{t}$ is created using the hyperbolic tangent function:

\begin{aligned} i_{t} & = σ (W_{i} [h_{t - 1}, x_{t}]) \end{aligned}

(8)

\begin{aligned} {\tilde{C}}_{t} & = \tanh (W_{c} [h_{t - 1}, x_{t}]) \end{aligned}

(9)

The new cell state is then computed as:

C_{t} = f_{t} C_{t - 1} + i_{t} {\tilde{C}}_{t}

(10)

Finally, the output gate and the new hidden state are obtained from:

\begin{aligned} o_{t} & = σ (W_{o} [h_{t - 1}, x_{t}]) \end{aligned}

(11)

\begin{aligned} h_{t} & = o_{t} \tanh (C_{t}) \end{aligned}

(12)

Here, $σ (\cdot)$ denotes the sigmoid activation function, and $\tanh (\cdot)$ scales values into the range $[- 1, 1]$ . $W_{f}$ , $W_{i}$ , $W_{c}$ , and $W_{o}$ represent the corresponding weight matrices, $f_{t}$ is the forget gate, $i_{t}$ the input gate, ${\tilde{C}}_{t}$ the candidate vector, $o_{t}$ the output gate, and $C_{t}$ the cell state. The model’s output $h_{t}$ is filtered based on the state of the LSTM cells.

4. Methodology

This section provides an in-depth overview of the dataset utilized for the analysis, outlining its source, structure, and any relevant characteristics

4.1. Dataset and Data Pre-processing

4.1.1. Dataset

The dataset used in this study comprises daily confirmed COVID-19 case data in India, sourced from the COVID-19 India API. The time spans from 1st March 2020 to 31st December 2021, covering a total of 670 days, and includes two key features: daily confirmed cases and cumulative confirmed cases. Daily confirmed cases reflect the number of new COVID-19 cases reported each day, which are crucial for capturing temporal fluctuations, trend shifts, and epidemic wave patterns Cumulative confirmed cases represent the total number of reported infections up to each date, useful for validating overall trends and assessing long-term progression. A COVID-19 API is a web-based tool that gives users extensive access to information on the COVID-19 pandemic. This service is an essential resource for developers, academics, healthcare professionals, and the general public who want to comprehend the pandemic’s changing environment. A COVID-19 API can provide a number of data needed to monitor the virus’s progress and effect. Key data elements often include total cases, vaccination rates, state and country-specific breakdowns, historical trends, and daily updates.

The dataset spans from 1 March 2020 to 31 December 2021, comprising 670 daily observations. During this period, daily confirmed cases increased from 1 to 16,764, while cumulative cases rose to approximately 34.8 million.

4.1.2. Data Pre-processing

Before model deployment, a sequence of pre-processing operations was performed to guarantee the reliability, consistency, and appropriateness of the dataset for time-series forecasting.

Missing Values Handling: The completeness check in the data set identified five missing daily observations, representing approximately 0.7%of the total records. To ensure temporal continuity and prevent distortions in the learning process, these missing values were filled by linear interpolation. This was adopted due to its simplicity and efficiency in dealing with small gaps in sequential data.

Stationarity Assessment: Stationarity is one of the key assumptions in ARIMA modeling. To determine this, the Augmented Dickey-Fuller (ADF) test was run on the raw time series of each day’s confirmed cases. The $p$ -value for this was 0.82, which indicates non-stationarity. This was addressed by applying first-order differencing, which gave a new $p$ -value of 0.001, thereby establishing stationarity in the transformed series. This conversion guarantees that the statistical characteristics of the series, i.e., mean and variance, are unchanged over time a requirement for the ARIMA model.

Train-Test Splitting: To assess model performance on previously unseen data, the dataset was split into a training subset and a testing subset according to an 80:20 temporal ratio. The period for training comprised from 1st March 2020 to 30th September 2021 (536 days) and that of testing comprised from 1st October 2021 to 31st December 2021 (134 days). The temporal split ensures that the data remains in chronological order, as this is imperative for time-series modeling, and permits rigorous evaluation of the accuracy of forecasts.

Normalization for LSTM Modeling: recurrent neural networks, and specifically LSTM models, are vulnerable to the input feature scale. To facilitate better model convergence and predictive performance, the values of daily confirmed cases were normalized by applying Min-Max scaling to map all the input features into the range [0, 1]. The normalization was applied solely to the LSTM model utilizing the MinMaxScaler tool in the scikit-learn library. The scaling parameters were computed using only the training data to prevent information leakage into the test set.

4.2. Model Building

4.2.1. ARIMA Model

The first step of the ARIMA model involved choosing parameters of the model. The partial autocorrelation function (PACF) plot revealed a cutoff at lag 2, indicating the requirement for two auto-regressive (AR) terms, resulting in a choice of $p = 2$ for the AR term. The autocorrelation function (ACF) plot displayed tailing behavior, indicating the need for a moving average (MA) component. Accordingly, $q = 1$ was used for the MA term. The Augmented Dickey-Fuller (ADF) test reflected that the series was non-stationary and needed differencing, which led to $d = 1$ to achieve stationarity. The ARIMA model was therefore specified as ARIMA (2,1,1), representing two auto-regressive terms, one differencing, and one moving average term. The ARIMA model was fitted by training the model with the training data using the maximum likelihood estimation (MLE) approach through the stats models ARIMA class. The model was made more efficient, and the Akaike Information Criterion (AIC) was computed to assess its goodness of fit. The ARIMA (2,1,1) model gave an AICof 420.3, which was less than the AIC value of 425.6 for the ARIMA (1,1,1) model. This indicated that the ARIMA (2,1,1) setup was better suited for the dataset. The model was finally used to generate 14-day-ahead forecasts on the test dataset to obtain short-term forecasts for the time series.

4.2.2. LSTM

The architecture of the LSTM model was tuned to process time series data with a look back period of 14 days. The input shape of the model was set to (look back $=$ 14, features $=$ 1), where the single feature maps to the new cases variable for every time step. The two-layered structure of the LSTM model consisted of the first layer with 64 units and was specified to return sequences, allowing it to efficiently handle sequence data as input to the next LSTM layer. A dropout layer with a rate of 0.2 was added after the first LSTM layer to reduce overfitting by randomly nullifying 20 percent of the input units during training. The next LSTM layer with 32 units further reduced the complexity of the model, followed by a dense layer with a single unit to produce the final prediction for the next time step. The LSTM model was trained for 100 epochs to properly extract the temporal dependencies from the data using a batch size of 32 for maximum memory efficiency and rate of convergence. The Adam optimizer, which is known to be adaptive to the learning rate, was used with a learning rate of 0.001. The training procedure involved the optimization of the model using the mean squared error (MSE) loss function, which is commonly used in regression problems such as time series forecasting. The performance of the model was carefully monitored while training to avoid overfitting and to facilitate the learning of important patterns from the training set. After training was complete, the model was used to generate projections for the next time periods using data from the previous 14 days.

4.3. Performance Evaluation

The performance models are evaluated using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Square Error (MSE), and $R$ -squared as the assessment metrics. They are defined as follows:

Mean Absolute Error (MAE): MAE measures the average magnitude of the errors between the predicted and actual values without considering their direction. It is the average over the test sample of the absolute differences between prediction and actual observation. It is defined mathematically as:

MAE = \frac{1}{N} \sum_{i = 1}^{N} | {\hat{y}}_{i} - y_{i} |

(13)

Root Mean Squared Error (RMSE): RMSE, or Root Mean Square Error, is the mathematical operation of taking the square root of the Mean Square Error (MSE). It quantifies the magnitude of mistakes using the identical units as the initial data.

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}}

(14)

(

R

-squared) performance evaluation shows how much of the variance in the dependent variable can be predicted from the independent variables. In mathematics, it is defined as:

R^{2} = 1 - \frac{{SS}_{res}}{{SS}_{tot}}

(15)

5. Results and Discussion

5.1. I & Results

On the respective actual vs. predicted cases graph presented in Figure 1(a) to (c), the LSTM prediction.(red dashed line) and the actual (black line) track very well across the 60-day timeline, accurately identifying small fluctuations and the large peak on day 48. However, the ARIMA prediction shows wider oscillations and larger deviations, mostly under- and over-estimating highs and lows, to produce quite worse tracking of actual case behavior.

Figure 1.

Actual and predicted values using LSTM and ARIMA models across different forecasting horizons. (a) 14-day forecast: Actual vs. LSTM vs. ARIMA, (b) 30-day forecast: Actual vs. predicted values and (c) 60-day forecast: Actual vs. predicted values.

Forecasting Accuracy Comparison. The comparison of LSTM and ARIMA performance across the 14-day, 30-day, and 60-day forecasting horizons reveals clear differences in their predictive capabilities, as summarized in Table 2. The LSTM model consistently achieves lower RMSE and MAE values, indicating a stronger ability to capture complex and non-linear relationships in the data. Although ARIMA performs reasonably well for short-term forecasts, its effectiveness declines as the forecasting horizon increases. This degradation is particularly evident in the 60-day forecast, where the $R^{2}$ value drops substantially, indicating a poorer model fit and reduced explanatory power compared to LSTM.

Table 2.

Forecasting Accuracy Comparison of LSTM and ARIMA Models.

Forecast Period	Model	RMSE	MAE	$R^{2}$
14-Day	LSTM	4294.77	3629.00	0.912
14-Day	ARIMA	4844.40	3811.41	0.836
30-Day	LSTM	33925.87	3419.84	0.848
30-Day	ARIMA	4776.67	3953.20	0.776
60-Day	LSTM	3165.03	2571.70	0.818
60-Day	ARIMA	5038.37	4168.41	0.638

In contrast, the LSTM model demonstrates a strong ability to learn long-term dependencies and non-linear relationships in the data. This advantage is particularly evident in the 30-day and 60-day forecasting horizons, where LSTM consistently outperforms ARIMA. By learning complex temporal patterns, LSTM maintains higher $R^{2}$ values, resulting in predictions that more accurately track actual case trends. These results confirm that LSTM-based models surpass conventional statistical approaches in many time-series forecasting applications.

Table 3 provides a comparative assessment of several forecasting models across distinct prediction horizons. Our research demonstrates that the LSTM model regularly surpasses the conventional ARIMA model in 14-, 30-, and 60-day predictions, attaining reduced RMSE and MAE values with elevated $R^{2}$ scores, signifying enhanced predictive accuracy and stability. In contrast to our findings, current research using CNN and hybrid CNN–RNN models indicates significantly reduced error rates, especially in short-term forecasting; yet, these enhancements are often dependent on the forecasting horizon or dataset used. Conversely, the LSTM-based methodology exhibits strong and reliable performance over all forecasting intervals. Conventional regression models provide intermediate efficacy, however probabilistic methods like Naive Bayes demonstrate worse performance with much more mistakes. The results of this study align with previous literature, confirming the superiority of deep learning models compared to traditional statistical methods for time series forecasting.

Table 3.

Comparative Forecasting Accuracy of Different Models.

Forecast	Model	RMSE	MAE	$R^{2}$
14-Day	LSTM	4294.77	3628.99	0.912
14-Day	ARIMA	4844.39	3811.40	0.836
14-Day	CNN	200.62	97.58	0.950
30-Day	LSTM	3925.86	3419.84	0.848
30-Day	ARIMA	4776.66	3953.19	0.776
30-Day	CNN	4.41	2.75	0.900
60-Day	LSTM	3165.02	2571.69	0.818
60-Day	ARIMA	5038.36	4168.40	0.638
60-Day	CNN	426.29	213.26	0.670
–	MLR	5.10	4.49	1.00
–	LR	173.34	114.90	0.88
10-Day	Naive Bayes	68,029.64	39,781.21	0.710
14-Day	CNN–RNN	2104.10	931.92	0.968

5.2. Discussion

The analysis of the forecasting performance of LSTM and ARIMA across varying time windows (14-day, 30-day, and 60-day) provides a useful benchmark for evaluating model effectiveness and their respective strengths and weaknesses. The results indicate that the LSTM model consistently outperforms the ARIMA model, particularly as the forecasting horizon increases. This outcome highlights LSTM’s ability to capture non-linear patterns and long-range dependencies in the data, making it more suitable for complex forecasting tasks. In contrast, the performance of the ARIMA model deteriorates over longer forecasting horizons, as evidenced by a pronounced decline in its $R^{2}$ values, indicating a reduced ability to explain variance in the observed data. The shortfall in performance arises because ARIMA is based on linear assumptions and data stationary sensitivities. Model performance trade-offs are of particular importance in selecting the most appropriate forecasting tool. Precision-interpret ability balance represents an important trade-off. LSTM, because of its deep architecture, offers greater predictive accuracy but at the cost of reduced interpret ability. The black box character of the system can complicate decision-making strategies, especially when transparency is necessary. ARIMA, though less accurate in long-term predictions, offers greater interpret ability because of its simpler statistical model. ARIMA is very beneficial in situations where understanding the interconnections between variables is critical, like policy-making or communication with non-technical stakeholders. Short- and long-term forecasting trade-offs become more obvious. LSTM is very effective at capturing short-term patterns and complex variability, making it more appropriate for situations where quick decisions are needed, like projecting demand for ICU beds or triggering warning systems for public health outbreaks. ARIMA, however, with its assumptions of linearity and stationarity, is better suited forlonger-term projections where there are observable, stable patterns, like planning vaccine distribution. The performance differences between ARIMA and LSTM are rooted in the intrinsic methodologies of the models. LSTM, a neural network approach, is designed to understand complex, non-linear relationships across time series. ARIMA, a time-series forecasting algorithm, has a strict, parametric approach, assuming that future values are a linear combination of past values. This conceptual divide explains why LSTMis better suited to detecting sudden shifts or non-linear patterns in data, while ARIMA falls short in handling such intricacies. Practical implication of these findings for public health agencies includes resource deployment strategies and the use of alert systems. In situations where instant accuracy is critical, for example, in forecasting ICU bed needs in a healthcare surge, LSTM would be used because it can tracks rapid changes and provide actionable insights. For longer projects like planning for vaccine distribution, where trends have more stability and linearity, ARIMA might be preferred because it is simple and easier to interpret. LSTM may also be applied in alert systems to issue advance warnings when estimated cases exceed set thresholds, for example, a 10.

6. Conclusion

The Results in this study present a clear leaning toward LSTM over the long forecasting term, though it is paramount to acknowledge limits existing in the two models. One primary limit is limitations of data. The success of both ARIMA and LSTM highly depends on data quality as well as quantities. LSTM requires ample data sets for effective capture of long-term dependence and subtle patterns. Insufficient data can lead to overfitting, thus compromising the effectiveness of the model. ARIMA’s reliance on stationary data makes it less suitable for datasets with non-stationary trends, where additional pre-processing steps are required to ensure proper predictions. Future research should explore the combination of hybrid models that tap into the strengths of both LSTM and ARIMA. Using ARIMA to forecast linear trends in data but allowing LSTM to detect non-linear patterns can be used to increase forecasting effectiveness. Further studies can also consider the use of LSTM in systems with limited data, using approaches like transfer learning or few-shot learning to offset data insufficiency. Another promising line of action is the use of ensemble techniques, where predictions of different models are combined to reduce overall error. Future studies might focus on improving the interpretability of deep learning models like LSTM. While LSTM yields high accuracy, its lack of transparency might hinder its application in industries requiring transparency, like healthcare decision-making. Researching methods to clarify the workings of LSTM models like attention mechanisms or explainable AI frameworks might help bridge the gap between accuracy and interpretability. In addition, further work on hybrid approaches combining the strengths of statistical and machine learning models can lead to more robust forecasting systems capable of handling a wider range of real-world conditions.

Footnotes

ORCID iD

Bandu Uppalaiah

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Barman

(2020). Time series analysis and forecasting of COVID-19 cases using LSTM and ARIMA models. arXiv preprint, arXiv:2006.13852

Benvenuto

Giovanetti

Vassallo

Angeletti

Ciccozzi

(2020). Application of the ARIMA model on the COVID-2019 epidemic dataset. Data in Brief, 29, 105340. https://doi.org/10.1016/j.dib.2020.105340

Borges

Nascimento

M. C. V.

(2022). COVID-19 ICU demand forecasting: A two-stage Prophet-LSTM approach. Applied Soft Computing, 125, 109181. https://doi.org/10.1016/j.asoc.2022.109181

Borghi

P. H.

Zakordonets

Teixeira

J. P.

(2021). A COVID-19 time series forecasting model based on MLP ANN. Procedia Computer Science, 181, 940–947. https://doi.org/10.1016/j.procs.2021.01.250

Box

G. E. P.

Jenkins

G. M.

Reinsel

G. C.

Ljung

G. M.

(2015). Time series analysis: Forecasting and control. Wiley.

Centers for Disease Control and Prevention (CDC). 2020). Forecasting COVID-19 trends. https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting.html

Chaurasia

Pal

(2022). Application of machine learning time series analysis for prediction COVID-19 pandemic. Research on Biomedical Engineering, 38(1), 35–47. https://doi.org/10.1007/s42600-020-00105-4

Zhang

W.-B.

Ruktanonchai

C. W.

Liu

Wang

Song

Liu

Yan

Yang

Cleary

Qader

Atuhaire

Ruktanonchai

N. W.

Tatem

Lai

(2022). Untangling the changing impact of non-pharmaceutical interventions and vaccination on European COVID-19 trajectories. Nature Communications, 13(1), 3106. https://doi.org/10.1038/s41467-022-30897-1

Hanif

Mustafa

Iqbal

Piracha

(2023). A study of time series forecasting enrollments using fuzzy interval partitioning method. Journal of Computer and Cognitive Engineering, 2(2), 143–149. https://doi.org/10.47852/bonviewJCCE2202159

10.

Hochreiter

Schmidhuber

(1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

11.

Kumar

Gupta

Kumar

Sachdeva

(2020). Spreading of COVID-19 in India, Italy, Japan, Spain, UK, US: A prediction using ARIMA and LSTM model. Digital Government: Research and Practice, 1(4), 1–9.

12.

Mahmud

Syed Hatim Noor

S. H. N.

Musa

K. I.

Mohamad Hamzah

Mat Yudin

Kamaruddin

Madawana

A. M.

Awang Nawi

M. A.

(2025). Hybrid ARIMA-LSTM for COVID-19 forecasting: A comparative AI modeling study. PeerJ Computer Science, 11, e3195. https://doi.org/10.7717/peerj-cs.3195

13.

Mostafiz

Uddin

M. S.

Uddin

K. M. M.

Rahman

M. M.

(2022). COVID-19 along with other chest infection diagnoses using Faster R-CNN and GAN. ACM Transactions on Spatial Algorithms and Systems, 8(3), 1–21. https://doi.org/10.1145/3520125

14.

Ospina

Gondim

J. A. M.

Leiva

Castro

(2023). An overview of forecast analysis with ARIMA models during the COVID-19 pandemic: Methodology and case study in Brazil. Mathematics, 11(14), 1–18. https://doi.org/10.3390/math11143069

15.

Rguibi

M. A.

Moussa

Madani

Aaroud

Zine-Dine

(2022). Forecasting COVID-19 transmission with ARIMA and LSTM techniques in Morocco. SN Computer Science, 3(2), 133. https://doi.org/10.1007/s42979-022-01019-x

16.

Shahid

Zameer

Muneeb

(2020). Predictions for COVID-19 with deep learning models of LSTM, GRU, and Bi-LSTM. Chaos, Solitons & Fractals, 140, 110212. https://doi.org/10.1016/j.chaos.2020.110212

17.

Shastri

Singh

Kumar

Kour

Mansotra

(2020). Time series forecasting of COVID-19 using deep learning models: India-USA comparative case study. Chaos, Solitons & Fractals, 140, 110227. https://doi.org/10.1016/j.chaos.2020.110227

18.

Sherstinsky

(2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. https://doi.org/10.1016/j.physd.2019.132306

19.

Swaraj

Verma

Kaur

Singh

Kumar

Melo de Sales

(2021). Implementation of stacking-based ARIMA model for prediction of COVID-19 cases in India. Journal of Biomedical Informatics, 121, 103887. https://doi.org/10.1016/j.jbi.2021.103887

20.

Wathore

Rawlekar

Anjum

Gupta

Bherwani

Labhasetwar

Kumar

(2023). Improving performance of deep learning predictive models for COVID-19 by incorporating environmental parameters. Gondwana Research, 114, 69–77. https://doi.org/10.1016/j.gr.2022.03.014

21.

World Health Organization (WHO). (2020). Coronavirus disease (COVID-19) pandemic. https://www.who.int/emergencies/diseases/novel-coronavirus-2019