Abstract
Air pollution is an alarming problem in many cities and countries around the globe. The ability to forecast air pollutant levels plays a crucial role in implementing necessary prevention measures to curb its effects in advance. There are many statistical, machine learning, and deep learning models available to predict air pollutant values, but only a limited number of models take into account the spatio-temporal factors that influence pollution. In this study a novel Deep Learning model that is augmented with Spatio-Temporal Co-Occurrence Patterns (STEEP) is proposed. The deep learning model uses the Closed Spatio-Temporal Co-Occurrence Pattern mining (C-STCOP) algorithm to extract non-redundant/closed patterns and the Diffusion Convolution Recurrent Neural Network (DCRNN) for time series prediction. By constructing a graph based on the co-occurrence patterns obtained from C-STCOP, the proposed model effectively addresses the spatio-temporal association among monitoring stations. Furthermore, the sequence-to-sequence encoder-decoder architecture captures the temporal dependencies within the time series data. The STEEP model is evaluated using the Delhi air pollutants dataset and shows an average improvement of 8%–13% in RMSE, MAE and MAPE metric compared to the baseline models.
Introduction
The abrupt economic development, urbanization, and the increase in vehicle usage and energy consumption have led to a drastic rise in air pollution in India’s National capital, Delhi. Therefore, air pollution in Delhi is a colossal threat to the environment and human health. Air quality monitoring stations are available in and around Delhi to report the concentrations of pollutants on hourly basis. Air pollution is attributed to pollutants like PM2.5, PM10, O3, CO, SO2, NO2, and NH3. The particulate matter (PM2.5 and PM10) has become a severe issue in the Delhi megacity, requiring immediate mitigation efforts. Numerous epidemiological studies show that exposure to PM2.5 causes serious health issues [1, 2, 3]. Short-term or acute exposure (24-hour concentration) to particulate matter causes breathing difficulty, eye irritation, etc. Long-term or chronic exposure (annual concentration) causes respiratory issues, reduced immunity levels, cardiovascular risk, etc. The derivates of particulate matter also cause harmful environmental effects like poor visibility, climate change, economic drawbacks, health hazards, reduced lifespan, indoor pollution, etc. The World Health Organisation (WHO) report in 2006 states that long-term exposures to PM2.5 should be less than 10
Ambient air quality standard.
Ambient air quality standard.
National Air Quality Index (AQI) categorization.
Numerous studies have been conducted using the observations collected from monitoring stations to predict air quality. These studies are generally denoted as time series forecasting. Auto-regressive moving average [6], Kalman filtering [7], regression method [8], and artificial neural network (ANN) [9] are some of the most famous techniques for time series prediction. But, only a limited number of existing studies address the spatial association among monitoring stations when predicting air pollution levels, where the spatial association is primarily attributed to the proximity or distance between the monitoring stations. And no existing work addresses the spatio-temporal association among the monitoring stations. Spatio-temporal association refers to the relationship or correlation between spatial and temporal factors within a time series dataset. It focuses on identifying patterns, co-occurrences, or associations between different locations or monitoring stations and analysing the simultaneous occurrence or behaviour of pollutants across space and time. Hence, a Spatio-Temporal Co-Occurrence Pattern assisted Deep Learning Model (STEEP) is proposed in this research. The deep learning model used here is Diffusion Convolution Recurrent Neural Network which takes the closed spatio-temporal patterns as input. The Spatio-temporal association among the stations is obtained by translating the non-redundant/ closed Spatio-temporal co-occurrence pattern mined using the C-STCOP approach. These patterns are converted to a matrix or constructed as a graph to be fed as knowledge to time series prediction. The sequence-to-sequence encoder-decoder architecture in Recurrent Neural Network (RNN) is used to capture the temporal dependency among time series data. Temporal dependency in time series prediction refers to the inherent relationship and influence of past observations on future values within a sequential dataset. The time series air pollutant and meteorological data, along with the C-STCOP patterns, is given as input to the STEEP model. The diffusion convolution process captures the spatio-temporal association among stations from the graph. The STEEP model then forecasts the air pollutants value for the next few hours (1, 6, 12, 18, 24 hours) at a location. Delhi air quality data is used to forecast the PM2.5 data and evaluate the model. The experimental results show that the proposed method gives better results than the baseline approach.
The rest of the paper is organized as follows. Section 2 overviews the existing techniques available for time series prediction. Section 3 gives an insight into the data, study area, and data pre-processing steps. In Section 4, the Methodology of the proposed STEEP model is explained. The Experimental Setup, results, and discussion is presented in Sections 5 and 6, followed by the Conclusion in Section 7.
Numerous research works exist on real-time air quality forecasting (RT-AQF) for short and long-term prediction [10, 11]. Typically, short-term predictions (1–5 days) are employed to notify the citizens about potentially hazardous pollution levels so that they can take preventative countermeasures. Long-term (
Artificial Neural Networks (ANN) are widely used for air quality predictions as they can accommodate the nonlinear temporal dependence among data. In [18], a multi-layer perceptron (MLP) and artificial neural network (ANN) model was applied to estimate daily maximum and mean Ozone and particulate matter (PM2.5 and PM10). It demonstrated that MLP surpassed the conventional multiple linear regression (MLR). In [9], the authors recommended constructing independent ANN models for every monitoring centre to predict the 24-hour moving average maximum value of PM2.5. The research outcome demonstrated that the multi-layer neural network outperformed linear regression and the persistent baseline (i.e., assigning hourly values for the next day with the values from the current day). The approaches mentioned above demonstrate a comparatively superior performance of ANN for air quality prediction.
Hybrid methods for handling linear and nonlinear trends in air quality time series analysis have also been developed. In [19], the authors introduced a hybrid model that combines ARIMA and ANN to enhance the precision of forecasts for a region with inadequate meteorological and air quality inputs. The results indicate that the hybrid model outperforms the ARIMA and ANN models separately. Likewise, in [13], the authors suggested a hybrid method composed of an ARIMA and a nonlinear model. Unfortunately, these hybrid techniques do not include spatial association with other monitoring locations’ air quality data. Therefore, a hybrid method was developed to estimate air quality over the next 48 hours at every monitoring centre [16]. The hybrid model might account for temporal and spatial dependence in separate models (i.e., temporal predictor and spatial predictor) and integrate the two predictors using a regression tree.
Deep learning techniques offer considerable potential for time series prediction [20, 21, 22]. Deep recurrent neural network (RNN), which can model nonlinear temporal dependency, has produced impressive outcomes in sequence modelling as well as time series modelling [23], such as speech recognition [24] and traffic prediction [25]. In [26], a more sophisticated variant of RNN called the Diffusion Convolutional Recurrent Neural Network (DCRNN) model is introduced. This model combines diffusion convolution and RNN to effectively handle the spatial association among monitoring stations and the temporal dependency within time series. An extension of it is the Geo-context-based diffusion convolutional recurrent neural network (GC-DCRNN); this model finds the similarity of environment between the monitoring stations to construct a graph for addressing the spatial association among monitoring stations [27]. Later, a model incorporating the Graph convolution neural network and Long short-term memory (GC-LSTM) was proposed to perform time series prediction [28]. This model uses the geographic distance between the monitoring station to construct a graph for accommodating spatial association among the stations.
The Existing air quality prediction models do not consider spatial and temporal aspect of the data. But the spatio-temporal factor gives additional information about the monitoring location and improves the prediction accuracy. Therefore, a model that integrates the spatio-temporal characteristics along with the time series data fulfils this research gap. The proposed STEEP model uses the spatio-temporal co-occurrence patterns obtained from C-STCOP to construct a graph from which the spatio-temporal association among the monitoring stations can be extracted. This graph is fed as knowledge to the DCRNN model for performing time series forecasting. Also, the proposed model can forecast the PM2.5 value for the next 24 hours across all stations simultaneously.
Data
This section gives a description of the study area, along with the data collection and cleaning process. The resulting dataset will be a pre-processed dataset ready to be used by the STEEP model for forecasting PM2.5 values.
Study area
The study area is Delhi (India), the National Capital Region (NCR) of India (28.7041∘ N, 77.1025∘ E). The total land area of Delhi is 1,483 sq. km. Delhi has been experiencing the worst pollution level for the past few years. Though Delhi is an urban city with significant economic achievements, it continues to be one of the worst-polluted metropolitan cities in the world [1, 29, 30]. Despite the relocation of many industries, 30% of the capital’s population still suffers from respiratory diseases. The most influencing factor on air quality is wind speed and direction. That is the reason behind comparatively less air pollution in Chennai than in Delhi, even though Chennai has more industries. Delhi’s poor geography and meteorology also make things worse, as Delhi could be imagined as an inverted bowl that collects air pollution with only a narrow outlet to escape. Various air pollutants like CO, NH3, PM2.5, PM10, etc., lead to air pollution. Among the numerous air pollutants, PM2.5 is hazardous as exposure to this pollutant has led to the death of 54000 people in Delhi during 2020. Although Delhi experienced a temporary respite from air pollution due to the COVID-19 lockdown, air pollution seriously threatens public health and the economy [31]. In this work, short-term forecasting of PM2.5 pollutants using the proposed STEEP model is done to forecast pollutant values effectively. The Spatio-temporal dependency among the stations is considered to predict the PM2.5 concentration effectively.
There is a total of 36 monitoring stations in Delhi. The 27 air quality monitoring stations in and around NCR were considered for this analysis. Based on data availability, these stations were chosen. The distribution of monitoring stations is visualized in Figure 1.

Delhi monitoring stations.
The air pollutants, namely PM10, PM2.5, NO2, CO, Ozone, SO2, NH3, and the weather parameters, atmospheric temperature (AT), relative humidity (RH), wind direction (WD), and wind speed (WS) that are monitored on hourly and daily basis are available in the central pollution control board. The PM2.5, AT, RH, and WS data from January 2019 to December 2020 are taken into consideration for this study. Table 3 gives the statistics of the data used for the analysis. From Table 3, it can also be observed that the PM2.5, RH, and WS standard deviation for 2020 is less compared to 2019 due to covid lockdown. The nationwide lockdown temporarily impacted air pollution, but the effect wore out as the unlock phases began.
Statistics of the dataset used.
Statistics of the dataset used.
The raw data obtained from the Central Pollution control board (CPCB) [32] must be cleansed and pre-processed for analysis. The pre-processing steps required are given below.
Missing values: The missing values are filled using the linear interpolation method. Various interpolation methods are available, like nearest neighbour interpolation, regression-based interpolation, and cubic spline interpolation, to fill missing values. But the, linear interpolation was used because of its simplicity and success in the past [33, 34]. Remove duplicates: The duplicate entries are removed by setting the timestamp attribute as the primary index. There were no duplicate values observed in the dataset.
The cleansed hourly data for each pollutant will be given as the time series data input for STEEP. The pre-processed data will have the following attributes: timestamp, station name, and PM2.5 concentration value. A sample of the cleaned data for five stations is given in Table 4.
Cleansed PM2.5 data.
Cleansed PM2.5 data.
The daily time series data are modelled into events for finding C-STCOP patterns. The values are categorized as safe and unsafe based on the categories in Table 2. If the recorded PM2.5 falls in the severe category, it is marked as unsafe, and for all other categories, it is marked safe. Stations experiencing dangerous level on a day was modelled into an event.
The PM2.5 data after event modelling has the following attributes: timestamp, Event (station), Latitude, and Longitude. This data is used to find the Spatio-temporal co-occurrence pattern by the C-STCOP algorithm. Table 5 shows a sample of the pollution data modelled into events.
Event modelling.
Event modelling.
The meteorology data, namely Wind Speed (WS), Wind direction (WD), Relative humidity (RH), atmospheric Temperature (AT), solar radiation (SR), and barometric pressure (BP), are monitored in each station. The weather parameters AT, RH, and WS were considered for this study. Data preprocessing steps, as explained in Section 3.2.1, are applied for weather data. Therefore, the weather data, PM2.5 values, and Spatio-temporal patterns are input to the STCOP-DCRNN model.
Methodology
The goal is to do a short-term forecast of PM2.5 values, given previously recorded data from
This model aims to learn a function
Figure 2 represents the architecture of the proposed STEEP model. The PM2.5 observations, meteorological values, and Spatio-temporal patterns are pre-processed and given as input to the model. The patterns are represented as an undirected graph (i.e., matrix M) and given as input, along with the pre-processed time series data in the input layer. The spatio-temporal dependencies in the data are captured in the spatio-temporal processing layer using Diffusion Convolution (DC) and Gated Recurrent Unit (GRU). The predicted PM2.5 value is given in the output layer.

STEEP architecture.
The C-STCOP algorithm in Table 6 uses an Apriori-based candidate generation strategy to produce unique, closed patterns. The significance of closed and lossless ST patterns is quantified using the coupling co-efficient measure
C-STCOP algorithm.
C-STCOP algorithm.
The prevalence of an individual event in the entire dataset is determined initially. If an event is not frequent with respect to the prevalence value, then patterns are not generated for those events. The candidate patterns are produced using an apriori approach if and only if
The spatiotemporal patterns obtained from C-STCOP is represented as an undirected graph
Where
Graph construction algorithm.

Illustration of graph construction from sample C-STCOP.
An algorithm for Graph Construction is given in Table 7. The algorithm begins by creating a zero matrix of size
An illustration of graph construction for three C-STCOP patterns is given in Figure 3. The first pattern is 5-9-15-16-23 and its prevalence value is 0.08. The station ID is used for denoting stations in the patterns. This value is assigned as edge weight to all the vertices involved in the pattern. The second pattern is 2-5-7-8-13 with a prevalence value of 0.07 and this value is assigned as edge weight to all the vertices. The third pattern considered is 15-23-25 and when assigning this prevalence value as edge weight to 15-23 and 23-15 the new prevalence value is added up with the existing value. The resultant graph constructed from using all the C-STCOP patterns obtained for the set parameters

Spatio-temporal graph constructed using the C-STCOP patterns.
Graph statistics.
The spatiotemporal dynamics of air pollution are modelled using STEEP. The spatiotemporal association considers the interconnectedness of pollution levels across different monitoring stations over time. It recognizes that pollution levels at one location can be influenced by the pollution levels at nearby or geographically distant locations, as well as by the temporal dynamics and trends in those locations. Therefore, by incorporating spatiotemporal association, time series prediction models can capture the spatial interactions and dependencies among monitoring stations, enabling more accurate and comprehensive predictions of air pollution levels. The proposed model recognizes that local factors and spatiotemporal relationships with neighbouring locations influence pollution levels. Consequently, the model captures the spatio-temporal association between the monitoring stations using the diffusion convolution process on the constructed graph. The temporal dependencies among time series are captured by the sequence-to-sequence architecture of the encoder-decoder.
Diffusion convolution
The diffusion convolution process is used on the graph that is constructed in Section 4.2. The standard convolutional neural network (CNN) usually handles grid-structured data in image processing. The convolution process uses a filter to examine the image and extract necessary features. For example, if a 3
Where
The temporal dependency is addressed by using the recurrent neural network (RNN). The Gated Recurrent Unit (GRU) is an improved version of the customary recurrent neural network (RNN) [4]. GRU uses an update and reset gate to handle the vanishing gradient problem. The matrix multiplication in GRU is replaced by the diffusion convolution operation
Here
A sequence-to-sequence architecture is employed for performing multi-step ahead forecasting. i.e., if a PM2.5 vector of 6 hours is fed to an encoder, then the decoder takes the last state as input and produces the subsequent sequence. Throughout the training phase, the encoder is provided with historical time series data, and its final state is used to initialize the decoder. During the testing phase, the forecasted results are compared with the ground truth for evaluating the model.
The air quality data, weather data, and C-STCOP patterns were used for building the STEEP model. Delhi, India, was used as the study area, as explained in Section 3. The Spatio-temporal data was stored in the PostgreSQL database using the PostGIS extension. The experiments were conducted in an intel core i5 desktop with 16 GB RAM and 500 GB SSD.
In this research work, the PM2.5, AT, RH, and WS observation from 2019 to 2020 was used for training and testing the model. The Spatio-temporal patterns obtained from C-STCOP were constructed as a graph. The C-STCOP patterns were used to construct the adjacency matrix/undirected graph. To compute the graph, the weight between each node using the patterns prevalence measure is calculated as per Eq. (2). The matrix was normalized to get values between the range 0–1 to be set as the edge weights of the nodes. This graph and the time series data are used for evaluating the model. K-fold cross-validation was employed with a value of 10 for k. 70% of the data was used for training purposes, 20% for testing, and 10% for validation. For the
Where
Experimental settings.
In this section the Result obtained by baseline methods and proposed STEEP is analysed.
Baseline methods
The baseline methods used for evaluating the proposed STEEP are explained in this section.
Linear Regression (LR): The linear relationship between input values and predicted values is analyzed by the LR. An autocorrelation exists among the predictions, i.e., the value of
Gradient Boosting Machines (GBM): This model can find linear patterns that LR cannot discover. This algorithm aims to reduce errors by adding a weak learner to the tree-based method through a gradient descent-like procedure. At each step, a new weak learner is added to minimize the error leaving the old ones unchanged.
LSTM: They are unique forms of RNN that are specialized in handling the long-term dependencies problem. The LSTMs can remember information for a longer period of time, thus, suitable for sequence data and time series prediction.
DCRNN: This model uses the geographic distance between the stations to model a graph for spatial dependency. The distance between monitoring stations is recorded as edge weight between the nodes. The diffusion convolution recurrent neural network uses a sequence-to-sequence encoder-decoder architecture to predict time series [26].
GC-DCRNN: This model is an extension of the DCRNN. Here the similarity between the stations is computed based on the geographic factors surrounding the monitoring station to represent the spatial dependency [35].
The comparative results of different models for predicting PM2.5 values.
The comparative results of different models for predicting PM2.5 values.
Comparison of running time for each model.
Table 11 gives the running time for each model in terms of minutes. It can be observed that the proposed STEEP model takes more time as it involves a pattern mining procedure followed by a time series prediction. But this concern can be overlooked as it gives better result in terms of MAE, RMSE and MAPE as seen in Table 10.

MAE, RMSE, MAPE graph visualization.

Graph constructed for the DCRNN model.
This section evaluates the results obtained by the proposed STEEP for forecasting PM2.5 values in Delhi. The PM2.5, AT, RH, and WS data for 2019–2020 were used along with the C-STCOP patterns. The patterns were modelled into a graph, as explained in Section 4.2. Figure 4 shows the undirected graph constructed using the ST patterns obtained from the C-STCOP method.
The stations involved in the graph and the number of edges from each node or station is shown in Figure 4. It can also be observed that a few stations don’t have any edges, irrespective of nearness to neighboring stations. DwarkaSector8, Mundka, Narela, Patparganj, SoniaVihar, and VivekVihar are the stations that have no association with other stations as per the Spatio-temporal patterns generated by the C-STCOP algorithm. It implies that the PM2.5 values recorded in these stations are not synonymous with other stations concerning spatial and temporal closeness. These stations exhibit a unique behavior as they are immune to the spatial and temporal dependency property. Generally, the ST patterns show the association between stations, which frequently exhibit spatial and temporal dependency. But, the above-mentioned stations have edges in Figure 6 as the spatial distance is only used to construct them. The main reason these stations show such behaviour is that they are located on the outskirts of Delhi. The other reason could be their geographic property, population, flora, and fauna. Expert studies are required to further analyse the reason supporting this characteristic. These stations have edges in the graph generated by the DCRNN model using a thresholded geographic distance matrix. Figure 6 shows the graph generated using 10 km as the neighbor threshold for the DCRNN model. From the graph, it can be seen that all the stations are present as nodes, as they have some stations as neighbors. The C-STCOP patterns capture the spatio-temporal association between the neighbors per the PM2.5 recorded; therefore, using these closed ST patterns as knowledge for time series forecasting will give better results, and Table 10 and Figure 5 proves this.
Comparison of the baseline models and the proposed STEEP with respect to forecasted results is shown in Table 10. The best results are highlighted in bold, and it can be observed that the proposed STEEP gives the best result compared to the baseline approaches. The proposed model generates low RMSE, MAE, and MAPE values compared to the baseline approaches as shown in Figure 5. This emphasizes the importance of using the ST patterns as knowledge in time series forecasting. The usage of ST patterns has led to a low error rate and accurate results compared to the existing models that use geographic context information like GC-DCRNN and DCRNN. Table 10 shows that the STEEP model performed better than the baseline models across all the horizons, with an improvement of 8%–13%. This clearly shows the proposed model’s ability to utilize the Spatio-temporal patterns in predicting PM2.5 values across all the horizons.
Conclusion
The surge in the availability of historical air pollution data and sophisticated computing resources has led to the development of various air quality prediction models. This work proposed a deep-learning model for forecasting PM2.5 pollutants. The proposed STEEP model incorporates spatio-temporal association among the monitoring stations. Also, this model integrates diffusion convolution and a gated recurrent unit (GRU) model. The Spatio-temporal association is captured by constructing a graph based on the patterns obtained from the C-STCOP algorithm. The STEEP model was initially trained to validate the network parameters, and later the test dataset justified it. The experimental results demonstrated that the proposed model surpasses the baseline models in forecasting PM2.5 value. The performance of the proposed model is better in forecasting across all horizons with respect to MAE, RMSE and MAPE. The STEEP achieves a consistent improvement of 8%–13% over the baseline approaches. The proposed model can be extended to use other types of ST patterns as knowledge in future studies.
Footnotes
Acknowledgments
This work was supported by the Council of Scientific & Industrial Research, India, under the scheme Direct- SRF with grant File No: 09/559(0141)/19-EMR-I.
