Abstract
Particulate matter (PM) is one of the most significant air pollutants in recent decades that has tremendous negative effects on the ambient air quality and the public health. Accurate PM forecasting provides a possibility for establishing an early warning system. In this paper, a deep feature learning architecture, i.e., autoencoder-based deep belief regression network (AE-based DBRN), is introduced and utilized to forecast the daily PM concentrations (PM2.5 and PM10). Prior to establishing this model, Pearson correlation analysis is applied to look for the possible input-output mapping, where the input candidate variables contain seven meteorological parameters and PM concentrations within one-day ahead, and the output variables are the local PM forecasts. The addressed model was evaluated by the dataset in the period of 28/10/2013 to 31/8/2016 in Chongqing municipality of China. Moreover, two shallow models, feed forward neural network and least squares support vector regression, were employed for the comparison. The results indicate that the AE-based DBRN model has remarkable better performances among the comparison models in terms of mean absolute percentage error (PM2.5 21.092%, PM10 19.474%), root mean square error (PM2.5 8.600μg/m3, PM10 11.239μg/m3) and correlation coefficient criteria (PM2.5 0.840, PM10 0.826).
Introduction
Particulate matter (PM), one of the major air pollutants, has been of a particular concern due to its negative effects on the ambient air quality and the public health. Moreover, the literature [1] points that the harmful degree for the human health usually depends on the duration of exposure as well as the category and concentration of the PM [2, 3]. To protect public health and the environment, various organizations and agencies established a large number of monitoring stations and different emission standards for regulating permissible concentrations of the PM in the atmosphere. However, the monitoring data are only the representations of the current air quality without reflecting the air quality changes in the future. Thus, it is necessary to implement an accurate forecasting model for the PM concentration that can provide early warning for guiding the works of air pollution control and public health protection.
According to the modeling theory, the PM concentration forecasting methods can be divided into two categories: physics-based models and data-driven models [4]. The formers [5, 6] aim to build and simulate the relationships between the PM concentration and the atmospheric physical and chemical components in the emission, transport and transformation processes. Nonetheless, the successful applications of these models based on the knowledge-driven require a strong background in atmospheric science. In addition, there are still many uncertainties in the evolution mechanism and the simulating model for the PM concentrations. Therefore, compared to the physics-based models, the data-driven model, which is based on the information of the dataset without considering the physical and chemical processes, has been attracted much more attention. Many approaches are developed and improved to forecast the PM concentrations [7–13].
Among these data-driven models, artificial neural network (ANN), which has good abilities of nonlinear mapping, self-adaption, and robustness, has been regarded as an outstanding model and successfully applied in the PM concentration forecasting with various structures, e.g., feed forward neural network [14], radial basis function neural network [15], and wavelet neural network [16]. These studies indicated that the forecasting performances of the ANN closely depend upon the extracted features from the input variables (meteorology and pollutants). In other words, the excellent feature representation can ensure forecasting accuracy. However, affected by the complicated meteorology condition in practice, the PM concentrations exhibit nonlinear and time-varying indetermination. It will lead to the data-based feature learning difficulties and the calculation complexities of the network structures [17]. Therefore, it is imperative to develop a new method for feature learning in the ANN for the PM concentrations forecasting.
Deep learning [18], including many hidden layers, has a strengthening capability for the complex features’ representation from the inputs, achieving a better generalization. Through representations by the hierarchical levels from the lower layers to the higher layers, the “deeper” feature hidden in the input variables of the PM can be extracted sufficiently, and transferred into the regression networks for the PM concentration forecast. In this regard, Hinton et al. [18] proposed a deep learning architecture (stack network), i.e., deep belief network (DBN), which has been applied successfully in feature extraction [19], classification [20], and regression [21, 22]. Recently, the deep learning technique has been also caught attention in the air pollutants forecasting [23, 24]. All of these studies have shown that the deep learning is a promising technique for better learning sophisticated features compared with traditional “shallow” learning approach (one or none hidden layer in networks).
Owing to the superiority in feature learning and representation, a deep regression network of AE-based DBRN, combining autoencoder-based DBN for the deep feature learning and the NN for the feature regression, is proposed for the PM concentration (PM2.5 and PM10) forecasting in this paper. In addition, the meteorological conditions are also considered into the deep regression modeling. To estimate the forecasting performance, a comparison study is designed including two “shallow” learning approaches, i.e., feed forward neural network (FFNN) and least square support regression (LSSVR).
The rest of this paper is outlined as follows. Methodologies are introduced in Section 2. A case study is presented in Section 3. Section 4 illustrates the results and discussion. Conclusions are given in Section 5.
Methodologies
AE for feature extraction
An autoencoder aims to reconstruct its input vectors

An AE network structure.
For an AE, there exist two processes, one is feature extraction of encoder f
e
(.) (Equation (2)), and the other is feature reconstruction of decoder f
d
(.) (Equation (3)).
After achieving the reconstruction values
To obtain optimal parameter sets of ξ = (
The DBN shown in Fig. 2, a classical and widely-used deep structure, is a stack of simple (AE in this paper) and unsupervised networks [26].

AE-based DBRN model structures.
According to the graphical representation (Fig. 2), it can find that the AE-based DBN for the feature learning is modeled via one AE shown in Fig. 1 extracted and transferred the specific information learned by “layer-wise” in l hidden layers. Specifically, each layer of the AE is trained individually and only the encoder process is kept, and several AEs are stacked together with an output of one AE process to the input of next layer. This is an unsupervised learning process. In addition, in the top layer (
An AE-based DBRN modeling can be concluded in Fig. 2. Its procedures are given as follows: (1) Perform generative unsupervised learning layer-wise on the AE from lower to higher layers. (2) Fine-tuning using back propagation algorithm (supervised learning) on the entire DBN to tweak the weights from top to lower layers. (3) Active the regression model using the results of the top layer
Overview of the AE-based DBRN model
Given the composed techniques, the AE-based DBRN modeling procedure, which is summarized in Fig. 3, can be implemented as follows:

The framework of AE-based DBRN for forecasting the PM concentrations.
Data set
In this paper, two kinds of PM are investigated, i.e., PM2.5 and PM10 (μg/m3). The daily data (24-h mean value) during 28/10/2013 to 31/08/2016 in Chongqing city were collected from the datacenter of the Ministry of Environmental Protection of China. Moreover, seven types of the daily meteorological data, i.e. maximum temperature (°C), minimum temperature (C), mean atmospheric pressure (kpa), mean relative humidity (%), precipitation (mm), mean wind speed (km/h), and visibility (km), were collected in the corresponding periods (http://www.tianqihoubao.com). The detail data (total 1040×9 samples) are shown in Fig. 4.

Air particulate matter concentrations and meteorological conditions during the period from 28/10/2013 to 31/8/2016.
According to “Ambient Air Quality Standards (GB 3095-2012)” published by the Ministry of Environmental Protection and the General Administration of Quality Supervision, Inspection and Quarantine of China, the daily legal concentration limits of the PM are set as PM2.5 ≤ 75μg/m3 and PM10 ≤ 150μg/m3, respectively, which are referred as the horizontal dashed line shown in Fig. 4. Therefore, compared to the standard (GB 3095-2012), it can find that the PM concentrations were often above the permissible limits, and the number of days (PM2.5 218 days and PM10 110 days) with exceeding values (maximum over 1.81 times for PM2.5 and 0.95 times for PM10) is particularly striking. Considering the meteorological conditions shown in Fig. 4, it can find that the atmosphere situations corresponding to the PM concentration levels out of the standard are lower temperature, higher atmospheric pressure, and less precipitation. There have little significant relationships between the PM concentrations and the rest meteorological data from the qualitative aspect.
As mentioned in Section 2, the proposed AE-based DBRN model has three key structure variables, i.e., the number of the input nodes, the number of the hidden layers, and the number of the hidden nodes.
As analyzed above (Fig. 4), the PM and the meteorological conditions are associated with each other in a complex relationship, making the modeling of the PM as a more complex process. To reveal their relationships, multiple inputs and one output structure are developed in this study, i.e., the meteorological parameters and the PM concentrations of one-day in advance as the input variables, and the PM forecasts of the next day as the outputs. Therefore, to improve the fitting degree and reduce the computational burden, Pearson correlation analysis method [28] is utilized for analyzing their relationships between the inputs and outputs, thereby determining the number of the input nodes.
For the hidden layers and hidden nodes in the DBN, there has no a theoretical method to determine their deep levels. Therefore, an experimental pattern (trial and error) is applied to define these variables. The experimental designing is shown as follows: four levels of the hidden layers from 1 to 4 are considered, and ten levels of the hidden nodes in each hidden layer from 5 to 50 (interval 5) are investigated.
In addition, the other computation parameters of the proposed model are set as follows: learning rate 1, size of batch training 100, and maximum iteration of training 100. The computation software is Matlab 2014 with the computation environment Intel Core i5-2450M CPU @2.50 GHz, and Memory 4.00GB. In the following study, the original time series of the PM shown in Fig. 4 will be divided into the training set (800×9 samples, 28/10/2013–04/01/2016) and the testing set (240×9 samples, 05/01/2016–31/08/2016) for modeling and validating the AE-based DBRN performance.
Performance criteria
Results and discussion
Pearson correlation analysis for input nodes specification
The relationships between the current PM and the meteorological factors in one-day in advance are quantified by the Pearson correlation coefficient as shown in Table 2. Label * in Table 2 represents 0.01 level of significance. From Table 2, it can find that the variables of Tmax and Tmin have the highest Pearson correlation coefficient (0.938), and AP and Tmin have the secondary higher mutual correlation (– 0.846). Moreover, the variables Tmin and PMC have a higher correlation than that of Tmax and MPC, as well as AP and MPC respectively. Therefore, to reduce the modeling burden of the multiple variables with higher correlations, the variable Tmax and AP can be deleted. In addition, the dataset that contains only variables strongly correlated with a coefficient greater than 0.1 can be considered as the input nodes as well. Thus four meteorological variables (Tmin, P, WS and V) and MPC in one-day ahead are chosen as the input nodes for the PM2.5 modeling. Five variables (Tmin, RH, P, WS and V) and MPC in one-day ahead are selected as the inputs for the PM10 modeling.
Description of the evaluation criteria
Description of the evaluation criteria
Note: N denotes the length of the investigation periods, C
o
and C
p
represent the values of the recorded and the forecasted values respectively, and
Results of the Pearson correlation analysis
Note: Tmax denotes maximum temperature (C), Tmin denotes minimum temperature (C), AP denotes mean atmospheric pressure (kpa), RH denotes mean relative humidity (%), P denotes precipitation (mm), WS denotes mean wind speed (km/h), V denotes visibility (km), and PMC denotes PM2.5 and PM10 concentrations (μg/m3) respectively. The Pearson correlation coefficients between meteorological variables and the PMC are separated by the slash (/).
According to the analysis for Table 2, five inputs are set for the PM2.5 forecasting, and six inputs for the PM10. The rest parameters, the hidden layers and hidden nodes, are then determined by the experiments. In this study, taking PM2.5 as a representation to introduce the experimental process as described in Section 3. Note that all the experimental results are the mean values under the maximum iteration times.
In the first place, an AE-based DBRN is initialized using a structure with five input nodes, one hidden layer with ten level hidden nodes, and one output node. Table 3 shows the learning results of the AE-based DBRN with one hidden layer for the testing dataset. The results reveal that the best performance appears at 45 hidden nodes (MAPE = 21.771%, RMSE = 9.023μg/m3, and CC = 0.825).
Results of an AE-based DBRN by one hidden layer with different hidden nodes for PM2.5 forecasting
Results of an AE-based DBRN by one hidden layer with different hidden nodes for PM2.5 forecasting
After confirming the number of hidden nodes in the first hidden layer (45 hidden nodes), the second AE-based DBRN is initialized as five input nodes, two hidden layers with ten level hidden nodes in the second hidden layer, and one output node. The forecasting performances of the testing dataset are shown in Table 4. The result indicates that the best evaluation group, i.e., MAPE (21.329%), RMSE (8.460μg/m3) and CC (0.840), also appears at 45 hidden nodes. In addition, the performances with two hidden layers are better than that with a single hidden layer, which show MAPE 0.448%, RMSE 0.563μg/m3, and CC 0.015 increases respectively. Therefore, the stack networks can be regarded as an effective framework to enhance the feature representation.
Results of an AE-based DBRN by two hidden layers with different hidden nodes for PM2.5 forecasting
The selecting experiment for the hidden layer is then continued by the same way. As shown in Table 5, the experimental results demonstrate that the structure with 45 hidden nodes in the third hidden layer has the best regression performances. However, the improvements are limited, that is, the MAPE has 0.237% increases and the RMSE has 0.015μg/m3 decreases.
Results of an AE-based DBRN by three hidden layers with different hidden nodes for PM2.5 forecasting
The experiment is repeated for the AE-based DBRN with four hidden layers. The forecasting performances of the testing dataset are listed in Table 6. It shows that the evaluation values with four hidden layers are firstly worse than that with three hidden layers. Due to this phenomenon, a three-hidden-layer AE-based DBRN is selected as the optimal architecture for the PM2.5 concentration forecasting, i.e., 5-45-45-45-1.
Results of an AE-based DBRN by four hidden layers with different hidden nodes for PM2.5 forecasting
Following the experimental procedures aforementioned, an AE-based DBRN structures for the PM10 forecasting are trained with six input nodes, and the performances for the testing data are concluded in Table 7. Results show that the AE-based DBRN with 45 hidden nodes in the first hidden layer has the optimal performance in terms of MAPE = 21.056%, RMSE = 11.837μg/m3, and CC = 0.800. The model with 30 hidden nodes in the second hidden layer achieves the best performance according to the three criteria, and the model with 25 hidden nodes in the third hidden layer obtains the best outcomes. Therefore, a three-hidden-layer AE-based DBRN is chosen as the optimal architecture for the PM10 concentration forecasting, i.e., 6-45-30-25-1. This experiment also indicates that the learning performances of the multiple hidden layers are better than that of one hidden layer. Nevertheless, its performances are not related to the number of the hidden layers. In other words, the deep architecture with too many hidden layers may lead to the over-fitting issue.
Results of an AE-based DBRN with different hidden structures for PM10 forecasting
Figure 5 plots the forecasting results using the optimal network structures. Figure 5(a) shows the PM2.5 concentration forecasting results with 5-45-45-45-1, and Fig. 5(b) displays the PM10 concentration forecasting results with 6-45-30-25-1. To investigate the agreement between the recorded and forecasted value, the scatter plots are also given in Fig. 5.

Forecasting results and scatter plots with AE-based DBRN. (a) PM2.5, (b) PM10.
From Fig. 5, it can find that the forecasting results can track the fluctuations successfully, except for the peak values during the late spring and early summer. Affected by the seasonal alternate, atmospheric conditions change frequently, unsatisfactory performance appears. In addition, it can be found from the scatter plots that there have fewer angles between the ideal fit (solid line) and the real fit (dash line). It illustrate that there exist a certain degree of convergence between the forecasted values and the recorded ones. According to Table 1, three statistics criteria are computed: MAPE = 21.092% (PM2.5), 19.474% (PM10), RMSE = 8.475μg/m3 (PM2.5), 11.239μg/m3 (PM10), and CC = 0.840 (PM2.5), 0.826 (PM10), respectively. The qualitative and quantitative results reveal that the proposed AE-based DBRN model has a successful application in the PM forecasting.
To evaluate the superiority of the proposed approach, two “shallow learning” approaches, i.e., the FFNN and the LSSVR is employed to forecast the daily PM concentrations using the same data. The input-output structure of the comparison models is the same as the AE-based DBRN.
For FFNN, the optimal network structure is trained as 5-10-1 (input nodes 5, hidden nodes 10, and output nodes 1) for the PM2.5 forecasting, and 6-10-1 (input nodes 6, hidden nodes 10, and output nodes 1) for the PM10 forecasting. For LSSVR [29], the parameters are trained by ten-fold cross validation. The forecasting results and the scatter plots of the testing data using FFNN and LSSVR models trained are shown in Fig. 6.

Forecasting results and scatter plots using different models. (a) FFNN for PM2.5, (b) FFNN for PM10, (c) LSSVR for PM2.5, and (d) LSSVR for PM10.
As shown in Fig. 6, both FFNN and LSSVR models have acceptable forecasting results. However, the same phenomena those significant deviations during the late spring and early summer also appeared. Compared with the AE-based DBRN model, these deviations are greater, and the discrete degree between the forecasted and the recoded values is lager. The qualitative results demonstrate that the methods based on the shallow learning cannot extract features sufficiently, leading to the unfortunate forecasting results.
Besides, a quantitative estimation is listed in Table 8. The three criteria also indicate that the proposed model has the best forecasting performance among the comparison approaches.
Comparison of the forecasting performances using different models
In addition, individual air quality index (IAQI), a number to define the pollution level, is utilized to evaluate the forecasting accuracy rank of the models. The bigger the AQI is, the severer the PM pollution is [30]. Based on the air quality standards of China, the IAQI of PM are classified into six levels, i.e., I 0–50 (excellent), II 51–100 (good), III 101–150 (slight pollution), IV 151–200 (moderate pollution), V 201–300 (heavy pollution), and VI more than 300 (hazardous). The IAQI values of each day are calculated using Equation (5).
After getting the IAQI of the testing periods for both the recodes and forecasts, the numbers of the relevant day are matched to estimate the rank forecasting accuracy. Table 9 gives the confusion matrix of the IAQI forecasted with the AE-based DBRN (Table 9a), FFNN (Table 9b), and LSSVR models (Table 9c), respectively. In the confusion matrix, each row shows the number of the relevant days of the recorded rank, and each column denotes the number of the relevant days of the forecasted rank. The results shown in Table 9 for the PM2.5 and the PM10 are separated by the slash (/).
Confusion matrix of the IAQI evaluation using the AE-based DBRN model
Confusion matrix of the IAQI evaluation using the FFNN model
Confusion matrix of the IAQI evaluation using the LSSVR model
As shown in Table 9, it is clearly that the overall forecasting accuracy rank of the level (IAQI) is the AE-based DBRN (90.83% for PM2.5, 90.42% for PM10) >FFNN (88.75% for PM2.5, 86.67% for PM10) >LSSVR (84.58% for PM2.5, 86.25% for PM10). The results also prove that the AE-based DBRN model has the best forecasting ability due to its deep feature representation.
To sum up, it is apparent that the forecasting performance of the testing data is satisfactory when applying the deep learning framework, because that the deep learning with good feature representation capability can capture the characteristics of the time series by the “layer-wise” learning from the multiple hidden layers.
In this study, an autoencoder-based deep belief regression network (AE-based DBRN) is built for the PM forecasting. Three steps of the proposed model are implemented as follows: (a) Perform generative unsupervised learning layer-wise on the AE from lower to higher layers. (b) Fine-tuning using supervised learning method on the entire DBN to tweak the weights from top to lower layers. (c) Active the regression model using the results of the top layer of the AE-based DBN. The proposed approach is trained and tested using the real data of the PM2.5 and PM10 in Chongqing, China. Moreover, the meteorological data in the corresponding periods are also employed for this task. To investigate the forecasting performance, the AE-based DBRN is compared with two classical “shallow learning” models. As shown in the experiment and the comparison, the proposed model exhibits a good feature capture capacity, and overwhelms the competitors in accordance with the three statistics criteria and the forecasting accuracy rank of the IAQI level. In addition, the AE-based DBRN model performs the same forecasting abilities, demonstrating its generalizability in the air pollution forecasting.
Footnotes
Acknowledgments
This work is supported in part by the National Natural Science Foundation of China (51775112), National Key Research & Development Program of China (2016YFE0132200), the Postdoctoral Science Foundation of China (2016M602459), and the Research Program of Higher Education of Guangdong (2016KQNCX165). We also thank the anonymous referees for their constructive comments that have helped improve this paper.
