Abstract
Understanding and forecasting air quality index (AQI) plays a vital role in guiding the reduction of air pollution and helping social sustainable development. By combining fuzzy logic with decomposition techniques, ANFIS has become an important means to analyze the data resources, uncertainty and fuzziness. However, few studies have paid attention to the noise of decomposed subseries. Therefore, this paper presents a novel decomposition-denoising ANFIS model named SSADD-DE-ANFIS (Singular Spectrum Analysis Decomposition and Denoising-Differential Evolution-Adaptive Neuro-Fuzzy Inference System). This method uses twice SSA to decompose and denoise the AQI series, respectively, then fed the subseries obtained after the decomposition and denoising into the constructed ANFIS for training and predicting, and the parameters of ANFIS are optimized using DE. To investigate the prediction performance of the proposed model, twelve models are included in the comparisons. The experimental results of four seasons show that: the RMSE of the proposed SSADD-DE-ANFIS model is 1.400628, 0.63844, 0.901987 and 0.634114, respectively, which is 19.38%, 21.27%, 20.43%, 21.27% and 87.36%, 88.12%, 88.97%, 88.71% lower than that of the single SSA decomposition and SSA denoising. Diebold-Mariano test is performed on all the prediction results, and the test results show that the proposed model has the best prediction performance.
Keywords
Introduction
In recent years, with the development of China’s industrialization, most cities have been subjected to more and more serious air pollution [1]. Previous studies have shown that air pollution can cause many diseases [2]. For example, PM2.5 pollution has a direct impact on the respiratory system and cardiovascular system, and is closely related to the incidence and mortality of lung cancer [3], a high level of PM10 will increase the risk of high blood pressure [4], and patients with hypertension had an increased risk of hospital admission when exposed to air pollution [5]. In addition, air pollution may lead to abnormal rainfall and aggravate the greenhouse effect [6].
Air quality index (AQI) is calculated with reference to the new ambient air quality standards (GB3095-2012), which covers six pollutants. It describes the degree of air cleanliness or pollution. At the current stage of China’s economic development, air pollution is inevitable, but that does not mean that it cannot be controlled. Since severe air pollution directly affects environmental quality and human health, accurate prediction of AQI is one of the main goals of air quality research. For example, when the AQI predicted value is high, it proves that the air pollution is serious. The government can immediately notify local factories to limit production or notify people to reduce the number of vehicles traveling, and at the same time remind the public to reduce outdoor activities to protect their physical and mental health. Therefore, effective air quality forecasting is indispensable for reducing the adverse effects of air pollution [7, 8].
To achieve accurate forecasting of air pollution, many prediction models have been proposed by scholars. These models include statistical models, machine learning models, and hybrid models. Statistical models commonly include autoregressive integrated moving average (ARIMA) and multiple linear regression (MLR). [9] applied an ARIMA model to predict PM2.5 concentrations, and the analysis showed that PM2.5 concentrations were significantly and positively correlated with PM10, SO2, and NO2 concentrations. [10] proposed a real-time air quality prediction model based on MLR. The results showed that the model has high accuracy and calculation efficiency, and can meet the requirements of air quality forecasting. However, traditional forecasting methods slowly fail to meet the needs of current forecasting targets [11], because it cannot capture nonlinear features well to make long-term prediction [12].
With the development of machine learning model, since it can overcome the shortcoming of statistical model in predicting nonlinear data, and have been widely used in air pollution forecasting. classical nonlinear prediction models include artificial neural network (ANN) [13, 14], support vector regression (SVR) [15, 16], random forest regression [17, 18], long short-term memory unit (LSTM) [19, 20], gated recurrent unit (GRU) [21] etc. Besides, Adaptive Network Based Fuzzy Inference System (ANFIS) is a method combining ANN and fuzzy inference system [22], which is learning and interpretable [23]. Using neural network learning mechanism, rules are automatically extracted from input and output sample data to form an adaptive neuro fuzzy controller, it creates a fuzzy inference and self-regulation of control rules so that the system itself runs in an adaptive, self-organizing and self-learning direction [24]. Based on the above advantages, ANFIS has been applied to predict the concentration of air pollutants. For example, [25] used collinearity tests and forward selection techniques to reduce the number of input variables of ANFIS, and predicted current and next day SO2, NO2, CO, O3 and PM10 in Howrah city. Their results showed that fewer input features can not only reduce the computational cost and time, but also improve the prediction accuracy of the ANFIS model. [26] established ANFIS models for the time series data of CO, SO2, O3 and NO2 in Tehran. They concluded that ANFIS is more accurate in predicting time series data than regression models. It is well known that different places have different situations, such as time and space [27, 28]. Hence, a single prediction model is difficult to fully adapt to all conditions of time series predictions [29]. Aimed to solving this problem, scholars have proposed lots of methods including optimization algorithms and decomposition algorithms.
Swarm intelligence optimization algorithms can continuously approach the optimal parameters through the cooperation and competition among individuals in the group, so as to achieve the optimal performance of the model [30]. [31] combined particle swarm optimization (PSO), genetic algorithm (GA) and differential (DE) evolution with ANFIS respectively, and established ANFIS-PSO model, ANFIS-GA model, ANFIS-DE model and standalone ANFIS model to predict monthly solar radiation using the metrological data of the Malaysian Metrological Department. The results showed that the prediction accuracy of the hybrid ANFIS model is higher than that of the standalone ANFIS model, and the ANFIS-PSO model gained the best results among other models in testing and training phases. [32] used slime mould algorithm (SMA), which is improved by using the particle swarm optimizer (PSO) to optimize the parameters of ANFIS model, and then predicted CO2, PM2.5, SO2 and NO2 in Wuhan, China. They compared the proposed PSOSMA to other MH algorithms. The results showed that the improved ANFIS using PSOSMA algorithm has better performance.
Data preprocessing is an effective method to sort out the complex data relationships in the original data and make it more stationary [33], and it is gradually becoming an increasingly critical technique to improve the prediction performance of hybrid models [34]. For example, [35] used two hybrid models based on empirical mode decomposition (EMD), EMD-SVR-Hybrid and EMD-IMFs-Hybrid, to predict the AQI of Xing Tai, China. The AQI forecasting results showed that the two proposed hybrid models are superior to ARIMA, SVR, GRNN, EMD-GRNN, Wavelet-GRNN and Wavelet-SVR. [36] established an EMD-GRU model to predict the PM2.5 series of Beijing airport. The case prediction results showed that compared with the single GRU model, EMD-GRU model has higher prediction accuracy. [37] used the wavelet packet decomposition (WPD) algorithm to decompose raw PM2.5 data. The decomposition method effectively increased the stability of the prediction model. [38] applied wavelet decomposition (WD), sample entropy (SE), and variational modal decomposition (VMD) to preprocess AQI series. Through secondary decomposition, the nonlinearity of the original series was significantly reduced and the analytical ability of LSTM is improved.
Based on the above literature analysis, it could be observed that: (a) The optimization algorithm can determine the best parameters of the model and improve the prediction performance of the model. (b) Decomposition method could reduce the volatility of original data and enhance the prediction accuracy of the predictor. Although many decomposition methods have been used for air quality prediction at present, it can be found that most of them directly combine the decomposed subseries with the predictor or perform secondary decomposition on the decomposed high-frequency subseries, and few studies pay attention to the noise problem of the subseries. If there is noise in the subseries, it will affect the convergence speed and prediction accuracy of the model. Therefore, noise removal from subseries is essential for building an effective AQI prediction model.
To solve the above problem, a decomposition-denoising ANFIS model based on SSA and DE algorithm is proposed for seasonal AQI forecasting. The novelty of the research is shown as: To enhance the accuracy and robustness of the AQI prediction model, a novel decomposition-denoising algorithm is presented based on using SSA twice. Compared with traditional single decomposition algorithms, the proposed algorithm can further process the unsteady components in original data. The decomposition-denoising process is described as follows: (a)The SSA algorithm is used to decompose raw AQI data. (b) The obtained subseries are decomposed by SSA again, and the decomposition quantity satisfies that the contribution rate (CR) of the last secondary subseries is less than 1%. (c) The secondary subseries with the lowest CR is noise series, and the rest of the secondary subseries are reconstructed into trend series. The ANFIS is the main prediction method for seasonal AQI forecasting. The ANFIS integrates the learning ability of neural networks and the cognitive ability of fuzzy logic to accurately solve many non-linear and complex real-world problems. Compared with other neural network models, ANFIS has no complex super parameters and network structure to adjust. It can adjust its parameters adaptively according to the characteristics of input and output data, so as to achieve better prediction results. The DE algorithm is a swarm intelligence optimization algorithm, which has the advantages of simple structure, fast convergence, easy to use, fast speed and strong robustness, is used to optimize the parameters of ANFIS. It can continuously approach the optimal parameters through the cooperation and competition among individuals in the group, so as to improve the prediction performance of the ANFIS. The SSADD-DE-ANFIS model proposed in the paper is a novel hybrid model. It combines decomposition and denoising, the DE algorithm, and the ANFIS. Compared with single models, it has better nonlinear predictive modeling ability. So, it is of great significance to research the application of this method in AQI forecasting.
The rest of the paper is arranged as follows. Section 2 introduces the methods and the proposed hybrid model applied in this study. Section 3 shows the processing of AQI and the parameter settings of this paper. Section 4 demonstrates four seasons’ studies where the forecasting results of the proposed model and other involved models are evaluated. Section 5 summarizes the research work of this paper and expound the direction of future research.
Methods
This Section presents a theoretical design concept of the proposed hybrid SSADD-DE-ANFIS model for seasonal AQI forecasting. Sections 2.1–2.3 are the relevant technologies and methods used in the hybrid model, and Section 2.1 provides an overview of the proposed approach.
Singular spectrum analysis decomposition and denoising
In recent years, the singular spectrum analysis (SSA) of time series has received a growing attention as a non-parametric time series modeling technique [39]. The SSA can effectively and efficiently realize the decomposition and reconstruction of the signal [40]. [41] used SSA to decompose the stock price time series, and then established the PSO-SVR model. The experimental results show that the presented SSA-PSO-SVR is superior to the conventional WT-FFNN, ARMA, polynomial regression, and naïve model. [42] used SSA to decompose wind power series into trend and harmonic series and noise series, and established a Laguerre neural network model to predict future wind power values. The results showed that the proposed hybrid model has the best prediction performance in the comparison models. [43] carried out the wind speed prediction models by combing the EMD based technologies with SSA, in which the SSA was applied to extract the trend components of the highest frequency sub-layer, and on the basis of their experiment results, their models could obtain high accurate forecasts. Due to the excellent performance of SSA, it has been applied in many prediction domains, but there are still few applications for AQI forecasting at present.
Singular spectrum analysis decomposition
In order to reduce the volatility and complexity of the original AQI series and make it stable, SSA is used to decompose the original AQI series into several subseries. Assuming that the one-dimensional AQI time series is X = (x1, x2, . . . , x N ), the steps for SSA decomposition are as follows:
Step 1. Embedding. Convert X to the vectors Z = (Z1, Z2, . . . , Z
K
) where Z
i
= (x
i
, xi+1, . . . , xi+L-1) T ∈ R
L
, K = N-L+1 and L ∈ [2, N], The number L is the window length. The matrix Z is the trajectory matrix, which can be denoted as
Step 2. Singular Value Decomposition. This step is used to decompose the trajectory matrix Z. Calculate ZZT and obtain L eigenvalues λ1 ⩾ λ2 ⩾ ⋯ ⩾ λ
L
⩾ 0, U1, U2, ⋯ U
L
are the corresponding orthogonal eigenvectors, here, d is the subscript of the largest eigenvalue and is also the rank of matrix Z, which can be described as
In the actual series, there are usually d = L* = min {L, K}. The matrix Z can be transformed as follows:
Where Z
i
denotes the elementary matrix, U
i
and V
i
denote the left and right eigenvectors, respectively,
Step 3. Restructure. This step aims to use the diagonal average method to convert the Z
i
(i = 1,2, ... ,d) into a matrix with length N. Suppose that Y is a d × K matrix with elements y
ij
, 1 ⩽ i ⩽ d, 1 ⩽ j ⩽ K, d* = min(d, K), and K* = max(d, K). If d < K, then
Then, Z1, Z2, . . . , Z
k
will be transformed into corresponding subseries X1, X2, . . . , X
d
, satisfying
In this paper, SSA is used again to remove the noise of the decomposed subseries to make it more suitable for prediction. Each subseries is decomposed by SSA again, and the decomposed secondary subseries is reconstructed. The first m secondary subseries with high CR are reconstructed into trend series, and the rest secondary subseries with low CR are reconstructed into noise series, which are directly removed. The steps are described as follows:
Step 1. Embedding. Construct the trajectory matrix according to Eq. (1) for each X
i
(i = 1, 2, . . . , d) obtained in Eq. (8).
Step 2. Singular Value Decomposition. Each S
i
is processed according to the step 2 of the SSA decomposition. Then,
Step 3. Grouping. Divide the Eq. (11) into 2 different groups and add the matrices contained in each group, one is a trend series, and the other is a noise series. Then,
Step 4. Restructure. Ti,1 and Ti,2 are reconstructed according to Eq. (7).
After the above four steps, the trend series and noise series of each subseries is obtained.

The Typical ANFIS system structure.
ANFIS is considered to be the combination of ANN and fuzzy inference, which not only gives full play to their advantages, but also makes up for their shortcomings. Using the neural network learning mechanism, it automatically extracts rules from input and output sample data, and thus constitutes a self-adaptive neural fuzzy controller [44]. In addition, The ANFIS model applies the first order of Tagaki-Sugeno-Kang (TSK) [45], which has the advantages of simple calculation and conducive to mathematical analysis, and is easy to be combined with optimization and adaptive methods, so as to realize the fuzzy modeling tool with optimization and adaptive ability. Compared with traditional machine learning algorithms, ANFIS can approximate nonlinear functions with arbitrary accuracy. ANFIS is divided into five layers: fuzzification layer, rule fitness layer, normalized fitness layer, defuzzification layer and output layer.
It is assumed that the fuzzy inference system has two inputs x and y, which are two variables used to predict AQI, and one output is AQI, the general rules with two fuzzy “IF-THEN” rules are as follows:
Rule 1: if x is A1 and y is B1 then z = p1x + q1y + r1.
Rule 2: if x is A2 and y is B2 then z = p2x + q2y + r2.
Typical ANFIS system structure, as illustrated in Fig. 2.

The whole process of the SSADD-DE-ANFIS model.
The network is divided into 5 layers. The first 3 layers are rule fronts, and the last 2 layers are rule backs. Square nodes represent adaptive nodes whose parameters can be adjusted. Circle nodes represent fixed nodes with no parameters or parameters that cannot be adjusted. Adjustable parameters are mainly concentrated in layer 1 and layer 5. The functions of each layer are described below, convention
The first layer: Fuzziness. Convert the input variables x and y into the membership degree of each fuzzy set, the nodes in this layer are adaptive nodes composed of node functions (membership functions).
Where,
The second layer: Rule applicability. The output of each node is the product of the input signal, that is, the membership degree of the fuzzy set corresponding to each variable is multiplied to obtain the trigger strength of each rule.
The fourth layer: Defuzzification. The nodes in this layer are adaptive nodes, and each fuzzy rule corresponds to an output, which is obtained by the linear combination of input features.
Where {p i , q i , r i } are called the parameter set of the consequent of fuzzy rules, which are adjustable parameters.
The fifth layer: Output layer. Summing the output of each fuzzy rule to get the total output.
The Differential Evolution (DE) algorithm is a kind of evolutionary algorithm, which was first proposed by Storn and Price in 1995. The evolution process of DE algorithm is very similar to that of genetic algorithm (GA), including mutation, crossover and selection operations, but the specific definitions of these operations are different from those of GA. Compared with simple GA, DE algorithm obtains a more accurate optimal value and fewer iterations [46]. Compared with other evolutionary algorithms, DE algorithm has the advantages of simple structure, fast convergence, easy to use, fast speed and strong robustness [47]. Therefore, DE algorithm has been widely used. The specific procedures are shown in the following:
Step 1. Initialization. Assuming that the dimension of ANFIS to be optimized is D, initialize NP real parameter vectors with dimension D as the initial population. Generally, the initial population conforms to the uniform probability distribution. The initialization method is determined according to the optimization range of each dimension.
Where, i = 1, 2, …, NP, rand is a random number with uniform distribution between 0 and 1. The population size NP is related to the dimension D, which is generally set between [5D, 10D].
Step 2. Mutation. In the g-th iteration, three individuals are randomly selected for the mutation operation of individual xi,g to produce offspring individual.
Where, r1, r2, r3 ∈ {1, 2, …, NP} are different integers and different from the current target vector index I, so the population size is required NP ⩾ 4. The scaling factor F is a constant between (0,1) and controls the size of the difference vector. If F is too small, the convergence rate will be reduced, and if F is too large, the population will not converge.
Step 3. Crossover. After the mutation operation, the target vector xi,g and the mutation vector vi,g need to be binomially crossed to generate the final experimental vector ui,g = [ui1,g, ui2,g, . . . , uiD,g]. The formula is as follows:
Where, j rand a is an integer randomly selected from the set {1, 2, …, D} to ensure that one-dimensional information of the variation vector is preserved. The crossover probability cr is a constant within the interval (0,1).
Since this paper optimizes the various dimension parameters required by ANFIS within a given range, some dimension out of bounds problems may occur after mutation. Therefore, it is necessary to traverse the data of each dimension, and use Eq. (21) to reassign the dimension that is out of bounds.
Step 4. Selection. This step is to obtain a new generation of population by competing between the parent and the child, that is, by comparing the objective function values of the experimental vector ui,g and the target vector xi,g, the smaller the fitness function value will be retained.
Since the original AQI data has obvious nonlinear characteristics and large fluctuations, if it is directly predicted, the accuracy may be limited. In order to eliminate the influence of non-stationary factors on the prediction effect, SSA is used to decompose and denoise the original AQI series.
The whole process of SSADD-DE-ANFIS model is shown in Fig. 1. The steps of hybrid forecasting model are described as follows:
Step 1: SSA decomposition. Before using SSA to decompose the original AQI series, it is necessary to select the appropriate window length, that is, the number of decompositions. This paper takes the decomposition result of EMD as a reference, because EMD is an adaptive data processing method, which can be decomposed according to the characteristics of the input signal. According to the decomposition results of EMD, SSA is used to decompose the original AQI into L subseries.
Step 2: SSA denoising. In order to further improve the predictability of subseries, SSA is used to extract the trend of each subseries, remove noise, and make them smoother. Take a single subseries as an example, decompose it into a certain number of secondary subseries, and satisfy the contribution rate of the last secondary subseries is less than 1%. The secondary subseries with the lowest CR is removed, and other secondary subseries are reconstructed into trend series.
Step 3: Divide the dataset. The training set and test set are divided according to seasons, and different ANFIS models are established according to seasons, so as to realize seasonal AQI forecasting.
Step 4: Determine the optimization interval. In this paper, AQI time series is transformed into the form of supervised learning, so the domain of discourse for each feature is set to be the same. Taking a single training set as an example, the discrete subseries is transformed into a continuous universe U, and two fuzzy sets of “low” and “high” are constructed for each feature, that is, two Gaussian membership functions are selected for fuzzification for each feature, corresponding to two centers and width,The definition of U and the center and width of membership function are shown in Equations (28)–(30).
Where d1 and d2 are small real numbers to adjust U to an integer interval. The purpose of optimizing a i is to ensure that the membership degree of the value in the subinterval is not too small and the membership degree of the value outside the subinterval is not too large.
Step 5: DE initialization. Initialize the parameters of DE, such as population number, search dimension and other information. Then initialize the rule antecedent parameters according to the optimization range in the step 5, and the optimization range of the rule consequent parameters is [–5, 5], and the optimization range of the consequent parameters is not constrained to ensure the global search ability. The initialization rule is shown in Eq. (21).
Step 6: DE-ANFIS model. After the population is initialized, the root mean square error (RMSE) of the training set is used as the fitness function to evaluate each particle, and the particle with the smallest fitness is selected as the initial optimal solution, and then the global optimal solution is updated iteratively. When the stopping condition is met: RMSE is less than the threshold or the maximum number of iterations is reached, the parameters of ANFIS are output. The RMSE of the training set is defined as Eq. (31):
Where g is the target output, d is the predicted output of ANFIS, and N a is the size of the training set.
Step 7: AQI forecasting. Use the ANFIS model to predict the test set, add the prediction results of all subseries to get the final prediction results, and calculate several evaluation indexes.
In previous studies, there have been many studies to predict the concentration of pollutants (PM10, P M2.5, CO, O3, NO2, SO2), but from the perspective of the public, it is difficult to judge whether the air quality is good or bad from a single indicator. The AQI is a dimensionless index obtained by synthesizing these six pollutants. According to the value of the AQI, the public can directly judge the current level of air pollution and take countermeasures. Therefore, this paper chooses AQI as the research object.
Study area and data description
Study area
In order to verify the prediction performance of the hybrid forecasting model proposed in this paper, the AQI series in Beijing is taken as an example. Beijing is the capital and political center of China, it is located in the northern part of China, and its center is located in the 116°20′E and 39°56′N. The climate of Beijing is a warm temperate semi-humid and semi-arid monsoon climate. It is hot and rainy in summer, cold and dry in winter, and short in spring and autumn. There are two main reasons for choosing Beijing as the object to be studied in this paper: (1) Beijing is not only the capital of China, but also a first-tier city in China. There are many studies on air pollution in this area. (2) The economic development level of the Beijing-Tianjin-Hebei region represented by Beijing is gradually improving, but at the same time, with the increasingly serious problem of air pollution, the forecasting of AQI in Beijing is very representative and important for controlling air pollution.
Data description
The data used in this study is the hourly AQI data of Beijing from March 1, 2019 to February 29, 2020, and the data is obtained from http://www.cnopendata.com/. The data for this time period is selected because China generally refers to spring from March to May, summer from June to August, autumn from September to November, and winter from December to February of the following year. The data of the first two months is the training set, and the data of the next month are applied as the test set to verify the forecasting accuracy of the hybrid model.
In addition, AQI data of each month are missing to varying degrees due to some other reasons, but there is no lack of data on the same day. Because there are enough sample data in this paper, and the missing data has little impact on the whole, the missing data is deleted directly, and a total of 8452 hours of data after deletion. It is divided into four training sets and test sets according to seasons. Table 1 shows the statistical information of AQI in four seasons. Fig. 3 shows the detailed fluctuation characteristics of the original AQI data. From Fig. 3 and Table 1, it can be seen that the fluctuation of AQI in different seasons is different. Among them, the fluctuation of AQI in the training set in spring is larger and higher than that of the overall sample in spring, the fluctuation is relatively gentle in summer and moderate in autumn, and the fluctuation of AQI in winter is the largest in the four seasons, and the fluctuation of the test set is greater than that of the training set.

Original AQI data.
Statistical indicators of AQI data in four seasons
Decomposition of original AQI series
Since the original AQI series has a large fluctuation range and is irregular, in order to deal with the non-stationarity of random and irregular AQI series, SSA is first used to decompose the original AQI series. The key of decomposition lies in the selection of window lengths L. However, there is no corresponding standard for the selection of L, which can only be selected by human experience. EMD decomposes signals according to the time scale characteristics of the data itself, which is adaptive. It can decompose complex signals into a limited number of intrinsic mode function (IMF). Therefore, this paper takes the decomposition number of EMD as a reference. EMD is used to decompose the original AQI series, and 10 IMF and a residual are obtained. Therefore, the decomposition quantity is determined as 11, and SSA is used to decompose the original AQI series into 11 subseries.
Denoising of the subseries
In order to further improve the predictability of subseries and improve the prediction accuracy, SSA is used again to remove the noise of each subseries. Each subseries is decomposed into a certain number of secondary subseries, and the CR of the secondary subseries with the smallest contribution is less than 1%. This secondary subseries is directly deleted as noise series, and the rest is reconstructed into the trend series of this subseries. Here, the sample entropy (SE) is used to measure the complexity of the noise series. Table 2 shows the decomposition number of subseries, CR and SE of noise series. Figure 4 is the final decomposition and denoising result.

Decomposition-denoising results for AQI series.
Relevant information of subseries decomposition
In terms of model evaluation, the evaluation indicators selected for all prediction models in this paper are RMSE, mean absolute error (MAE), mean absolute percentage error (MAPE), symmetric mean absolute percentage error (SMAPE), and R-square (R2). RMSE is used to measure the deviation between the predicted values and the real values. It is very sensitive to the large and small errors in the same group of measurements, so RMSE can well reflect the precision of forecasting. MAE can better reflect the forecasting errors because the absolute value of the deviation is used to calculate and avoid the offset effect of positive and negative values. MAPE and SMAPE examines the ratio of prediction errors and the true values, that is, the degree of deviation between forecasting values and true values. The difference is that MAPE is asymmetric, it imposes a greater penalty on negative errors than on positive errors. The smaller the calculation results of RMSE, MAE, MAPE and SMAPE, the smaller the prediction deviation, and the better the prediction effect of the model. R-square is the goodness of fit of the regression forecasting model. The value of R-square is closer to one, indicating that the fitting effect of the forecasting model is better. The equation is defined as Equations (32)–(36).
This paper uses four improvement metrics to show the degree of progress of the hybrid forecasting model proposed in this paper relative to other models. including improvement percentage of root mean square error (PRMSE), improvement percentage of mean absolute error (PMAE), improvement percentage of mean absolute percentage error (PMAPE) and improvement symmetric mean absolute percentage error (PSMAPE). The equation is defined as Eqs. (37) – (40).
In Section 3.2, the original AQI has been processed. Each subseries is divided into four training sets and test sets, and the input and output structure are formed by forecasting the next data according to the first five data. The forecasting idea of this paper is shown in Fig. 5, and the parameter settings of DE algorithm are shown in Table 3.

The forecasting structure.
Experimental parameters of DE algorithm
Experiments
After decomposition and denoising using SSA, the DE-ANFIS model was developed, to predict each subseries. Then, the ultimate AQI forecast result could be obtained, by accumulating the prediction values from each subseries.
In this study, to investigate the prediction performance of the SSADD-DE-ANFIS model, eleven prediction models are provided as the comparison models, which comprise of the SVR model, back propagation neural (BP) model, LSTM model, GRU model, Extreme Learning Machine (ELM) model, ANFIS model, DE-ANFIS model, SSAd-DE-ANFIS model, EMD-DE-ANFIS model, VMD-DE-ANFIS model and SSAD-DE-ANFIS model. Among them, SSAD and SSAd indicate that only SSA is used for decomposition and denoising. Here, uppercase and lowercase are used for differentiation.
All of the involved models can be written in different languages, such as Python, Java, R language and Matlab. In this paper, the SSA, DE-ANFIS model, LSTM model and GRU model are implemented in Python; the EMD and VMD are implemented through the corresponding software package of R language; the SVR model, BP model and ELM model are implemented through the e1071 package, neuralnet package and elmNNRcpp package of the R language respectively; while the ANFIS is implemented by the anfisedit toolbox of MatlabR2021a.
The detailed parameters of the competitive models are presented in Table 4, and other parameters are at their default settings. The parameter optimization methods of DE-ANFIS model, SSAd-DE-ANFIS model, EMD-DE-ANFIS model, VMD-DE-ANFIS model and SSAD-DE-ANFIS model the same as those of SSADD-DE-ANFIS model.
Parameter settings of the competitive models
Parameter settings of the competitive models
To analyze the experimental results of the prediction models, we use the five metrics which are defined in Section 3.3. The RMSE, MAE, MAPE, SMAPE and R2 results of the twelve prediction models for the four seasons are given in Tables 5–8.
Evaluation metrics of the twelve models in Spring
Evaluation metrics of the twelve models in Spring
Evaluation metrics of the twelve models in Summer
Evaluation metrics of the twelve models in Autumn
Evaluation metrics of the twelve models in Winter
In accordance with Tables 5–8, it can be found that: The SSADD-DE-ANFIS model can achieve accurate AQI prediction results in the four seasons, which indicates that the proposed model can have strong generalization ability. Compared to the SVR model, BP model, LSTM model, GRU model and ELM model, the DE-ANFIS model is more stable in AQI predictions for the four seasons and has higher prediction accuracy. In the autumn forecast, the RMSE and R2 of the DE-ANFIS model are 8.903004 and 0.964384, respectively. The RMSE of the other single models are all greater than 10, and the R2 is less than 0.95, which indicates that DE-ANFIS model has better feature extraction ability and prediction ability for AQI time series Compared to the ANFIS model, the forecasting errors of the DE-ANFIS model is smaller, and the predicted values are highly consistent with the real values, which shows that using DE algorithm to optimize the parameters of ANFIS is more effective than traditional ANFIS, and can give full play to the advantages of ANFIS model. Compared to the DE-ANFIS model, the forecasting errors of SSAd-DE-ANFIS model has decreased to a certain extent in the four seasons. For example, in the spring forecast, RMSE decreased from 12.015013 to 11.080307; MAE decreased from 5.827328 to 4.907401; MAPE decreased from 8.114386% to 6.728947; SMAPE decreased from 7.7396786 to 840890; R2 increased from 0.924831 to 0.936072. Compared to the SSAd-DE-ANFIS model, the prediction accuracy of the SSAD-DE-ANFIS model is higher, which indicates that the method using only data decomposition is more beneficial to reduce the complexity of the AQI series and enhance the accuracy of the prediction model than the method using only denoising. Compared to the EMD-DE-ANFIS model and VMD-DE-ANFIS model, the SSAD-DE-ANFIS model has better prediction accuracy for AQI prediction of the four seasons, which indicates that using SSA to decompose the original AQI series is more effective than EMD and VMD. Compared to the SSAD-DE-ANFIS model, The prediction errors of the SSADD-DE-ANFIS model are smaller, indicating that using SSA to denoise the subseries again can further reduce the complexity of the subseries and improve the predictability of the subseries.
The proposed SSADD-DE-ANFIS model exhibited the best prediction accuracy in the comparison of accuracy metrics. In order to further analyze the prediction errors of each model, the fitting curves and forecasting errors of the twelve models for four seasons are shown in Figs. 6 and 7, and the AQI forecast scatterplots for the various models versus the actual AQI data are presented in Figs. 8–11.

Comparison of forecast results of 1 step ahead of twelve models in four seasons.

Boxplots of forecasting errors of twelve models in four seasons.

Comparison of predicted values and real values of twelve models in Spring.

Comparison of predicted values and real values of twelve models in Summer.

Comparison of predicted values and real values of twelve models in Autumn.

Comparison of predicted values and real values of twelve models in Winter.
Based on Figs. 6–11, it can be found that: The single models can roughly fit the change trend of AQI when the AQI series is flat, but when there is a large change in AQI, the single models are difficult to capture the change law of AQI, resulting in large prediction errors. Take the spring forecasting as an example, it can be clearly seen from the local enlarged map and the error box graph that the maximum positive and negative errors of a single model exceed 100, resulting in a low accuracy of the overall forecast. EMD-DE-ANFIS has poor fitting ability in predicting AQI in spring. When using EMD to process highly complex AQI data, it is still difficult to achieve the ideal prediction effect. From the error box diagram of Fig. 7, it can be seen that the prediction error is significantly greater than that of DE-ANFIS model. However, for the AQI forecasting of the other three seasons, the fitting effect is improved. The reason is that there are several very large data of AQI in spring, when using EMD to decompose the original AQI, the complexity of the first IMF is high, and the prediction effect is not ideal, resulting in the overall prediction accuracy is not high. The fitting effect of VMD-DE-ANFIS model is worse than that of EMD-DE-ANFIS model, but it is obviously inferior to SSA-DE-ANFIS model. VMD-DE-ANFIS model has a low prediction value at the high point of AQI, because VMD will separate a high complex noise series when decomposing the original AQI data Although the prediction accuracy has been improved, the information contained after decomposition is less than the original data, so the overall prediction accuracy is limited. From the prediction results of Fig. 6, the SSAD-DE-ANFIS model and the SSADD-DE-ANFIS model have good fitting effects on the prediction of four seasons, and the predicted values of the two models are highly consistent with the real values. However, it can be seen from Fig. 7 that the error distribution of SSADD-DE-ANFIS model is more concentrated and the error is smaller. According to Figs. 8–11, it can be seen that a single model is difficult to capture the variation characteristics of AQI fluctuations in different seasons, and the fitting degree is obviously not as good as that of the hybrid model. In the forecast of the four seasons, the fitting degree of the SSAD-DE-ANFIS model is higher than that of the EMD-DE-ANFIS model and the VMD-DE-ANFIS model, indicating that SSA is more effective for seasonal AQI forecasting. SSADD-DE-ANFIS model has the best fitting degree in the four seasons. Compared with SSAD-DE-ANFIS model, the distribution of scattered points is more concentrated on the straight line, indicating that it is reasonable to use SSA again to remove the noise of subseries.
Analyzing the prediction results of each model, it is not difficult to see that the greater the fluctuation of AQI, the lower the prediction accuracy. The reasons for the large differences in AQI in different seasons can be analyzed from the geographical location and environment of Beijing.
Beijing is located in the inland. In spring and autumn, the atmospheric environment is generally dry, and the water vapor in the air is insufficient. In addition, Beijing is close to the desert in the northwest. When the dry spring and autumn days come, there will be windy days and sometimes even sandstorms. Therefore, the AQI in Beijing fluctuates too much in spring and autumn, which makes it difficult for a single model to capture its change law and the prediction error is large.
According to the weather conditions in China, haze weather is a weather phenomenon that is very likely to occur in autumn and winter every year. In summer, most of the eastern part of China is controlled by the summer monsoon, which blows from the sea to the land. It is warm and humid. At the same time, the horizontal and vertical exchange movement of the atmosphere is strong, which is conducive to the diffusion of pollutants and the removal of moisture, and haze weather is not likely to occur. According to the AQI prediction results in summer, it can also be seen that the prediction results of each model are obviously better than those of other seasons. In the winter half year, most parts of my country are controlled by the winter monsoon, which comes from northern Siberia and other regions, and is dry and cold in nature. The transition between winter and summer monsoon generally begins in October, so Beijing generally has a smog prone season from October to march of the next year. In addition, people’s demand for heating gradually increases in winter. Low air pressure, strong temperature inversion, weak wind speed, and high humidity cause a significant decrease in the ability of pollutants to diffuse, and pollutants continue to accumulate, resulting in an increase in the level of pollution concentration. In this weather, people should minimize the frequency of going out due to the large area coverage of harmful air. If you really want to go out, you should also take appropriate measures in advance, such as wearing a mask and walking in small steps with calm breathing.
Although the AQI fluctuates differently in different seasons, the SSADD-DE-ANFIS model proposed in this paper can well adapt to the AQI changes in different seasons and greatly improve the prediction accuracy of AQI. It can provide reference and basis for forecasting, controlling and mitigating air pollution, and also provide constructive opinions and suggestions for policymakers to take more economical and efficient measures to improve air quality in the future.
Improvements of the proposed model
To further compared the prediction accuracy of the SSADD-DE-ANFIS model with that of the eight comparison models, the PRMSE, PMAE, PMAPE, and PSMAPE are used for analysis. Tables 9–12 show the improvement percentages of the eight comparison models by the proposed model for each season, respectively.
Improvement percentages of the comparison models by the SSADD-DE-ANFIS model for Spring
Improvement percentages of the comparison models by the SSADD-DE-ANFIS model for Spring
Improvement percentages of the comparison models by the SSADD-DE-ANFIS model for Summer
Improvement percentages of the comparison models by the SSADD-DE-ANFIS model for Autumn
Improvement percentages of the comparison models by the SSADD-DE-ANFIS model for Winter
Take the winter with the best prediction effect as an example, according to Table 12, it can be observed that: The prediction accuracy of the SSADD-DE-ANFIS model outperforms that of the SVR model, the RMSE, MAE, MAPE and SMAPE are reduced by 93.00%, 92.19%, 91.13%, 90.94%, respectively. The prediction accuracy of the SSADD-DE-ANFIS model outperforms that of the BP model, the RMSE, MAE, MAPE and SMAPE are reduced by 91.32%, 91.03%, 89.33%, 89.26%, respectively. The prediction accuracy of the SSADD-DE-ANFIS model outperforms that of the LSTM model, the RMSE, MAE, MAPE and SMAPE are reduced by 89.60%, 89.74%, 89.24%, 89.09%, respectively. The prediction accuracy of the SSADD-DE-ANFIS model outperforms that of the GRU model, the RMSE, MAE, MAPE and SMAPE are reduced by 89.65%, 89.63%, 88.78%, 88.75%, respectively. The prediction accuracy of the SSADD-DE-ANFIS model outperforms that of the ELM model, the RMSE, MAE, MAPE and SMAPE are reduced by 89.43%, 89.33%, 89.01%, 88.91%, respectively. The prediction accuracy of the SSADD-DE-ANFIS model outperforms that of the ANFIS model, the RMSE, MAE, MAPE and SMAPE are reduced by 89.50%, 89.33%, 88.78%, 86.69%, respectively. The prediction accuracy of the SSADD-DE-ANFIS model outperforms that of the DE-ANFIS model, the RMSE, MAE, MAPE and SMAPE are reduced by 89.54%, 89.19%, 88.32%, 88.21%, respectively. The prediction accuracy of the SSADD-DE-ANFIS model outperforms that of the SSAd-DE-ANFIS model, the RMSE, MAE, MAPE and SMAPE are reduced by 88.71%, 88.99%, 88.46%, 88.28%, respectively. The prediction accuracy of the SSADD-DE-ANFIS model outperforms that of the EMD-DE-ANFIS model, the RMSE, MAE, MAPE and SMAPE are reduced by 81.86%, 82.31%, 82.69%, 82.67%, respectively. The prediction accuracy of the SSADD-DE-ANFIS model outperforms that of the VMD-DE-ANFIS model, the RMSE, MAE, MAPE and SMAPE are reduced by 80.83%, 82.68%, 84.86%, 84.43%, respectively. The prediction accuracy of the SSADD-DE-ANFIS model slightly outperforms that of the SSA-DE-ANFIS model, the RMSE, MAE, MAPE and SMAPE are reduced by 21.27%, 20.85%, 20.40%, 20.40%, respectively.
Hypothesis testing is a widely used method to infer a totality according to a sample in mathematical statistics based on certain as sumptions [48].
In this paper, we use Diebold-Mariano test to compare the prediction results of two time series prediction models to determine which model has the better prediction results. Assume that model A and model B make predictions on the span of time length T, and the error series of model A and model B are E
a
= [a1, a2, . . . , a
T
] and E
b
= [b1, b2, . . . , b
T
], respectively. The difference sequence D = [d1, d2, . . . , d
T
] can be obtained, where d
i
= a
i
- b
i
, i = 1, 2, , , T . s2 is the estimation of the variance of d
i
. Then the DM statistic can be defined as:
The hypothesis test can be expressed as:
H0 : E (d i = 0): model A and model B have the same prediction effects.
H1 : E (d i ≠ 0): model A and model B have the different prediction effects.
P-value: when the P-value is>0.05, the original hypothesis is accepted, which means that the prediction effects of the two models are the same; when the P value is less than 0.05, the original hypothesis is rejected, which means that the prediction effects of the two models are different.
DM value (this indicator is meaningful only when p < 0.05): when DM >0, it means that model B is better than model A; when DM < 0, it means that model A is better than model B.
To further test if the prediction accuracy of the proposed model is significantly different from that of the comparison models, the DM test is used for analysis. Table 13 describes the DM test on the four seasons, where the SSADD-DE-ANFIS model is the benchmark.
Diebold-Mariano test: SSADD-DE-ANFIS model as benchmark
Based on Table 13, it can be found that:
In the AQI prediction for four seasons, the DM values of the comparison models are greater than 0, and P-values are less than 0.05, except for the P-values of LSTM model, GRU model and ANFIS model predicted in autumn, all P-values are less than 0.01. This phenomenon shows that the prediction errors of the SSADD-DE-ANFIS model are significantly different from the other competitive models.
Based on the Tables 9–13 and analysis, the prediction accuracy of the proposed model is significantly better than that of the comparison models in the AQI forecasting of the four seasons.
In order to verify the prediction accuracy of the model proposed in this paper, we compared the prediction results of the model proposed in this paper with other four different hybrid models. These four hybrid models are VMD-SE-LSTM model [49], EMD-GRU model [36], and the other two hybrid models are SSADD-GRU and SSADD-SE-LSTM models that combine the decomposition method proposed in this paper with GRU and SE-LSTM. Here, we use the summer data set with small data fluctuation as an example to carry out the experiment. In this comparative experiment, the decomposition number of the original AQI is 11. We reconstruct the subseries with similar SE after the decomposition of VMD and SSADD according to the method of Wu and Lin. The input characteristics of the four comparative models are consistent with the SSADD-DE-ANFIS model proposed in this paper, and the parameters of the four models are adjusted according to different subseries. We used RMSE, MAE, MAPE, SMAPE and R2 in Section 3.3 to evaluate the prediction results of the five hybrid models.
Table 14 is the comparison analysis of prediction errors and R2 between the EMD-GRU, the SSADD-GRU the VMD-SE-LSTM, the SSADD-SE-LSTM and the SSADD-DE-ANFIS. From the perspective of prediction errors, the SSADD-DE-ANFIS model proposed in this paper is the smallest in RMSE, MAE, MAPE and SMAPE; in terms of fitting degree, the SSADD-DE-ANFIS model proposed in this paper has the largest R2, which indicates that the hybrid model proposed in this paper has better prediction effect and higher consistency between the predicted values and the real values.
The comparison analysis of prediction errors and R2 of the five hybrid models
The comparison analysis of prediction errors and R2 of the five hybrid models
Comparing the EMD-GRU model and the SSADD-GRU model, the VMD-SE-LSTM model and the SSADD-SE-LSTM model, it can be seen that the SSADD method proposed in this paper has more advantages than EMD and VMD methods in processing random and fluctuating AQI series. In addition, this paper uses the DE algorithm to optimize the parameters of ANFIS, and the prediction accuracy of the model is higher. GRU and LSTM belong to the deep learning method, and their super parameters and network structure need to be adjusted, and continuous comparison and exploration are required.
Air pollution has become a major global problem, posing a serious threat to economy, environment and human health. Therefore, establishing an accurate and effective AQI forecast model is crucial for guiding the reduction of air pollution and helping the sustainable development of society.
In this paper, a novel decomposition-denoising ANFIS is developed by combining the SSA and DE. In the architecture of the proposed SSADD-DE-ANFIS model, the first use of SSA is to decompose the original AQI series and reduce the non-stationarity and complexity of AQI; the second use of SSA is to extract the trend information of subseries and remove noise; the ANFIS is designed to predict the subseries after decomposition and denoising; and the DE is adapted to optimize the parameters of ANFIS model. In order to investigate the prediction performance of the SSADD-DE-ANFIS model, some models are selected as the comparison models, including the SVR model, BP model, LSTM model, GRU model, ELM model, ANFIS model, DE-ANFIS model, SSAd-DE-ANFIS model, EMD-DE-ANFIS model, VMD-DE-ANFIS model and SSAD-DE-ANFIS model. Based on the experimental results of the four seasons, reached the following conclusions: The SSADD-DE-ANFIS model can achieve satisfactory predictions in seasonal AQI forecasting. In the SSADD-DE-ANFIS model, the SSA is a good feature extractor, which is more effective than EMD and VMD in seasonal AQI forecasting. Using SSA again to remove the noise of subseries can further improve their predictability, and using DE algorithm to optimize the parameters of ANFIS model can further improve the prediction accuracy of ANFIS model. The proposed hybrid model can comprehensively capture the characteristics of the original AQI series, has high consistency in the forecasting of AQI series, and has more advantages than the other comparison models. It can be used as a simple and effective tool for air pollution forecasting and early warning.
It can be seen that the decomposition-denoising ANFIS model used in this paper is feasible for seasonal AQI forecasting, and the prediction accuracy is high. However, when predicting the future AQI, this paper artificially selects the data of the first five hours, without considering the impact of the number of input features on the prediction accuracy; In SSA denoising, the CR of the noise series to the original series is less than 1%. We do not consider the impact of increasing or reducing the CR of the noise series on the prediction accuracy. Therefore, our future work will focus on the following aspects: We will consider using the feature selection method to select the optimal features as the input of the model to further improve the prediction accuracy. In addition, we will design experiments to appropriately adjust the CR of the noise series to study whether different CR can achieve more satisfactory prediction results.
Footnotes
Acknowledgments
The research was supported by the National Natural Science Foundation of China (Grant No. 71971105). The research was supported by the National Statistical Science Research Project (Grant No. 2020LZ03) and “Thousand talents plan” for high level talents in Jiangxi Province (Grant No. jxsq2019201064).
