A hybrid model for multi-step wind speed forecasting based on secondary decomposition,deep learning,and error correction algorithms

Abstract

Accurate and stable wind speed forecasting is an essential means to ensure the safe and stable operation of wind power integration. Therefore, a new hybrid model was proposed to improve wind speed forecasting performance, consisting of data pre-processing, model forecasting, and error correction (EC). The specific modeling process is as follows: (a) A wind speed series was decomposed into a series of subseries with different frequencies utilizing the ensemble empirical mode decomposition (EEMD) method. Afterward, various subseries were divided into high-frequency components, intermediate-frequency components, and low-frequency components based on their sample entropies (SE). (b) Three frequency components were forecast by separately employing the hybrid model of convolutional neural network and long short-term memory network (CNN-LSTM), long short-term memory network (LSTM), and Elman neural network. (c) Subsequently, an error sequence was further forecast using CNN-LSTM. (d) Finally, three actual datasets were used to forecast the multi-step wind speed, and the forecasting performance of the proposed model was verified. The test results show that the forecasting performance of the proposed model is better than the other 13 models in three actual datasets.

Keywords

Ensemble empirical mode decomposition long short-term memory network elman neural network error correction

1 Introduction

With the increasingly severe global energy crisis, the development and utilization of renewable energy are essential to the development of society [1]. As one of the most abundant renewable energy sources, wind energy shows broad application potential. However, as intermittent energy, wind power has the characteristics of randomness and uncontrollability [2]. Therefore, large-scale wind power integration will affect the power system’s stability, adequacy, and economy [3]. Wind speed forecasting is not only helpful for dispatchers to make plans and ensure the quality of power, but also to arrange reasonable reserve capacity and reduce the operating cost of the power system [4]. Therefore, it is regarded as a practical approach to alleviating the adverse effects of wind power integration.

In recent years, scholars have proposed many methods for wind speed forecasting. These methods are mainly divided into four types [5]: physical, statistical, artificial intelligence, and hybrid models. The physical model usually simulates wind speeds utilizing various physical parameters such as meteorological and geographical parameters. The physical model is adequate for medium and long-term wind speed forecasting [6]. However, the physical model is realized by many complex physical equations, which need much calculation in modeling [7]. The statistical models include the autoregressive moving average (ARMA) model and the autoregressive integrated moving average model (ARIMA) [8]. These models have advantages, including simplicity and rapidity; however, they are suitable for linear series forecasting, not for non-linear series forecasting [9]. Artificially intelligent models are non-linear and can explain the non-linear relationship between input and output data. These models include support vector machine (SVM) [10], artificial neural network [11], backpropagation (BP) neural network [12], generalized regression neural network (GRNN) [13], and extreme learning machine [14], which were widely used in wind speed forecasting. These traditional non-linear models can extract superficial features and exhibit strong data adaptation ability, but they need to be further promoted.

With the development of computer networks, a new branch- the deep learning model was proposed based on the machine learning model. Compared with traditional non-linear models, the deep learning model can automatically extract the internal abstract features and hidden invariable data structures from the lowest level to the highest level of data [15]. Due to the advantages of the deep learning model, many scholars continue to study how to apply it to forecast wind speeds. For example, Wang et al. [16] proposed a forecasting method for wind speeds certainty and probability based on a deep belief network. The test result showed that the method shows good forecasting performance. Furthermore, Wang et al. [17] extracted non-linear features of wind powers at different frequencies using a convolutional neural network (CNN) to improve forecasting performance. Moreno et al. [18] used a long short-term memory network (LSTM) to forecast multi-step wind speed and achieved beneficial results. However, although these researchers used the deep learning model to forecast the wind speed, most of them used the same model to forecast the whole components of the wind speed without considering the different characteristics of different components of the wind speed.

Given the shortcomings of the single forecasting model, scholars have proposed some hybrid forecast models that integrate different models’ forecasting ability to give full play to the complementary advantages of various models to achieve better forecasting performance [19]. For example, Liu et al. [20] used CNN and CNN-LSTM to forecast the wind speed with different frequency sublayers. They compared it with other reference models and verified the superiority of the hybrid model. Liu et al. [21] used two kinds of recurrent neural networks, Elman neural network, and LSTM to forecast wind speed, which verified the good forecasting performance of LSTM in high-precision wind speed forecasting. Zhang et al. [22] combined CNN with LSTM to forecast wind speed considering both linear and non-linear trends in wind speed series. The test results revealed that compared with a single model, the forecasting performance and reliability were improved. Based on the excellent forecasting performance of Elman, LSTM, and CNN-LSTM in wind speed forecasting, this paper takes the above three models as the basic models of wind speed forecasting.

Besides the forecasting model, data pre-processing is another essential factor affecting the forecasting performance of the wind speed [23]. It decomposes the original wind speed series into more regular and stable subsequences. Then, an appropriate forecasting model is established for each subsequence, and the sum of the forecasting results of each subsequence is taken as the final forecasting result. For example, Tascikaraoglu et al. [24] used wavelet transform and Spatio-temporal correlation to forecast wind speeds. He et al. [25] proposed a forecasting model by combining ensemble empirical mode decomposition (EEMD) with wavelet neural network and achieved an excellent forecasting performance. Yu et al. [26] proposed a wind speed forecasting model based on wavelet transform and singular spectrum analysis (SSA). In his research, SSA extracted the trend components in the subseries with the highest frequency. Although wavelet transform can decompose the original series into more standard components, it needs to determine the wavelet basis function and decomposition sequence manually, which cannot guarantee the optimal decomposition of the signal. Based on the above analysis, EEMD and SSA are used to pre-process the wind speed series in this study.

Additionally, error correction (EC) is also an effective method for enhancing forecasting performance [27]. For example, Jiang et al. [28] combined the least-squares SVM with the generalized autoregressive conditional heteroscedasticity to correct the error components in the forecasting and verified the effectiveness of the method through experiments. Yu et al. [29] proposed an error predictive method based on data transformational GM(1,1) and further improved the forecasting performance. Tkachenko et al. [30] used an additional successive geometric transformations model to forecast the constant displacement and linear component of the error for improving the forecasting performance. The above research shows that the error correction can further improve the forecasting accuracy of the model.

Based on the above research, a hybrid forecasting model combining EEMD, SSA, Elman, LSTM, CNN-LSTM, and EC was proposed to realize multi-step forecasting of short-term wind speeds. The whole process of the proposed wind speed forecasting method is shown in Fig. 1. There are four stages: data pre-processing, model forecast, error correction, and comparison and analysis.

Fig. 1

The whole process of the proposed wind speed forecasting method.

(1) Data pre-processing

An original wind speed series is decomposed into intrinsic mode functions (IMFs) with different frequencies and a residual component based on EEMD. Next, the sample entropies (SEs) of each component are calculated using SE. The components with an approximated SE fall in the same group, and finally, three groups, i.e., high-frequency, intermediate-frequency, and low-frequency groups, are divided. Moreover, all components in each group are summed as a component.

(2) Model forecasting

The major steps of model forecasting include: firstly, the trend component of high-frequency components is extracted using SSA; next, high-frequency components are denoised for forecasting; afterward, Elman, LSTM, and CNN-LSTM are used to forecast the low-frequency, intermediate-frequency, and high-frequency components, respectively; finally, the sum of the forecasting results of the three components is taken as the initial forecasting result of wind speeds.

(3) Error correction

The main steps of error correction include: firstly, the error is obtained by subtracting the initial predicted value from the actual value of wind speeds; then, the trend component of the error sequence is obtained using SSA to denoise the error sequence; finally, CNN-LSTM is used to forecast the error, and the sum of the forecasting error and the initial forecasting value is taken as the final forecasting result.

(4) Comparison and analysis

The main step of comparative analysis is to use E-S-CLE-S-E and other 13 forecasting models to forecast the multi-step wind speed of three different wind speed data sets and compare the forecasting performance of these models.

From what is mentioned above, the main contributions of this study can be summarized as follows:

To improve the accuracy and robustness of the wind speed forecasting, EEMD and SE methods are used to divide the wind speed into multi-level frequency components (high-frequency, intermediate-frequency and low-frequency components). In view of the volatility and nonlinearity of high frequency components, the SSA method is used to denoise them in this paper.

Considering that different neural network models have different forecasting advantages, Elman, LSTM, and CNN-LSTM are used to forecast the low-frequency, intermediate-frequency, and high-frequency components, respectively, which can give full play to the complementary advantages of each neural network model.

The hybrid model proposed in this paper considers the error factors, realizes the error correction, and further improves the forecasting performance.

2 Methods

2.1 EEMD method

EEMD is an improvement based on empirical mode decomposition (EMD). By adding white noise with uniform spectrum distribution to the signal to be analyzed, the signals of different time scales can be automatically separated into corresponding reference scales, thus overcoming the shortcomings of mode mixing in the EMD method [25]. The specific decomposition steps based on EEMD are shown as follows:

Step 1: the stochastic Gaussian white noise series n_m (t) is added to the original series x (t), thus obtaining, $x_{i} (t) = x (t) + n_{m} (t)$ (1)

Step 2: the series x_i (t) is decomposed into multiple IMFs and a residual component (R) utilizing EMD, that is, $x_{i} (t) = \sum_{j = 1}^{n} {IMF}_{i, j} (t) + R_{i} (t)$ (2) where, IMF_i,j (t), Re_i (t), and n denote the j^th IMF component in the i^th test, the residual component in the i^th test, and the total number of IMFs after decomposition.

Step 3: different noise series are added each time to repeat the aforementioned Steps 1 and 2. In this way, M groups of different IMF components and a residual component R are attained.

Step 4: the mean of M groups of IMF components and the residual component R is taken as the final result of EEMD, that is, ${IMF}_{j} (t) = \frac{1}{M} \sum_{i = 1}^{M} {IMF}_{i, j} (t), R = \frac{1}{M} \sum_{i = 1}^{M} R_{i} (t)$ (3)

2.2 SE method

The SE method was usually used to measure the complexity of a time series. The greater the SE, the more complex the time series; the lower the SE, the higher the autocorrelation between series [31]. Each component after decomposition is calculated based on SE. The series with similar SE (i.e., similar complexity) is grouped to reduce the number of forecasting models and improve forecasting performance. The specific algorithm of SE is displayed as follows:

Step 1: it is supposed that there is a time series {x (t) , (t = 1, 2, ⋯ N) }. N and SD refer to the length and standard deviation of the series x (t), respectively.

Step 2: Related parameters of the algorithm are defined, such as the embedding dimension m and the similarity tolerance r. Generally, m and r are separately set as 2 and 0.1 SD to 0.25 SD.

Step 3: an m-dimensional vector X (1) , X (2) , ⋯ , X (N - m + 1) is reconstructed, in which, $X (i) = [x (i), x (i + 1), \dots, x (i + m - 1)]$ (4)

Step 4: D [X (i) , X (j)] is defined as the distance between the vectors X (i) and X (j), which depends on the maximum difference between corresponding elements, that is, $D [X (i), X (j)] = \max_{k = 0 m - 1} | x (i + k) - x (j + k) |, i \neq j$ (5)

Step 5: $B_{i}^{m} (r)$ is defined as the ratio of the number of elements satisfying D [X (i) , X (j)] < r to N - m, that is, $B_{i}^{m} (r) = \frac{1}{N - m} num {D [X (i), X (j)] < r}$ (6)

Step 6: the mean of $B_{i}^{m} (r)$ is calculated and recorded as B^m (r), that is, $B^{m} (r) = \frac{1}{N - m + 1} \sum_{i = 1}^{N - m + 1} B_{i}^{m} (r)$ (7)

Step 7: assuming k = m + 1, Steps 3 to 6 are repeated to determine the values of $A_{i}^{k} (r)$ and A^k (r). $A^{k} (r) = \frac{1}{N - k + 1} \sum_{i = 1}^{N - k + 1} A_{i}^{k} (r)$ (8)

Step 8: the SE is calculated according to the following equation: $SE (m, r) = lim_{N \to \infty} {- ln [\frac{A^{m} (r)}{B^{m} (r)}]}$ (9)

Because N is a limited value in practical application, the SE estimation method is as follows: $S E (m, r, N) = - \ln [B^{m + 1} (r) / B^{m} (r)]$ (10)

2.3 SSA method

SSA proposed by Broomhead and King [32] is a proper time series analysis method, which can extract the trend, oscillation, periodic, quasi-periodic, and noise components in original data. SSA projects the data space into sub-spaces with different features and characterizes the properties thereof using singular values, and reconstructs the time series by truncating the singular values [33]. The method mainly involves data decomposition and data reconstruction, in which the former includes embedding and singular value decomposition (SVD), and the latter involves grouping and diagonal averaging.

(a) Decomposition

Step 1: embedding; the original time series X = (X₁, ⋯ , X_N) is transformed into L-dimensional track matrix Y = (Y₁, ⋯ , Y_L), in which Y_i = (X_i, ⋯ , X_i+L-1), L ∈ [2, N] and K = N - L + 1. The matrix Y is expressed as follows: $Y = [\begin{matrix} X_{1} & X_{2} & \dots & X_{K} \\ X_{2} & X_{3} & \dots & X_{K + 1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ X_{L} & X_{L + 1} & \dots & X_{N} \end{matrix}]$ (11)

Step 2: SVD; the eigentriples (λ_i, U_i, V_i) in descending order by λ_i of the matrix YY^T can be attained through SVD. Where, λ_i, U_i and V_i represent the singular value, and left and right eigenvectors, respectively. Thus, the matrix Y can be re-written as: $Y = Y_{1} + Y_{2} + \dots + Y_{d}$ (12) $Y_{i} = \sqrt{λ_{i}} U_{i} V_{i}^{T}$ (13)

where d = rank (Y).

(b) Reconstruction

Step 1: grouping; m components are selected from d characteristic sets and defined as I ={ I₁, I₂, ⋯ I_m }. Therefore, Y can be expressed as a matrix consisting of m components, that is, Y_I ={ X_I1, X_I2, ⋯ , X_Im }. Y_I stands for the trend component of original time series and the other (d - m) components are regarded as sources of noise.

Step 2: diagonal averaging; the matrix {Y_I1, Y_I2, ⋯ , Y_Im } is transformed into the matrix {X_I1, X_I2, ⋯ , X_Im } through the Hankelisation procedure H. The original time series can be further described as follows: $\begin{matrix} X = H (Y_{I 1}) + \dots + H (Y_{Im}) + H (ε) \\ = X_{I 1} + \dots + X_{Im} + H (ε) \\ = X_{trend} + X_{noise} \end{matrix}$ (14)

where, X_trend = X_I1 + ⋯ + X_Im and X_noise = H (ε), which are separately called trend and noise components.

2.4 Elman neural network

Elman neural network is a typical dynamic recurrent neural network [34]. It realizes the memory function by adding a connected layer as a one-step delay operator in the hidden layer of the basic structure of the BP network. In this way, the system can adapt to the time-varying characteristics and enhance the global stability of the network. Furthermore, it has stronger computing power than a feed-forward neural network, so it is very suitable for time series forecasting [21].

The structure of the Elman neural network is mainly composed of the input layer, hidden layer, connection layer, and output layer, as shown in Fig. 2. The unit of the input layer only plays the role of signal transmission, while the unit of the output layer plays the role of weighting.

Fig. 2

The structure of the Elman neural network.

The hidden layer unit has two types of excitation functions, linear and nonlinear, and the commonly used excitation function is the nonlinear sigmoid function. The connection layer has a delay unit, which can be used to memorize the output value of the last time of the hidden layer unit. The next time, the output value and the input of the network are used as the input of the hidden layer so that the network has the dynamic memory function. The addition of an internal feedback network improves the network’s ability to process dynamic information to achieve dynamic modeling.

2.5 LSTM method

LSTM consists of an input layer, a cyclic hidden layer, and an output layer. Its hidden layer is a memory unit, not a neuron node [35]. The memory cell c_t is the key to LSTM, on which information is transmitted. There are three gates in the memory unit: the input gate, output gate, and forget gate, which are used to protect and control the cell state. A memory unit of LSTM is shown in Fig. 3. The input gate i_t determines how many of the inputs x_t of the network are stored in memory cell c_t at the current time. The forget gate f_t determines how many of the previous moment memory cells c_t-1 are retained in the current moment c_t. The output gate o_t controls how many memory cells c_t are exported to the current output value h_t of the LSTM [36].

Fig. 3

The structure of the LSTM.

The model’s input is defined as x = (x₁, x₂, . . . , x_T), the output is defined as y = (y₁, y₂, . . . , y_T), and the vector sequence of the hidden layer is h = (h₁, h₂, . . . , h_T), where T is the prediction period. The working principle of the storage unit is as follows: $i_{t} = σ (W_{xi} x_{t} + W_{hi} h_{t - 1} + W_{ci} c_{t - 1} + b_{i})$ (15) $f_{t} = σ (W_{xf} x_{t} + W_{hf} h_{t - 1} + W_{cf} c_{t - 1} + b_{f})$ (16) $c_{t} = f_{t} * c_{t - 1} + i_{t} g (W_{xc} x_{t} + W_{hc} h_{t - 1} + b_{c})$ (17) $o_{t} = σ (W_{xo} x_{t} + W_{ho} h_{t - 1} + W_{co} c_{t} + b_{o})$ (18) $h_{t} = o_{t} * h (c_{t})$ (19) where c_t is the state of the memory cells at time t; h_t is the output of the LSTM unit at time t; W and b represent the corresponding weight matrix and bias vector, respectively; * represents the dot product of two vectors; σ is the standard logical sigmoid function; g (x) and h (x) are scoped logical functions, with a range of values of [–2, 2], [–1, 1], respectively.

2.6 CNN-LSTM method

As one of the dominant models for deep learning, CNN is a feed-forward neural network, including convolution calculation and having a deep structure, which can extract the inherent abstract features and hidden high-level invariant structures from the data. Therefore, CNN is suitable for processing the wind speed series with non-linear and non-stationary characteristics [37]. The typical network structure for CNN includes many different layers, as shown in Fig. 4. It is mainly composed of an input layer, convolutional layers, pooling layers (also called sub-sampling layers), and a fully connected layer.

Fig. 4

The structure of the CNN.

The hybrid CNN-LSTM network combines CNN with LSTM and shares the advantages of the two networks. The CNN can extract the deep features of data and show great potential in processing time series; the LSTM can memorize historical information and allow the persistence of information, making it suitable for processing problems associated with time series data [18]. Thus, the CNN-LSTM is used to forecast the high-frequency sub-layers of wind speed series (Fig. 5). At first, high- dimensional features of high-frequency components of wind speeds are extracted using the CNN; after that, the time series is forecasted using the LSTM based on the extracted features. The structures of CNN and LSTM are the same as those previously, respectively.

Fig. 5

The structure of the CNN-LSTM.

2.7 The proposed method for wind speed forecasting

Taking data 1 as an example, the framework of the proposed model- E-S-CLE-S-E is illustrated in Fig. 6. The main steps of this hybrid model can be summarized as follows.

Fig. 6

The framework of the proposed hybrid model (taking Data 1 as a case).

The original wind speed series is decomposed into multiple IMFs and a residual component (R) using EEMD.

Each component after EEMD decomposition is calculated based on SE. Then, the series with similar SE are grouped into three groups: high-frequency components, intermediate-frequency components, and low-frequency components.

High-frequency components denoised by SSA, intermediate-frequency components, and low-frequency components are forecasted by CNN-LSTM, LSTM, and Elman, respectively. Moreover, the forecasting results of three groups were combined as the initial forecasting result.

The error sequence after denoised by SSA is forecasted by CNN-LSTM.

The final wind speed forecasting result is determined by combining the error forecasting result with the initial forecasting result.

3 Data pre-processing

3.1 Data source

Liaoning Province, located in north-eastern China, is rich in wind resources. The data (wind speed) used in the study is the SCADA data in a specific wind farm in Liaoning Province from January 2018 to December 2018, with a time resolution of 15 min. The missing data was supplemented according to the following two cases: (a) if the missing data is relatively small, the missing data can be determined by interpolating the average value of the data before and after the adjacent moments; (b) if the missing data is relatively large, the similar day principle can be used to supplement the missing data. Besides, if the data at a moment is unreasonable when such as zero value or negative value, then the value can be replaced by the data at the last moment.

Ten days of wind speed data were selected to build three datasets from January, April, and July 2018. Figure 7 shows the three datasets of the wind speed series. A total of 960 samples are contained in each dataset, in which the samples 1 to 840, and 841 to 960, are separately used as the training set and test set. Table 1 shows the statistical information about the three datasets of wind speed series data.

Fig. 7

Three sets of wind speed time series at 15 min intervals.

Table 1

Statistical information of wind speed series data

Data	Min	Max	Mean	Standard Derivation
Data1	0.2	17.73	7.93	3.36
Data2	0.24	23.79	8.49	4.11
Data3	0.06	15.59	5.16	2.95

3.2 Decomposition of a wind speed series based on the EEMD method

The wind speed in Data 1 is decomposed based on the EEMD method to attain eight IMF components (IMF 1 to IMF 8) and a residual component R, as shown in Fig. 8. It can be seen from Fig. 8 that each component series has different frequencies. In this way, the decomposed components can highlight the local features of the original wind speed series so that the periodic term, stochastic term, and trend term of the original series can be observed more clearly.

Fig. 8

Decomposition results of wind speed series based on EEMD method.

3.3 Grouping of decomposed components based on SE method

According to the SE method, the decomposed component series based on the EEMD method is divided into high-frequency, intermediate-frequency, and low-frequency components to improve the forecasting efficiency and accuracy. It shows the SEs of each component in Fig. 9.

Fig. 9

SE of each component.

As shown in the Fig. 9, the SEs of IMF1 and IMF2 are similarly significant, so IMF1 and IMF2 belong to the high-frequency components group; IMF3, IMF4, and IMF5 exhibit approximately equal SEs and they are subordinated to the intermediate-frequency components group; IMF6, IMF7, IMF8, and R show approximately similar SEs and formed the low-frequency components group. Similar SEs show that these component series show a similar complexity and trend, and thus they can be grouped and forecasted using the same model. Therefore, the nine components decomposed through the EEMD are divided into three groups: Sub1(IMF1, IMF2), Sub2(IMF3, IMF4, IMF5), and Sub3(IMF6, IMF7, IMF8, R), respectively. By calculating the SE values of the nine components and combining them into three groups, the computational effort of the prediction model can be reduced, and the forecasting performance can be improved.

4 Case study

4.1 Evaluation indices of forecasting performance

To verify the forecasting accuracy and stability of the proposed model E-S-CLE-S-E, six evaluation indices (i.e., mean absolute error (MAE), mean absolute percentage error (MAPE), root-mean-square error (RMSE), promoting percentages of the MAE (P_MAE), promoting percentages of the MAPE (P_MAPE), and promoting percentages of the RMSE (P_RMSE)) are used to evaluate the forecasting performance. The six evaluation indices are separately defined as follows: $MAE = \frac{1}{n} \sum_{t = 1}^{n} | x (t) - \hat{x} (t) |$ (20) $MAPE = \frac{1}{n} \sum_{t = 1}^{n} | \frac{x (t) - \hat{x} (t)}{x (t)} | \times 100$ (21) $RMSE = {[\frac{1}{n} \sum_{t = 1}^{n} x (t) - \hat{x} (t)^{2}]}^{\frac{1}{2}}$ (22) $P_{MAE} = | ({MAE}_{1} - {MAE}_{2}) / {MAE}_{1} |$ (23) $P_{MPAE} = | ({MAPE}_{1} - {MAPE}_{2}) / {MAPE}_{1} |$ (24) $P_{RMSE} = | ({RMSE}_{1} - {RMSE}_{2}) / {RMSE}_{1} |$ (25) where, x (t) and $\hat{x} (t)$ refer to the actual value and forecast value of a wind speed series.

4.2 Forecasting

This section introduces the specific process of proposing the hybrid model E-S-CLE-S-E. At first, ten models (BP, GRNN, Elman, LSTM, CNN-LSTM, EEMD-BP, EEMD-GRNN, EEMD-Elman, EEMD-LSTM, and EEMD-CNN-LSTM) were used to perform one-step, two-step, and three-step forecasting on wind speed series in Data 1 for selecting the optimal single forecasting model from BP, GRNN, Elman, LSTM and CNN-LSTM. In these EEMD-related methods, EEMD-BP means that high-frequency components sub1, intermediate-frequency components sub2, and low-frequency components sub3 were separately forecasted using BP at first; after that, the three forecasting results were added to attain the initial forecasting result. The other EEMD-related methods are similar to EEMD-BP. For all models, partial autocorrelation function (PACF) analysis is applied to determine the most relevant variables. The PACF values of the lagged wind speed data that exceed the 95% confidence level are chosen as the input vector for forecasting. Finally, through PACF analysis, it is determined that the model’s input is the wind speed of the first five moments of the predicted wind speed.

In this paper, LSTM and CNN-LSTM are built-in Python version 3.6 with the help of the Keras deep learning package. The constructed CNN consists of two convolution layers and one pool layer, and the convolution kernels of the two type layers are 64 and 128, respectively. According to the convolution kernel size, the CNN can extract many useful features from the input wind speed series; that is, the CNN can extract the features reflecting the narrower or broader period of multivariate time series by setting the convolution kernel size smaller or larger. The constructed LSTM uses the Relu function as the activation function of the hidden layer. The random deactivation method is used in each layer of the LSTM to prevent the over-fitting of the proposed model. In the process of training, the Adam is selected as the optimization algorithm, and the error loss function used for network training corresponds to mean squared error (MSE). Furthermore, each method is run ten times and uses the averaging values to avoid random factors.

Table 2 lists the forecasting results of the ten models: the two deep learning models, i.e., LSTM and CNN-LSTM, all deliver better forecasting performance than the other superficial learning models (BP, GRNN, Elman) for the wind speed series with and without decomposition with EEMD; after decomposing the wind speed series using EEMD, the forecasting accuracies of the other several models are all enhanced except GRNN model. During the experiment, it can be found that the forecasting performance of EEMD–GRNN is inferior to that of GRNN. The reason is that EEMD–GRNN divides the wind speed series into three different frequency components, sub1, sub2 and sub3. This method produces large error in forecasting the high-frequency components sub1. The noise in the high-frequency components affects the forecasting results. SSA can be applied to denoise the high-frequency components. Figure 10 displays the comparison of the sub1 component in Data1 without and with being de-noised using the SSA method. As seen in Fig. 10, SSA can extract the trend components from the fluctuating data to realize a favorable de-noising effect.

Table 2
Analysis of the prediction results for the experimental Data1

Model 1-step 2-step 3-step

MAE (m/s) MAPE (%) RMSE MAE (m/s) MAPE (%) RMSE MAE (m/s) MAPE (%) RMSE

BP 0.95 7.55 1.21 1.05 8.25 1.35 1.15 9.12 1.47

GRNN 1.16 8.67 1.46 1.19 9.07 1.52 1.39 10.62 1.76

Elman 0.88 6.86 1.14 0.99 7.57 1.24 1.09 8.32 1.38

LSTM 0.84 6.65 1.12 0.93 7.30 1.19 1.01 7.61 1.30

CNN-LSTM 0.81 6.39 1.09 0.94 7.26 1.20 0.98 7.55 1.22

E-BP 0.78 6.03 0.97 0.80 6.07 0.98 0.80 6.24 0.94

E-GRNN 1.41 10.64 1.59 1.68 12.56 1.89 1.73 12.87 1.96

E-Elman 0.71 5.55 0.89 0.75 5.72 0.92 0.76 5.95 0.97

E-LSTM 0.67 5.31 0.85 0.70 5.45 0.86 0.72 5.69 0.89

E-CNN-LSTM 0.67 5.24 0.84 0.69 5.43 0.85 0.72 5.62 0.91

Model	1-step	2-step	3-step
BP	0.95	7.55	1.21	1.05	8.25	1.35	1.15	9.12	1.47
GRNN	1.16	8.67	1.46	1.19	9.07	1.52	1.39	10.62	1.76
Elman	0.88	6.86	1.14	0.99	7.57	1.24	1.09	8.32	1.38
LSTM	0.84	6.65	1.12	0.93	7.30	1.19	1.01	7.61	1.30
CNN-LSTM	0.81	6.39	1.09	0.94	7.26	1.20	0.98	7.55	1.22
E-BP	0.78	6.03	0.97	0.80	6.07	0.98	0.80	6.24	0.94
E-GRNN	1.41	10.64	1.59	1.68	12.56	1.89	1.73	12.87	1.96
E-Elman	0.71	5.55	0.89	0.75	5.72	0.92	0.76	5.95	0.97
E-LSTM	0.67	5.31	0.85	0.70	5.45	0.86	0.72	5.69	0.89
E-CNN-LSTM	0.67	5.24	0.84	0.69	5.43	0.85	0.72	5.62	0.91

Fig. 10

The comparison of the sub1 component without and with being de-noised using SSA.

Besides, it can be found that different forecasting models are applicable for forecasting different frequency components. For example, the low-frequency components have strong regularity, slight fluctuation and are easy to be forecasted. Therefore, it is unnecessary to select a complex depth model for forecasting, and Elman can meet the demand. LSTM and CNN-LSTM models exhibit the best forecast performance when separately forecasting intermediate-frequency and high-frequency components. Based on the aforementioned two points, E-CLE (after being decomposed by EEMD, CNN-LSTM, LSTM, and Elman are used to forecast sub1, sub2, and sub3, respectively) and E-S-CLE (the sub1 is forecasted using CNN-LSTM after noise reduction by SSA; sub2 and sub3 are forecasted by separately using LSTM and Elman) models were further proposed to forecast wind speed series in Data1. The forecasting results are listed in Table 3. As shown in Tables 2 and 3, the forecasting accuracy of E-CLE and E-S-CLE models is improved relative to the models listed in Table 2. Moreover, the forecast performance of E-S-CLE is superior to that of E-CLE.

Table 3

Analysis of the prediction results for the experimental Data1

Model	1-step			2-step			3-step
	MAE (m/s)	MAPE (%)	RMSE	MAE (m/s)	MAPE (%)	RMSE	MAE (m/s)	MAPE (%)	RMSE
E-CLE	0.61	4.78	0.78	0.62	4.87	0.77	0.70	5.46	0.88
E-S-CLE	0.42	3.30	0.54	0.54	4.17	0.68	0.55	4.31	0.71

1-step wind speed forecasting reveals that MAE values of E-S-CLE are 0.29, 0.25, 0.25, and 0.19 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the MAPEs of E-S-CLE are 2.25, 2.01, 1.94, and 1.48 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the RMSE values of E-S-CLE are 0.35, 0.31, 0.30, and 0.24 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively.

2-step wind speed forecasting reveals that the MAE values of E-S-CLE are 0.21, 0.16, 0.15, and 0.08 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the MAPEs of E-S-CLE are 1.55, 1.28, 1.26, and 0.70 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the RMSE values of E-S-CLE are 0.24, 0.18, 0.17, and 0.09 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively.

3-step wind speed forecasting reveals that the MAE values of E-S-CLE are 0.21, 0.17, 0.17, and 0.15 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the MAPEs of E-S-CLE are 1.64, 1.38, 1.31, and 1.15 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the RMSE values of E-S-CLE are 0.26, 0.18, 0.20, and 0.17 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively.

Although the above research has established a suitable forecasting model for wind speed series. However, the related error sequence still contains some valuable information, which can be used to further improve the forecasting performance of forecasting models [38]. The error sequence {ε (t) : t = 1, 2, ⋯ n } means the difference between the actual value and the forecast value of wind speeds, that is, $ε (t) = x (t) - \overset{⌢}{x} (t)$ (26)

To correct the initial forecasting value, an EC process was introduced. Generally, the error sequence ε (t) is regarded as stochastic and the error {^⌢ε (t) : t = 2, 3, ⋯ n + 1 } is forecasted by employing {ε (t) : t = 1, 2, ⋯ n }, that is, using the error at a given time to forecast the error at the next time-step. Because of the non-stationary, high-frequency, chaotic characteristics of the error sequence, it is challenging to forecast errors. Hence, it is beneficial to reduce the noise of the error sequence using SSA before forecasting.

E-S-CLE-E and E-S-CLE-S-EC models were further proposed base on EC. The difference between the two models is whether error sequence is processed by SSA or not. Table 4 lists the forecasting results of the two models for Data1. As shown in Tables 3 and 4, during 1-step forecasting, the MAE values of E-S-CLE-S-E are 0.38, 0.19, and 0.17 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the MAPEs of E-S-CLE-S-E are 3.05, 1.57, and 1.44 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the RMSE values of E-S-CLE-S-E are 0.48, 0.24, and 0.20 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively.

Table 4

Analysis of the forecasting results for the experimental Data1

Model	1-step			2-step			3-step
	MAE (m/s)	MAPE (%)	RMSE	MAE (m/s)	MAPE (%)	RMSE	MAE (m/s)	MAPE (%)	RMSE
E-S-CLE-E	0.40	3.17	0.50	0.68	5.26	0.82	0.75	5.80	0.96
E-S-CLE-S-E	0.23	1.73	0.30	0.42	3.26	0.57	0.51	3.91	0.66

During 2-step forecasting, the MAE values of E-S-CLE-S-E are 0.20, 0.12, and 0.26 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the MAPEs of E-S-CLE-S-E are 1.61, 0.91, and 2.00 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the RMSE values of E-S-CLE-S-E are 0.20, 0.11, and 0.25 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively.

During 3-step forecasting, the MAE values of E-S-CLE-S-E are 0.19, 0.04, and 0.24 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the MAPEs of E-S-CLE-S-E are 1.55, 0.40, and 1.89 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the RMSE values of E-S-CLE-S-E are 0.22, 0.05, and 0.30 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively.

To further illustrate the forecasting performance of different models, Figs. 11 to 13 separately display the comparison between the actual value and the forecast value of wind speeds determined using various models in Data 1 during 1, 2, and 3-step forecasting periods, respectively. As shown in Figs. 11 to 13, the forecasting value determined using the E-S-CLE-S-E and has the lowest MAE, MAPE, and RMSE. The result reveals that the model delivers a better forecasting performance relative to other models. On the other hand, it can be seen from Figs. 12 and 13 that the forecasting performance of E-S-CLE-E is inferior to that of E-S-CLE during 2 and 3-step forecasting.

Fig. 11

The comparison between the actual value of wind speeds and the forecast values in 1-step forecasting.

Fig. 12

The comparison between the actual value of wind speeds and the forecast values in 2-step forecasting.

Fig. 13

The comparison between the actual value of wind speeds and the forecast values in 3-step forecasting.

It indicates that the more the forecasting steps are, the more unstable the error sequence is, and the more complex the forecasting is. Additionally, it also validates the necessity of performing noise reduction on error sequences using SSA.

4.3 Further study

To verify the universality and effectiveness of the proposed model, the wind speeds in Data 2 and Data 3 are also subjected to 1-step, 2-step, and 3-step forecasting using the aforementioned 14 forecasting models. Tables 5 and 6 show the error results of wind speeds in the two datasets through multi-step forecasting with different forecasting models.

Table 5
The error evaluation results of different models for Data 2

Model 1-step 2-step 3-step

MAE (m/s) MAPE (%) RMSE MAE (m/s) MAPE (%) RMSE MAE (m/s) MAPE (%) RMSE

BP 0.86 17.16 1.10 0.93 18.30 1.18 0.97 19.86 1.24

GRNN 0.91 17.64 1.15 0.99 19.44 1.25 1.02 20.49 1.25

Elman 0.80 16.54 1.00 0.88 17.01 1.10 0.89 18.21 1.15

LSTM 0.77 14.88 0.97 0.84 16.59 1.09 0.87 17.65 1.11

CNN-LSTM 0.76 14.71 0.96 0.79 15.97 1.05 0.86 17.16 1.10

E-BP 0.65 12.39 0.85 0.69 13.33 0.87 0.78 14.64 0.96

E-GRNN 0.65 12.57 0.81 0.66 13.53 0.88 0.83 15.96 1.03

E-Elman 0.65 11.87 0.79 0.66 12.14 0.85 0.72 13.66 0.92

E-LSTM 0.63 11.59 0.78 0.65 11.83 0.84 0.67 12.62 0.84

E-CNN-LSTM 0.62 11.61 0.77 0.65 11.89 0.80 0.65 12.35 0.81

E-CLE 0.59 11.21 0.75 0.65 11.72 0.79 0.64 12.15 0.77

E-S-CLE 0.45 8.29 0.57 0.53 9.42 0.66 0.55 9.65 0.70

E-S-CLE-E 0.41 7.85 0.53 0.76 13.37 1.01 0.71 13.20 0.96

E-S-CLE-S-E 0.31 5.76 0.40 0.37 6.77 0.49 0.43 7.96 0.56

Model	1-step	2-step	3-step
BP	0.86	17.16	1.10	0.93	18.30	1.18	0.97	19.86	1.24
GRNN	0.91	17.64	1.15	0.99	19.44	1.25	1.02	20.49	1.25
Elman	0.80	16.54	1.00	0.88	17.01	1.10	0.89	18.21	1.15
LSTM	0.77	14.88	0.97	0.84	16.59	1.09	0.87	17.65	1.11
CNN-LSTM	0.76	14.71	0.96	0.79	15.97	1.05	0.86	17.16	1.10
E-BP	0.65	12.39	0.85	0.69	13.33	0.87	0.78	14.64	0.96
E-GRNN	0.65	12.57	0.81	0.66	13.53	0.88	0.83	15.96	1.03
E-Elman	0.65	11.87	0.79	0.66	12.14	0.85	0.72	13.66	0.92
E-LSTM	0.63	11.59	0.78	0.65	11.83	0.84	0.67	12.62	0.84
E-CNN-LSTM	0.62	11.61	0.77	0.65	11.89	0.80	0.65	12.35	0.81
E-CLE	0.59	11.21	0.75	0.65	11.72	0.79	0.64	12.15	0.77
E-S-CLE	0.45	8.29	0.57	0.53	9.42	0.66	0.55	9.65	0.70
E-S-CLE-E	0.41	7.85	0.53	0.76	13.37	1.01	0.71	13.20	0.96
E-S-CLE-S-E	0.31	5.76	0.40	0.37	6.77	0.49	0.43	7.96	0.56

Table 6

The error evaluation results of different models for Data 3

Model	1-step			2-step			3-step
	MAE (m/s)	MAPE (%)	RMSE	MAE (m/s)	MAPE (%)	RMSE	MAE (m/s)	MAPE (%)	RMSE
BP	0.67	17.90	0.87	0.79	19.58	1.02	0.93	20.84	1.23
GRNN	0.80	19.45	1.04	0.89	22.20	1.15	1.09	24.36	1.48
Elman	0.62	16.41	0.78	0.73	18.57	0.93	0.89	19.69	1.07
LSTM	0.60	15.74	0.77	0.71	18.06	0.91	0.82	19.43	1.03
CNN-LSTM	0.59	15.75	0.75	0.70	17.29	0.91	0.81	18.66	1.04
E-BP	0.53	14.09	0.66	0.56	14.99	0.69	0.67	15.24	0.84
E-GRNN	0.68	16.42	0.86	0.89	19.53	1.11	0.74	20.48	0.88
E-Elman	0.44	11.69	0.59	0.52	14.15	0.64	0.62	14.40	0.77
E-LSTM	0.42	11.21	0.56	0.53	12.91	0.69	0.54	13.49	0.67
E-CNN-LSTM	0.46	11.52	0.59	0.47	12.37	0.62	0.55	13.08	0.70
E-CLE	0.42	11.15	0.55	0.45	11.35	0.59	0.52	12.43	0.69
E-S-CLE	0.30	8.09	0.39	0.41	10.99	0.53	0.40	11.44	0.52
E-S-CLE-E	0.29	7.86	0.37	0.57	14.48	0.74	0.52	13.15	0.69
E-S-CLE-S-E	0.24	5.41	0.33	0.32	8.42	0.40	0.36	8.73	0.49

According to Tables 5 and 6, it can be found that:

The models exhibit the same ranking results in forecasting accuracy when forecasting different datasets.

The error increases with the increased number of steps; that is, the greater the number of forecasting steps used, the greater the error.

A deep learning model generally shows a better forecasting performance than a traditional non-linear model when separately used to forecast wind speeds.

The forecasting results based on Elman, LSTM and, CNN-LSTM show that the forecasting performance of wind speed series decomposed by EEMD is better than that of a single model.

The forecasting error of E-S-CLE is lower than that of E-CLE. This result reveals that the forecasting accuracy can be improved by conducting noise reduction on high-frequency components using SSA.

During 2 and 3-step forecasting, the forecasting error of E-S-CLE-E is larger than that of E-S-CLE. However, the forecasting error of E-S-CLE is larger than that of E-S-CLE-S-E. Therefore, it implies the necessity of performing noise reduction on an error sequence using SSA.

The proposed model E-S-CLE-S-E exhibits the best forecasting performance, and the forecasting value attained using the model is closest to the actual value. Moreover, the MAE, MAPE, and RMSE of the model are all the lowest.

To further compare forecasting performances of the proposed model with those of the other 13 models, a comparative analysis was conducted on these models based on P_MAE, P_MAPE, and P_RMSE. Tables 7 and 8 separately show the improvement percentages of the E-S-CLE-S-E model compared with the other 13 models for Data 2 and Data 3. It can be seen that:

The E-S-CLE-S-E model shows good generalization and exhibits high forecasting accuracy and stability in the three datasets.

The forecasting performance of the E-S-CLE-S-E model is significantly superior to those of traditional non-linear models. For example, for Data 2, compared to the BP model, the MAE of the E-S-CLE-S-E model decreases by 63.95%, 60.22%, and 55.67%, respectively; the MAPE of the E-S-CLE-S-E model decreases by 66.43%, 63.01%, and 59.92%, respectively; and the RMSE of the E-S-CLE-S-E model decreases by 63.64%, 58.47%, and 54.84%, respectively.

The forecasting performance of the E-S-CLE-S-E model is remarkably better than that of a single deep learning model. For example, in Data 2, compared to the LSTM model, the MAE of the E-S-CLE-S-E model decreases by 59.74%, 55.95%, and 50.57%, respectively; the MAPE of the E-S-CLE-S-E model decreases by 61.29%, 59.19%, and 54.90%, respectively; and the RMSE of the E-S-CLE-S-E model decreases by 58.76%, 55.05%, and 49.55%, respectively.

The E-S-CLE-S-E model is superior to other models treated by EEMD alone; for example, in Data 2, compared to the E-CLE model, the MAE of the E-S-CLE-S-E model decreases by 47.46%, 43.08%, and 32.81%, respectively; the MAPE of the E-S-CLE-S-E model decreases by 48.62%, 42.24%, and 34.49%, respectively; and the RMSE of the E-S-CLE-S-E model decreases by 46.67%, 37.97%, and 27.27%, respectively.

The E-S-CLE-S-E model returns significantly higher forecasting performance than that of the E-S-CLE model. For example, in Data 2, compared to the E-S-CLE model, the MAE of the E-S-CLE-S-E model decreases by 31.11%, 30.19%, and 21.82%, respectively; the MAPE of the E-S-CLE-S-E model decreases by 30.52%, 28.13%, and 17.51%, respectively; and the RMSE of the E-S-CLE-S-E model decreases by 29.82%, 25.76%, and 20.00%, respectively.

The forecasting performance of the E-S-CLE-S-E model is much better than that of the E-S-CLE-E model. For example, in Data 2, compared to the E-S-CLE-E model, the MAE of the E-S-CLE-S-E model decreases by 24.39%, 51.32%, and 39.44%, respectively; the MAPE of the E-S-CLE-S-E model decreases by 26.62%, 49.36%, and 39.70%, respectively; and the RMSE of the E-S-CLE-S-E model decreases by 24.53%, 51.49%, and 41.67%, respectively.

Table 7

Improvement when using the E-S-CLE-S-E model compared with other models for Data 2

Comparison Model	P_MAE(%)			P_MAPE(%)			P_RMSE(%)
	1-step	2-step	3-step	1-step	2-step	3-step	1-step	2-step	3-step
BP	63.95	60.22	55.67	66.43	63.01	59.92	63.64	58.47	54.84
GRNN	65.94	62.63	57.84	67.35	65.17	61.15	65.22	60.80	55.20
Elman	61.25	57.95	51.69	65.18	60.20	56.29	60.00	55.45	51.30
LSTM	59.74	55.95	50.57	61.29	59.19	54.90	58.76	55.05	49.55
CNN-LSTM	59.21	53.16	50.00	60.84	57.61	53.61	58.33	53.33	49.09
E-BP	52.31	46.38	44.87	53.51	49.21	45.63	52.94	43.68	41.67
E-GRNN	52.31	43.94	48.19	54.18	49.96	50.13	50.62	44.32	45.63
E-Elman	52.31	43.94	40.28	51.47	44.23	41.73	49.37	42.35	39.13
E-LSTM	50.79	43.08	35.82	50.30	42.77	36.93	48.72	41.67	33.33
E-CNN-LSTM	50.00	43.08	33.85	50.39	43.06	35.55	48.05	38.75	30.86
E-CLE	47.46	43.08	32.81	48.62	42.24	34.49	46.67	37.97	27.27
E-S-CLE	31.11	30.19	21.82	30.52	28.13	17.51	29.82	25.76	20.00
E-S-CLE-E	24.39	51.32	39.44	26.62	49.36	39.70	24.53	51.49	41.67

Table 8

Improvement when using the E-S-CLE-S-E model compared with other models for Data 3

Model	P_MAE(%)			P_MAPE(%)			P_RMSE(%)
	1-step	2-step	3-step	1-step	2-step	3-step	1-step	2-step	3-step
BP	64.18	59.49	61.29	69.78	57.00	58.11	62.07	60.78	60.16
GRNN	70.00	64.04	66.97	72.19	62.07	64.16	68.27	65.22	66.89
Elman	61.29	56.16	59.55	67.03	54.66	55.66	57.69	56.99	54.21
LSTM	60.00	54.93	56.10	65.63	53.38	55.07	57.14	56.04	52.43
CNN-LSTM	59.32	54.29	55.56	65.65	51.30	53.22	56.00	56.04	52.88
E-BP	54.72	42.86	46.27	61.60	43.83	42.72	50.00	42.03	41.67
E-GRNN	64.71	64.04	51.35	67.05	56.89	57.37	61.63	63.96	44.32
E-Elman	45.45	38.46	41.94	53.72	40.49	39.38	44.07	37.50	36.36
E-LSTM	42.86	39.62	33.33	51.74	34.78	35.29	41.07	42.03	26.87
E-CNN-LSTM	47.83	31.91	34.55	53.04	31.93	33.26	44.07	35.48	30.00
E-CLE	42.86	28.89	30.77	51.48	25.81	29.77	40.00	32.20	28.99
E-S-CLE	20.00	21.95	10.00	33.13	23.38	23.69	15.38	24.53	5.77
E-S-CLE-E	17.24	43.86	30.77	31.17	41.85	33.61	10.81	45.95	28.99

5 Test of forecasting models

Although MAE, MAPE, and RMSE are often used to evaluate the forecasting performance of models, statistical tests are still needed, otherwise the conclusions are subjective and accidental. Wind speed forecasting models are usually trained based on historical statistical data, and the basis for evaluating the advantages and disadvantages of forecasting models can be found through probability and statistics theory. [39, 40]. Hence, in addition to the traditional error evaluation methods, Diebold Mariano (DM) statistical test [40] and a novel metric named variance ratio (VR) [41] are used to verify the accuracy and stability of the hybrid model proposed in this study.

5.1 Statistical significance

DM statistical test is a classical statistical method to test the advantages and disadvantages of the proposed model and the comparison model. In this section, the DM statistical test is used to validate the performance of the proposed hybrid model. Table 9 shows the DM statistical test results of eight models. It can be seen form Table 9 that the average DM value of the E-S-CLE and E-S-CLE-E models are larger than the upper limits at the 10% and 5% significance level, respectively, and the DM values of other models are larger than the upper limits at the 1% significance level. Since the absolute value of DM is larger than that of the critical value of standard normal distribution, the original hypothesis is rejected, and the difference between the proposed model and the comparison model is significant. Therefore, it can be concluded that the proposed hybrid model is significantly better than the comparison forecasting models.

Table 9
Results of DM statistical test

Model Data1 Data2 Data3 Average

E-BP 4.53^a 3.45^a 3.61^a 3.86^a

E-GRNN 7.81^a 3.94^a 4.45^a 5.40^a

E-Elman 4.37^a 3.90^a 2.77^a 3.68^a

E-LSTM 4.11^a 3.82^a 2.25^a 3.39^a

E-CNN-LSTM 3.95^a 3.84^a 2.93^a 3.57^a

E-CLE 3.81^a 3.45^a 2.86^a 3.37^a

E-S-CLE 2.04^b 1.75^c 1.24^f 1.68^c

E-S-CLE-E 2.64^a 2.21^b 1.35^e 2.07^b

Model	Data1	Data2	Data3	Average
E-BP	4.53^a	3.45^a	3.61^a	3.86^a
E-GRNN	7.81^a	3.94^a	4.45^a	5.40^a
E-Elman	4.37^a	3.90^a	2.77^a	3.68^a
E-LSTM	4.11^a	3.82^a	2.25^a	3.39^a
E-CNN-LSTM	3.95^a	3.84^a	2.93^a	3.57^a
E-CLE	3.81^a	3.45^a	2.86^a	3.37^a
E-S-CLE	2.04^b	1.75^c	1.24^f	1.68^c
E-S-CLE-E	2.64^a	2.21^b	1.35^e	2.07^b

^ais the 1% significance level Z0.01/2 = 2.58. ^bis the 5% significance level Z0.05/2 = 1.96. ^cis the 10% significance level Z0.10/2 = 1.64. ^dis the 15% significance level Z0.15/2 = 1.44. ^eis the 20% significance level Z0.20/2 = 1.28. ^fis the 25% significance level Z0.25/2 = 1.15.

5.2 Stability testing

Generally, the performance variance can be used to evaluate the stability of prediction model. However, it is unscientific to evaluate the stability of the forecasting model only with performance variance. VR combines the forecasting and actual values, which can better test the stability of the forecasting model [41]. Table 10 shows the VR values of nine models, from which it can be seen that compared with other models, the proposed hybrid model reaches the maximum value in four cases. Thus, it proves that the proposed hybrid model is more stable than other models.

Table 10
Results of VR stability test

Model Data1 Data2 Data3 Average

E-BP 0.8122 0.7926 0.8134 0.8060

E-GRNN 0.7489 0.8233 0.8096 0.7939

E-Elman 0.8193 0.8627 0.8457 0.8426

E-LSTM 0.8337 0.8865 0.8728 0.8643

E-CNN-LSTM 0.8490 0.9034 0.8926 0.8817

E-CLE 0.8705 0.8976 0.8837 0.8839

E-S-CLE 0.9246 0.9346 0.9177 0.9256

E-S-CLE-E 0.9294 0.9025 0.8946 0.9088

E-S-CLE-S-E 0.9390 0.9726 0.9634 0.9583

Model	Data1	Data2	Data3	Average
E-BP	0.8122	0.7926	0.8134	0.8060
E-GRNN	0.7489	0.8233	0.8096	0.7939
E-Elman	0.8193	0.8627	0.8457	0.8426
E-LSTM	0.8337	0.8865	0.8728	0.8643
E-CNN-LSTM	0.8490	0.9034	0.8926	0.8817
E-CLE	0.8705	0.8976	0.8837	0.8839
E-S-CLE	0.9246	0.9346	0.9177	0.9256
E-S-CLE-E	0.9294	0.9025	0.8946	0.9088
E-S-CLE-S-E	0.9390	0.9726	0.9634	0.9583

6 Conclusion

Due to intermittency and non-controllability of wind speed series, wind speed forecasting is very challenging. Hence, a hybrid model E-S-CLE-S-E for multi-step wind speed forecasting based on EEMD, SSA, Elman, LSTM, CNN-LSTM, and EC was proposed to improve the forecasting performance of wind speeds. By comparing the proposed hybrid model with the other 15 models, the following conclusions were drawn:

The forecasting performance of a single forecasting model on wind speed series is unsatisfactory. The forecasting performance of deep learning models (LSTM, CNN-LSTM) on forecasting wind speed series is superior to those of traditional non-linear models (BP, GRNN, Elman).

EEMD and SSA methods can strengthen the accuracy and stability of wind speed forecasting. Combining them with the advantages of each single forecasting model (BP, GRNN, Elman, LSTM, CNN-LSTM) can improve the forecasting performance of wind speeds. Besides, error correction can improve the accuracy of the forecasting model, but the high-frequency and unstable error sequence needs to be denoised.

Compared with the other models, the proposed hybrid model (E-S-CLE-S-E) exhibits the lowest MAE, MAPE, and RMSE, and its forecasting accuracy and stability are much better than that of the other models.

It should be emphasized that due to the different characteristics of wind speed data in different regions, the network parameters of Elman, LSTM and CNN-LSTM may not be applicable to other regions. In other words, the network parameters may need to be adjusted when the proposed method is applied in other regions. Besides, more advanced CNN or LSTM network structure may get better forecasting performance, but the network structure used in this study is the most commonly used, with high forecasting performance.

When the uncertainty level of wind power series increases, the traditional point forecasting results may not be reliable and accurate enough for decision makers to make plans, so accurate wind speed interval forecasting will be the key research direction in the future.

Footnotes

Acknowledgments

This work was supported by the National Nature Science Foundation of China (grant nos. 61533007 and 61873053) and the National Key Research and Development Program of China (grant nos. 2019YFE0105000).

References

Liu

, Wu

H.P.

and Li

, Smart wind speed forecasting using EWT decomposition, GWO evolutionary optimization, RELM learning and IEWT reconstruction[J], Energy Conversion Management 161 (2018), 266–283.

Moradi

, Shahinzadeh

, Khandan

, et al., A profitability investigation into the collaborative operation of wind and underwater compressed air energy storage units in the spot market[J], Energy 141 (2017), 1779–1794.

Moazzami

, Moradi

, Shahinzadeh

, et al., Optimal economic operation of microgrids integrating wind farms and advanced rail energy storage system [J], International Journal of Renewable Energy Research 8(2) (2018), 1155–1164.

Tascikaraoglu

and Uzunoglu

, A review of combined approaches for prediction of short-term wind speed and power[J], Renewable Sustainable Energy Reviews 34(6) (2014), 243–254.

Liu

, Duan

, Han

F.Z.

, et al., Big multi-step wind speed forecasting model based on secondary decomposition, ensemble method and error correction algorithm [J], Energy Conversion Management 156 (2018), 525–541.

X.W.

, Liu

and Li

Y.F.

, Wind speed prediction model using singular spectrum analysis, empirical mode decomposition and convolutional support vector machine[J], Energy Conversion Management 180(1) (2019), 196–205.

Zhang

, Li

Y.T.

and Zhang

G.Y.

, Short-term wind power forecasting approach based on Seq2Seq model using NWP data [J], Energy 213 (2020), 118371.

Lydia

, Kumar

S.S.

, Selvakumar

A.I.

, et al., Linear and non-linear autoregressive models for short-term wind speed forecasting[J], Energy Conversion Management 112 (2016), 115–124.

Memarzadeh

and Keynia

, A new short-term wind speed forecasting method based on fine-tuned LSTMneural network and optimal input sets[J], Energy Conversion Management 213 (2020), 112824.

10.

Yuan

X.H.

, Chen

, Yuan

Y.B.

, et al., Short-term wind power prediction based on LSSVM–GSA model[J], Energy Conversion Management 101 (2015), 393–401.

11.

and Shi

, On comparing three artificial neural networks for wind speed forecasting[J], Applied Energy 87(7) (2010), 2313–2320.

12.

Ren

, An

, Wang

J.Z.

, et al., Optimal parameters selection for BP neural network based on particle swarm optimization: A case study of wind speed forecasting[J], Knowledge-Based Systems 56 (2014), 226–239.

13.

Jiang

and Li

, Research and application of an innovative combined model based on a modified optimization algorithm for wind speed forecasting[J], Measurement 124 (2018), 395–412.

14.

C.Y.

, Wang

J.Z.

, Chen

X.J.

, et al., A novel hybrid system based on multi-objective optimization for wind speed forecasting [J], Renewable Energy 146 (2020), 149–165.

15.

Zhao

, Chen

W.H.

, Wu

X.M.

, et al., LSTM network: a deep learning approach for short-term traffic forecast[J], IET Intelligent Transport Systems 11(2) (2017), 68–75.

16.

Wang

H.Z.

, Wang

G.B.

, Li

G.Q.

, et al., Deep belief network based deterministic and probabilistic wind speed forecasting approach[J], Applied Energy 182 (2016), 80–93.

17.

Wang

H.Z.

, Li

G.Q.

, Wang

G.B.

, et al., Deep learning based ensemble approach for probabilistic wind power forecasting[J], Applied Energy 188 (2017), 56–70.

18.

Moreno

S.R.

, Silva

, Mariani

V.C.

, et al., Multi-step wind speed forecasting based on hybrid multi-stage decomposition model and long short-term memory neural network[J], Energy Conversion Management 213 (2020), 112869.

19.

Cadenas

and Rivera

, Wind speed forecasting in three different regions of Mexico, using a hybrid ARIMA–ANN model[J], Renewable Energy 35(12) (2010), 2732–2738.

20.

Liu

, Mi

X.W.

and Li

Y.F.

, Smart deep learning based wind speed prediction model using wavelet packet decomposition, convolutional neural network and convolutional long short term memory network[J], Energy Conversion Management 166 (2018), 120–131.

21.

Liu

, Mi

X.W.

and Li

Y.F.

, Wind speed forecasting method based on deep learning strategy using empirical wavelet transform, long short term memory neural network and Elman neural network[J], Energy Conversion Management 156 (2018), 498–514.

22.

Zhang

, Chen

, Xiao

J.H.

, et al., Hybrid wind speed forecasting model based on multivariate data secondary decomposition approach and deep learning algorithm with attention mechanism[J], Renewable Energy 174 (2021), 688–704.

23.

Liu

, Mi

X.W.

and Li

Y.F.

, Comparison of two new intelligent wind speed forecasting approaches based on wavelet packet decomposition, complete ensemble empirical mode decomposition with adaptive noise and artificial neural networks[J], Energy Conversion Management 155 (2018), 188–200.

24.

Tascikaraoglu

, Sanandaji

B.M.

, Poolla

, et al., Exploiting sparsity ofinterconnections in spatio-temporal wind speed forecasting using Wavelet Transform[J], Applied Energy 165 (2016), 735–747.

25.

Q.Q.

, Wang

J.Z.

and Lu

H.Y.

, A hybrid system for short-term wind speed forecasting[J], Applied Energy 226 (2018), 756–771.

26.

C.J.

, Li

Y.L.

and Zhang

M.J.

, An improved wavelet transform using singular spectrum analysis for wind speed forecasting based on elman neural network[J], Energy Conversion Management 148 (2017), 895–904.

27.

Izonin

, Tkachenko

, Verhun

, et al., An approach towards missing data management using improved GRNN-SGTM ensemble method[J], an International Journal 156 (2020), 498–514.

28.

[28] Jiang

and Huang.

G.Q.

, Short-term wind speed prediction: Hybrid of ensemble empirical mode decomposition, feature selection and error correction[J], Energy Conversion Management 144 (2017), 340–350.

29.

Z.J.

, Yang

C.H.

, Zhang

, et al., Error correction method based on data transformational GM (1, 1) and application on tax forecasting[J], Applied Soft Computing 37 (2015), 554–560.

30.

Tkachenko

, Izonin

, Dronyuk

, et al., Recover Missing Sensor Data with GRNN-based Cascade Scheme [J], International Journal of Sensors Wireless Communications and Control 11(5) (2021), 531–541.

31.

Sun

, Zhou

J.Z.

, Chen

, et al., An adaptive dynamic short-term wind speed forecasting model using secondary decomposition and an improved regularized extreme learning machine[J], Energy 165 (2018), 939–957.

32.

C.J.

, Li

Y.L.

and Zhang

M.J.

, Comparative study on three new hybrid models using Elman Neural Network and Empirical Mode Decomposition based technologies improved by Singular Spectrum Analysis for hour-ahead wind speed forecasting[J], Energy Conversion Management 147 (2017), 75–85.

33.

Broomhead

D.S.

and King

G.P.

, Extracting qualitative dynamics from experimental data[J], Physica D: Nonlinear Phenomena 20(2) (1986), 217–236.

34.

Wang

J.J.

, Zhang

W.Y.

, Li

Y.N.

, et al., Forecasting wind speed using empirical mode decomposition and Elman neural network[J], Applied Soft Computing 23 (2014), 452–459.

35.

Gers

F.A.

, Schmidhuber

and Cummins

, Learning to Forget: Continual Prediction with LSTM[J], Neural Computation 12(10) (2000), 2451–2471.

36.

Wang

, Xuan

Z.M.

, Zhen

, et al., A day-ahead PV power forecasting method based on LSTM-RNN model and time correlation modification under partial daily pattern prediction framework[J], Energy Conversion Management 212 (2020), 112766.

37.

Livieris

I.E.

, Pintelas

and Pintelas

, A CNN–LSTM model for gold price time series forecasting [J], Neural Computing and Applications 32 (2020), 17351–17360.

38.

Tkachenko

, Izonin

, Kryvinska

, et al., An Approach towards Increasing Prediction Accuracy for the Recovery of Missing IoT Data Based on the GRNN-SGTM Ensemble[J], Sensors 20(9) (2020), 2625.

39.

M.W.

, Wang

Y.T.

, Cheng

, et al., Chaos cloud quantum bat hybrid optimization algorithm [J], Nonlinear Dynamics 103 (2021), 1167–1193.

40.

Diebold

F.X.

and Mariano

R.S.

, Comparing predictive accuracy [J], Journal of Business and Economic Statistics 13 (1995), 253–265.

41.

, Wang

J.Z.

, Yang

W.D.

, et al., A novel hybrid model for short-term wind power forecasting[J], Applied Soft Computing 80 (2019), 93–106.