Abstract
Accurate and stable wind speed forecasting is an essential means to ensure the safe and stable operation of wind power integration. Therefore, a new hybrid model was proposed to improve wind speed forecasting performance, consisting of data pre-processing, model forecasting, and error correction (EC). The specific modeling process is as follows: (a) A wind speed series was decomposed into a series of subseries with different frequencies utilizing the ensemble empirical mode decomposition (EEMD) method. Afterward, various subseries were divided into high-frequency components, intermediate-frequency components, and low-frequency components based on their sample entropies (SE). (b) Three frequency components were forecast by separately employing the hybrid model of convolutional neural network and long short-term memory network (CNN-LSTM), long short-term memory network (LSTM), and Elman neural network. (c) Subsequently, an error sequence was further forecast using CNN-LSTM. (d) Finally, three actual datasets were used to forecast the multi-step wind speed, and the forecasting performance of the proposed model was verified. The test results show that the forecasting performance of the proposed model is better than the other 13 models in three actual datasets.
Keywords
Introduction
With the increasingly severe global energy crisis, the development and utilization of renewable energy are essential to the development of society [1]. As one of the most abundant renewable energy sources, wind energy shows broad application potential. However, as intermittent energy, wind power has the characteristics of randomness and uncontrollability [2]. Therefore, large-scale wind power integration will affect the power system’s stability, adequacy, and economy [3]. Wind speed forecasting is not only helpful for dispatchers to make plans and ensure the quality of power, but also to arrange reasonable reserve capacity and reduce the operating cost of the power system [4]. Therefore, it is regarded as a practical approach to alleviating the adverse effects of wind power integration.
In recent years, scholars have proposed many methods for wind speed forecasting. These methods are mainly divided into four types [5]: physical, statistical, artificial intelligence, and hybrid models. The physical model usually simulates wind speeds utilizing various physical parameters such as meteorological and geographical parameters. The physical model is adequate for medium and long-term wind speed forecasting [6]. However, the physical model is realized by many complex physical equations, which need much calculation in modeling [7]. The statistical models include the autoregressive moving average (ARMA) model and the autoregressive integrated moving average model (ARIMA) [8]. These models have advantages, including simplicity and rapidity; however, they are suitable for linear series forecasting, not for non-linear series forecasting [9]. Artificially intelligent models are non-linear and can explain the non-linear relationship between input and output data. These models include support vector machine (SVM) [10], artificial neural network [11], backpropagation (BP) neural network [12], generalized regression neural network (GRNN) [13], and extreme learning machine [14], which were widely used in wind speed forecasting. These traditional non-linear models can extract superficial features and exhibit strong data adaptation ability, but they need to be further promoted.
With the development of computer networks, a new branch- the deep learning model was proposed based on the machine learning model. Compared with traditional non-linear models, the deep learning model can automatically extract the internal abstract features and hidden invariable data structures from the lowest level to the highest level of data [15]. Due to the advantages of the deep learning model, many scholars continue to study how to apply it to forecast wind speeds. For example, Wang et al. [16] proposed a forecasting method for wind speeds certainty and probability based on a deep belief network. The test result showed that the method shows good forecasting performance. Furthermore, Wang et al. [17] extracted non-linear features of wind powers at different frequencies using a convolutional neural network (CNN) to improve forecasting performance. Moreno et al. [18] used a long short-term memory network (LSTM) to forecast multi-step wind speed and achieved beneficial results. However, although these researchers used the deep learning model to forecast the wind speed, most of them used the same model to forecast the whole components of the wind speed without considering the different characteristics of different components of the wind speed.
Given the shortcomings of the single forecasting model, scholars have proposed some hybrid forecast models that integrate different models’ forecasting ability to give full play to the complementary advantages of various models to achieve better forecasting performance [19]. For example, Liu et al. [20] used CNN and CNN-LSTM to forecast the wind speed with different frequency sublayers. They compared it with other reference models and verified the superiority of the hybrid model. Liu et al. [21] used two kinds of recurrent neural networks, Elman neural network, and LSTM to forecast wind speed, which verified the good forecasting performance of LSTM in high-precision wind speed forecasting. Zhang et al. [22] combined CNN with LSTM to forecast wind speed considering both linear and non-linear trends in wind speed series. The test results revealed that compared with a single model, the forecasting performance and reliability were improved. Based on the excellent forecasting performance of Elman, LSTM, and CNN-LSTM in wind speed forecasting, this paper takes the above three models as the basic models of wind speed forecasting.
Besides the forecasting model, data pre-processing is another essential factor affecting the forecasting performance of the wind speed [23]. It decomposes the original wind speed series into more regular and stable subsequences. Then, an appropriate forecasting model is established for each subsequence, and the sum of the forecasting results of each subsequence is taken as the final forecasting result. For example, Tascikaraoglu et al. [24] used wavelet transform and Spatio-temporal correlation to forecast wind speeds. He et al. [25] proposed a forecasting model by combining ensemble empirical mode decomposition (EEMD) with wavelet neural network and achieved an excellent forecasting performance. Yu et al. [26] proposed a wind speed forecasting model based on wavelet transform and singular spectrum analysis (SSA). In his research, SSA extracted the trend components in the subseries with the highest frequency. Although wavelet transform can decompose the original series into more standard components, it needs to determine the wavelet basis function and decomposition sequence manually, which cannot guarantee the optimal decomposition of the signal. Based on the above analysis, EEMD and SSA are used to pre-process the wind speed series in this study.
Additionally, error correction (EC) is also an effective method for enhancing forecasting performance [27]. For example, Jiang et al. [28] combined the least-squares SVM with the generalized autoregressive conditional heteroscedasticity to correct the error components in the forecasting and verified the effectiveness of the method through experiments. Yu et al. [29] proposed an error predictive method based on data transformational GM(1,1) and further improved the forecasting performance. Tkachenko et al. [30] used an additional successive geometric transformations model to forecast the constant displacement and linear component of the error for improving the forecasting performance. The above research shows that the error correction can further improve the forecasting accuracy of the model.
Based on the above research, a hybrid forecasting model combining EEMD, SSA, Elman, LSTM, CNN-LSTM, and EC was proposed to realize multi-step forecasting of short-term wind speeds. The whole process of the proposed wind speed forecasting method is shown in Fig. 1. There are four stages: data pre-processing, model forecast, error correction, and comparison and analysis.

The whole process of the proposed wind speed forecasting method.
(1) Data pre-processing
An original wind speed series is decomposed into intrinsic mode functions (IMFs) with different frequencies and a residual component based on EEMD. Next, the sample entropies (SEs) of each component are calculated using SE. The components with an approximated SE fall in the same group, and finally, three groups, i.e., high-frequency, intermediate-frequency, and low-frequency groups, are divided. Moreover, all components in each group are summed as a component.
(2) Model forecasting
The major steps of model forecasting include: firstly, the trend component of high-frequency components is extracted using SSA; next, high-frequency components are denoised for forecasting; afterward, Elman, LSTM, and CNN-LSTM are used to forecast the low-frequency, intermediate-frequency, and high-frequency components, respectively; finally, the sum of the forecasting results of the three components is taken as the initial forecasting result of wind speeds.
(3) Error correction
The main steps of error correction include: firstly, the error is obtained by subtracting the initial predicted value from the actual value of wind speeds; then, the trend component of the error sequence is obtained using SSA to denoise the error sequence; finally, CNN-LSTM is used to forecast the error, and the sum of the forecasting error and the initial forecasting value is taken as the final forecasting result.
(4) Comparison and analysis
The main step of comparative analysis is to use E-S-CLE-S-E and other 13 forecasting models to forecast the multi-step wind speed of three different wind speed data sets and compare the forecasting performance of these models.
From what is mentioned above, the main contributions of this study can be summarized as follows: To improve the accuracy and robustness of the wind speed forecasting, EEMD and SE methods are used to divide the wind speed into multi-level frequency components (high-frequency, intermediate-frequency and low-frequency components). In view of the volatility and nonlinearity of high frequency components, the SSA method is used to denoise them in this paper. Considering that different neural network models have different forecasting advantages, Elman, LSTM, and CNN-LSTM are used to forecast the low-frequency, intermediate-frequency, and high-frequency components, respectively, which can give full play to the complementary advantages of each neural network model. The hybrid model proposed in this paper considers the error factors, realizes the error correction, and further improves the forecasting performance.
EEMD method
EEMD is an improvement based on empirical mode decomposition (EMD). By adding white noise with uniform spectrum distribution to the signal to be analyzed, the signals of different time scales can be automatically separated into corresponding reference scales, thus overcoming the shortcomings of mode mixing in the EMD method [25]. The specific decomposition steps based on EEMD are shown as follows:
Step 1: the stochastic Gaussian white noise series n
m
(t) is added to the original series x (t), thus obtaining,
Step 2: the series x
i
(t) is decomposed into multiple IMFs and a residual component (R) utilizing EMD, that is,
Step 3: different noise series are added each time to repeat the aforementioned Steps 1 and 2. In this way, M groups of different IMF components and a residual component R are attained.
Step 4: the mean of M groups of IMF components and the residual component R is taken as the final result of EEMD, that is,
The SE method was usually used to measure the complexity of a time series. The greater the SE, the more complex the time series; the lower the SE, the higher the autocorrelation between series [31]. Each component after decomposition is calculated based on SE. The series with similar SE (i.e., similar complexity) is grouped to reduce the number of forecasting models and improve forecasting performance. The specific algorithm of SE is displayed as follows:
Step 1: it is supposed that there is a time series {x (t) , (t = 1, 2, ⋯ N) }. N and SD refer to the length and standard deviation of the series x (t), respectively.
Step 2: Related parameters of the algorithm are defined, such as the embedding dimension m and the similarity tolerance r. Generally, m and r are separately set as 2 and 0.1 SD to 0.25 SD.
Step 3: an m-dimensional vector X (1) , X (2) , ⋯ , X (N - m + 1) is reconstructed, in which,
Step 4: D [X (i) , X (j)] is defined as the distance between the vectors X (i) and X (j), which depends on the maximum difference between corresponding elements, that is,
Step 5:
Step 6: the mean of
Step 7: assuming k = m + 1, Steps 3 to 6 are repeated to determine the values of
Step 8: the SE is calculated according to the following equation:
Because N is a limited value in practical application, the SE estimation method is as follows:
SSA proposed by Broomhead and King [32] is a proper time series analysis method, which can extract the trend, oscillation, periodic, quasi-periodic, and noise components in original data. SSA projects the data space into sub-spaces with different features and characterizes the properties thereof using singular values, and reconstructs the time series by truncating the singular values [33]. The method mainly involves data decomposition and data reconstruction, in which the former includes embedding and singular value decomposition (SVD), and the latter involves grouping and diagonal averaging.
(a) Decomposition
Step 1: embedding; the original time series X = (X1, ⋯ , X
N
) is transformed into L-dimensional track matrix Y = (Y1, ⋯ , Y
L
), in which Y
i
= (X
i
, ⋯ , Xi+L-1), L ∈ [2, N] and K = N - L + 1. The matrix Y is expressed as follows:
Step 2: SVD; the eigentriples (λ
i
, U
i
, V
i
) in descending order by λ
i
of the matrix YY
T
can be attained through SVD. Where, λ
i
, U
i
and V
i
represent the singular value, and left and right eigenvectors, respectively. Thus, the matrix Y can be re-written as:
where d = rank (Y).
(b) Reconstruction
Step 1: grouping; m components are selected from d characteristic sets and defined as I ={ I1, I2, ⋯ I m }. Therefore, Y can be expressed as a matrix consisting of m components, that is, Y I ={ XI1, XI2, ⋯ , X Im }. Y I stands for the trend component of original time series and the other (d - m) components are regarded as sources of noise.
Step 2: diagonal averaging; the matrix {YI1, YI2, ⋯ , Y
Im
} is transformed into the matrix {XI1, XI2, ⋯ , X
Im
} through the Hankelisation procedure H. The original time series can be further described as follows:
where, X trend = XI1 + ⋯ + X Im and X noise = H (ε), which are separately called trend and noise components.
Elman neural network is a typical dynamic recurrent neural network [34]. It realizes the memory function by adding a connected layer as a one-step delay operator in the hidden layer of the basic structure of the BP network. In this way, the system can adapt to the time-varying characteristics and enhance the global stability of the network. Furthermore, it has stronger computing power than a feed-forward neural network, so it is very suitable for time series forecasting [21].
The structure of the Elman neural network is mainly composed of the input layer, hidden layer, connection layer, and output layer, as shown in Fig. 2. The unit of the input layer only plays the role of signal transmission, while the unit of the output layer plays the role of weighting.

The structure of the Elman neural network.
The hidden layer unit has two types of excitation functions, linear and nonlinear, and the commonly used excitation function is the nonlinear sigmoid function. The connection layer has a delay unit, which can be used to memorize the output value of the last time of the hidden layer unit. The next time, the output value and the input of the network are used as the input of the hidden layer so that the network has the dynamic memory function. The addition of an internal feedback network improves the network’s ability to process dynamic information to achieve dynamic modeling.
LSTM consists of an input layer, a cyclic hidden layer, and an output layer. Its hidden layer is a memory unit, not a neuron node [35]. The memory cell c t is the key to LSTM, on which information is transmitted. There are three gates in the memory unit: the input gate, output gate, and forget gate, which are used to protect and control the cell state. A memory unit of LSTM is shown in Fig. 3. The input gate i t determines how many of the inputs x t of the network are stored in memory cell c t at the current time. The forget gate f t determines how many of the previous moment memory cells ct-1 are retained in the current moment c t . The output gate o t controls how many memory cells c t are exported to the current output value h t of the LSTM [36].

The structure of the LSTM.
The model’s input is defined as x = (x1, x2, . . . , x
T
), the output is defined as y = (y1, y2, . . . , y
T
), and the vector sequence of the hidden layer is h = (h1, h2, . . . , h
T
), where T is the prediction period. The working principle of the storage unit is as follows:
As one of the dominant models for deep learning, CNN is a feed-forward neural network, including convolution calculation and having a deep structure, which can extract the inherent abstract features and hidden high-level invariant structures from the data. Therefore, CNN is suitable for processing the wind speed series with non-linear and non-stationary characteristics [37]. The typical network structure for CNN includes many different layers, as shown in Fig. 4. It is mainly composed of an input layer, convolutional layers, pooling layers (also called sub-sampling layers), and a fully connected layer.

The structure of the CNN.
The hybrid CNN-LSTM network combines CNN with LSTM and shares the advantages of the two networks. The CNN can extract the deep features of data and show great potential in processing time series; the LSTM can memorize historical information and allow the persistence of information, making it suitable for processing problems associated with time series data [18]. Thus, the CNN-LSTM is used to forecast the high-frequency sub-layers of wind speed series (Fig. 5). At first, high- dimensional features of high-frequency components of wind speeds are extracted using the CNN; after that, the time series is forecasted using the LSTM based on the extracted features. The structures of CNN and LSTM are the same as those previously, respectively.

The structure of the CNN-LSTM.
Taking data 1 as an example, the framework of the proposed model- E-S-CLE-S-E is illustrated in Fig. 6. The main steps of this hybrid model can be summarized as follows.

The framework of the proposed hybrid model (taking Data 1 as a case).
The original wind speed series is decomposed into multiple IMFs and a residual component (R) using EEMD. Each component after EEMD decomposition is calculated based on SE. Then, the series with similar SE are grouped into three groups: high-frequency components, intermediate-frequency components, and low-frequency components. High-frequency components denoised by SSA, intermediate-frequency components, and low-frequency components are forecasted by CNN-LSTM, LSTM, and Elman, respectively. Moreover, the forecasting results of three groups were combined as the initial forecasting result. The error sequence after denoised by SSA is forecasted by CNN-LSTM. The final wind speed forecasting result is determined by combining the error forecasting result with the initial forecasting result.
Data source
Liaoning Province, located in north-eastern China, is rich in wind resources. The data (wind speed) used in the study is the SCADA data in a specific wind farm in Liaoning Province from January 2018 to December 2018, with a time resolution of 15 min. The missing data was supplemented according to the following two cases: (a) if the missing data is relatively small, the missing data can be determined by interpolating the average value of the data before and after the adjacent moments; (b) if the missing data is relatively large, the similar day principle can be used to supplement the missing data. Besides, if the data at a moment is unreasonable when such as zero value or negative value, then the value can be replaced by the data at the last moment.
Ten days of wind speed data were selected to build three datasets from January, April, and July 2018. Figure 7 shows the three datasets of the wind speed series. A total of 960 samples are contained in each dataset, in which the samples 1 to 840, and 841 to 960, are separately used as the training set and test set. Table 1 shows the statistical information about the three datasets of wind speed series data.

Three sets of wind speed time series at 15 min intervals.
Statistical information of wind speed series data
The wind speed in Data 1 is decomposed based on the EEMD method to attain eight IMF components (IMF 1 to IMF 8) and a residual component R, as shown in Fig. 8. It can be seen from Fig. 8 that each component series has different frequencies. In this way, the decomposed components can highlight the local features of the original wind speed series so that the periodic term, stochastic term, and trend term of the original series can be observed more clearly.

Decomposition results of wind speed series based on EEMD method.
According to the SE method, the decomposed component series based on the EEMD method is divided into high-frequency, intermediate-frequency, and low-frequency components to improve the forecasting efficiency and accuracy. It shows the SEs of each component in Fig. 9.

SE of each component.
As shown in the Fig. 9, the SEs of IMF1 and IMF2 are similarly significant, so IMF1 and IMF2 belong to the high-frequency components group; IMF3, IMF4, and IMF5 exhibit approximately equal SEs and they are subordinated to the intermediate-frequency components group; IMF6, IMF7, IMF8, and R show approximately similar SEs and formed the low-frequency components group. Similar SEs show that these component series show a similar complexity and trend, and thus they can be grouped and forecasted using the same model. Therefore, the nine components decomposed through the EEMD are divided into three groups: Sub1(IMF1, IMF2), Sub2(IMF3, IMF4, IMF5), and Sub3(IMF6, IMF7, IMF8, R), respectively. By calculating the SE values of the nine components and combining them into three groups, the computational effort of the prediction model can be reduced, and the forecasting performance can be improved.
Evaluation indices of forecasting performance
To verify the forecasting accuracy and stability of the proposed model E-S-CLE-S-E, six evaluation indices (i.e., mean absolute error (MAE), mean absolute percentage error (MAPE), root-mean-square error (RMSE), promoting percentages of the MAE (P
MAE
), promoting percentages of the MAPE (P
MAPE
), and promoting percentages of the RMSE (P
RMSE
)) are used to evaluate the forecasting performance. The six evaluation indices are separately defined as follows:
This section introduces the specific process of proposing the hybrid model E-S-CLE-S-E. At first, ten models (BP, GRNN, Elman, LSTM, CNN-LSTM, EEMD-BP, EEMD-GRNN, EEMD-Elman, EEMD-LSTM, and EEMD-CNN-LSTM) were used to perform one-step, two-step, and three-step forecasting on wind speed series in Data 1 for selecting the optimal single forecasting model from BP, GRNN, Elman, LSTM and CNN-LSTM. In these EEMD-related methods, EEMD-BP means that high-frequency components sub1, intermediate-frequency components sub2, and low-frequency components sub3 were separately forecasted using BP at first; after that, the three forecasting results were added to attain the initial forecasting result. The other EEMD-related methods are similar to EEMD-BP. For all models, partial autocorrelation function (PACF) analysis is applied to determine the most relevant variables. The PACF values of the lagged wind speed data that exceed the 95% confidence level are chosen as the input vector for forecasting. Finally, through PACF analysis, it is determined that the model’s input is the wind speed of the first five moments of the predicted wind speed.
In this paper, LSTM and CNN-LSTM are built-in Python version 3.6 with the help of the Keras deep learning package. The constructed CNN consists of two convolution layers and one pool layer, and the convolution kernels of the two type layers are 64 and 128, respectively. According to the convolution kernel size, the CNN can extract many useful features from the input wind speed series; that is, the CNN can extract the features reflecting the narrower or broader period of multivariate time series by setting the convolution kernel size smaller or larger. The constructed LSTM uses the Relu function as the activation function of the hidden layer. The random deactivation method is used in each layer of the LSTM to prevent the over-fitting of the proposed model. In the process of training, the Adam is selected as the optimization algorithm, and the error loss function used for network training corresponds to mean squared error (MSE). Furthermore, each method is run ten times and uses the averaging values to avoid random factors.
Table 2 lists the forecasting results of the ten models: the two deep learning models, i.e., LSTM and CNN-LSTM, all deliver better forecasting performance than the other superficial learning models (BP, GRNN, Elman) for the wind speed series with and without decomposition with EEMD; after decomposing the wind speed series using EEMD, the forecasting accuracies of the other several models are all enhanced except GRNN model. During the experiment, it can be found that the forecasting performance of EEMD–GRNN is inferior to that of GRNN. The reason is that EEMD–GRNN divides the wind speed series into three different frequency components, sub1, sub2 and sub3. This method produces large error in forecasting the high-frequency components sub1. The noise in the high-frequency components affects the forecasting results. SSA can be applied to denoise the high-frequency components. Figure 10 displays the comparison of the sub1 component in Data1 without and with being de-noised using the SSA method. As seen in Fig. 10, SSA can extract the trend components from the fluctuating data to realize a favorable de-noising effect.
Analysis of the prediction results for the experimental Data1
Analysis of the prediction results for the experimental Data1

The comparison of the sub1 component without and with being de-noised using SSA.
Besides, it can be found that different forecasting models are applicable for forecasting different frequency components. For example, the low-frequency components have strong regularity, slight fluctuation and are easy to be forecasted. Therefore, it is unnecessary to select a complex depth model for forecasting, and Elman can meet the demand. LSTM and CNN-LSTM models exhibit the best forecast performance when separately forecasting intermediate-frequency and high-frequency components. Based on the aforementioned two points, E-CLE (after being decomposed by EEMD, CNN-LSTM, LSTM, and Elman are used to forecast sub1, sub2, and sub3, respectively) and E-S-CLE (the sub1 is forecasted using CNN-LSTM after noise reduction by SSA; sub2 and sub3 are forecasted by separately using LSTM and Elman) models were further proposed to forecast wind speed series in Data1. The forecasting results are listed in Table 3. As shown in Tables 2 and 3, the forecasting accuracy of E-CLE and E-S-CLE models is improved relative to the models listed in Table 2. Moreover, the forecast performance of E-S-CLE is superior to that of E-CLE.
Analysis of the prediction results for the experimental Data1
1-step wind speed forecasting reveals that MAE values of E-S-CLE are 0.29, 0.25, 0.25, and 0.19 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the MAPEs of E-S-CLE are 2.25, 2.01, 1.94, and 1.48 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the RMSE values of E-S-CLE are 0.35, 0.31, 0.30, and 0.24 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively.
2-step wind speed forecasting reveals that the MAE values of E-S-CLE are 0.21, 0.16, 0.15, and 0.08 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the MAPEs of E-S-CLE are 1.55, 1.28, 1.26, and 0.70 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the RMSE values of E-S-CLE are 0.24, 0.18, 0.17, and 0.09 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively.
3-step wind speed forecasting reveals that the MAE values of E-S-CLE are 0.21, 0.17, 0.17, and 0.15 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the MAPEs of E-S-CLE are 1.64, 1.38, 1.31, and 1.15 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively; the RMSE values of E-S-CLE are 0.26, 0.18, 0.20, and 0.17 lower than those of EEMD-Elman, EEMD-LSTM, EEMD-CNN-LSTM, and E-CLE, respectively.
Although the above research has established a suitable forecasting model for wind speed series. However, the related error sequence still contains some valuable information, which can be used to further improve the forecasting performance of forecasting models [38]. The error sequence {ε (t) : t = 1, 2, ⋯ n } means the difference between the actual value and the forecast value of wind speeds, that is,
To correct the initial forecasting value, an EC process was introduced. Generally, the error sequence ε (t) is regarded as stochastic and the error {⌢ε (t) : t = 2, 3, ⋯ n + 1 } is forecasted by employing {ε (t) : t = 1, 2, ⋯ n }, that is, using the error at a given time to forecast the error at the next time-step. Because of the non-stationary, high-frequency, chaotic characteristics of the error sequence, it is challenging to forecast errors. Hence, it is beneficial to reduce the noise of the error sequence using SSA before forecasting.
E-S-CLE-E and E-S-CLE-S-EC models were further proposed base on EC. The difference between the two models is whether error sequence is processed by SSA or not. Table 4 lists the forecasting results of the two models for Data1. As shown in Tables 3 and 4, during 1-step forecasting, the MAE values of E-S-CLE-S-E are 0.38, 0.19, and 0.17 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the MAPEs of E-S-CLE-S-E are 3.05, 1.57, and 1.44 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the RMSE values of E-S-CLE-S-E are 0.48, 0.24, and 0.20 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively.
Analysis of the forecasting results for the experimental Data1
During 2-step forecasting, the MAE values of E-S-CLE-S-E are 0.20, 0.12, and 0.26 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the MAPEs of E-S-CLE-S-E are 1.61, 0.91, and 2.00 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the RMSE values of E-S-CLE-S-E are 0.20, 0.11, and 0.25 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively.
During 3-step forecasting, the MAE values of E-S-CLE-S-E are 0.19, 0.04, and 0.24 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the MAPEs of E-S-CLE-S-E are 1.55, 0.40, and 1.89 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively; the RMSE values of E-S-CLE-S-E are 0.22, 0.05, and 0.30 lower than those of E-CLE, E-S-CLE, and E-S-CLE-E, respectively.
To further illustrate the forecasting performance of different models, Figs. 11 to 13 separately display the comparison between the actual value and the forecast value of wind speeds determined using various models in Data 1 during 1, 2, and 3-step forecasting periods, respectively. As shown in Figs. 11 to 13, the forecasting value determined using the E-S-CLE-S-E and has the lowest MAE, MAPE, and RMSE. The result reveals that the model delivers a better forecasting performance relative to other models. On the other hand, it can be seen from Figs. 12 and 13 that the forecasting performance of E-S-CLE-E is inferior to that of E-S-CLE during 2 and 3-step forecasting.

The comparison between the actual value of wind speeds and the forecast values in 1-step forecasting.

The comparison between the actual value of wind speeds and the forecast values in 2-step forecasting.

The comparison between the actual value of wind speeds and the forecast values in 3-step forecasting.
It indicates that the more the forecasting steps are, the more unstable the error sequence is, and the more complex the forecasting is. Additionally, it also validates the necessity of performing noise reduction on error sequences using SSA.
To verify the universality and effectiveness of the proposed model, the wind speeds in Data 2 and Data 3 are also subjected to 1-step, 2-step, and 3-step forecasting using the aforementioned 14 forecasting models. Tables 5 and 6 show the error results of wind speeds in the two datasets through multi-step forecasting with different forecasting models.
The error evaluation results of different models for Data 2
The error evaluation results of different models for Data 2
The error evaluation results of different models for Data 3
According to Tables 5 and 6, it can be found that: The models exhibit the same ranking results in forecasting accuracy when forecasting different datasets. The error increases with the increased number of steps; that is, the greater the number of forecasting steps used, the greater the error. A deep learning model generally shows a better forecasting performance than a traditional non-linear model when separately used to forecast wind speeds. The forecasting results based on Elman, LSTM and, CNN-LSTM show that the forecasting performance of wind speed series decomposed by EEMD is better than that of a single model. The forecasting error of E-S-CLE is lower than that of E-CLE. This result reveals that the forecasting accuracy can be improved by conducting noise reduction on high-frequency components using SSA. During 2 and 3-step forecasting, the forecasting error of E-S-CLE-E is larger than that of E-S-CLE. However, the forecasting error of E-S-CLE is larger than that of E-S-CLE-S-E. Therefore, it implies the necessity of performing noise reduction on an error sequence using SSA. The proposed model E-S-CLE-S-E exhibits the best forecasting performance, and the forecasting value attained using the model is closest to the actual value. Moreover, the MAE, MAPE, and RMSE of the model are all the lowest.
To further compare forecasting performances of the proposed model with those of the other 13 models, a comparative analysis was conducted on these models based on P
MAE
, P
MAPE
, and P
RMSE
. Tables 7 and 8 separately show the improvement percentages of the E-S-CLE-S-E model compared with the other 13 models for Data 2 and Data 3. It can be seen that: The E-S-CLE-S-E model shows good generalization and exhibits high forecasting accuracy and stability in the three datasets. The forecasting performance of the E-S-CLE-S-E model is significantly superior to those of traditional non-linear models. For example, for Data 2, compared to the BP model, the MAE of the E-S-CLE-S-E model decreases by 63.95%, 60.22%, and 55.67%, respectively; the MAPE of the E-S-CLE-S-E model decreases by 66.43%, 63.01%, and 59.92%, respectively; and the RMSE of the E-S-CLE-S-E model decreases by 63.64%, 58.47%, and 54.84%, respectively. The forecasting performance of the E-S-CLE-S-E model is remarkably better than that of a single deep learning model. For example, in Data 2, compared to the LSTM model, the MAE of the E-S-CLE-S-E model decreases by 59.74%, 55.95%, and 50.57%, respectively; the MAPE of the E-S-CLE-S-E model decreases by 61.29%, 59.19%, and 54.90%, respectively; and the RMSE of the E-S-CLE-S-E model decreases by 58.76%, 55.05%, and 49.55%, respectively. The E-S-CLE-S-E model is superior to other models treated by EEMD alone; for example, in Data 2, compared to the E-CLE model, the MAE of the E-S-CLE-S-E model decreases by 47.46%, 43.08%, and 32.81%, respectively; the MAPE of the E-S-CLE-S-E model decreases by 48.62%, 42.24%, and 34.49%, respectively; and the RMSE of the E-S-CLE-S-E model decreases by 46.67%, 37.97%, and 27.27%, respectively. The E-S-CLE-S-E model returns significantly higher forecasting performance than that of the E-S-CLE model. For example, in Data 2, compared to the E-S-CLE model, the MAE of the E-S-CLE-S-E model decreases by 31.11%, 30.19%, and 21.82%, respectively; the MAPE of the E-S-CLE-S-E model decreases by 30.52%, 28.13%, and 17.51%, respectively; and the RMSE of the E-S-CLE-S-E model decreases by 29.82%, 25.76%, and 20.00%, respectively. The forecasting performance of the E-S-CLE-S-E model is much better than that of the E-S-CLE-E model. For example, in Data 2, compared to the E-S-CLE-E model, the MAE of the E-S-CLE-S-E model decreases by 24.39%, 51.32%, and 39.44%, respectively; the MAPE of the E-S-CLE-S-E model decreases by 26.62%, 49.36%, and 39.70%, respectively; and the RMSE of the E-S-CLE-S-E model decreases by 24.53%, 51.49%, and 41.67%, respectively.
Improvement when using the E-S-CLE-S-E model compared with other models for Data 2
Improvement when using the E-S-CLE-S-E model compared with other models for Data 3
Although MAE, MAPE, and RMSE are often used to evaluate the forecasting performance of models, statistical tests are still needed, otherwise the conclusions are subjective and accidental. Wind speed forecasting models are usually trained based on historical statistical data, and the basis for evaluating the advantages and disadvantages of forecasting models can be found through probability and statistics theory. [39, 40]. Hence, in addition to the traditional error evaluation methods, Diebold Mariano (DM) statistical test [40] and a novel metric named variance ratio (VR) [41] are used to verify the accuracy and stability of the hybrid model proposed in this study.
Statistical significance
DM statistical test is a classical statistical method to test the advantages and disadvantages of the proposed model and the comparison model. In this section, the DM statistical test is used to validate the performance of the proposed hybrid model. Table 9 shows the DM statistical test results of eight models. It can be seen form Table 9 that the average DM value of the E-S-CLE and E-S-CLE-E models are larger than the upper limits at the 10% and 5% significance level, respectively, and the DM values of other models are larger than the upper limits at the 1% significance level. Since the absolute value of DM is larger than that of the critical value of standard normal distribution, the original hypothesis is rejected, and the difference between the proposed model and the comparison model is significant. Therefore, it can be concluded that the proposed hybrid model is significantly better than the comparison forecasting models.
Results of DM statistical test
Results of DM statistical test
ais the 1% significance level Z0.01/2 = 2.58. bis the 5% significance level Z0.05/2 = 1.96. cis the 10% significance level Z0.10/2 = 1.64. dis the 15% significance level Z0.15/2 = 1.44. eis the 20% significance level Z0.20/2 = 1.28. fis the 25% significance level Z0.25/2 = 1.15.
Generally, the performance variance can be used to evaluate the stability of prediction model. However, it is unscientific to evaluate the stability of the forecasting model only with performance variance. VR combines the forecasting and actual values, which can better test the stability of the forecasting model [41]. Table 10 shows the VR values of nine models, from which it can be seen that compared with other models, the proposed hybrid model reaches the maximum value in four cases. Thus, it proves that the proposed hybrid model is more stable than other models.
Results of VR stability test
Results of VR stability test
Due to intermittency and non-controllability of wind speed series, wind speed forecasting is very challenging. Hence, a hybrid model E-S-CLE-S-E for multi-step wind speed forecasting based on EEMD, SSA, Elman, LSTM, CNN-LSTM, and EC was proposed to improve the forecasting performance of wind speeds. By comparing the proposed hybrid model with the other 15 models, the following conclusions were drawn: The forecasting performance of a single forecasting model on wind speed series is unsatisfactory. The forecasting performance of deep learning models (LSTM, CNN-LSTM) on forecasting wind speed series is superior to those of traditional non-linear models (BP, GRNN, Elman). EEMD and SSA methods can strengthen the accuracy and stability of wind speed forecasting. Combining them with the advantages of each single forecasting model (BP, GRNN, Elman, LSTM, CNN-LSTM) can improve the forecasting performance of wind speeds. Besides, error correction can improve the accuracy of the forecasting model, but the high-frequency and unstable error sequence needs to be denoised. Compared with the other models, the proposed hybrid model (E-S-CLE-S-E) exhibits the lowest MAE, MAPE, and RMSE, and its forecasting accuracy and stability are much better than that of the other models.
It should be emphasized that due to the different characteristics of wind speed data in different regions, the network parameters of Elman, LSTM and CNN-LSTM may not be applicable to other regions. In other words, the network parameters may need to be adjusted when the proposed method is applied in other regions. Besides, more advanced CNN or LSTM network structure may get better forecasting performance, but the network structure used in this study is the most commonly used, with high forecasting performance.
When the uncertainty level of wind power series increases, the traditional point forecasting results may not be reliable and accurate enough for decision makers to make plans, so accurate wind speed interval forecasting will be the key research direction in the future.
Footnotes
Acknowledgments
This work was supported by the National Nature Science Foundation of China (grant nos. 61533007 and 61873053) and the National Key Research and Development Program of China (grant nos. 2019YFE0105000).
