A hybrid short-term load forecasting method using CEEMDAN-RCMSE and improved BiLSTM error correction

Abstract

Accurate load forecasting is an important issue for safe and economic operation of power system. However, load data often has strong non-stationarity, nonlinearity and randomness, which increases the difficulty of load forecasting. To improve the prediction accuracy, a hybrid short-term load forecasting method using load feature extraction based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and refined composite multi-scale entropy (RCMSE) and improved bidirectional long short time memory (BiLSTM) error correction is proposed. Firstly, CEEMDAN is used to separate the detailed information and trend information of the original load series, RCMSE is used to reconstruct the feature information, and Spearman is used to screen the features. Secondly, an improved butterfly optimization algorithm (IBOA) is proposed to optimize BiLSTM, and the reconstructed components are predicted respectively. Finally, an error correction model is constructed to mine the hidden information contained in error sequence. The experimental results show that the MAE, MAPE and RMSE of the proposed method are 645 kW, 0.96% and 827.3 kW respectively, and MAPE is improved by about 10% compared with other hybrid models. Therefore, the proposed method can overcome the problem of inaccurate prediction caused by data and inherent defects of models and improve the prediction accuracy.

Keywords

Short-term load forecasting complete ensemble empirical mode decomposition with adaptivenoise refined composite multi-scale entropy improved butterfly optimization algorithm bidirectional long short time memory neural network

1 Introduction

Short-term load forecasting is very important for the stable operation of power system. Accurate load forecasting can not only ensure the balance between supply and demand, but also avoid abnormal fluctuations in voltage and frequency and reduce the risk of failure [1]. With the reform and promotion of power market and the development of power technology such as demand response and load aggregation, power load presents more complex and changeable new features and forms [2], which also puts forward higher requirements for the accuracy and reliability of load forecasting. In order to provide reliable data for optimal dispatching of power grid and ensure the stable operation of power system, it has become an urgent problem to enhance the level of load decomposition, improve traditional load forecasting methods and make full use of error information to further improve forecasting performance [3].

Many scholars have studied the short-term load forecasting. The forecasting methods can be roughly divided into two categories, mathematical statistics algorithm and machine learning algorithm. Commonly used statistical methods include linear regression method [4], Kalman filter [5], exponential smoothing method [6] and so on. Although these methods have the advantages of simple model and fast calculation speed, they are not robust and easy to be destroyed by random factors, which leads to low reliability of load forecasting in complex environment. Machine learning algorithms include support vector machine (SVM) [7], extreme learning machine (ELM) [8], random forest (RF) [9] and so on. Reference [7] uses SVM combined with adaptive neuro-fuzzy inference system (ANFIS) to make prediction, and the results show that SVM has smaller prediction error than ANFIS in training stage, while ANFIS has higher prediction accuracy in testing stage. Reference [8] adopts ELM to modify the structure of group method of data handling, and obtains accurate prediction results. This kind of method can deal with nonlinear and multidimensional data well, but it is sensitive to outliers and easy to lead to the decline of model performance in the face of data mutation.

In recent years, deep learning has been widely concerned in the field of load forecasting because of its excellent data expression ability. Among them, long short-term memory (LSTM) [10, 11] is an improved structure of recurrent neural network (RNN), which is suitable for processing time series and has achieved good forecasting results in the field of load forecasting. Reference [10] proposes a hybrid method based on support vector regression (SVR) and LSTM, which can handle high-dimensional time series data well, and the prediction accuracy is higher than that of single LSTM. Reference [11] proposes a multi-task learning model based on ResNet-LSTM network, which mines the spatial coupling interaction features between multiple energies and realizes the differentiated selection of shared features by using attention mechanism. Although LSTM solves the problem of long-term dependence to a certain extent, there is still the phenomenon of information loss when dealing with long-term time series problems. The bi-directional structure of bidirectional long short-term memory (BiLSTM) enables it to learn the forward and backward time series relationship of the sequence, which optimizes the problems that LSTM neural network does not fully learn the global information of historical data and ignores the correlation between forward and backward time series. Reference [12] compares four load forecasting models, which proves the advantages of bidirectional structure and that BiLSTM has better forecasting performance than LSTM and gate recurrent unit (GRU). Reference [13] adopts Conv1D to extract depth features and BiLSTM and attention mechanism to learn load data features, which has high prediction accuracy and anti-interference ability. In [14], considering the internal relationship between multivariable and time series, important features are extracted from the row vectors of BiLSTM hidden state matrix by temporal pattern attention (TPA), and the hyperparameters of BiLSTM are optimized by chaotic sparrow search algorithm (CSSA), which further improves the prediction accuracy. The existing research proves that compared with single models such as GRU and LSTM, the hybrid model can make full use of the advantages of each model, can overcome the limitations of the single model and have better prediction performance.

However, load series are easily influenced by various factors, showing instability and nonlinearity, which increases the difficulty of load forecasting. Therefore, scholars often use load decomposition method to obtain smoother and more stable components, and predict each component separately, thus improving the overall prediction accuracy. Reference [15] uses wavelet transform package (WTP) to decompose the original signal, which improves the prediction accuracy of local features. But it is necessary to choose a suitable wavelet function. Reference [16] uses complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) to preliminarily decompose the sequence and extract the characteristic frequency. Compared with the traditional empirical mode decomposition (EMD) and wavelet transform (WT) method, CEEMDAN has higher sensitivity and adaptability. Reference [17] suppress the interference of load fluctuation by the CEEMDAN algorithm, propose the method combined GRU with improved gray wolf optimizer (IGWO) to predict power load, which can effectively improve the prediction accuracy. Reference [18] uses CEEMDAN mining and permutation entropy (PE) to judge the component noise, further decomposes the noisy components through wavelet packet decomposition (WPD) and make prediction, which effectively improves the prediction accuracy. But two decompositions produce more components, resulting in a large amount of calculation and low training efficiency.

Aiming at the problems of mode mixing and high entropy of high-frequency components that may be caused by load decomposition, and in order to solve the problems that the traditional deep learning model has long-sequence information loss and the prediction error cannot be corrected, a short-term load correction forecasting framework that combines CEEMDAN, refined composite multi-scale entropy (RCMSE), BiLSTM and improved butterfly optimization algorithm (IBOA) is proposed. Firstly, the preprocessed original load data is decomposed by CEEMDAN to reduce the information loss and improve the forecasting performance, RCMSE is used to evaluate the entropy of the decomposed components, the components with similar values are reconstructed into new components, and Spearman coefficient is used to screen the characteristics of each component. Secondly, the IBOA is used to search the optimal hyperparameter combination of BiLSTM, the reconstructed components is predicted, and the prediction results of each component are added to get the preliminary load prediction results. Finally, an error correction model based on the improved BiLSTM is constructed, which further improves the prediction accuracy.

The innovations and contributions of the proposed method are as follows:

(1)
Based on the existing advantages of CEEMDAN, the RCMSE is introduced to decompose and reconstruct components with similar complexity. Using CEEMDAN-RCMSE decomposition framework, the original load is decomposed and reconstructed adaptively, so as to effectively extract the time series characteristics such as periodicity and trend in load data and reduce its non-stationary characteristics.
(2)
The IBOA with strong optimization ability and fast convergence speed can search the optimal hyperparameter combination of BiLSTM, which enables the BiLSTM to fully explore the past and future deep time series characteristics of load data and give full play to the advantages of model itself.
(3)
The error correction model can correct the accumulated error in the prediction process and the error caused by residual noise in CEEMDAN. A practical case shows that the hybrid forecasting method proposed in this paper can effectively improve the load forecasting accuracy.

The remainder of this paper is organized as follows. The decomposition and reconstruction method is introduced in Section 2. The improved BiLSTM error correction model is introduced in Section 3. The total procedure of hybrid forecasting method is explained in Section 4. The forecasting model performance is evaluated in Section 5. Conclusions are drawn in the last part.
2 Decomposition and reconstruction method with data characteristic

2.1 Complete ensemble empirical mode decomposition with adaptive noise

The short-term load forecasting framework is shown in Fig. 1. Short-term load forecasting is based on historical data, comprehensively considering meteorological factors and external factors, reflecting the complex relationship between them by establishing mathematical relations, and accurately forecasting the future load. Aiming at the characteristics of strong qualitative and remarkable randomness of the original load series, CEEMDAN is used to decompose the original load series into multiple components, which reduces the difficulty of forecasting.

Fig. 1

Short-term load forecasting framework.

CEEMDAN is an adaptive data preprocessing algorithm. Based on EMD, CEEMDAN adds adaptive Gaussian white noise at each stage, which decomposes the load data into a series of intrinsic mode function (IMF) and residual component (RES) with high to low frequencies. CEEMDAN algorithm effectively solves the problem of mode mixing, which greatly improves the decomposition efficiency. Each mode component reflects the characteristics of the original load data on different time scales, which effectively weakens the non-stationary and nonlinear characteristics of the load series and provides a basis for the detailed analysis of load data.

The calculation steps of CEEMDAN decomposition are as follows: adding Gaussian white noise with standard normal distribution ω_i (t) to the original signal X (t), averaging the components E_k (*) obtained by empirical mode decomposition for many times as the final actual component, and the k + 1st order mode component IMF _k ₊₁ is

IM F_{k + 1} = \frac{1}{R} \sum_{i = 1}^{R} E_{1} (r_{k} (t) + ɛ_{k} E_{k} (ω_{i} (t)))

(1)

where R is the sample point of load data, t is the load data period (10 min), ɛ₀ is the noise standard deviation, ω_i (t) is the i-th white noise signal, and E_k (*) is the k-th IMF component generated by EMD algorithm. Stop decomposition until the residual signal cannot be decomposed.

2.2 Components reconstruction method based on RCMSE

The short-term load forecasting model combined with CEEMDAN algorithm can improve the evaluation indexes of the forecasting model. However, with the increase of the number of components, the computational complexity and time consumption cost of the prediction model will increase exponentially, which is not conducive to the superposition of subsequent optimization algorithms. Therefore, in order to balance the prediction performance and time cost, the mode components obtained by CEEMDAN decomposition is aggregated. Considering that the regularity and complexity of each mode component are quite different, the RCMSE is used to evaluate the complexity.

RCMSE is an improved algorithm based on multi-scale entropy (MSE) and composite multi-scale entropy (CMSE), which can make up for the inaccurate or undefined entropy of MSE and CMSE, overcome the limitation of single scale, and obtain more stable entropy values than other multi-scale entropy algorithms [19]. For load time series, the formula of RCMSE value is defined as follows:

RCMSE (x, τ, m, r) = - ln \frac{\sum_{k = 1}^{τ} n_{k, τ}^{m + 1}}{\sum_{k = 1}^{τ} n_{k, τ}^{m}}, 1 \leq k \leq τ

(2)

where m is the embedding dimension, r is the conditional threshold, τ is the scale factor, and

n_{k, τ}^{m + 1}

and

n_{k, τ}^{m}

are the corresponding vector pairs after coarse granulation.

3 The improved BiLSTM error correction model

3.1 Bidirectional long short time memory network

BiLSTM is a combination of forward LSTM and backward LSTM, which is suitable for processing time series data. It is precisely because BiLSTM can use both past and future sequence information to train the model, which effectively solves the problem of long-term dependence and improves the ability to extract data features. At the same time, the time correlation between features can be fully utilized, which is conducive to further improving the accuracy of load forecasting. The structure of BiLSTM is shown in Fig. 2.

Fig. 2

The structure of BiLSTM.

The output of BiLSTM at time t is obtained by adding the outputs of forward layer and backward layer simultaneously. The calculations are as follow:

\vec{h_{t}} = f (\vec{ɛ} x_{t} + {\vec{ɛ}}^{'} h_{t - 1})

(3)

{\overset{\leftarrow}{h}}_{t} = f (\overset{\leftarrow}{ε} x_{t} + \overset{\leftarrow}{ε}' h_{t + 1})

(4)

O_{t} = [\vec{h_{t}}, \overset{\leftarrow}{h_{t}}] = g (ɛ \vec{h_{t}} + ɛ^{'} \overset{\leftarrow}{h_{t}})

(5)

where

\vec{h_{t}}

is the output of the forward LSTM layer,

\vec{ɛ}

is the weight of forward LSTM layer,

\overset{\leftarrow}{h_{t}}

is the output of the backward LSTM layer,

\overset{\leftarrow}{ɛ}

is the weight of backward LSTM layer, and O_t is the output of the BiLSTM network.

3.2 The optimization of BiLSTM network

3.2.1 Improved butterfly optimization algorithm

The BOA is a heuristic optimization algorithm based on natural inspiration. Its core strategy is to imitate the foraging and mating behavior of butterflies. The BOA imitates these behaviors to find the optimal value in the search space. When the traditional BOA is applied to load forecasting, although it is better than other optimization algorithms, it also has the problems of falling into local optimum, poor convergence accuracy and reduced population diversity. Aiming at the limitation of traditional BOA, this paper makes full use of the advantages of dynamic transformation strategy, optimal neighborhood disturbance strategy and random inertia weight strategy in solving complex problems with constraints and unknown search space, which optimizes the global search stage and local search stage and solves the problems that the algorithm is easy to fall into local optimal value, low convergence accuracy and reduced population diversity [20].

(1) Dynamic conversion probability strategy

The conversion probability p determines whether the butterflies enter the global search stage or the local search stage. In order to improve the convergence speed of the algorithm in the later stage, the conversion probability p is adjusted to a dynamically changing value. The formula is

p = 0.6 - 0.1 \cdot (\frac{T_{max} - t}{T_{max}})

(6)

where T_max and t represent the maximum number of iterations and the current number of iterations, respectively. It can be seen from Equation (6) that with the increase of iteration times, the conversion probability p will be dynamically adjusted. When p > r, it enters the global search stage. When p ≤ r, it enters the local search stage.

(2) Optimal neighborhood disturbance strategy

In the process of iteration, the butterflies in BOA will move randomly or move to the optimal butterfly. When the butterflies move randomly, the BOA enters the global search stage. The expression is as follows:

X_{i}^{G + 1} = X_{i}^{G} + r^{2} (X_{ii}^{G} - X_{iii}^{G}) F_{i}

(7)

where

X_{i}^{G + 1}

and

X_{i}^{G}

are the position of the i-th butterfly in the G + 1 and G iterations, respectively; r is a random number between [0,1];

X_{ii}^{G}

and

X_{iii}^{G}

are two different butterfly individuals randomly selected in the G iteration.

In order to reduce the diversity of butterfly population, the neighborhood space is searched by disturbing the current optimal position in the global search stage, which can ensure that the real optimal position can be found and thus jumping out of the local optimal position. The optimal neighborhood disturbance strategy formula is

{\tilde{X}}_{(t)} = {\begin{matrix} X_{(t)}^{*} + 0.5 \cdot ran d_{1} X_{(t)}^{*}, \\ X_{(t)}^{*}, \end{matrix} \begin{matrix} ran d_{1} < 0.5 \\ ran d_{2} \geq 0.5 \end{matrix}

(8)

where

{\tilde{X}}_{(t)}

and

X_{(t)}^{*}

represent the new position after disturbance and the optimal position before disturbance; rand₁ and rand₂ are random numbers, with values in the range of [0,1].

(3) Random inertia weight strategy

When the butterflies move to the optimal butterfly, the BOA enters the local search stage. The expression is as follows:

X_{i}^{G + 1} = X_{i}^{G} + r^{2} (X_{best}^{G} - X_{i}^{G}) F_{i}

(9)

where

X_{best}^{G}

is the optimal solution in the current iteration process.

Aiming at the problem that the traditional BOA easily falls into local optimum in the later stage, random inertia weight is used to change the influence of the previous position on the current position update in the local search stage. The formula for calculating the random inertia weight is

\begin{matrix} ω = μ_{min} + (μ_{max} - μ_{min}) \cdot rand () \\ \begin{matrix}  \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix}  \end{matrix} + sin (\frac{π t}{2 T_{max}} + π) \end{matrix}

(10)

where μ_max and μ_m_in are the upper and lower limit of inertia weight, respectively; rand is a random number, and its value is in the range of [0,1]. Adjust the butterfly position update mode in the local search stage to

x_{i}^{t + 1} = ω \cdot x_{i}^{t} + (s^{2} \times x_{j}^{t} - x_{k}^{t}) \times Smel l_{i}

(11)

It can be seen from Equation (11) that ω realizes the adjustment and control of the position to be updated by the previous position of the butterfly. When the value of ω is large, the position of butterfly to be updated is greatly affected by the previous position. When the value of ω is small, the position of butterfly to be updated is less affected by previous position.

3.2.2 The process of IBOA optimizing BiLSTM

When BiLSTM is used for short-term load forecasting, the accuracy depends not only on the comprehensiveness of feature extraction, but also on the selection of model hyperparameter. In this paper, IBOA is used to search the optimal hyperparametric combination of BiLSTM. The training times λ, learning rate μ and the number of hidden layer neurons γ of BiLSTM are determined, which further improves the accuracy of BiLSTM model in short-term load forecasting. The process of IBOA optimizing BiLSTM is shown in Fig. 3.

Fig. 3

The flow chart of IBOA optimizing BiLSTM.

Step 1: Set the butterfly population scale N, the dimension of the search space D, the maximum number of iterations T_max, the upper limit μ_max and the lower limit μ_min of the inertia weight. Initialize the butterfly population position, and determine the value range of the parameter combination S (λ, μ, γ) to be optimized.

Step 2: Assign values to the parameters of BiLSTM according to S (λ, μ, γ). BiLSTM is trained by training set data, and the most fitness is measured by mean square error. After reaching the highest number of iterations, the output value of training samples is obtained.

Step 3: Calculate the fitness value of each butterfly, and take the current optimal position of the butterfly as the historical optimal position. At the same time, the random number r is generated in the range of [0,1], and the dynamic conversion probability p is calculated. When p > r, it enters the global search stage and updates the position according to the optimal neighborhood disturbance strategy. When p ≤ r, it enters the local search stage and updates the position according to the random inertia weight strategy. If the updated butterfly fitness is better than the previous butterfly fitness, the current optimal position and the global optimal position are updated.

Step 4: When the set optimal iteration number is reached, the optimal hyperparametric combination of BiLSTM is obtained.

3.3 Error correction model

The white noise signal added in the process of CEEMDAN decomposition leads to the residual noise of components, which makes errors in component reconstruction and reduces the prediction accuracy. Therefore, an error correction model based on IBOA-BiLSTM is established. Further compensation for the prediction results can effectively improve the prediction performance.

Firstly, IBOA-BiLSTM is used to make a preliminary load forecasting, and the real load data is subtracted from the preliminary forecast sequence to get the error sequence. Then, the IBOA-BiLSTM is used to learn the changing trend of error and predict the error to obtain predicted sequence of errors. Finally, the error prediction sequence is removed from the preliminary predicted load sequence, so as to obtain the final load prediction result after error correction. The final predicted value is

Q^{Final} = \sum_{j = 1}^{s} f (X_{j}) + e

(12)

where s represents the number of component sequences after aggregation and reconstruction; e is the error prediction values after aggregation and reconstruction.

4 The total procedure of hybrid forecasting method

Load data is easily influenced by multiple factors, so it shows strong non-stationarity and nonlinear characteristics. Due to the advantages of CEEMDAN in non-stationary sequence processing and the remarkable effect of BiLSTM in short-term load forecasting, the CEEMDAN-RCMSE and improved BiLSTM error correction model is proposed. The specific process is shown in Fig. 4.

Fig. 4

The flow chart of CEEMDAN-RCMSE-IBOA-BiLSTM-EC.

The total procedure mainly includes four parts:

Step 1: The original load sequence is decomposed into multiple components through CEEMDAN, and IMF₁, IMF₂, ... , IMF_g are obtained, which effectively reduces the fluctuation and nonlinearity of the original load sequence. The RCMSE values of each component are obtained by coarse-graining procedure and vector-matching procedure, and the components with similar entropy are reconstructed to form new component sequences F₁, F₂, ... , F_s, which reduces the model complexity and calculation scale and improves the prediction efficiency.

Step 2: The components after reconstruction, temperature, general diffusion flow and humidity are normalized as input sequences. The IBOA is adopted to search the optimal hyperparameter combination of BiLSTM from both global and local perspectives. The neural network model based on improved BiLSTM is established for each component, and the prediction results of components are obtained. Sum the predicted value of each component to obtain a preliminary forecasting sequence.

Step 3: Because model error comes from addition white noise when CEEMDAN decomposition is used, which results in residual noise of mode components, and inherent defects of the model itself and load data, the predicted load sequence of CEEMDAN-RCMSE-IBOA-BiLSTM neural network is subtracted from the original load sequence to obtain the error load sequence as input. The error load sequence, temperature, general diffusion flow and humidity are used as the inputs of the error correction model, and the error sequence is predicted by the improved BiLSTM model. The predicted error sequence is removed from the predicted load sequence to obtain the final load forecasting result.

Step 4: By using four evaluation metrics, the final load forecasting results are compared with actual values. Meanwhile, the error diagram of 144 data point in the test set, gaussian mixture distribution and Taylor diagram are drawn, which verifies the stability and reliability of the proposed model.

5 Case study

5.1 Data description and evaluation metrics

5.1.1 Data description

To verify the performance and effectiveness of the proposed forecasting framework, the short-term load forecasting is carried out by taking 8640 pieces of data with a granularity of 10 minutes from November 1, 2017 to December 30, 2017 in a certain area in northern Morocco as an example. The data set includes load data, temperature, humidity, general diffusion flow, wind speed and diffusion flow. Taking the load data from November 1, 2017 to December 29, 2017 as the training set data, with a total of 8496 data points. Taking the load data of December 30, 2017 as the test data, with a total of 144 data points.

To improve the training speed and prediction accuracy of the model and avoid supersaturation in the training process, it is necessary to preprocess the data. Data preprocessing mainly includes outlier processing, missing value filling and normalization. Quartile method is used to detect abnormal values in data set, and average value is used to replace abnormal values and missing values to ensure the relative integrity of data. The characteristics of load data after outlier processing and missing value filling are shown in Table 1.

Table 1
Division and characteristics of load data set after preprocessing

Data set Complete data Training set Test set

Sample size 8,640 8,496 144

Maximum 99,297 99,297 87,862

Minimum 36,785 36,785 43,541

Average 64,399 64,397 60,427

Outlier size 64 62 –

Data set	Complete data	Training set	Test set
Sample size	8,640	8,496	144
Maximum	99,297	99,297	87,862
Minimum	36,785	36,785	43,541
Average	64,399	64,397	60,427
Outlier size	64	62	–

When the outlier size accounts for 0.5% –1% of the data set, it indicates that the data set is available. The proportion of outlier data in the selected data set is 0.74%, so the selected set can be used for short-term load forecasting. Meanwhile, normalization is adopted to eliminate the dimensional influence between different factors. The normalization formula can be express as

x * = \frac{x - x_{min}}{x_{max} - x_{min}}

(13)

where x is the input feature vector data, x_max is the maximum value of input feature vector data and x_min is the minimum value of input feature vector data.

In the absence of feature selection for prediction, the future load can only rely on historical data. However, the power load is subject to the influence of multiple factors in the power system, and weather features exhibit complexity, variability, and high volatility. Selection of pertinent features can enhance the stability of the model. Randomly selecting features as inputs to the model may ignore significant contributors like temperature. Conversely, employing all features instead of omitting any may introduce noise and compromise accuracy due to the dataset’s extensive size and inclusion of redundant elements, such as diffusion flow. Hence, feature selection assumes a pivotal role, as a judicious selection of input features can substantially improve the computational efficiency and prediction accuracy of the model. Spearman correlation analysis is used to analyze the correlation between load data and other factors. Assuming that two variables α = (α₁, α₂, …, α_n) and β = (β₁, β₂, …, β_n) are given, the Spearman correlation coefficient can be expressed as

Spearman = \frac{\sum_{i} (α_{i} - \bar{α}) (β_{i} - \bar{β})}{\sqrt{\sum_{i} {(α_{i} - \bar{α})}^{2}} \sum_{i} {(β_{i} - \bar{β})}^{2}}

(18)

where

\bar{α}

and

\bar{β}

are the average values of α and β respectively.

As can be seen from Table 2, power load data has strong correlation with temperature and humidity, with the correlation coefficients of 0.3482 and 0.3127 respectively, followed by general diffusion flow with correlation coefficient of 0.2002. However, power load data has small correlation with wind speed and diffusion flow, which are characterized by weak correlation. Therefore, temperature, humidity and general diffusion flow are taken as input characteristics.

Table 2

The correlation coefficient of Spearman

Temperature (°C)	Humidity (RH)	General diffusion flow (m²/s)	Wind speed (m/s)	Diffusion flow (m²/s)
0.3482	–0.3127	0.2002	–0.07429	0.06688

5.1.2 Evaluation metrics

In order to verify the forecasting performance of the proposed model and compare the forecasting effects of different modeling methods, this paper adopts mean absolute error (MAE), mean absolute percent error (MAPE), root mean square error (RMSE) and determining coefficient R² as the evaluation metrics of load forecasting accuracy.

MAE = \frac{1}{m} \sum_{i = 1}^{m} | {\hat{y}}_{i} - y_{i} |

(14)

MAPE = \frac{1}{m} \sum_{i = 1}^{m} ‖ \frac{{\hat{y}}_{i} - y_{i}}{y_{i}} ‖ \times 100

(15)

RMSE = \sqrt{\frac{\sum_{i = 1}^{m} {({\hat{y}}_{i} - y_{i})}^{2}}{m}}

(16)

R^{2} = 1 - \frac{\sum_{i = 1}^{m} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{m} {(y_{i} - \frac{1}{m} \sum_{i = 1}^{m} y_{i})}^{2}}

(17)

where

{\hat{y}}_{i}

is the value of load forecasting, y_i is the true value of load, and m is the number of predicted points.

5.2 Data decomposition and reconstruction

In this paper, CEEMDAN is used to decompose complex original load data series. The original load data is decomposed into 12 mode components and 1 residual component at different frequency scales by CEEMDAN. Considering that the components are directly modeled and predicted will lead to too much calculation, this paper uses RCMSE to evaluate the complexity of each component. Based on the entropy value, the component sequences are aggregated and reconstructed to improve the prediction efficiency and the input feature quality. The embedding dimension, conditional threshold and time delay of RCMSE algorithm are set to 2, 0.2std (IMF_i) and 1 respectively. According to RCMSE values, the results of reconstruction of each component are shown in Fig. 5.

Fig. 5

Components reconstruction based on RCMSE value.

It can be seen from Fig. 5 that the RCMSE values of each component sequence are mainly distributed in seven numerical neighborhood ranges of 4.7, 5.5, 2.6, 3.5, 0.8, 0.5 and 0.1. Therefore, this paper aggregates and reconstructs each component sequence, and the results are as follows: F₁ = IMF₁, F₂ = IMF₂ + IMF₃, F₃ = IMF₄ + IMF₇, F₄ = IMF₅, F₅ = IMF₆ + IMF₈, F₆ = IMF₉ + IMF₁₀, F₇ = IMF₁₁ + IMF₁₂ + IMF₁₃. Figure 6 shows the process of components reconstruction based on RCMSE values. The number of components after reconstruction is reduced by 6. Therefore, the purpose of reducing mode components and improving calculation efficiency can be achieved.

Fig. 6

The process of components reconstruction based on RCMSE.

5.3 Prediction performance comparison

5.3.1 Comparison of the single model prediction results

In order to compare the performance of single forecasting models, six forecasting models, including RF, SVM [21], ELM-NN, forward feedback back propagation (FFBP), LSTM and BiLSTM, are established respectively with load series, temperature, humidity and general diffusion flow as input. The training epoch of all models are 500, and the learning rate is 0.01. Other parameter settings are shown in Table 3.

Table 3
Hyperparameter setting

Model Parameter Parameter setting

RF Estimators number n = 800

Min sample split min = 2

Max depth max = 4

SVM Kernel type RBF kernel function

Penalty factor C = 32

Kernel function parameter γ= 4

FFBP Layer number 2

Hidden neurons number (50, 50)

ELM-NN Hidden neurons number 20

LSTM Layer number 2

Hidden neurons number (50, 50)

Activation function ReLU

Optimization method Adam

BiLSTM Forward neurons number 50

Backward neurons number 50

Activation function ReLU

Optimization method Adam

Model	Parameter	Parameter setting
RF	Estimators number	n = 800
	Min sample split	min = 2
	Max depth	max = 4
SVM	Kernel type	RBF kernel function
	Penalty factor	C = 32
	Kernel function parameter	γ= 4
FFBP	Layer number	2
	Hidden neurons number	(50, 50)
ELM-NN	Hidden neurons number	20
LSTM	Layer number	2
	Hidden neurons number	(50, 50)
	Activation function	ReLU
	Optimization method	Adam
BiLSTM	Forward neurons number	50
	Backward neurons number	50
	Activation function	ReLU
	Optimization method	Adam

Table 4 shows the comparison of their predicted performance. Among the six single models, the error of RF, SVM and FFBP are large, which indicates that they are unstable and easy to fall into local optimum.

Table 4

Prediction performance comparison of the single models

Model	RMSE (kW)	MAPE (%)	Model	RMSE (kW)	MAPE (%)
RF	13,164	12.19	ELM-NN	10,282	12.05
SVM	23,243	11.47	LSTM	6,304	7.33
FFBP	12,276	9.83	BiLSTM	2,477	2.80

The error of ELM-NN is lower than that of RF, SVM and FFBP, indicating that the forecasting result is relatively stable, but the error is still high. The error of LSTM has obviously decreased, which shows that the fluctuation range of prediction result of LSTM is smaller and more accurate. BiLSTM has the highest prediction accuracy, and the prediction result is more stable than other neural networks. To sum up, the neural network is more suitable for dealing with multi-feature input and long-term large-scale load data than machine learning, and BiLSTM can learn the relationship of load data from two directions, so compare with other models, BiLSTM is relatively stable and accurate. Therefore, this paper adopts BiLSTM as the basic prediction model.

5.3.2 Comparison of the hybrid model prediction results

BiLSTM is superior to other neural networks, but hyperparameter setting according to experience is random and lacks theoretical basis. Therefore, it is necessary to optimize the hyperparameters of BiLSTM. The population number of IBOA is set to 20 and the maximum iteration number is set to 200.

As can been seen from Table 5, compared with the single BiLSTM, the prediction accuracy of the improved BiLSTM is obviously increased, which indicates that the IBOA can fully mine the potential information in discontinuous data and effectively improve the prediction accuracy and fitting degree. Because the original load series tends to be nonlinear, random and fluctuating, the performance of load forecasting will be not significantly improved only by using the improved BiLSTM. Therefore, it is necessary to use the mode decomposition algorithm to decompose the original data to reduce the prediction difficulty and the prediction error. Compared with IBOA-BiLSTM, the MAE, MAPE and RMSE of CEEMDAN-RCMSE-IBOA-BiLSTM are reduced by 14.59%, 19.35% and 13.28%, and R² is increased by 0.21%, which effectively verifies that the feature extraction of the original load series can effectively eliminate the influence of nonlinearity and randomness of the load series.

Table 5
Prediction performance comparison of the hybrid models

Model MAE (kW) MAPE (%) RMSE (kW) R²

CEEMDAN-RCMSE-IBOA-BiLSTM 796 1.25 1,103.4 0.9935

the method in [17] 782 1.26 1,151.2 0.9929

the method in [22] 838 1.39 1,241.3 0.9925

IBOA-BiLSTM 932 1.55 1,272.4 0.9914

BiLSTM 1,927 2.80 2,477.9 0.9771

Model	MAE (kW)	MAPE (%)	RMSE (kW)	R²
CEEMDAN-RCMSE-IBOA-BiLSTM	796	1.25	1,103.4	0.9935
the method in [17]	782	1.26	1,151.2	0.9929
the method in [22]	838	1.39	1,241.3	0.9925
IBOA-BiLSTM	932	1.55	1,272.4	0.9914
BiLSTM	1,927	2.80	2,477.9	0.9771

Meanwhile, the hybrid models in [17] and [22] are used to predict the data set in this paper. It can be seen from Table 5 and Fig. 7 that all the hybrid methods have good performance, but CEEMDAN-RCMSE-IBOA-BiLSTM is the closest to the actual load data and its MAPE is 1.25%. In the peak period, the error value of CEEMDAN-RCMSE-IBOA-BiLSTM is the smallest value, which is still closet to actual value, indicating that it can achieve accurate prediction and deal with uncertainty well. All these verify the superiority of the hybrid method. To sum up, compared with other hybrid methods, the hybrid method based on CEEMDAN-RCMSE and improved BiLSTM can effectively eliminate the influence of nonlinearity and randomness of load series, thus improving the forecasting performance.

Fig. 7

The prediction results of the hybrid model.

5.3.3 Comparison of error correction prediction model results

The hybrid forecasting model using decomposition method is limited by the cumulative error in the process of forecasting, and the forecasting accuracy cannot be further improved. Therefore, the proposed method adopts an error correction model to correct the error of decomposition algorithm and the accumulated error in the prediction process. As can be seen from Fig. 8 and Table 6, compared with the model without error correction, the prediction accuracy of the proposed model is obviously improved. The MAPE of the proposed method is 0.96%, which is 22.96% higher than that without error correction, and the R² of the corrected model is 0.9964, which is the closest to 1 and the closest to the real value. In the peak period, the proposed method is closer to the actual value and has higher ability to deal with uncertainty. To sum up, the error correction model can fully mine the hidden information in the error sequence, reduce the error in the reconstruction of the subsequence and caused by the limitations of the model itself, deal with uncertainty of load sequence better and effectively improve the prediction accuracy of the model.

Fig. 8

The prediction results of error correction model.

Table 6

Prediction performance comparison of error correction model

Model	MAE (kW)	MAPE (%)	RMSE (kW)	R²
the proposed method	645	0.96	827.3	0.9964
CEEMDAN-RCMSE-IBOA-BiLSTM	796	1.25	1,103.4	0.9935

5.4 Sensitivity analysis

To observe the forecasting capacity of the proposed hybrid forecasting model when a parameter changes, the sensitivity analysis is conducted to explore the sensitivity of forecasting results in the algorithm. The MAE is applied to evaluate the extent to which parameters impact the properties of the proposed method. The parameters considered by IBOA are the butterfly number and iteration number. The hyperparameter combination of BiLSTM is obtained by IBOA, so BiLSTM considers the number of hidden layers under the optimal hyperparameter. For the number of butterflies, the parameter is set to 10, 20, 30 and 50. The number of iterations is 100, 150, 200, 250 and 300, and the number of BiLSTM hidden layers is set to single layer and double layer.

As can be seen from Fig. 9, when the number of BiLSTM hidden layer is fixed, with the change of IBOA parameters, MAE also changes accordingly. When the IBOA parameters are fixed, MAE also change under different layers of BiLSTM. The number of butterflies and iterations and the number of layers of BiLSTM have certain influence on the prediction accuracy, so the parameters of the proposed method should be selected reasonably. In single-layer BiLSTM, when the number of butterflies is 20 and iterations is 200, the MAE is 645 kW. Similarly, in the double-layer, when the number of butterflies is 30 and iteration is 150, the MAE is 682 kW. Therefore, it is of great significance to select appropriate parameters in load forecasting, which has a great influence on the prediction accuracy.

Fig. 9

The sensitivity analysis for each parameter involved in the proposed method.

5.5 Error analysis of prediction results

In order to further analyze the prediction errors of each model, this paper draws the figure of error of 144 data points in the test set and uses Gaussian mixture distribution to test and statistically analyze the errors of each hybrid model. It can be seen from Fig. 10 that the mean value of RMSE and MAPE of the proposed method is the smallest and the model can always be stable. At the peak and valley period, the proposed model can keep a low prediction error and can adapt to the uncertainty in power system. From the histogram in Fig. 11(a) and the fitting curve in Fig. 11(b) describing the error distribution of each model, it can be seen that in the error comparison of different prediction models, the error value of the proposed method is mainly concentrated near zero, with the most concentrated distribution and the narrowest distribution area compared with other models. From the statistical point of view, the proposed method has a lower probability of producing large errors, which also verifies the stability and reliability of the proposed model.

Fig. 10

The error of 144 data points.

Fig. 11

Error distribution diagram.

Figure 12 shows the comparison results of the proposed method with other hybrid models in three dimensions: standard deviation, R² and RMSE. According to Taylor diagram, the standard deviation and RMSE of the proposed method are the smallest. Compared with other hybrid models, the prediction result of the proposed method is the closest to the actual value.

Fig. 12

Taylor diagram.

5.6 Applicability verification

To further verify that the stability and reliability of the proposed method, an applicability case is added. Short-term load forecasting is carried out for 8832 pieces of data with granularity of 30 minutes from July 1, 2019 to December 31, 2019 in Singapore. Taking the load data from July 1, 2019 to December 29, 2019 as the training set data. Taking the load data from December 30, 2019 to December 31, 2019 as the test data. The parameter setting of decomposition and reconstruction method is the same as above, and the computational parameters of IBOA-BiLSTM are obtained by the same method as above. The final prediction results are shown as Fig. 13.

Fig. 13

Prediction results of hybrid model in applicability verification.

Figure 13 and Table 7 show the prediction results of the hybrid methods in different references. Besides comparing with the hybrid methods in [17] and [22], the comparison among the proposed method and novel prediction methods in [23–25]. Compared with other hybrid models in other references, the proposed method has the smallest error. Its MAE, MAPE and RMSE are 25.77 MW, 0.44% and 32.89 MW, respectively. At the peak and valley period, the proposed method is the closest to the actual value and has the strongest ability to deal with uncertainty compared with the hybrid models in other references. All these prove that the proposed method is not only superior to other novel hybrid methods, but also can achieve accurate prediction under different data set.

In practical application, noise data will inevitably be generated, which increase the uncertainty of load forecasting. To fully verify the reliability of the proposed method, the power price series with strong randomness and many abnormal values is added to the input variable as noise data. R² is used as the evaluation standard to evaluate the ability to deal with noise data. It can be seen from Table 8 that the prediction accuracy of all noiseless hybrid models is higher than that of noisy models. After the noise sequence is removed, the R² of the proposed method decreases from 0.9971 to 0.9950. On the premise of maintaining the highest accuracy, the R² of the proposed method decreases the least, which shows that it can deal with noise data and deal with uncertainty best.

Table 7

Prediction results of hybrid model in applicability verification

Model	MAE (MW)	MAPE (%)	RMSE (MW)
The proposed method	25.77	0.44	32.89
The method in [17]	36.69	0.62	51.87
The method in [22]	46.65	0.72	58.36
The method in [23]	48.94	0.85	62.59
The method in [24]	54.65	0.89	64.26
The method in [25]	57.31	0.96	68.58

Table 8

The R² value of hybrid models with noise and noiseless

Model	Noiseless	Noise
The proposed method	0.9971	0.9950
The method in [17]	0.9928	0.9902
The method in [22]	0.9927	0.9903
The method in [23]	0.9914	0.9885
The method in [24]	0.9901	0.9871
The method in [25]	0.9885	0.9829

The receiver operating characteristic-area under the curve (ROC-AUC) is usually used to evaluate the performance of classification problems. Especially in binary classification problems, the performance of the classifier is evaluated by true positive rate (TPR) and false positive rate (FPR) under different thresholds. Kappa coefficient is a statistical index used to evaluate the performance of classification model, which is especially suitable for evaluating the performance of classifier in multi-category problems. It considers the classification accuracy of the model and corrects the accidental consistency of classification. Kappa coefficient is usually used to measure the consistency between observers, and can also evaluate the consistency of classification models. When the proposed method is applied to classification problems, ROC-AUC and Kappa coefficient perform well, and Table 9 shows the comparison with other hybrid methods.

Table 9

The ROC-AUC and Kappa coefficient of each hybrid model

Model	ROC-AUC	Kappa
The proposed method	0.968	0.938
The method in [17]	0.927	0.897
The method in [22]	0.935	0.869
The method in [23]	0.874	0.833
The method in [24]	0.714	0.879
The method in [25]	0.707	0.539

To sum up, the CEEMDAN-RCMSE in extracting and processing the differentiated and refined features of load series and IBOA-BiLSTM error correction forecasting model can deal with uncertainty well and perform well in stability and reliability.

6 Conclusion

Short-term load forecasting of power system plays an important role in economic dispatching, ensuring the effective utilization of renewable energy and optimal scheduling, and ensuring the safe and stable operation of power system. Short-term load forecasting is taken as the research content in this paper. Aiming at the problem of high-frequency components caused by load decomposition, and in order to solve the problems that the traditional deep learning model has long-sequence information loss and the prediction error cannot be corrected, a hybrid short-term load forecasting method using load feature extraction based on CEEMDAN and RCMSE and improved BiLSTM error correction model is put forward. Through the analysis and verification of concrete examples, the following conclusions are drawn: (1)

Aiming at the non-stationary and nonlinear characteristics of load series, CEEMDAN algorithm is introduced to decompose the preprocessed original load series, which effectively reduces the fluctuation and nonlinearity of the original load series. The complexity of each mode component after CEEMDAN decomposition is calculated by using RCMSE, and the subsequence is aggregated according to RCMSE values, which reduces the model complexity and calculation scale and improves the prediction efficiency.

(2)

It is difficult for a single prediction model to achieve accurate prediction, so IBOA is used to search the optimal hyperparameter combination of BiLSTM from both global and local perspectives, effectively avoiding the problems of easy falling into local optimal value, low convergence accuracy and reduced population diversity. The short-term load forecasting model based on CEEMDAN-RCMSE and improved BiLSTM is constructed. The MAPE of the hybrid model is 1.25%, which is about 10% higher than the hybrid model in other references. These effectively prove that the proposed hybrid model can improve the accuracy and generalization of the forecasting model and fully excavates the deep time series characteristics of the data.

(3)

The load forecasting model considering error correction overcomes the problem of inaccurate forecasting caused by adding white noise, data defects and model limitations to some extent. The MAPE of the proposed model is 0.96%, which is 23.30% higher than the hybrid model without error correction and about 28% higher than other hybrid models. These indicate that the proposed model can fully mine the hidden information in the sequence and further improves the forecasting accuracy of the forecasting model.

The research content of this paper not only provides a reference for short-term load forecasting methods and diversity of choices, but also has a good reference significance for studying other forecasting problems in the field of power system.

Footnotes

Acknowledgments

This work was supported by Innovation Fund for Production, Education and Research in Chinese Universities (2022IT039) and Basic Research Project of Education Department of Liaoning Province (LJKQ20222276).

References

Jin

Guo

J.T.

Mohamed

M.A.

Wang

M.Q.

, A novel model predictive control via optimized vector selection method for common-mode voltage reduction of three-phase inverters, IEEE Access7 (2019), 95351–95363.

Han

F.J.

Wang

X.H.

Qiao

Shi

M.J.

T.J.

, Review on artificial intelligence based load forecasting research for the new-type power system, Proceedings of the CSEE43(22) (2023), 8569–8592.

Zhao

Wang

H.M.

Kang

Zhang

Z.Y.

, Temporal convolution network-based short-term electrical load forecasting, Transactions of China Electrotechnical Society37(5) (2022), 1242–1251.

Bhatti

Anuradha

, Short-term load forecasting with using multiple linear regression, International Journal of Electrical and Computer Engineering10(4) (2020), 3911–3917.

Chen

P.G.

Fang

Y.J.

, Short-term load forecasting of power system for holiday point-by-point growth rate based on Kalman filtering, Engineering Journal of Wuhan University53(2) (2020), 139–144.

Yang

G.H.

Zheng

H.F.

Zhang

H.H.

Jia

, Short-term load forecasting based on Holt-Winters exponential smoothing and temporal convolutional network, Automation of Electric Power Systems46(6) (2022), 73–82.

Najafzadeh

Etemad-Shahidi

Lim

S.Y.

, Scour prediction in long contractions using ANFIS and SVM, Ocean Engineering111 (2016), 128–135.

Saberi-Movahed

Najafzadeh

Mehrpooya

, Receiving more accurate predictions for longitudinal dispersion coefficients in water pipelines: Training group method of data handling using extreme learning machine conceptions, Water Resources Management34 (2020), 529–561.

Saberi-Movahed

Mohammadifard

Mehrpooya

, et al. Decoding clinical biomarker space of COVID-19: Exploring matrix factorization-based feature selection methods, Computers in Biology and Medicine146 (2022), 105426.

10.

Moradzadeh

Zakeri

Shoaran

Mohammadi-Ivatloo

Mohammadi

, Short-term load forecasting of microgrid via hybrid support vector regression and long short-term memory algorithms, Sustainability12(17) (2020), 7076.

11.

Wang

Zheng

Dai

Z.M.

Zhang

K.F.

, Multi-energy load forecasting in integrated energy system based on ResNet-LSTM network and attention mechanism, Transactions of China Electrotechnical Society37(7) (2022), 1789–1799.

12.

R.X.

Yuan

Z.Y.

Lei

Zheng

J.C.

Luo

X.J.

, Building thermal load prediction using deep learning method considering time-shifting correlation in feature variables, Journal of Building Engineering61 (2022), 105316.

13.

Xiao

Z.C.

L.J.

Zhang

H.J.

Zhang

X.T.

Y.X.

, HVAC load forecasting based on the CEEMDAN-Conv1D-BiLSTM-AM model, Mathematics11(22) (2023), 4630.

14.

Wen

J.H.

Wang

Z.J.

, Short-term power load forecasting withhybrid TPA-BiLSTM prediction model based on CSSA, ComputerModeling in Engineering & Sciences136(1) (2023), 749–765.

15.

Meng

F.B.

Zou

Q.Q.

Zhang

Z.Y.

Wang

H.R.

Abdullah

H.M.

Almalaq

Mohamed

M.A.

, An intelligent hybrid wavelet-adversarial deep model for accurate prediction of solar power generation, Energy Reports7 (2021), 2155–2164.

16.

Jin

Zhuo

Mohamed

M.A.

, A novel approach based on CEEMDAN to select the faulty feeder in neutral resonant grounded distribution Systems, IEEE Transactions on Instrumentation and Measurement69(7) (2020), 4712–4721.

17.

Chen

Z.X.

Jin

Zheng

X.D.

Liu

Y.L.

Zhuang

Z.Y.

Mohamed

M.A.

, An innovative method-based CEEMDAN-IGWO-GRU hybrid algorithm for short-term load forecasting, Electrical Engineering104 (2022), 3137–3156.

18.

Chang

Y.F.

Yang

Z.X.

Pan

Tang

Huang

W.C.

, Ultra-short-term wind power prediction based on CEEMDAN-PE-WPD and multi-objective optimization, Power System Technology47(12) (2023), 5015–5025.

19.

Tian

Che

D.T.

Ding

, Automatic sleep scoring based on refined composite multi-scale entropy and support vector machine, Journal of Shanghai Jiao Tong University53(3) (2019), 321–326.

20.

Meng

G.S.

Gao

D.D.

, Improved butterfly optimization algorithm and its application in solving inverse kinematics problem of redundant manipulators, Manufacturing Technology & Machine Tool722(8) (2022), 91–96.

21.

Najafzadeh

Oliveto

, Riprap incipient motion for overtopping flows with machine learning models, Journal of Hydroinformatics22(4) (2020), 749–767.

22.

Yao

H.R.

C.X.

Zheng

X.J.

Yang

, Short-term load combination forecasting model integrating ACMD and BiLSTM, Power System Protection and Control50(19) (2022), 58–66.

23.

Zhao

X.Y.

Q.J.

Zhu

, Short-term power load forecasting based on CEEMDAN and TCN-LSTM model, Science Technology and Engineering23(4) (2023), 1557–1564.

24.

Wang

Ding

Z.T.

Zheng

J.Y.

Zhang

K.F.

, A Transformer-based method of multienergy load forecasting in integrated energy system, IEEE Transactions on Smart Grid13(4) (2022), 2703–2714.

25.

Deng

D.Y.

Zhang

Z.Y.

Teng

Y.F.

Huang

, Short-term electric load forecasting based on EEMD-GRU-MLR, Power System Technology44(2) (2020), 593–602.

A hybrid short-term load forecasting method using CEEMDAN-RCMSE and improved BiLSTM error correction

Abstract

Keywords

1 Introduction

2.1 Complete ensemble empirical mode decomposition with adaptive noise

3.1 Bidirectional long short time memory network

3.2.1 Improved butterfly optimization algorithm

5.1 Data description and evaluation metrics

5.1.1 Data description

Table 1 Division and characteristics of load data set after preprocessing Data set Complete data Training set Test set Sample size 8,640 8,496 144 Maximum 99,297 99,297 87,862 Minimum 36,785 36,785 43,541 Average 64,399 64,397 60,427 Outlier size 64 62 –

5.3.1 Comparison of the single model prediction results

Footnotes

Acknowledgments

References

Table 1
Division and characteristics of load data set after preprocessing

Data set Complete data Training set Test set

Sample size 8,640 8,496 144

Maximum 99,297 99,297 87,862

Minimum 36,785 36,785 43,541

Average 64,399 64,397 60,427

Outlier size 64 62 –