GRU combined model based on multi-objective optimization for short-term residential load forecasting

Abstract

Reliable and accurate short-term forecasting of residential load plays an important role in DSM. However, the high uncertainty inherent in single-user loads makes them difficult to forecast accurately. Various traditional methods have been used to address the problem of residential load forecasting. A single load forecast model in the traditional method does not allow for comprehensive learning of data characteristics for residential loads, and utilizing RNNs faces the problem of long-term memory with vanishing or exploding gradients in backpropagation. Therefore, a gated GRU combined model based on multi-objective optimization is proposed to improve the short-term residential load forecasting accuracy in this paper. In order to demonstrate the effectiveness, GRUCC-MOP is first experimentally tested with the unimproved model to verify the model performance and forecasting effectiveness. Secondly the method is evaluated experimentally with other excellent forecasting methods: models such as DBN, LSTM, GRU, EMD-DBN and EMD-MODBN. By comparing simulation experiments, the proposed GRU combined model can get better results in terms of MAPE on January, April, July, and November load data, so this proposed method has better performance than other research methods in short-term residential load forecasting.

Keywords

Short-term residential load forecasting gate recurrent unit multi-objective optimization algorithm deep learning

Nomenclature

Variables
z _t	GRU update gate
r _t	GRU reset gate
h _t	GRU output
x _t	GRU input
Ω	Optimization algorithm decision space
F (x)	Optimization algorithm objective space
Z ^*	MOEA/D reference point
λ	Optimization algorithm weight vector
v _i	Normalizing factor
P	MOEA/D population size
T	MOEA/D neighborhood size
S	MOEA/D maximum number of iterations
G	GRU network forecasts
Ave	Mean of GRU network forecasts
Rel	Actual values of load data
y _max	The maximum value in the adopted dataset
y _min	The minimum value in the adopted dataset
W	Parameter matrix of the GRU cell
w	Sub-model weight vector
iter	MOEA/D maximum number of iterations
b	Sub-model bias vector
f (x)	Objective function value
y	Actual value of data sample
y′	Forecasts of forecasting models

Abbreviations
GRU	Gate Recurrent Unit
MOEA/D	Multi-objective Evolutionary Algorithm Based on Decomposition
DBN	Deep Belief Network
RNN	Recurrent Neural Network
LSTM	Long-Short Term Memory
AMI	Advanced Metering Infrastructure
DSM	Demand Side Management systems
ES	Energy Storage
DR	Demand Response
ToU	Time-of-Use
SVR	Support Vector Machine
FIR	Fuzzy Inductive Reasoning
ANN	Artificial Neural Networks
DNN	Deep Neural Networks
EMD	Empirical Mode Decomposition
EMD-DBN	Empirical Mode Decomposition-Deep Belief Network
EMD-MODBN	Empirical Mode Decomposition-Multi-objective Deep Belief Network
MISO	Multi-inputs Single-output
SGSC	Smart Grid Smart City
PF	Pareto Frontier
MOP	Multi-objective Optimization Problem
VMD	Variational Mode Decomposition
RBM	Restricted Boltzmann Machine
EMD-LSTM	Empirical Mode Decomposition-Long-Short Term Memory
MSSA-GRU	Multivariate Singular Spectrum Analysis-Gate Recurrent Unit
Seq2Seq	Sequence to Sequence
BPNN	Back Propagation Neural Network
MAPE	Mean Absolute Percentage Error
RMSE	Root Mean Squared Error
GRUCC-MOP	GRU combined model based on multi-objective optimization

1 Introduction

As AMI, smart meters become widespread in residential networks, and provide an opportunity to improve grid energy management capabilities through residential load data [1]. Short-term residential load forecasting plays an important role in DSM [2]. For utilities, peak load shaving can be achieved by establishing DSM, cooperating with ES system, and intelligent DR technology to achieve efficient and economical use of electric power. For residents, consumers can play a significant role in the operation of the smart grid by shifting energy consumption from on-peak to off-peak in ToU, and avoiding expensive electricity costs [3]. How to accurately forecast the residential load has become a research hotspot.

At present, the main research object of load forecasting is the total load of the regional grid, but there are few studies for individual residents. The main methods of load forecasting include SVR [4], FIR [5], ANN [6, 7], and DNN [8 –10]. These methods have been achieved high accuracy in grid-level load. In reference [11, 12], DBN has been used to forecast hourly grid level load with small errors; in reference [13, 14], EMD was used to decompose the load data into multiple pieces of sub-data and then make forecasting for each sub-data separately; In reference [33], VMD was used to decompose residential data and extract sequence information from it for data forecasting, but the effectiveness of this method in different datasets was not confirmed. The final forecast results are superimposed from the forecast results of all sub-data. These methods weaken the complexity of data through decomposition, but it will destroy the correlation of original data.

Compared with the grid-level load, the residential load is more complex. Due to the instability and randomness of resident behavior, the fluctuation of residential load is much higher than that of grid-level load. Due to the different living habits of the different resident, the characteristics of the residential load is different; even the electricity consumption habits of the same resident may vary. So, the residential load forecast model not only learns all the hidden knowledge fully in load data, but also has good generalization capability [15]. Obviously, a single residential load forecast model is difficult to achieve an effective forecast of residential load, because it can only learn some data characteristics, but it cannot achieve comprehensive [16]. Therefore, ensemble learning is introduced into the load forecast model to solve this problem. Firstly, multiple sub-models are used to learn the characteristics form residential load data, and then these sub-models are integrated into a combined model. The combined model can learn more characteristics form residential load data, and improve the accuracy and generalization capability. Compared to a single DBN, multiple DBN is used in reference [17]; the experimental results demonstrate that the combined model has higher forecast accuracy. In recent years, RNN has been applied to load forecasting, it can provide more sub-model options for combined model. Reference [34] used RNN model to forecast electricity load, and the experiment shows that RNN model has better forecasting effect, but it also exposes the drawback of RNN model, that is, it is only applicable to the dataset that deals with a small amount of data. Reference [18] uses LSTM for short-term residential load forecasting for a single electricity customer. Experimental results show that LSTM network has significant advantages compared to other deep learning for residential load forecasting. References [35 –37] all used LSTM models for short-term load forecasting in the region, and the models were experimentally verified to have a high degree of accuracy. However, in references [35, 36], the hyperparameters of the model are artificially specified and not experimentally verified, which is highly subjective: while reference [37] only considers the influencing factors in the text and ignores other influencing factors. Like LSTM, GRU can solve the gradient problem in long-term memory and direction propagation effectively, GRU is computationally cheaper. Reference [39] used the GRU model to forecast wind power data, and it was experimentally verified that the model could obtain higher accuracy and forecast correlation. However, due to the different wind power forecasting scenarios, the model is unable to continuously forecast based on the updates and inputs of wind data, so only small-scale wind power forecasting can be considered. Reference [38] used the GRU model to forecast electricity loads, but the same problem of hyperparameters subjectively considered to be set was exposed. In reference [19], GRU is used to forecast the residential load, the experiment results show that the forecasting accuracy based on GRU is similar to that based on LSTM, the simulation time of GRU is shorter than that of LSTM. Reference [40] uses the Seq2Seq model to model and forecast the power generation of wind farms. The model achieves better forecasting results as verified by forecasting in two different wind farms. The model also has high accuracy in forecasting the maximum, minimum and average values of power load. Moreover, the model only needs its own lag value, which improves the efficiency and reduces the computational cost at the same time.

In order to improve the accuracy and generalization capability, GRU combined model based on multi-objective optimization is proposed in this paper. Firstly, the multi-objective optimization algorithm is used to optimize the hyper-parameters of GRU networks, where GRU is used to construct multiple high-quality sub-models. Secondly, the forecast results of multiple high-quality sub-models are combined to obtain the final forecast results. In terms of multi-objective optimization algorithms, the improved (MOEA/D) [20] is used to optimize GRU network for accuracy and diversity. Then, the final population is optimized in the second stage based on the non-dominated sorting [21], further optimizing the population while removing the bad individuals. The multi-objective optimization algorithm not only improves the accuracy of the sub-models, but also ensures the differences between them, ie. GRU combined model formed by integrating multiple similar sub-models does not differ significantly from a single model in terms of forecast accuracy [17]. DBN combined strategy is used to integrate the forecast results of each sub-model. Because of the strong nonlinear mapping capabilities, DBN network is used to construct the relationships between each sub-model and the final forecast result. The final forecasts can be obtained by integrating several high-quality sub-models, the accuracy and generalization are improved. The data provided by the SGSC program in Australia is applied to demonstrate the superiority of the method, it is compared with DBN [11], LSTM [18], GRU [19], EMD-DBN [13] and EMD-MODBN [17]. The experimental results show that GRUCC-MOP not only has high accuracy, but also has good generalization ability.

The main contributions of the paper:

A short-term residential electricity load forecasting network based on multilayer multiple-input single-output GRUs is utilized to optimize the hyperparameters of the GRU network using the improved MOEA/D algorithm, and decompose the load data into different sub-models in order to improve the accuracy and diversity of the residential load forecasting network.

The update method based on maximum fitness difference is proposed to improve the MOEA/D algorithm to enhance the forecasting effect of the sub-models decomposed by the algorithm. A two-stage optimization method is used to further screen the sub-models and enhance the forecasting effect of the sub-models.

Using a DBN-based integration strategy, the entire optimized and filtered high quality short-term residential load forecasting sub-models are used to improve the accuracy and generalization ability of the GRU combinatorial model-based residential electricity load forecasting method in this paper.

Experimentally compare the method proposed in this paper with the unimproved model to verify the effectiveness of the method and analyze the forecasting accuracy. Experimentally verify the forecasting method proposed in this paper with other excellent forecasting methods to highlight the superiority of the method proposed in this paper.

The rest of the paper is summarized as follows: Section 2 describes GRU short-term load forecasting network. Section 3 proposes an improved MOEA/D algorithm for optimizing the GRU network, and the combination strategy employed to improve the accuracy of short-term residential load forecasts. In Section 4, the relevant settings for simulation experiments are explained. Also, GRUCC-MOP is examined to highlight its effectiveness. In Section 5, GRUCC-MOP is compared with other good short-term load forecasting methods on several datasets to highlight the superiority of the method. Finally, Section 6 gives the conclusion of this paper.

2 GRU short-term load forecasting network

Residential load fluctuations are difficult to forecast due to the uncertainty and randomness of residents. The different habits of residents and different loads in the household make the residential load characteristics more complex, and the habits of the same residents are different in different seasons. Therefore, the difficulty of residential load forecasting is increased. Changes in residential loads are dramatic due to the uncertainty of the electricity consumption habits and the volatility of residential loads, and even subtle factors can cause drastic fluctuations. In order to address this feature, GRU short-term forecasting method based on the combined model with multi-objective optimization is proposed, it will improve the accuracy of short-term residential load forecasting.

Because the residential load data is a kind of time series, RNN has certain advantages in processing time series, it can capture the time series information in the residential load data sequence, and back-propagation for learning, so RNN is commonly used in residential load forecasting. However, due to the internal loop unit, RNN deals with long-time sequences with increased data volume, the phenomenon of gradient vanishing or exploding occurs when the processing reaches a certain sequence length. Distinguishing from the traditional RNN, LSTM alters the transmission of historical information in structural units, which mitigates the gradient vanishing and gradient exploding problems to learn long-time sequences. GRU is used to simply LSTM unit structure, and the gate structure is simplified to update gate and reset gate, gradient problems can be mitigated, and the learning time of the network can be reduced. The efficiency of short-term residential load forecasting will be improved.

The framework of short-term load residential forecasting based on GRU in this paper is shown in Fig. 1. It consists of an input layer, an output layer and GRU layer. Where the residential load data from different time steps are used as input layer to GRU layer. GRU layer is consists of a multi-layer with multi-inputs and single-output structure, and the output of GRU layer is input to the feed-forward neural network in the output layer, so as to obtain the forecast value finally. GRU layer of is built from a multi-layer with MISO, which can learn the effect of historical information form output, so it is particularly suitable for processing time series signals [28]. The structure of single GRU layer with MISO is displayed in Fig. 2.

Fig. 1

Framework of GRU short-term load forecasting network.

Fig. 2

Structure of single GRU layer with MISO.

GRU network forecasts the load data for t + 1st time steps by learning the load data of the past t time steps. Each GRU cell transmits the historical information and the information learned in the current time step to the next cell. In the last GRU cell, the transmission of information is stopped, and the information of the current cell is output. The output will form the load forecast value through a feed-forward neural network. During network training, the loss is obtained through the forecasting load value and the actual load value, which is used for back-propagation. During the load forecast period, this value will be reconstituted with the input value of the past t - 1 time steps to form a new input to forecast the load in the t + 2st time step.

GRU network is a variant of RNN [22]. It can maintain the advantages of RNN in time series processing and overcome the problem of vanishing or exploding gradients [23, 24]. GRU cell with a single time step is shown in Fig. 3.

GRU cell contains two gate structures: update gate z_t and reset gate r_t. The output h_t of the t-th time step is obtained by selecting and combining the inputs x_t and the intermediate output h_t-1 through the two gate structures mentioned above. The new output h_t can save the relevant information of each previous input signal. This characteristic makes GRU have better fitting effect than traditional neural network in time series processing. The mathematical model of GRU cell can be described as follows:

Fig. 3

GRU cell with a single time step.

$z_{t} = σ (W_{z} [h_{t - 1}, x_{t}] + b_{t})$ (1) $r_{t} = σ (W_{r} [h_{t - 1}, x_{t}] + b_{r})$ (2) ${\tilde{h}}_{t} = tanh (W_{h} [r_{t} ⊙ h_{t - 1}, x_{t}] + b_{h})$ (3) $h_{t} = (1 - z_{t} ⊙ h_{t - 1} + z_{t} ⊙ {\tilde{h}}_{t})$ (4) where W and b represent the parameter matrices. In the network, the parameter matrices of each cell are duplicated. The new output h_t for each unit can be formed into an effective output through the regression layer. When training the network, the effective output will be compared with the actual value to obtain the loss, and the loss is used to update and adjust the parameter matrices to achieve the purpose of training the network. GRU network is used as the basic network of the short-term load forecast combined model. To utilizing the powerful capability of processing time series, it is improved to short-term load forecast accuracy based on combined model.

3 Improved MOEA/D optimization algorithm and combined strategy

In this paper, the core objective of MOEA/D algorithm in GRU short-term residential load forecasting is to decompose the residential load data into different sub-data models on a time-series basis, and different sub-models will be optimized to obtain their corresponding optimization solutions; the solution set of the optimization solutions come form PF, it makes the algorithm update only the worst individual each time, and increases the residence time of the relatively good individual, prevents the population from falling into a local optimum during the algorithm optimization period. It will optimize the forecast of the sub-models decomposed by MOEA/D algorithm, and increase the accuracy of GRU short-term residential load forecasting.

The flow chart of the improved MOEA/D algorithm and combined strategy for GRU short-term residential load forecasting is shown in Fig. 4. The specific steps can be divided into two parts: optimization and combination.

In the optimization part, the improved MOEA/D algorithm is used to optimize GRU network. The objective space of the optimization algorithm is constructed from the accuracy and diversity of GRU network. The decision space is constructed from the hyper-parameters of GRU network, which include the learning rate, the initialization weight of each gate, and the number of hidden nodes. Then, the final population is optimized based on the non-dominated ordering in the second stage. After optimization, several sub-models with high accuracy and differences from each other can be obtained.

In the combination part, DBN network is used to combine the forecast values of several high-quality sub-models to obtain the final forecast value. The effect of preventing overfit is achieved by integrating several sub-models.

3.1 MOEA/D sub-model optimization algorithm

Fig. 4

Flowchart of improved MOEA/D algorithm and combined strategy for GRU short-term residential load forecasting.

Optimization problems are composed of several conflicting sub-problems, which can be decomposed into multiple sub-problems collectively referred to as MOPs [25]. A MOP can be defined as: $\begin{matrix} maximize F (x) = (f_{1} (x), \dots, f_{2} (x)) \\ subject to : x \in Ω \end{matrix}$ (5) where Ω is the decision space. F (x) represents the objective space, which consists of the decision variable x mapped by an objective function. For a point x^*, if F (x^*) remains optimal for each subproblem, it is called Pareto optimal vector. The set of all Pareto optimal vectors is called PF.

Currently, MOEA/D is considered as one of the best methods to obtain PF [26]. The core is to decompose the optimization problem into several sub-problems by using weight vectors and solving each of them to obtain GRUCC-MOP. In the initialization, a set of weight vectors is generated in the objective space, each vector corresponding to an individual. In the iterative process, the sub-generation points are obtained by differential evolutionary algorithm and Gaussian variation algorithm, and the fitness value of each point is obtained by aggregation function. The population is updated by comparing the fitness values between individuals. The uniform distribution of weight vectors enables MOEA/D optimization algorithm to obtain good population diversity. When dealing with low-dimensional problems, the update method based on the comparison of fitness values can reduce the time complexity of the algorithm. In this paper, the improved MOEA/D optimization algorithm is used to optimize GRU network.

3.1.1 Objective space and decision space

The decision space of MOEA/D is composed of hyper-parameters, including learning rate, initialization weight of each gate, and number of hidden nodes: $X = {x_{1}, x_{2}, x_{3}}$ (6)

Since integrated learning requires sub-models to be different from each other, the multiple similar sub-models form a combined model that is essentially the same as a single model in terms of forecasting accuracy. In this paper, the diversity and accuracy of GRU network is defined as the objective space of MOEA/D optimization algorithm. $F (X) = {f_{1} (X), f_{2} (X)}$ (7) $\begin{matrix} f_{1} (X) = 1 / \\ | \sum_{i = 1}^{N} (G_{m}^{i} - {Ave}^{i}) \sum_{j \neq m, j = 1}^{M} (G_{j}^{i} - {Ave}^{i}) | \end{matrix}$ (8) $f_{2} (X) = \frac{1}{N} \sum_{i = 1}^{N} {(G_{m}^{i} - {Rel}^{i})}^{2}$ (9)

Set that in an optimization process, the training set contains N samples and the population contains M GRU networks. For decision space, x₁, x₂, x₃ are optimized within a given range. For objective space, f (X₁) represents the diversity of the network [17], and f (X₂) represents the accuracy of the network [29]. In Equation(8), $G_{m}^{i}$ represents the forecast value of the m-th network on the i-th sample; Aveⁱ represents the average of the GRUCC-MOP of all networks on the i-th sample; Relⁱ represents the actual value of the i-th sample. It can be seen from the mathematical relationship that when the optimization algorithm reduces f (X₂), the forecast value of each network will approach the actual value, leading to the increase in f (X₁), where f (X₁) and f (X₂) are minimized at the same time to form the opposite relationship between objective functions.

3.1.2 Tchebycheff update method

The decomposition method used in this paper is Tchebycheff approach [20]. It is considered to have superior convergence and diversity in population distribution when dealing with low-dimensional optimization problems. Since the optimization problem discussed is 2-dimensional, this method is used as the decomposition method of MOEA/D optimization algorithm. The scalar optimization problem of the traditional Tchebycheff approach is shown in Equation (10). $min_{x \in Ω} g^{tch} (X | λ, Z^{*}) = max_{1 ⩽ i ⩽ m} {λ_{i} [f_{i} (X) - z_{i}^{*}]}$ (10) where $Z^{*} = (z_{1}^{*}, \dots, z_{m}^{*})$ represents the reference point and λ represents the weight vector. For the reference point, during the initialization and iteration period, the optimization solution of the objective function in the current population is selected to form the reference point to the next iteration. For the weight vector λ ={ λ₁, λ₂ }, Equation (11) is used to generate a group of evenly distributed weight vectors. $λ_{1} + λ_{2} = 1$ (11)

In order to make MOEA/D optimization algorithm more suitable for network optimization, it improves the update method of Tchebycheff approach, and proposes the update method based on maximum fitness difference. The comparison between the traditional method and the updated method based on maximum fitness difference is shown in Fig. 5.

Fig. 5

Comparison of between traditional method and update method. (a) Fitness difference (b) Initial population (c) Traditional update method (d) Update method based on maximum fitness difference.

In Fig. 5(b), the offspring individual F (X′) dominates the parent F (X₁), F (X₂) and F (X₃). The traditional update method is updated by F (X′) for all parents, the result is shown in Fig. 5(c). This updated method will dramatically reduce the diversity of the population. In subsequent iterations, the population may be trapped in a local optimum because less paternal genetic information is available. It should be noted that the size of populations set up in this paper is small, and it is easier to fall into local optimum when using traditional update methods. To alleviate this phenomenon, the update method based on the maximum fitness difference is proposed, which defines the fitness difference as shown in Equation (12).

$\begin{matrix} g_{j} = g^{tch} (X_{j} | λ_{j}, Z^{*}) - g^{tch} (X_{i}^{'} | λ_{j}, Z^{*}), \\ (j = 1, \dots, T) \end{matrix}$ (12) where T represents the neighborhood of λⁱ. When updating, if there are g_j > 0, then the offspring is used to update the parent corresponding to the maximum g_j, as shown in Fig. 6(d). This update method only updates one individual at a time, which not only ensures the optimization of the population, but also makes the relatively excellent individuals stay longer. It effectively prevents the population from falling into local optimum.

Fig. 6

DBN structure for short-term load forecasting.

3.1.3 Normalization of objective function

In practice, there will be a large gap between the objective function values calculated by Equation(9) and Equation(10). This phenomenon makes the optimization algorithm deviate from the optimization direction when updating; the objective function values are normalized by Equation (13) before each population update. $f_{i} (X) = \frac{f_{i} (X)}{v_{i}}, (i = 1, 2)$ (13) where f_i (X) represents the objective function value and v_i represents the normalization factor. During iteration, the normalization factor takes the maximum value in the current objective function values.

Sub-networks optimization strategy based on improved MOEA/D are shown in algorithm 1.

Algorithm 1

Sub-networks optimization strategy based on improved MOEA/D
Input:
· Training data set;
· P: population size;
·T: neighborhood size;
· S: maximum number of iterations;
· a uniform spread of P weight vectors:
· its = 1;
Output: final population;
/Initialization/
1: randomly generate P individuals $X_{i} = {x_{i}^{1}, x_{i}^{2}, x_{i}^{3}}$ , (i = 1, ⋯ , P);
2: calculate the objective function values F (X_i) ={ f₁ (X_i,) , f₂ (X_i,) } of each individual through Equations (11) and (12);
3: compute the Euclidean distances between each two weight vectors, work out the T closest weight vectors to form the neighborhood;
4: each weight corresponds to an individual;
5: initialize vectors reference point $Z^{} = (z_{1}^{}, z_{2}^{*})$ ;
/Update/
6: while its < S do
7: for i = 1 to P do
8: get the normalization factor V = (v₁, v₂);
9: generate offspring individual $X_{i}^{'}$ by differential evolution algorithm and Gauss mutation algorithm;
10: calculate the objective function value of offspring individual $F (X_{i}^{'})$ ;
11: normalize the objective function value;
12: for j = 1 to T do
13: $g_{j} = g^{tch} (X_{j} \| λ_{j}, Z^{}) - g^{tch} (X_{i}^{'} \| λ_{j}, Z^{})$ ;
14: end for
15: if max (g_j) > 0, (j = 1, ⋯ , T), update neighborhood;
16: end for
17: its = its + 1;
18: end while;

3.2 Second stage optimization

When the optimization algorithm iteration ends, a final population of P (P = 30) individuals is obtained. In practice, when generating weight vectors, two boundary weight vectors λ = (1, 0) and λ = (0, 1) will be generated. When optimizing the sub-problem corresponding to the boundary weight vector, it is equivalent to considering only one of two objective functions and completely ignoring the other. It will make the individuals corresponding to two sub-problems fall into the local optimum, which will have an adverse impact on the combined output. In order to eliminate the individuals corresponding to the boundary vector and further optimize the final population, the second stage of optimization will be carried out for the final population.∥The second stage of optimization includes two steps. The first step is to calculate the H_i = f₁ (X_i) + f₂ (X_i) , (i = 1, ⋯ , P) for all individuals, and eliminate the individuals corresponding to the two maximum values H_i. The second step is to sort the P - 2 individuals, and set the individuals that are not dominated by any other individuals as the first non-dominance level. Do the same for the remaining individuals until all individuals are assigned to the corresponding non-dominance level. After sorting, only P/2 individuals were selected for subsequent operations. Priority is given to individuals with a higher non-dominance level. If they belong to the same non-dominance level, the individuals with smaller values of the objective function f₂ (X) are selected.∥

3.3 DBN combined strategy

Short-term residential load data is decomposed and optimized into different sub-models by improved MOEA/D, and the final residential short-term load forecast values are combined from the values of different sub-models. The combination strategies are weight integration and stacking integration. The weight integration is based on the contribution of each sub-model to the final model, and the contribution size is usually set manually, so the weight integration is more subjective. The stacked integration is used to input the sub-models obtained from decomposition to train DBN combined strategy, integrate the sub-models of the GRU combined model according to the learning state of DBN combined strategy, and construct the link between the forecast values of the sub-models and the final forecast value, which can improve the accuracy of the short-term residential load forecasting.∥DBN is built up from multiple RBNs [27]; it consists of an input layer, a hidden layer, and an output layer, see Fig. 6. At the end of DBN network structure, the final result is output by a BPNN. The internal mathematical expression of each hidden layer neural unit is shown in Equation (14). $h_{j} = σ (\sum_{i = 1}^{n} w_{ij} v_{i} + b_{j}) (i = 1, \dots, n; j =, \dots, m)$ (14) where w and b represent weights and biases, respectively. When training, RBN is trained layer by layer on the basis of unsupervised training. The last layer of BPNN propagates the error information from top to bottom to each RBN layer, realizing the fine-tuning of DBN to achieve the effect of training. In this paper, the final forecasting results are obtained by integrating each sub-model through DBN, which improves the forecast accuracy while ensuring the generalization ability of GRU combined model for short-term residential load forecasting.∥After the second stage of optimization, P/2 individuals will be obtained, and each individual corresponds to a high-quality sub-network. During the training period, the training output of sub-networks is used as the training input of DBN, and the actual values of the training samples are used as the training actual values of DBN. The final GRUCC-MOP can be obtained by putting it into the trained DBN. The short-term load forecasting process based on DBN combined strategy is shown in Fig. 7.

Fig. 7

Short-term load forecasting process based on DBN combined strategy.

4 Methodology and case study

4.1 Introduction of the dataset

Dataset: The data used in the experiment are residential load data of New South Wales from 2010 to 2014 provided by the Australian SGSC project [30]. This dataset includes residential load data of about 10,000 different residents as well as detailed information on retail and distributor product offers. Since the dataset is too large and it is impractical to study and analyze the load data of all residents, a subset of SGSC dataset was selected for the experiment. 25 residents with Internet and gas-fired hot water system are selected as the experimental data. Such selection criteria are intended to guarantee the presence of human activity in the selected dwellings to ensure the validity of the load data. Meanwhile, in order to reflect the influence of seasons, the load data of each residents in January, April, July, and October 2013 are selected for experiments, corresponding to summer, autumn, winter, and spring in Australia respectively.

Data processing: Since the data is recorded every half an hour (48 samples are included in a day), the time step is set to 48. In the experiment, only load data is used as training input. Use Equation (15) to standardize the data to facilitate network learning, scaling data that would otherwise span a large area into a smaller distribution space, removing units from the data, and converting the data to dimensionless values. The load data is calculated by the equation to get the data after standardization. According to the division of the dataset in this paper, the load data of the last 7 days of each month (including 336 samples) are used as the test set, and the rest of the load data of the current month are used as the training set. $y = \frac{y - y_{min}}{y_{max} - y_{min}}$ (15) where y_max represents the maximum value in the adopted dataset, y_min represents the minimum value in the adopted dataset.

4.2 Evaluation criteria

In this experiment, MAPE and RMSE were used as evaluation criteria. The expression is as follows: $MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i}^{'} - y_{i}}{y_{i}} |$ (16) $RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i}^{'} - y_{i})}$ (17) where n represents the number of samples, y_i represents the actual value of the i-th sample, and $y_{i}^{'}$ represents the forecast value of the load forecast model for the i-th sample.

4.3 Parameter setting

In this paper, some empirical rules are used to set the parameters of the model. Set epochs of GRU network to 150; The time step is set to 48; The number of network layers is set to 2 [19]. The learning rate, the initialization weight of each gate, and the number of hidden nodes constitute MOEA/D decision space, which will be optimized in a range. In order to prove that the hyperparameter ranges set in this paper have better advantages, two outer two sets of hyperparameter ranges are set, and since the initialization weights and the number of hidden nodes of each gate are set in this paper in a larger range, only the optimization range of the learning rate is changed in this experiment, and experiments are carried out to verify it on the January and July load data of the residents 100006414, 10006572, and 10017466, and the results are shown in Table 1. Where the range of initialization weights is [–0.005, 0.005] and the range of the number of hidden nodes is [25, 128].

Table 1
Comparative tests with different learning rate ranges

Resident Month [0.001,0.01] [0.01,0.1] [0.1,1]

10006414 Jan MAPE 42.0952 77.7674 46.5351

RMSE 0.0947 0.1855 0.2747

Jul MAPE 46.1289 74.5761 64.3599

RMSE 0.0987 0.1827 0.2797

10006572 Jan MAPE 48.3064 55.1466 65.2448

RMSE 0.0784 0.2276 0.3308

Jul MAPE 31.1527 52.1762 53.3262

RMSE 0.0819 0.2192 0.3215

10017466 Jan MAPE 34.9996 91.7556 81.5734

RMSE 0.0745 0.1807 0.2574

Jul MAPE 38.1613 88.7058 49.4527

RMSE 0.0787 0.1767 0.3496

Resident	Month		[0.001,0.01]	[0.01,0.1]	[0.1,1]
10006414	Jan	MAPE	42.0952	77.7674	46.5351
		RMSE	0.0947	0.1855	0.2747
	Jul	MAPE	46.1289	74.5761	64.3599
		RMSE	0.0987	0.1827	0.2797
10006572	Jan	MAPE	48.3064	55.1466	65.2448
		RMSE	0.0784	0.2276	0.3308
	Jul	MAPE	31.1527	52.1762	53.3262
		RMSE	0.0819	0.2192	0.3215
10017466	Jan	MAPE	34.9996	91.7556	81.5734
		RMSE	0.0745	0.1807	0.2574
	Jul	MAPE	38.1613	88.7058	49.4527
		RMSE	0.0787	0.1767	0.3496

Set the population size of MOEA/D optimization algorithm to 30, the number of iterations to 25, and the neighborhood size to 8. For DBN, its structural parameters depend on the scale of input data [29].

In order to obtain the most appropriate number of DBN hidden layers, the forecasting error of the load data of residents 10006414, 10006572, and 10017466 in January and July is compared when the number of DBN hidden layers is 1 to 5. The results are shown in Table 2.

Table 2

Error of DBN with different layers

Resident	Month		1	2	3	4	5
10006414	Jan	MAPE	42.8141	43.2341	43.9876	44.7662	45.1483
		RMSE	0.0892	0.0899	0.0909	0.0926	0.0934
	Jul	MAPE	53.5627	53.9024	54.4646	55.0798	55.4356
		RMSE	0.1559	0.1569	0.1583	0.1601	0.1613
10006572	Jan	MAPE	38.0616	38.7350	39.1957	40.5350	41.0557
		RMSE	0.0278	0.0283	0.0285	0.0296	0.0304
	Jul	MAPE	46.1224	46.9216	47.3453	47.6852	48.2245
		RMSE	0.1003	0.1020	0.1029	0.1033	0.1059
10017466	Jan	MAPE	50.9685	51.2803	51.7465	52.6429	52.6826
		RMSE	0.1585	0.1593	0.1610	0.1641	0.1639
	Jul	MAPE	64.5890	64.8449	65.0080	65.5403	66.4849
		RMSE	0.2152	0.2159	0.2166	0.2185	0.2219

From the results, it can be concluded that when the number of hidden layers of DBN is 1, the model achieves the minimum error on multiple groups of data. Therefore, set the number of hidden layers of DBN to 1.

4.4 Case study

4.4.1 Convergence of optimization algorithm

Figure 8 records the changing trend of the objective function of the first 20 generations in one iteration. In order to make the results more intuitive, the optimal objective function value of each generation shown in the figure is the average of the optimal objective function values of all individuals in this generation. It can be seen from the figure that both objective functions converge effectively.

Fig. 8

Change trend of objective function value.

4.4.2 Normalization of the objective function

In order to illustrate the importance of normalizing the objective function before each population update, the objective function values (not normalized) of five sub-problems in the same generation are shown in Table 3. As can be seen from Table 3, the value of f₁ (x) is much greater than f₂ (2).

Table 3
Optimal objective function value of five sub-problems

Subproblems 1 2 3 4 5

f₁(x) 8.1805 14.6527 13.6326 16.9828 10.7490

f₂(x) 0.00512 0.00481 0.00472 0.00473 0.00523

Subproblems	1	2	3	4	5
f₁(x)	8.1805	14.6527	13.6326	16.9828	10.7490
f₂(x)	0.00512	0.00481	0.00472	0.00473	0.00523

Figure 9 shows the population distribution of the two methods (the individuals corresponding to the boundary weight vector have been eliminated). In Fig. 9(a), the optimization results for MOEA/D without normalization of the objective function are recorded. Because the optimal objective function value of f₁ (x) is much larger than f₂ (x), the population falls into the local optimum in the objective function f₂ (x). In Fig. 9(b), the optimization results for MOEA/D with normalizing the objective function are recorded. Compared with Fig. 9(a), the population in Fig. 9(b) converges better on the objective function f₂ (x), and the population distribution is more uniform.

Fig. 9

Population distribution. (a) Objective function normalization is not used (b) Objective function normalization is used.

4.4.3 Update method

In order to illustrate the superiority of the update method based on the maximum fitness difference, the population distribution using the traditional update method is shown in Fig. 10. It can be seen that the diversity of the population obtained based on the traditional renewal method is very poor, most of the individuals overlap together and do not fully converge. In contrast, the population of the update method in Fig. 9(b) is better in diversity and convergence.

Fig. 10

Population distribution of traditional update method.

4.4.4 Second stage optimization

In order to illustrate the role of the second stage optimization, comparative experiments are carried out with the load data of residents 10006414, 10006572, and 10017466 in January and July.

Compare the error before/after the second stage optimization in the load forecast. The results are recorded in Table 4. The results show that the second stage optimization can improve the accuracy of short-term residential load forecasting.

Table 4
Final error with and without second stage selection

Resident Month Second stage optimization Without second stage optimization

10006414 Jan MAPE 42.8141 47.1974

RMSE 0.0892 0.0976

Jul MAPE 53.5627 54.8261

RMSE 0.1559 0.1597

10006572 Jan MAPE 38.0616 42.1143

RMSE 0.0278 0.0307

Jul MAPE 46.1224 51.6256

RMSE 0.1003 0.1134

10017466 Jan MAPE 50.9685 51.9749

RMSE 0.1585 0.1615

Jul MAPE 64.5890 67.6187

RMSE 0.2152 0.2253

Resident	Month		Second stage optimization	Without second stage optimization
10006414	Jan	MAPE	42.8141	47.1974
		RMSE	0.0892	0.0976
	Jul	MAPE	53.5627	54.8261
		RMSE	0.1559	0.1597
10006572	Jan	MAPE	38.0616	42.1143
		RMSE	0.0278	0.0307
	Jul	MAPE	46.1224	51.6256
		RMSE	0.1003	0.1134
10017466	Jan	MAPE	50.9685	51.9749
		RMSE	0.1585	0.1615
	Jul	MAPE	64.5890	67.6187
		RMSE	0.2152	0.2253

5 Performance comparison of forecasting methods

GRUCC-MOP is compared with DBN [10], LSTM [17], GRU [19], EMD-DBN [12], and EMD-MODBN [15] for forecasting accuracy. Table 5 shows the average MAPE values of 6 methods on the load data of 25 selected residents in January, April, July, and October 2013. It can be seen from the experimental results that GRUCC-MOP achieves the minimum error on all data sets. It not only has high accuracy, but also has strong generalization ability.

Table 5
Overall errors of various forecasting methods

Month DBN LSTM GRU EMD-DBN EMD-MODBN GRUCC-MOP

Jan 121.253 59.614 51.897 92.492 87.205 49.526

Apr 130.613 67.043 69.083 108.349 103.104 61.097

Jul 114.629 54.758 53.390 86.936 82.171 48.583

Oct 153.224 75.847 79.701 113.970 115.184 62.036

Month	DBN	LSTM	GRU	EMD-DBN	EMD-MODBN	GRUCC-MOP
Jan	121.253	59.614	51.897	92.492	87.205	49.526
Apr	130.613	67.043	69.083	108.349	103.104	61.097
Jul	114.629	54.758	53.390	86.936	82.171	48.583
Oct	153.224	75.847	79.701	113.970	115.184	62.036

To further illustrate the advantages of the GRU combined model, we compared GRUCC-MOP to four other excellent forecasting methods, including GRU, on load data from resident 10006414. Figure 11 shows the load profiles for 48 time steps using the GRU forecasting method, and the data results are recorded in Table 6. In this paper, the results of comparison between the proposed method and other four excellent forecasting methods at 96 time steps are also established, and the results of the experimental data are recorded in Table 7, and Fig. 12 shows the load profiles at 96 time steps, and it can also be seen that the error achieved by GRUCC-MOP is less.

Fig. 11

Comparison of the fitted curves of residential short-term load forecasts using the proposed method with four forecasting models at 48 time steps. (a) January (b) April (c) July (d) October.

Table 6

MAPE comparison of 4 forecasting methods at 48 time steps

Month	LSTM	EMD-LSTM	GRU	MSSA-GRU	GRUCC-MOP
Jan	46.6541	44.5784	43.9835	44.5948	42.8141
Apr	61.2278	60.0892	62.9208	65.9702	57.1494
Jul	70.6571	69.3855	68.5239	61.3779	53.5627
Oct	68.4765	69.3133	73.3694	72.0884	54.3063

Fig. 12

Comparison of the fitted curves of residential short-term load forecasts using the proposed method with four forecasting models at 96 time steps. January (b) April (c) July (d) October.

Table 7

MAPE comparison of 4 forecasting methods at 96 time steps

Month	LSTM	EMD-LSTM	GRU	MSSA-GRU	GRUCC-MOP
Jan	41.9125	44.2252	45.8579	41.4048	35.1329
Apr	67.2234	72.3577	61.3294	55.6982	47.1176
Jul	77.6318	67.4832	74.7291	66.2455	62.2447
Oct	63.1041	60.6816	52.4276	52.0454	40.3311

In Figs. 11 and 12, the horizontal coordinates represent time points, and the difference between the two points is 30 minutes. A 24-hour day contains 48 time points. The vertical coordinate represents the electrical load in the last 30 minutes. As can be seen from Figs. 11 and 12, the fluctuation of the curve is relatively smooth at night and more intense during the day, which is in line with the human law of sunrise and sunset. The peaks of the curves occur mainly at noon and in the evening, which are related to the regular activities of the residents, such as cooking and cleaning. Some peaks also occur between 9 p.m. and 12 p.m., which may be due to non-routine activities of residents, such as occasional parties. A comparison of the curve fit can be concluded that compared to GRU combined model and other good forecasting methods, the method proposed in this paper allows for better tracking of peak. This suggests that GRU combined model can learn more features of the data in the load and can cope with sudden changes in the load more effectively. In summary, GRU combined model has higher accuracy than other models in short-term residential load forecasting.

6 Discussions and conclusion

6.1 Discussions

In Sections 4 and 5, the superiority of the combined model proposed in this paper has been experimentally verified. In this section, the following points will be discussed with respect to the experimental results:

In Table 5, various forecasting methods achieve greater errors in forecasting load for spring and fall compared to summer and winter. Loads are generally considered to be more difficult to forecast in the summer and winter because of the larger load values in these seasons. However, the factor that actually makes the load data difficult to forecast is not the size of the load values, but the degree of load fluctuation. In spring and fall, temperatures fluctuate more dramatically than in summer and winter. The instability of temperature can lead to increased fluctuations in load, which in turn makes the load data more difficult to forecast in spring and autumn. In reference [31, 32], it also appears that the error in short-term residential load forecasting during spring and autumn is greater than that in summer and winter.

In Table 5, the errors of both DBN and EMD-MODBN are larger than those of GRU. This indicates that the forecasting accuracy of sub-models is one of the main reasons affecting the forecasting accuracy of GRU combined model. EMD-MODBN is used to integrate several DBN models, and the accuracy of EMD-MODBN is still lower than that of GRU, even though it uses the idea of integrated learning, because the forecast accuracy of DBN models is much lower than that of GRU combined models. In further improving the forecast accuracy of the combined model, the selection of sub-models will be the main research direction.

GRU combined model proposed in this paper can track the peaks better than GRU and other load forecast models. This indicates that the combined model learns more data features. It can react more effectively when there is a sudden change in the load data. This is consistent with the findings in reference [16].

The time complexity of GRU combined model is analyzed as follows: Suppose it takes L calculation to train a GRU sub-model and I calculation to train DBN combined strategy. The time complexity of building up GRU combined model is O (SLP) + O (I). Where S denotes the number of iterations of the optimization algorithm. P denotes the population size of the optimization algorithm. In GRU combined model, the various steps in the optimization algorithm, including the initialization of the weight vector, the population update process for each iteration, and the second stage optimization take much less computational time than the time spent to train a deep learning network. Therefore, when analyzing the time complexity of the algorithm, only the time spent to train the deep learning network is considered. In summary, the main parameters that determine the runtime of the algorithm are the number of iterations S and the population size P.

6.2 Conclusion

GRU combined model based on multi-objective optimization is presented to solve the problem that it is difficult to effectively forecast the residential load. The accuracy and diversity of GRU are optimized by multi-objective optimization algorithm, and multiple high-precision and different sub-networks have been obtained. Through DBN combined strategy, the sub-models are combined to obtain the final load forecasting value. The performance of the short-term residential load forecasting method is analyzed on multiple data sets, and the following conclusions are obtained:

The experiments in Section 4.4.3 demonstrate that the updating method based on the maximum fitness difference can ensure population diversity and effectively solve the problem, because the traditional updating methods are easy to fall into local optimum.

The experiments in Section 4.4.4 demonstrate that the second stage optimization can improve the forecast accuracy of the short-term residential load forecasting model.

By analyzing Tables 5, 6, 7 and Figs. 11, 12, the experimental results show that GRU combined model can improve the forecast accuracy, but also can improve generalization ability.

Comparing the performance of GRUCC-MOP with several excellent forecasting methods. Experimental results show that GRUCC-MOP has higher accuracy on multiple data sets. The GRUCC-MOP short-term residential load forecasting model proposed in this paper controls the iteration termination of the optimization algorithm by setting the number of iterations of the algorithm. When learning load data, the algorithm may end the iterations before the population reaches the optimum, adversely affecting the generalization ability of the combined model. It may be due to this reason that the experimental results of the method proposed in this paper do not show a large advantage on certain datasets. A more flexible iteration termination condition of the algorithm can be considered to solve this problem. In this paper, only short-term residential electric load data is used as input to the model data, and no other influencing factors are considered. There are many factors that produce fluctuations in residential-level load data, with weather factors dominating. In future work, weather factors such as temperature and humidity can be input into the forecasting framework after data processing and other operations according to the data in the dataset to improve the accuracy of the model’s short-term residential load forecasting under the influence of multiple factors.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61572416), Hunan province Natural science Zhuzhou United foundation (2022JJ50132).

References

Sun

, Liu

, Zhang

, Cao

and Zhao

, An efficient edge sparse coding approach to ultra-short-term household electricity demand estimation[J], IEEJ Transactions on Electrical and Electronic Engineering 13(11) (2018), 1586–1594.

Kong

, Dong

Z.Y.

, Hill

D.J.

, Luo

and Xu

, Short-term residential load forecasting based on resident behaviour learning[J], IEEE Transactions on Power Systems 33(1) (2018), 1087–1088.

Kim

S.H.

, Lee

, Shin

Y.J.

Economical energy storage systems scheduling based on load forecasting using deep learning[C], IEEE International Conference on Big Data and Smart Computing. 1–7 (2019:27 February–2 March), Kyoto, Japan.

Eskandarpour

and Khodaei

, Leveraging accuracy-uncertainty tradeoff in SVM to achieve highly accurate outage predictions[J], IEEE Transactions on Power Systems 33(1) (2018), 1139–1141.

Hinojosa

V.H.

and Hoese

, Short-term load forecasting using fuzzy inductive reasoning and evolutionary algorithms[J], IEEE Transactions on Power Systems 25(1) (2010), 565–574.

Liu

D.N.

, Zeng

, Li

C.B.

, Ma

K.L.

, Chen

Y.J.

and Cao

Y.J.

, A distributed short-term load forecasting method based on local weather information[J], IEEE Systems Journal 12(1) (2018), 208–215.

B.W.

, Zhang

, He

and Wang

, Short-term load-forecasting method based on wavelet decomposition with second-order gray neural network model combined with ADF test[J], IEEE Access 5 (2017), 16324–16331.

Chen

, Chen

, Wang

, He

, Hu

and He

, Short-term load forecasting with deep residual networks[J], IEEE Transactions on Smart Grid 10(4) (2019), 3943–3952.

Kong

, Li

, Zheng

and Wang

, Improved Deep Belief Network for Short-Term Load Forecasting Considering Demand-Side Management[J], IEEE Transactions on Power Systems 35(2) (2020), 1531–1538.

10.

Zhang

, Chan

K.W.

, Li

, Wang

, Qiu

and Wang

, Deep-learning-based probabilistic forecasting of electric vehicle charging load with a novel queuing model[J], IEEE Transactions on Cybernetics 51(6) (2021), 3157–3170.

11.

Wang

K.J.

, Qi

X.X.

, Li

H.D.

and Song

J.K.

, Deep belief network based k-means cluster approach for short-term wind power forecasting[J], Energy 165 (2018), 840–852.

12.

Dedinec

, Filiposka

, Dedinec

and Kocarev

, Deep belief network based Residential load forecasting: An analysis of Macedonian case[J], Energy 115 (2018), 1688–1700.

13.

Kong

et al. Short-term load forecasting method based on empirical mode decomposition and feature correlation analysis[J], Dianli Xitong Zidonghua/Automation of Electric Power Systems 43(5) (2019), 46–52.

14.

Zheng

, Xiong

, Wei

Short-term Load Forecasting of BP Network Based on EMD[C], IEEE 8th Joint International Information Technology and Artificial Intelligence Conference (2019:24–26 May), Chongqing, China.

15.

, Fujita

and Hayashi

, Deep reservoir architecture for short-term residential load forecasting: An online learning scheme for edge computing[J], Applied Energy 298(4) (2021), 117176.

16.

Zhang

, Zhang

and Yu

, A similar day based short term load forecasting method using wavelet transform and LSTM[J], IEEJ Transactions on Electrical and Electronic Engineering 17 (2022), 506–513.

17.

Fan

, Ding

, Zheng

et al. Empirical Mode Decomposition Based Multi-objective Deep Belief Network for Short-term Power Load Forecasting[J], Neurocomputing 388 (2020), 110–123.

18.

Sun

, Jiang

, Wang

and Yang

, Short-term building load forecast based on a data-mining feature selection and LSTM-RNN method[J], IEEJ Transactions on Electrical and Electronic Engineering 15(7) (2020), 1002–1010.

19.

Zheng

, Chen

, Yu

Short-term Power Load Forecasting of Residential Community Based on GRU Neural Network[C], International Conference on Power System Technology 4862–4868 (2018:6–8 November), Guangzhou, China.

20.

Zhang

and Li

, MOEA/D: A Multiobjective evolutionary algorithm based on decomposition[J], IEEE Transactions on Evolutionary Computation 11(6) (2007), 712–731.

21.

Abouei Ardakan

A.M.

, Taghi Rezvan,

B.M.

, Multi-objective optimization of reliability –redundancy allocation problem with cold-standby strategy using NSGA-II[J],, Reliability Engineering & System Safety 172 (2018), 225–238.

22.

Zhang

, Xue

, Lan

, Zeng

, Gao

and Zheng

, EleAtt-RNN: Adding attentiveness to neurons in recurrent neural networks[J], IEEE Transactions on Image Processing 29 (2019), 1061–1073.

23.

John Kolen

and Stefan Kremer,

, Gradient Flow in Recurrent Nets: The Difficulty of Learning LongTerm Dependencies[J], A Field Guide to Dynamical Recurrent Networks IEEE (2001), 237–243.

24.

Chen

, Review on Supervised and Unsupervised Learning Techniques for Electrical Power Systems: Algorithms and Applications[J], IEEJ Transactions on Electrical and Electronic Engineering 16 (2021), 1487–1499.

25.

Juan Li, , Jie Li, , Panos Pardalos,

et al. DMaOEA- ɛ C: Decomposition-based many-objective evolutionary algorithm with the ɛ-Constraint Framework[J], Information Sciences 537 (2020), 203–226.

26.

, Ma

, Liu

, Jiao

, Sun

and Wu

, MOEA/D with Adaptive Weight Adjustment[J], Evolutionary Computation 22(2) (2014), 231–264.

27.

Zhao

, Jiao

, Zhao

et al. Discriminant deep belief network for high-resolution SAR image classification[J], Pattern Recognition 61 (2016), 686–701.

28.

Wang

, He

and Li

, Optimization Algorithms for MISO Polygonal Fuzzy Neural Networks[J], Science China Press 45(5) (2015), 650–667.

29.

Fan

, Ding

, Xiao

, Cheng

and Ai

, Deep belief ensemble network based on MOEA/D for short-term load forecasting[J], Nonlinear Dynamics 105 (2021), 2405–2430.

30.

Fan

, Li

, Yi

, Xiao

, Qu

and Ai

, Multi-objective LSTM ensemble model for household short-term load forecasting[J], Memetic Computing 14 (2022), 115–132.

31.

Sun

, Jiang

, Wang

and Yang

, Short-term building load forecast based on a data-mining feature selection and LSTM-RNN method[J], IEEJ Transactions on Electrical and Electronic Engineering 15 (2020), 1002–1010.

32.

Islam

, Baharudin

and Nallagownden

, Modified meta heuristics and improved backpropagation neural network-based electrical load demand prediction technique for smart grid[J], IEEJ Transactions on Electrical and Electronic Engineering 12 (2017), S20–S32.

33.

Mohamed Aymane Ahajjam, , Daniel Bonilla Licea, , Mounir Ghogho , Abdellatif Kobbane, , Experimental investigation of variational mode decomposition and deep learning for short-term multi-horizon residential electric load forecasting[J], Applied Energy 326 (2022), 119963.

34.

Slawek Smyl, , Grzegorz Dudek , Paweł Pełka, , Contextually enhanced ES-dRNN with dynamic attention for short-term load forecasting[J], Neural Networks, 169 (2024), 660–672.

35.

Sheng

, An

, Wang

, Chen

and Tian

, Residual LSTM based short-term load forecasting[J], Applied Soft Computing 144 (2023), 110461.

36.

Lin

, Ma

, Zhu

and Cui

, Short-term load forecasting based on LSTM networks considering attention mechanism[J],, International Journal of Electrical Power Energy & Systems 137 (2022), 107818.

37.

Zahra Fazlipour, , Elaheh Mashhour , Mahmood Joorabian, , A deep model for short-term load forecasting applying a stacked autoencoder based on LSTM supported by a multi-stage attention mechanism[J], Applied Energy 327 (2022), 120063.

38.

Hua

, Liu

, Li

, Deng

and Wang

, An ensemble framework for short-term load forecasting based on parallel CNN and GRU with improved ResNet[J], Electric Power Systems Research 216 (2023), 109057.

39.

Zhao

, Yun

, Jia

, Guo

, Meng

, He

, Li

, Shi

and Yang

, Hybrid VMD-CNN-GRU-based model for short-term forecasting of wind power considering spatio-temporal features[J], Engineering Applications of Artificial Intelligence 121 (2023), 105982.

40.

Yıldırım Akbal , Kamil Demirberk Ünlü, , A univariate time series methodology based on sequence-to-sequence learning for short to midterm wind power production[J], Renewable Energy 200 (2022), 832–844.

GRU combined model based on multi-objective optimization for short-term residential load forecasting

Abstract

Keywords

1 Introduction

2 GRU short-term load forecasting network

3.1 MOEA/D sub-model optimization algorithm

3.3 DBN combined strategy

4.1 Introduction of the dataset

4.4.1 Convergence of optimization algorithm

Table 3 Optimal objective function value of five sub-problems Subproblems 1 2 3 4 5 f1(x) 8.1805 14.6527 13.6326 16.9828 10.7490 f2(x) 0.00512 0.00481 0.00472 0.00473 0.00523

Table 5 Overall errors of various forecasting methods Month DBN LSTM GRU EMD-DBN EMD-MODBN GRUCC-MOP Jan 121.253 59.614 51.897 92.492 87.205 49.526 Apr 130.613 67.043 69.083 108.349 103.104 61.097 Jul 114.629 54.758 53.390 86.936 82.171 48.583 Oct 153.224 75.847 79.701 113.970 115.184 62.036

6.1 Discussions

6.2 Conclusion

Footnotes

Acknowledgments

References

Table 3
Optimal objective function value of five sub-problems

Subproblems 1 2 3 4 5

f₁(x) 8.1805 14.6527 13.6326 16.9828 10.7490

f₂(x) 0.00512 0.00481 0.00472 0.00473 0.00523

Table 5
Overall errors of various forecasting methods

Month DBN LSTM GRU EMD-DBN EMD-MODBN GRUCC-MOP

Jan 121.253 59.614 51.897 92.492 87.205 49.526

Apr 130.613 67.043 69.083 108.349 103.104 61.097

Jul 114.629 54.758 53.390 86.936 82.171 48.583

Oct 153.224 75.847 79.701 113.970 115.184 62.036