Abstract
Reliable and accurate short-term forecasting of residential load plays an important role in DSM. However, the high uncertainty inherent in single-user loads makes them difficult to forecast accurately. Various traditional methods have been used to address the problem of residential load forecasting. A single load forecast model in the traditional method does not allow for comprehensive learning of data characteristics for residential loads, and utilizing RNNs faces the problem of long-term memory with vanishing or exploding gradients in backpropagation. Therefore, a gated GRU combined model based on multi-objective optimization is proposed to improve the short-term residential load forecasting accuracy in this paper. In order to demonstrate the effectiveness, GRUCC-MOP is first experimentally tested with the unimproved model to verify the model performance and forecasting effectiveness. Secondly the method is evaluated experimentally with other excellent forecasting methods: models such as DBN, LSTM, GRU, EMD-DBN and EMD-MODBN. By comparing simulation experiments, the proposed GRU combined model can get better results in terms of MAPE on January, April, July, and November load data, so this proposed method has better performance than other research methods in short-term residential load forecasting.
Keywords
Nomenclature
Introduction
As AMI, smart meters become widespread in residential networks, and provide an opportunity to improve grid energy management capabilities through residential load data [1]. Short-term residential load forecasting plays an important role in DSM [2]. For utilities, peak load shaving can be achieved by establishing DSM, cooperating with ES system, and intelligent DR technology to achieve efficient and economical use of electric power. For residents, consumers can play a significant role in the operation of the smart grid by shifting energy consumption from on-peak to off-peak in ToU, and avoiding expensive electricity costs [3]. How to accurately forecast the residential load has become a research hotspot.
At present, the main research object of load forecasting is the total load of the regional grid, but there are few studies for individual residents. The main methods of load forecasting include SVR [4], FIR [5], ANN [6, 7], and DNN [8–10]. These methods have been achieved high accuracy in grid-level load. In reference [11, 12], DBN has been used to forecast hourly grid level load with small errors; in reference [13, 14], EMD was used to decompose the load data into multiple pieces of sub-data and then make forecasting for each sub-data separately; In reference [33], VMD was used to decompose residential data and extract sequence information from it for data forecasting, but the effectiveness of this method in different datasets was not confirmed. The final forecast results are superimposed from the forecast results of all sub-data. These methods weaken the complexity of data through decomposition, but it will destroy the correlation of original data.
Compared with the grid-level load, the residential load is more complex. Due to the instability and randomness of resident behavior, the fluctuation of residential load is much higher than that of grid-level load. Due to the different living habits of the different resident, the characteristics of the residential load is different; even the electricity consumption habits of the same resident may vary. So, the residential load forecast model not only learns all the hidden knowledge fully in load data, but also has good generalization capability [15]. Obviously, a single residential load forecast model is difficult to achieve an effective forecast of residential load, because it can only learn some data characteristics, but it cannot achieve comprehensive [16]. Therefore, ensemble learning is introduced into the load forecast model to solve this problem. Firstly, multiple sub-models are used to learn the characteristics form residential load data, and then these sub-models are integrated into a combined model. The combined model can learn more characteristics form residential load data, and improve the accuracy and generalization capability. Compared to a single DBN, multiple DBN is used in reference [17]; the experimental results demonstrate that the combined model has higher forecast accuracy. In recent years, RNN has been applied to load forecasting, it can provide more sub-model options for combined model. Reference [34] used RNN model to forecast electricity load, and the experiment shows that RNN model has better forecasting effect, but it also exposes the drawback of RNN model, that is, it is only applicable to the dataset that deals with a small amount of data. Reference [18] uses LSTM for short-term residential load forecasting for a single electricity customer. Experimental results show that LSTM network has significant advantages compared to other deep learning for residential load forecasting. References [35–37] all used LSTM models for short-term load forecasting in the region, and the models were experimentally verified to have a high degree of accuracy. However, in references [35, 36], the hyperparameters of the model are artificially specified and not experimentally verified, which is highly subjective: while reference [37] only considers the influencing factors in the text and ignores other influencing factors. Like LSTM, GRU can solve the gradient problem in long-term memory and direction propagation effectively, GRU is computationally cheaper. Reference [39] used the GRU model to forecast wind power data, and it was experimentally verified that the model could obtain higher accuracy and forecast correlation. However, due to the different wind power forecasting scenarios, the model is unable to continuously forecast based on the updates and inputs of wind data, so only small-scale wind power forecasting can be considered. Reference [38] used the GRU model to forecast electricity loads, but the same problem of hyperparameters subjectively considered to be set was exposed. In reference [19], GRU is used to forecast the residential load, the experiment results show that the forecasting accuracy based on GRU is similar to that based on LSTM, the simulation time of GRU is shorter than that of LSTM. Reference [40] uses the Seq2Seq model to model and forecast the power generation of wind farms. The model achieves better forecasting results as verified by forecasting in two different wind farms. The model also has high accuracy in forecasting the maximum, minimum and average values of power load. Moreover, the model only needs its own lag value, which improves the efficiency and reduces the computational cost at the same time.
In order to improve the accuracy and generalization capability, GRU combined model based on multi-objective optimization is proposed in this paper. Firstly, the multi-objective optimization algorithm is used to optimize the hyper-parameters of GRU networks, where GRU is used to construct multiple high-quality sub-models. Secondly, the forecast results of multiple high-quality sub-models are combined to obtain the final forecast results. In terms of multi-objective optimization algorithms, the improved (MOEA/D) [20] is used to optimize GRU network for accuracy and diversity. Then, the final population is optimized in the second stage based on the non-dominated sorting [21], further optimizing the population while removing the bad individuals. The multi-objective optimization algorithm not only improves the accuracy of the sub-models, but also ensures the differences between them, ie. GRU combined model formed by integrating multiple similar sub-models does not differ significantly from a single model in terms of forecast accuracy [17]. DBN combined strategy is used to integrate the forecast results of each sub-model. Because of the strong nonlinear mapping capabilities, DBN network is used to construct the relationships between each sub-model and the final forecast result. The final forecasts can be obtained by integrating several high-quality sub-models, the accuracy and generalization are improved. The data provided by the SGSC program in Australia is applied to demonstrate the superiority of the method, it is compared with DBN [11], LSTM [18], GRU [19], EMD-DBN [13] and EMD-MODBN [17]. The experimental results show that GRUCC-MOP not only has high accuracy, but also has good generalization ability.
The main contributions of the paper: A short-term residential electricity load forecasting network based on multilayer multiple-input single-output GRUs is utilized to optimize the hyperparameters of the GRU network using the improved MOEA/D algorithm, and decompose the load data into different sub-models in order to improve the accuracy and diversity of the residential load forecasting network. The update method based on maximum fitness difference is proposed to improve the MOEA/D algorithm to enhance the forecasting effect of the sub-models decomposed by the algorithm. A two-stage optimization method is used to further screen the sub-models and enhance the forecasting effect of the sub-models. Using a DBN-based integration strategy, the entire optimized and filtered high quality short-term residential load forecasting sub-models are used to improve the accuracy and generalization ability of the GRU combinatorial model-based residential electricity load forecasting method in this paper. Experimentally compare the method proposed in this paper with the unimproved model to verify the effectiveness of the method and analyze the forecasting accuracy. Experimentally verify the forecasting method proposed in this paper with other excellent forecasting methods to highlight the superiority of the method proposed in this paper.
The rest of the paper is summarized as follows: Section 2 describes GRU short-term load forecasting network. Section 3 proposes an improved MOEA/D algorithm for optimizing the GRU network, and the combination strategy employed to improve the accuracy of short-term residential load forecasts. In Section 4, the relevant settings for simulation experiments are explained. Also, GRUCC-MOP is examined to highlight its effectiveness. In Section 5, GRUCC-MOP is compared with other good short-term load forecasting methods on several datasets to highlight the superiority of the method. Finally, Section 6 gives the conclusion of this paper.
GRU short-term load forecasting network
Residential load fluctuations are difficult to forecast due to the uncertainty and randomness of residents. The different habits of residents and different loads in the household make the residential load characteristics more complex, and the habits of the same residents are different in different seasons. Therefore, the difficulty of residential load forecasting is increased. Changes in residential loads are dramatic due to the uncertainty of the electricity consumption habits and the volatility of residential loads, and even subtle factors can cause drastic fluctuations. In order to address this feature, GRU short-term forecasting method based on the combined model with multi-objective optimization is proposed, it will improve the accuracy of short-term residential load forecasting.
Because the residential load data is a kind of time series, RNN has certain advantages in processing time series, it can capture the time series information in the residential load data sequence, and back-propagation for learning, so RNN is commonly used in residential load forecasting. However, due to the internal loop unit, RNN deals with long-time sequences with increased data volume, the phenomenon of gradient vanishing or exploding occurs when the processing reaches a certain sequence length. Distinguishing from the traditional RNN, LSTM alters the transmission of historical information in structural units, which mitigates the gradient vanishing and gradient exploding problems to learn long-time sequences. GRU is used to simply LSTM unit structure, and the gate structure is simplified to update gate and reset gate, gradient problems can be mitigated, and the learning time of the network can be reduced. The efficiency of short-term residential load forecasting will be improved.
The framework of short-term load residential forecasting based on GRU in this paper is shown in Fig. 1. It consists of an input layer, an output layer and GRU layer. Where the residential load data from different time steps are used as input layer to GRU layer. GRU layer is consists of a multi-layer with multi-inputs and single-output structure, and the output of GRU layer is input to the feed-forward neural network in the output layer, so as to obtain the forecast value finally. GRU layer of is built from a multi-layer with MISO, which can learn the effect of historical information form output, so it is particularly suitable for processing time series signals [28]. The structure of single GRU layer with MISO is displayed in Fig. 2.

Framework of GRU short-term load forecasting network.

Structure of single GRU layer with MISO.
GRU network forecasts the load data for t + 1st time steps by learning the load data of the past t time steps. Each GRU cell transmits the historical information and the information learned in the current time step to the next cell. In the last GRU cell, the transmission of information is stopped, and the information of the current cell is output. The output will form the load forecast value through a feed-forward neural network. During network training, the loss is obtained through the forecasting load value and the actual load value, which is used for back-propagation. During the load forecast period, this value will be reconstituted with the input value of the past t - 1 time steps to form a new input to forecast the load in the t + 2st time step.
GRU network is a variant of RNN [22]. It can maintain the advantages of RNN in time series processing and overcome the problem of vanishing or exploding gradients [23, 24]. GRU cell with a single time step is shown in Fig. 3.
GRU cell contains two gate structures: update gate z t and reset gate r t . The output h t of the t-th time step is obtained by selecting and combining the inputs x t and the intermediate output ht-1 through the two gate structures mentioned above. The new output h t can save the relevant information of each previous input signal. This characteristic makes GRU have better fitting effect than traditional neural network in time series processing. The mathematical model of GRU cell can be described as follows:

GRU cell with a single time step.
In this paper, the core objective of MOEA/D algorithm in GRU short-term residential load forecasting is to decompose the residential load data into different sub-data models on a time-series basis, and different sub-models will be optimized to obtain their corresponding optimization solutions; the solution set of the optimization solutions come form PF, it makes the algorithm update only the worst individual each time, and increases the residence time of the relatively good individual, prevents the population from falling into a local optimum during the algorithm optimization period. It will optimize the forecast of the sub-models decomposed by MOEA/D algorithm, and increase the accuracy of GRU short-term residential load forecasting.
The flow chart of the improved MOEA/D algorithm and combined strategy for GRU short-term residential load forecasting is shown in Fig. 4. The specific steps can be divided into two parts: optimization and combination.
In the optimization part, the improved MOEA/D algorithm is used to optimize GRU network. The objective space of the optimization algorithm is constructed from the accuracy and diversity of GRU network. The decision space is constructed from the hyper-parameters of GRU network, which include the learning rate, the initialization weight of each gate, and the number of hidden nodes. Then, the final population is optimized based on the non-dominated ordering in the second stage. After optimization, several sub-models with high accuracy and differences from each other can be obtained.
In the combination part, DBN network is used to combine the forecast values of several high-quality sub-models to obtain the final forecast value. The effect of preventing overfit is achieved by integrating several sub-models.
MOEA/D sub-model optimization algorithm

Flowchart of improved MOEA/D algorithm and combined strategy for GRU short-term residential load forecasting.
Optimization problems are composed of several conflicting sub-problems, which can be decomposed into multiple sub-problems collectively referred to as MOPs [25]. A MOP can be defined as:
Currently, MOEA/D is considered as one of the best methods to obtain PF [26]. The core is to decompose the optimization problem into several sub-problems by using weight vectors and solving each of them to obtain GRUCC-MOP. In the initialization, a set of weight vectors is generated in the objective space, each vector corresponding to an individual. In the iterative process, the sub-generation points are obtained by differential evolutionary algorithm and Gaussian variation algorithm, and the fitness value of each point is obtained by aggregation function. The population is updated by comparing the fitness values between individuals. The uniform distribution of weight vectors enables MOEA/D optimization algorithm to obtain good population diversity. When dealing with low-dimensional problems, the update method based on the comparison of fitness values can reduce the time complexity of the algorithm. In this paper, the improved MOEA/D optimization algorithm is used to optimize GRU network.
The decision space of MOEA/D is composed of hyper-parameters, including learning rate, initialization weight of each gate, and number of hidden nodes:
Since integrated learning requires sub-models to be different from each other, the multiple similar sub-models form a combined model that is essentially the same as a single model in terms of forecasting accuracy. In this paper, the diversity and accuracy of GRU network is defined as the objective space of MOEA/D optimization algorithm.
Set that in an optimization process, the training set contains N samples and the population contains M GRU networks. For decision space, x1, x2, x3 are optimized within a given range. For objective space, f (X1) represents the diversity of the network [17], and f (X2) represents the accuracy of the network [29]. In Equation(8),
The decomposition method used in this paper is Tchebycheff approach [20]. It is considered to have superior convergence and diversity in population distribution when dealing with low-dimensional optimization problems. Since the optimization problem discussed is 2-dimensional, this method is used as the decomposition method of MOEA/D optimization algorithm. The scalar optimization problem of the traditional Tchebycheff approach is shown in Equation (10).
In order to make MOEA/D optimization algorithm more suitable for network optimization, it improves the update method of Tchebycheff approach, and proposes the update method based on maximum fitness difference. The comparison between the traditional method and the updated method based on maximum fitness difference is shown in Fig. 5.

Comparison of between traditional method and update method. (a) Fitness difference (b) Initial population (c) Traditional update method (d) Update method based on maximum fitness difference.
In Fig. 5(b), the offspring individual F (X′) dominates the parent F (X1), F (X2) and F (X3). The traditional update method is updated by F (X′) for all parents, the result is shown in Fig. 5(c). This updated method will dramatically reduce the diversity of the population. In subsequent iterations, the population may be trapped in a local optimum because less paternal genetic information is available. It should be noted that the size of populations set up in this paper is small, and it is easier to fall into local optimum when using traditional update methods. To alleviate this phenomenon, the update method based on the maximum fitness difference is proposed, which defines the fitness difference as shown in Equation (12).

DBN structure for short-term load forecasting.
In practice, there will be a large gap between the objective function values calculated by Equation(9) and Equation(10). This phenomenon makes the optimization algorithm deviate from the optimization direction when updating; the objective function values are normalized by Equation (13) before each population update.
Sub-networks optimization strategy based on improved MOEA/D are shown in algorithm 1.
When the optimization algorithm iteration ends, a final population of P (P = 30) individuals is obtained. In practice, when generating weight vectors, two boundary weight vectors λ = (1, 0) and λ = (0, 1) will be generated. When optimizing the sub-problem corresponding to the boundary weight vector, it is equivalent to considering only one of two objective functions and completely ignoring the other. It will make the individuals corresponding to two sub-problems fall into the local optimum, which will have an adverse impact on the combined output. In order to eliminate the individuals corresponding to the boundary vector and further optimize the final population, the second stage of optimization will be carried out for the final population.∥The second stage of optimization includes two steps. The first step is to calculate the H i = f1 (X i ) + f2 (X i ) , (i = 1, ⋯ , P) for all individuals, and eliminate the individuals corresponding to the two maximum values H i . The second step is to sort the P - 2 individuals, and set the individuals that are not dominated by any other individuals as the first non-dominance level. Do the same for the remaining individuals until all individuals are assigned to the corresponding non-dominance level. After sorting, only P/2 individuals were selected for subsequent operations. Priority is given to individuals with a higher non-dominance level. If they belong to the same non-dominance level, the individuals with smaller values of the objective function f2 (X) are selected.∥
DBN combined strategy
Short-term residential load data is decomposed and optimized into different sub-models by improved MOEA/D, and the final residential short-term load forecast values are combined from the values of different sub-models. The combination strategies are weight integration and stacking integration. The weight integration is based on the contribution of each sub-model to the final model, and the contribution size is usually set manually, so the weight integration is more subjective. The stacked integration is used to input the sub-models obtained from decomposition to train DBN combined strategy, integrate the sub-models of the GRU combined model according to the learning state of DBN combined strategy, and construct the link between the forecast values of the sub-models and the final forecast value, which can improve the accuracy of the short-term residential load forecasting.∥DBN is built up from multiple RBNs [27]; it consists of an input layer, a hidden layer, and an output layer, see Fig. 6. At the end of DBN network structure, the final result is output by a BPNN. The internal mathematical expression of each hidden layer neural unit is shown in Equation (14).

Short-term load forecasting process based on DBN combined strategy.
Introduction of the dataset
Dataset: The data used in the experiment are residential load data of New South Wales from 2010 to 2014 provided by the Australian SGSC project [30]. This dataset includes residential load data of about 10,000 different residents as well as detailed information on retail and distributor product offers. Since the dataset is too large and it is impractical to study and analyze the load data of all residents, a subset of SGSC dataset was selected for the experiment. 25 residents with Internet and gas-fired hot water system are selected as the experimental data. Such selection criteria are intended to guarantee the presence of human activity in the selected dwellings to ensure the validity of the load data. Meanwhile, in order to reflect the influence of seasons, the load data of each residents in January, April, July, and October 2013 are selected for experiments, corresponding to summer, autumn, winter, and spring in Australia respectively.
Data processing: Since the data is recorded every half an hour (48 samples are included in a day), the time step is set to 48. In the experiment, only load data is used as training input. Use Equation (15) to standardize the data to facilitate network learning, scaling data that would otherwise span a large area into a smaller distribution space, removing units from the data, and converting the data to dimensionless values. The load data is calculated by the equation to get the data after standardization. According to the division of the dataset in this paper, the load data of the last 7 days of each month (including 336 samples) are used as the test set, and the rest of the load data of the current month are used as the training set.
In this experiment, MAPE and RMSE were used as evaluation criteria. The expression is as follows:
In this paper, some empirical rules are used to set the parameters of the model. Set epochs of GRU network to 150; The time step is set to 48; The number of network layers is set to 2 [19]. The learning rate, the initialization weight of each gate, and the number of hidden nodes constitute MOEA/D decision space, which will be optimized in a range. In order to prove that the hyperparameter ranges set in this paper have better advantages, two outer two sets of hyperparameter ranges are set, and since the initialization weights and the number of hidden nodes of each gate are set in this paper in a larger range, only the optimization range of the learning rate is changed in this experiment, and experiments are carried out to verify it on the January and July load data of the residents 100006414, 10006572, and 10017466, and the results are shown in Table 1. Where the range of initialization weights is [–0.005, 0.005] and the range of the number of hidden nodes is [25, 128].
Comparative tests with different learning rate ranges
Comparative tests with different learning rate ranges
Set the population size of MOEA/D optimization algorithm to 30, the number of iterations to 25, and the neighborhood size to 8. For DBN, its structural parameters depend on the scale of input data [29].
In order to obtain the most appropriate number of DBN hidden layers, the forecasting error of the load data of residents 10006414, 10006572, and 10017466 in January and July is compared when the number of DBN hidden layers is 1 to 5. The results are shown in Table 2.
Error of DBN with different layers
From the results, it can be concluded that when the number of hidden layers of DBN is 1, the model achieves the minimum error on multiple groups of data. Therefore, set the number of hidden layers of DBN to 1.
Convergence of optimization algorithm
Figure 8 records the changing trend of the objective function of the first 20 generations in one iteration. In order to make the results more intuitive, the optimal objective function value of each generation shown in the figure is the average of the optimal objective function values of all individuals in this generation. It can be seen from the figure that both objective functions converge effectively.

Change trend of objective function value.
In order to illustrate the importance of normalizing the objective function before each population update, the objective function values (not normalized) of five sub-problems in the same generation are shown in Table 3. As can be seen from Table 3, the value of f1 (x) is much greater than f2 (2).
Optimal objective function value of five sub-problems
Optimal objective function value of five sub-problems
Figure 9 shows the population distribution of the two methods (the individuals corresponding to the boundary weight vector have been eliminated). In Fig. 9(a), the optimization results for MOEA/D without normalization of the objective function are recorded. Because the optimal objective function value of f1 (x) is much larger than f2 (x), the population falls into the local optimum in the objective function f2 (x). In Fig. 9(b), the optimization results for MOEA/D with normalizing the objective function are recorded. Compared with Fig. 9(a), the population in Fig. 9(b) converges better on the objective function f2 (x), and the population distribution is more uniform.

Population distribution. (a) Objective function normalization is not used (b) Objective function normalization is used.
In order to illustrate the superiority of the update method based on the maximum fitness difference, the population distribution using the traditional update method is shown in Fig. 10. It can be seen that the diversity of the population obtained based on the traditional renewal method is very poor, most of the individuals overlap together and do not fully converge. In contrast, the population of the update method in Fig. 9(b) is better in diversity and convergence.

Population distribution of traditional update method.
In order to illustrate the role of the second stage optimization, comparative experiments are carried out with the load data of residents 10006414, 10006572, and 10017466 in January and July.
Compare the error before/after the second stage optimization in the load forecast. The results are recorded in Table 4. The results show that the second stage optimization can improve the accuracy of short-term residential load forecasting.
Final error with and without second stage selection
Final error with and without second stage selection
GRUCC-MOP is compared with DBN [10], LSTM [17], GRU [19], EMD-DBN [12], and EMD-MODBN [15] for forecasting accuracy. Table 5 shows the average MAPE values of 6 methods on the load data of 25 selected residents in January, April, July, and October 2013. It can be seen from the experimental results that GRUCC-MOP achieves the minimum error on all data sets. It not only has high accuracy, but also has strong generalization ability.
Overall errors of various forecasting methods
Overall errors of various forecasting methods
To further illustrate the advantages of the GRU combined model, we compared GRUCC-MOP to four other excellent forecasting methods, including GRU, on load data from resident 10006414. Figure 11 shows the load profiles for 48 time steps using the GRU forecasting method, and the data results are recorded in Table 6. In this paper, the results of comparison between the proposed method and other four excellent forecasting methods at 96 time steps are also established, and the results of the experimental data are recorded in Table 7, and Fig. 12 shows the load profiles at 96 time steps, and it can also be seen that the error achieved by GRUCC-MOP is less.

Comparison of the fitted curves of residential short-term load forecasts using the proposed method with four forecasting models at 48 time steps. (a) January (b) April (c) July (d) October.
MAPE comparison of 4 forecasting methods at 48 time steps

Comparison of the fitted curves of residential short-term load forecasts using the proposed method with four forecasting models at 96 time steps. January (b) April (c) July (d) October.
MAPE comparison of 4 forecasting methods at 96 time steps
In Figs. 11 and 12, the horizontal coordinates represent time points, and the difference between the two points is 30 minutes. A 24-hour day contains 48 time points. The vertical coordinate represents the electrical load in the last 30 minutes. As can be seen from Figs. 11 and 12, the fluctuation of the curve is relatively smooth at night and more intense during the day, which is in line with the human law of sunrise and sunset. The peaks of the curves occur mainly at noon and in the evening, which are related to the regular activities of the residents, such as cooking and cleaning. Some peaks also occur between 9 p.m. and 12 p.m., which may be due to non-routine activities of residents, such as occasional parties. A comparison of the curve fit can be concluded that compared to GRU combined model and other good forecasting methods, the method proposed in this paper allows for better tracking of peak. This suggests that GRU combined model can learn more features of the data in the load and can cope with sudden changes in the load more effectively. In summary, GRU combined model has higher accuracy than other models in short-term residential load forecasting.
Discussions
In Sections 4 and 5, the superiority of the combined model proposed in this paper has been experimentally verified. In this section, the following points will be discussed with respect to the experimental results: In Table 5, various forecasting methods achieve greater errors in forecasting load for spring and fall compared to summer and winter. Loads are generally considered to be more difficult to forecast in the summer and winter because of the larger load values in these seasons. However, the factor that actually makes the load data difficult to forecast is not the size of the load values, but the degree of load fluctuation. In spring and fall, temperatures fluctuate more dramatically than in summer and winter. The instability of temperature can lead to increased fluctuations in load, which in turn makes the load data more difficult to forecast in spring and autumn. In reference [31, 32], it also appears that the error in short-term residential load forecasting during spring and autumn is greater than that in summer and winter. In Table 5, the errors of both DBN and EMD-MODBN are larger than those of GRU. This indicates that the forecasting accuracy of sub-models is one of the main reasons affecting the forecasting accuracy of GRU combined model. EMD-MODBN is used to integrate several DBN models, and the accuracy of EMD-MODBN is still lower than that of GRU, even though it uses the idea of integrated learning, because the forecast accuracy of DBN models is much lower than that of GRU combined models. In further improving the forecast accuracy of the combined model, the selection of sub-models will be the main research direction. GRU combined model proposed in this paper can track the peaks better than GRU and other load forecast models. This indicates that the combined model learns more data features. It can react more effectively when there is a sudden change in the load data. This is consistent with the findings in reference [16]. The time complexity of GRU combined model is analyzed as follows: Suppose it takes L calculation to train a GRU sub-model and I calculation to train DBN combined strategy. The time complexity of building up GRU combined model is O (SLP) + O (I). Where S denotes the number of iterations of the optimization algorithm. P denotes the population size of the optimization algorithm. In GRU combined model, the various steps in the optimization algorithm, including the initialization of the weight vector, the population update process for each iteration, and the second stage optimization take much less computational time than the time spent to train a deep learning network. Therefore, when analyzing the time complexity of the algorithm, only the time spent to train the deep learning network is considered. In summary, the main parameters that determine the runtime of the algorithm are the number of iterations S and the population size P.
Conclusion
GRU combined model based on multi-objective optimization is presented to solve the problem that it is difficult to effectively forecast the residential load. The accuracy and diversity of GRU are optimized by multi-objective optimization algorithm, and multiple high-precision and different sub-networks have been obtained. Through DBN combined strategy, the sub-models are combined to obtain the final load forecasting value. The performance of the short-term residential load forecasting method is analyzed on multiple data sets, and the following conclusions are obtained: The experiments in Section 4.4.3 demonstrate that the updating method based on the maximum fitness difference can ensure population diversity and effectively solve the problem, because the traditional updating methods are easy to fall into local optimum. The experiments in Section 4.4.4 demonstrate that the second stage optimization can improve the forecast accuracy of the short-term residential load forecasting model. By analyzing Tables 5, 6, 7 and Figs. 11, 12, the experimental results show that GRU combined model can improve the forecast accuracy, but also can improve generalization ability.
Comparing the performance of GRUCC-MOP with several excellent forecasting methods. Experimental results show that GRUCC-MOP has higher accuracy on multiple data sets. The GRUCC-MOP short-term residential load forecasting model proposed in this paper controls the iteration termination of the optimization algorithm by setting the number of iterations of the algorithm. When learning load data, the algorithm may end the iterations before the population reaches the optimum, adversely affecting the generalization ability of the combined model. It may be due to this reason that the experimental results of the method proposed in this paper do not show a large advantage on certain datasets. A more flexible iteration termination condition of the algorithm can be considered to solve this problem. In this paper, only short-term residential electric load data is used as input to the model data, and no other influencing factors are considered. There are many factors that produce fluctuations in residential-level load data, with weather factors dominating. In future work, weather factors such as temperature and humidity can be input into the forecasting framework after data processing and other operations according to the data in the dataset to improve the accuracy of the model’s short-term residential load forecasting under the influence of multiple factors.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61572416), Hunan province Natural science Zhuzhou United foundation (2022JJ50132).
