Abstract
Electrical load prediction plays an important role in power system management and economic development. However, because electrical load has non-linear relationships with several factors such as the political environment, the economic policy, the human activities, the irregular behaviors and the other factors, it is quite difficult to predict power load accurately. In order to further improve the electrical load forecasting performance, a hybrid model is proposed in this paper. The proposed hybrid model combines the Stacked AutoEncoders (SAE) and extreme learning machines (ELMs) to learn the characteristics of the time series data of electrical load. In this proposed method, in order to utilize the characteristics of the electrical load in different depths, the outputs of each layer of the SAE are taken as the inputs of one specific ELM. Then, the obtained results from the constructed different ELMs are integrated by the linear regression to obtain the final output. The linear regression part is trained by the least square estimation method. In addition, the hybrid model is applied to predict two real-world electrical load time series. And, detailed comparisons with the SAE, ELM, the back propagation neural network (BPNN), the multiple linear regression (MLR) and the support vector regression (SVR) are done to show the advantages of the proposed forecasting model. Experimental and comparison results demonstrate that the proposed hybrid model can achieve much better performance than the comparative methods in electrical load forecasting application.
Introduction
With the rapid development of the economy, the power grid is facing an increasingly complex operating environment. In order to adapt to economic growth and make best of electric energy, improving the quality and safety of electricity has become a crucial issue. There are many ways to improve the electrical performance, and electrical load forecasting is one of the most effective methods. Power load forecasting is to construct the models to systematically deal with the past and future loads which can reflects some important factors, such as the capacity increase decision, the system operation characteristics, the natural conditions and the social impact, and to predict the load value at a specific time in the future on the premise of meeting certain accuracy [1]. Reliable electrical load forecasting can decrease energy consumption and maintain the safe and stable operation of the power grid.
As the electrical load prediction has the properties of nonlinearity and high levels of uncertainties, many scholars have conducted comprehensive researches on this topic and have provided different kinds of methods [2–4]. One typical method is the artificial intelligence [5–8]. In [9], the support vector regression (SVR) was used for annual load forecasting. In [10], the random forest model was successfully applied to predict the short-term electrical load. In [11], the kernel-based support vector quantile regression and Copula theory were presented for estimating the short-term power load probability density prediction. In [12, 13], the least square support vector machine (LSSVM) to mid-long term electrical load forecasting was provided. Artificial neural networks (ANNs) has advantages in time series modeling and nonlinear fitting, and different kinds of ANNs have been applied to the electrical load prediction application. For example, in [14], a similar day-based wavelet neural network was used to forecast the tomorrow’s electrical load, and the proposed method has been extended successfully for holiday load forecasting. In [15], the back propagation based wavelet neural network was designed for the very short-term load prediction. ANNs were also utilized to improve the accuracy of electrical load forecasting in [16–18]. On the other hand, the hybrid models combing different methods can achieve better performance than the single models in time series prediction application. In [19], the hybrid model which combines the fuzzy neural networks with wavelet fuzzy neural network was presented to estimate the electrical load. In this hybrid model, the fuzzified wavelet features were used as the inputs and the Choquet integral was employed as the outputs. In [20], the SVR and Bayes methods were integrated to improve the performance of the model for electrical load prediction. In [21–23], the feasibility and applicability of the hybrid models were also confirmed in this application. However, because electrical load has non-linear relationships with several factors such as the political environment, the economic policy, the human activities, the irregular behaviors and the other factors, it is still a quite difficult task to predict the power load accurately.
Deep learning is developed on the foundation of artificial neural network. In the deep learning models, deeper representative features can be extracted from the bottom to the top of the deep networks. Deep learning is more feasible and effective than the conventional ANNs in dealing with high-dimensional data and complex functions [24, 25]. Therefore, deep learning method has been universally used in various fields. In [26–28], the generative adversarial net was applied to the parallel prediction of the building energy consumption, and conputational experiments methods were also given. In [29], the feed-forward denoising convolutional neural networks (DnCNNs) were utilized for image denoising to speed up the training process as well as to boost the denoising performance. In [30–32], deep learning technologies were designed for image classification. In [33, 34], deep learning algorithms were applied to speech recognition. And, the problem of sentiment analysis was discussed in [35, 36]. In [37], a novel prediction method which composed DBN, multi-layer perceptron (MLP) and autoregressive integrated moving average (ARIMA) was given to analyze time series data. As a specific deep learning method, stacked autoencoders (SAE) utilizes the outputs of the previous layer as the inputs of the current layer, thus forming a deep network, to better represent the characteristics of input data. The SAE has been applied in many fields, especially in the building energy consumption prediction [38]. Existing results have verified that the SAE has better performance than traditional methods.
In this paper, in order to further improve the electrical load forecasting performance, a SAE based hybrid model is proposed. The proposed hybrid model combines the SAE with the extreme learning machine (ELM) to extract the characteristics of time series data of the electrical load. This proposed method takes the outputs of each layer of the SAE as the inputs of different ELMs, then, integrates the outputs of the ELMs to generate the final output by the linear regression method. In addition, to verify the effectiveness of the proposed hybrid model, it is applied to predict two real-world electrical load time series. Further, detailed comparisons with the SAE, ELM, BPNN, MLR and SVR are done to show the advantages of the proposed forecasting model.
The remainder of this paper is as follows. In Section 2, the SAE and the ELM will be introduced respectively. In Section 3, the proposed hybrid model will be presented in detail. In Section 4, two electrical load prediction experiments will be done. In addition, the experimental results and comparative experiments will be given in this section. Finally, in Section 5, the conclusions will be made.
Methodologies
In this section, the SAE and ELM will be introduced briefly. These two methods will be used as the main components of the proposed hybrid model.
AutoEncoder
AutoEncoder (AE) network is a shallow neural network which consists of the input layer, the hidden layer and the output layer [39, 40]. The structure of one typical AE is shown in Fig. 1. It is an unsupervised neural network and uses backpropagation algorithm to train weights. For each input vector, the expected output is set to be the same input vector. The output of this network approximates the input data, through minimizing the error between the two. The aim of an AE is to learn a compressed or sparse representation for a set of data.
The AE includes two processes: the encoding and the decoding. The encoding process from input layer to hidden layer for primary data

The structure of one typical AE
SAE is a neural network with two or more layers of automatic encoder, which is used in an unsupervised way. The structure of the SAE is shown in Fig. 2. Its main idea is to capture high-order features from data. The SAE uses greedy layer-wise training method to train each layer of the network in turn, and then pre training the whole depth neural network to better learn the data characteristics [44]. It trains each hidden layer separately, and the output of each hidden layer is used as the input of the next layer. Let’s take the training of a stacked autoencoder with two hidden layers as an example. Firstly, an AE with a hidden layer is created and trained, and the initial feature g(1) is activated after training. Then, the data is provided to the second AE, and the second feature g(2) is obtained as a new representation of the data.

The structure of the SAE
Extreme learning machine (ELM) is a new learning method proposed by Huang et al. [45] for the single hidden layer feedforward neural network. It has fast learning speed and satisfactory approximation performance. In this method, the number of hidden layer nodes is set before training. The input weights and the parameters of the hidden layers are randomly assigned in the course of training. And, the weights between the hidden layer and the output layer are then determined by the least square method. The whole process needs no iteration and can produce one unique optimal solution [46, 47].
Suppose that the training data set is
Then, the prediction result for the input
ELM tries to approximate the training samples with no error, which can be mathematically expressed as
Then, Equation (5) can be rewritten as
Finally, the weight vector
In this section, the proposed hybrid prediction model will be given. Firstly, we will present the framework of the proposed hybrid prediction model. Then, we will discuss how to determine the input variables for this hybrid model, and generate the original training data set based on the observed electrical load time series. Further, the design processes of this model will be discussed step by step.
Framework of the proposed hybrid model
The structure of the proposed hybrid model is demonstrated in Fig. 3. The outputs of each layer of SAE are calculated firstly and used as the inputs of the ELMs. Then, the outputs of the constructed different ELMs will be ensembled by the linear regression method to generate the final output of the hybrid model. It can be summarized into the following four steps.

The structure of the proposed hybrid model
Suppose that we have the electrical load time series data as p (1) , p (2) , p (3) , ⋯ , p (M). The constructed prediction model uses the previous values at time t to predict the value at time t + 1. This can be mathematically expressed as
In this study, the partial autocorrelation function (PACF) is adopted to realize this task. The PACF is one of the main tools to analyze the correlation between time series. It reflects the internal relations and interdependencies of time series variables and plays an important role in analyzing time series.
We adopt the steps in [48, 49] to calculate the delay n. Such steps are summarized as follows:
Calculate the autocovariance of the electrical load time series as
Calculate the sample autocorrelation function (ACF) as
Solve the following Yule-Walker equation to get the PACF value ψ (j) = ψ jj as
When n satisfies the following conditions
Once the time delay is determined, the original training data set can be obtained as
Based on this original data set, we will train the SAE to generate the mediate new data sets from the hidden layers of the SAE.
Firstly, train the first layer of the SAE, then we can obtain the first new data set
Then, based on this newly generated data set
Continue the above process, through training the ith layer of the SAE, we can obtain the ith new data set
Assume that we have k hidden layers in the SAE. Then, using the above new data set generation process, we can obtain k + 1 data sets for training the prediction model. The data sets include the original training data set and the newly generated ones which are listed below
For each one of the k + 1 data sets, we construct one ELM. In other words, using the data set
Following the ELM method in Subsection 2.3, the constructed jth ELM model has the following input-output mapping
From above discussion, we can obtain the original prediction result
In other words, the final prediction output of the hybrid model can be obtained as
For the original training data set, we expect that
This formula can be rewritten in the matrix and vector form as
Then, the parameters c0, c1, ⋯ , ck+2 of the linear regression part of the hybrid model can be obtained as
In order to show the effectiveness of the proposed hybrid prediction model, this section will apply it to two real-world electrical load prediction applications. Furthermore, to demonstrate the advantages of the proposed method, it will be compared with several popular artificial intelligence methods. In this section, we will firstly introduce the comparative methods and the evaluation indices, and then give the experiments and comparisons in detail.
Comparative methods
In this study, the proposed hybrid method will be compared with the pure SAE and ELM which are also the ensembled components of the hybrid prediction model. In addition, the proposed method will compared with some popular methods, such as the Back Propagation Neural Network (BPNN) [50, 51], the Support Vector Regression (SVR) [52] and the Multiple Linear Regression (MLR) [53, 54].
BPNN is a multilayer feedforward neural network based on the error back propagation algorithm. It has at least three layers of neurons and can realize arbitrary continuous mapping [50]. It has been widely used in various fields, such as the building energy consumption prediction [15], the network traffic forecasting [51], and so on.
SVR is a variant of Support Vector Machine (SVM). It adopts the kernel functions to solve some complex high-dimensional modeling or prediction problems. It has the superiorities of simplicity and strong generalization ability [52].
MLR is a correlation prediction method. It can determine the mathematical relationship between model variables by reasonably analyzing the observed data of the samples, and finally achieve prediction. It has been widely used in the river discharge forecasting [53], solar energy prediction [54] and other industrial fields.
Evaluation indices
In this paper, we adopt the following four indices which are the mean absolute error (MAE), the root mean square error (RMSE), the mean relative error (MRE) and the mean absolute error percentage (MSPE) to evaluate the performance of the proposed hybrid model and the comparative methods. Such indices can be computed as
MAE is the average of absolute values of deviations between all observations and arithmetic average, and it can accurately reflect the actual prediction error. RMSE is the square root of the ratio of the sum of the observed and true deviations to the number of observations. RMSE is very sensitive to outliers. MRE and MSPE are also often used to measure the quality of a model’s predictions. Greater values of these four indicators mean greater differences between the predicted values and the original values and worse prediction performance. In other words, for these comparison indices, smaller values imply better performance.
Applied data set
The electrical load data set in the first experiment was downloaded from the public website: https://openei.org/. In this study, we use the electrical load data of the Oak Ridge National Laboratory for our first experiment. The Oak Ridge National Laboratory is an Integration Center of the Campbell Creek Research House, and it is a research center for building technologies. In this experiment, the data sampling period was fifteen minutes, including 35040 samples from October 1, 2013 to October 1, 2014. This experiment divides the data into two parts: the training set and the test set. The data samples from October 1, 2013 to September 1, 2014 are used for training while the left data samples are for testing.
Experimental setting
Following the PACF method presented in Subsection 3.2, the delay n is determined to be 22, i.e. there are 22 input variables in the prediction models.
On the other aspect, the number of hidden layers and the number of hidden units in each hidden layer are significant to the performance of SAE model. In order to determine the optimal structure of the SAE model in this experiment, the number of hidden layers and the number of hidden units in each hidden layer were taken as the key research factors. The number of hidden layers was tested from 2 to 5, while the number of hidden units in each hidden layer was chosen from 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800. Thus, 56 trials are given.
The values of the MAE, MRE, RMSE and MSPE of the SAE model in the 56 trials in this experiment are shown in Table 1. From Table 1, we can see that when the number of hidden layers is 4 and the number of hidden units in each hidden layer is 800, the four indices have the smallest values among all the 56 trials. In other words, the SAE model can achieve the best performance when it has 4 hidden layers and each its hidden layer has 800 neurons. For the ELM in the hybrid model, the number of neuron nodes is set to be 110.
The values of the four indices of the SAE model in the 56 trials in the first experiment.
The values of the four indices of the SAE model in the 56 trials in the first experiment.
For the BPNN, the number of hidden layer nodes and the iteration number were respectively set to be 200 and 20000. Additionally, the sigmoid function and the gradient descent-based algorithm were used in the training process. For the SVR, the radial basis function was chosen as the kernel function and the penalty coefficient was set to 100. In addition, the parameter γ in the SVR was set to be 0.001.
The prediction results of the hybrid model in the first experimental are demonstrated in Fig. 4. In order to show the details more clearly, we plotted the results of five days’ electrical load forecasting. It can be seen from Fig. 4 that the proposed hybrid model has strong prediction ability and the predicted results are satisfactory.

The prediction results of five days from the hybrid model in the first experiment.
In order to examine the superiority of the hybrid model, the values of the four indices in the six prediction models in this experiment are listed in Table 2. Further, the box-plots of the absolute prediction errors in this experiment are demonstrated in Fig. 5.
The comparison values of the six models in the first experiment.

Box-plots of the absolute prediction errors in the first experiment, (a) Hybrid model, (b) SAE, (c) ELM, (d) BPNN, (e) MLR, (f) SVR.
Applied data set
The electrical load data set in the second experiment was downloaded from the website: https://www.aeso.ca. This historical electrical load dataset was collected by the Albert Electric System Operator (AESO) and provided for market participants. The data sampling period was one hour, including 8760 samples from January 1, 2016 to December 31, 2016. This experiment divides the data into two parts: the training set and the testing set. The data samples from January 1, 2016 to November 31, 2016 are used for training while the data samples in December, 2016 are for testing.
Experimental setting
In this experiment, the PACF was also adopted to determine the input variables for electrical load prediction and 23 input variables were selected.
In this case, to determine the optimal or suboptimal structure of the SAE, the number of its hidden layers was also tested from 2 to 5, while the number of hidden units in each hidden layer was chosen from 50, 100, 150, 200, 250, 300, 350, 400, 450, 500. Consequently, 40 trials are given.
Similarly, the values of the MAE, MRE, RMSE and MSPE of the SAE model in the 40 trials in this experiment are computed and listed in Table 3. From Table 3, we can observe that when the number of hidden layers is 3 and the number of hidden units in each hidden layer is 300, the SAE model can achieve the best performance. For the ELM in the hybrid model, the number of neuron nodes was set to be 100.
The values of the four indices of the SAE model in the 40 trials in the second experiment.
The values of the four indices of the SAE model in the 40 trials in the second experiment.
For the BPNN, the number of hidden layer nodes and the iteration number were respectively set to be 200 and 15000. The same as in the first experiment, the sigmoid function and the gradient descent-based algorithm were used in its training process. For the SVR, the radial basis function was chosen as the kernel function and the penalty coefficient was set to be 120. Again, the parameter γ was set to be 0.001.
The prediction results of the hybrid model is demonstrated in Fig. 6. The comparison values of the six prediction models are shown in Table 4. From this table, we can see that the hybrid model has the smallest values of the four indices which imply that the accuracy and fitting ability of the hybrid model are better than the other five models. The box-plots of the absolute prediction errors in this experiment are also plotted and demonstrated in Fig. 7. From this figure, we can also observe that the proposed hybrid model has the better forecasting performance than the other predictors.

The prediction results of the hybrid model in the second experiment
The comparison values of the six models in the second experiment.
From the above figures and tables, we can observe the following facts or make the following conclusions. Fig. 4 and Fig. 6 demonstrated parts of the prediction results of the proposed hybrid model. From both figures, we can observe that the proposed hybrid model can give satisfactory prediction results. This verified the effectiveness of the proposed hybrid model. In this study, we compared the proposed hybrid model with the SAE, ELM, BPNN,MLR, and SVR. The comparison values of four comparison indices of these six models in the first and second experiments are respectively listed in Table 2 and Table 4. From both tables, we can observe that the proposed hybrid model still performed best. Taking the comparison index MSPE for example, the prediction performance of the proposed hybrid model can improve at least 10% compared with the other comparative methods. Fig. 5 and Fig. 7 depicts the box-plots of the absolute prediction errors of the six prediction models. In descriptive statistics, box-plot is a method for graphically depicting groups of numerical data through their quartiles. On each box, the central mark indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points not considered outliers, and the outliers are plotted individually using the ’+’ symbol. Thus, the box-plots in Fig. 5 and Fig. 7 can reflect the details of the distributions of the absolute prediction errors. Again, from such box-plots, we can see that the proposed hybrid model has the smallest prediction error median and the smallest heights between the 25th and 75th percentiles. This once again verified the advantages of the proposed method. Box-plots of the absolute prediction errors in the second experiment: (a) Hybrid model, (b) SAE, (c) ELM, (d) BPNN, (e) MLR, (f) SVR.

Electrical load prediction plays a guiding role in power grid. In order to further improve the precision of electrical load prediction, a hybrid model was proposed in this paper. The proposed hybrid model combines the SAE and the ELMs to learn different layers of characteristics of the electrical load time series data. In addition, the proposed hybrid model was applied to predict two real-world electrical load time series. Further, in order to prove its superiority, the proposed model was compared with five popular artificial intelligence methods which are the SAE, ELM, BPNN, MLR and SVR. Experimental and comparison results demonstrated that the proposed hybrid model can achieve much better performance than the comparative methods in this electrical load forecasting application.
In this study, we only used the electrical load time series to realize the prediction. However, as well known, the electrical load may be affected by some important factors, such as the weather changing, the human activities, etc. Hence, it will be useful for further improving the prediction performance through incorporating such factors into our proposed hybrid model. This will be done in our future work. On the other hand, the proposed hybrid model can be applied to similar time series prediction problems, e.g. the traffic flow prediction, the wind speed estimation, and the stock market forecasting. This will also be one of our research directions.
Footnotes
Acknowledgment
This work was financially supported by the National Natural Science Foundation of China (61473176, 61573225), the Taishan Scholar Project of Shandong Province (TSQN201812092), and the Innovation Team of the Co-Innovation Center for Green Building of Shandong Province in Shandong Jianzhu University.
