Abstract
The energy load data in the micro-energy network are a time series with sequential and nonlinear characteristics. This paper proposes a model based on the encode-decode architecture and ConvLSTM for multi-scale prediction of multi-energy loads in the micro-energy network. We apply ConvLSTM, LSTM, attention mechanism and multi-task learning concepts to construct a model specifically for processing the energy load forecasting of the micro-energy network. In this paper, ConvLSTM is used to encode the input time series. The attention mechanism is used to assign different weights to the features, which are subsequently decoded by the decoder LSTM layer. Finally, the fully connected layer interprets the output. This model is applied to forecast the multi-energy load data of the micro-energy network in a certain area of Northwest China. The test results prove that our model is convergent, and the evaluation index value of the model is better than that of the multi-task FC-LSTM and the single-task FC-LSTM. In particular, the application of the attention mechanism makes the model converge faster and with higher precision.
Keywords
Introduction
The multi-energy system not only improves the utilization rate of renewable energy and reduces environmental pollution but also improves the flexibility of the energy system. The micro-energy network is a form of multi-energy system application. The micro-energy network integrates multiple energy inputs, outputs of multiple products (including electricity, gas, etc.) and multiple energy conversion units, and thus the uncertainty and time-varying energy in the micro-energy network are notably strong. With deepening of the coupling between various forms of energy such as electricity and gas in the micro-energy network system and access to large-scale renewable energy, the accuracy and real-time performance of energy load forecasting have higher requirements. The micro-energy network is an effective way to use renewable energy. As the proportion of renewable energy in the energy system increases, coordinated control of the energy supply and load within the micro-energy network is expected to become an important research field.
In recent years, deep learning has been widely applied in computer vision (CV) [21] and natural language processing (NLP) [6, 7], e.g., object detection [16], image segmentation [12], speech recognition [23], text classification [25], and machine translation [1, 31]. Energy load forecasting is characterized by time series and non-linearity, and thus two main types of energy load forecasting exist. One type is time series analysis methods, such as multiple linear regression models and autoregressive integrated moving average models (ARIMA). The basic idea is of these models is to extrapolate future data from time series historical data and current data, and their common advantage is that they consider the temporal relationship of data, and thus the model is simple and widely used. The disadvantage of multiple linear regression is an insufficient ability to predict nonlinear relational data. The ARIMA is inherently unable to capture nonlinear relationships and requires a stationary time series for input. The other type of method is shallow learning methods based on machine learning, such as support vector machine (SVM) and shallow neural networks. The main advantage of shallow machine learning models is their strong nonlinear fitting ability, and the neural network method also has self-learning capability. A common shortcoming of shallow learning methods is the lack of correlation considerations for time series data.
The long-short term memory (LSTM) [14] is a recurrent neural networks (RNNs) method with memory characteristics that solves long-term dependence defects due to gradient disappearance. Because the LSTM network can effectively solve the time dependence problem of temporal data [6], it is widely used in NLP [7] and time series prediction [9]. Dr. Shi Xing applied the convolution operation to the LSTM network and proposed the ConvLSTM structure [19]. This neuron structure can solve the problem of the full-connection long-term memory network (FC-LSTM) containing a large amount of redundant spatial data without considering spatial correlation when addressing spatiotemporal sequence problems.
Based on analysis and understanding of the existing results, this paper proposes an encoder-decoder architecture model based on the ConvLSTM, LSTM, attention mechanism and multi-task learning methods. The experimental results show that the solution of each task in the model is beneficial to the solution of each energy load forecasting task in the micro-energy network, and the use of the attention mechanism significantly improves the prediction accuracy and reduces the training time.
Traditional time series analysis methods require stable time series and have an insufficient ability to capture non-linear relationships. A lack of non-linear fitting capabilities exists in shallow machine learning methods, such as SVM regression methods, which also lack consideration of the time correlation of time series.
In previous papers [39, 40], the author used the encoder-decoder structure of CNN-LSTM to encode and decode the input time series, but CNN as an encoder does not have the ability to model time series. This paper uses ConvLSTM as an encoder, which has the ability to describe local features, similar to CNN, and also has the ability to model time series in a manner similar to LSTM. ConvLSTM is a neuron with time and space characteristics. In another paper [41], the author used multi-task regression to solve the problem of power demand forecasting. Although the author considered the power demand problem under the influence of multiple factors, the linear regression model was essentially insufficient in its ability to fit nonlinear problems. The idea proposed in this paper not only considers the energy demand problem under the influence of various factors but also applies the encoder-decoder architecture with ConvLSTM as the encoder. The use of ConvLSTM not only considers the time dependence of the sequence but also considers the interrelationships between the input data and fully explores the potential relationships between the data. Another highlight of this paper is that an attention mechanism is applied between the encoder and decoder. The introduction of the attention mechanism allows the model to give higher weight to the corresponding features and allows the model to learn from the data to a deeper level.
The structure of this paper is described as follows. First, we introduce the basic principles of the LSTM, ConvLSTM, attention mechanism, encoder-decoder architecture and multi-task learning. Second, we establish a model based on the encoder-decoder architecture and give a standard for evaluating the accuracy of the model followed by analysis of the basic model structure. Finally, the feasibility of our model is verified by comparing the multi-tasking model and the single task prediction model using LSTM and CNN as encoders.
Overall, the main contributions of this paper are listed as follows:
A hybrid model based on ConvLSTM is proposed for multi-time scale prediction of multi-energy in multi-energy networks; The experimental results show that the model designed in this paper performs better in the multi-energy load prediction experiment on the micro-energy network in Northwest China than the model using CNN and LSTM as the encoder.
Data standardization
The data used in this paper have a total of ten dimensions: time information, historical data of two variables that must be forecasted, and environmental data. The environmental data include temperature, air pressure, wind speed, wind direction, humidity, cumulative hourly snow volume, and cumulative hourly rainfall volume. In dataset, the missing value and noise information of the data in this paper does not exceed 5%, and the outlier value does not exceed 1%. For missing values, this paper uses the mean of their features to fill the gaps. The wind direction feature is not a digital feature, and thus the wind direction feature is encoded in this paper. The wind direction data are encoded as values from 0 to 7.
Finally, to eliminate the influence of the feature data dimension and accelerate the search for the optimal solution, it is necessary to standardize the data. This paper uses min-max scaling to map the data between 0 and 1. The calculation formula is written as follows:
The model prediction results are inversely transformed to the normal range, and the transformation is shown below:
Constructed 3D input data.
We define the micro-energy network multi-energy load time series data in the form of
We treat each
Input feature map.
LSTM
In the deep RNNs, due to the gradient disappearance problem, it is only theoretically possible to establish a long-term dependency between states [29]. The LSTM [14] is an improved RNN structure that introduces a gating mechanism to control the information transfer path. This mechanism can effectively solve the gradient disappearance problem of RNNs. The three doors of the LSTM are the input gate, forgetting gate and output gate. Multiple LSTM neurons can be stacked and connected in time to form a more complex structure. The LSTM structure is shown in Fig. 3.
LSTM structure.
Where,
Where
Dr. Shi Xingjian proposed the use of a convolution operation inside the LSTM, thereby creating the ConvLSTM structure. ConvLSTM not only has LSTM timing modeling capabilities but also portrays local features similar to CNNs [36], and it is a neuron with temporal and spatial characteristics [19] (Fig. 4).
Inner structure of convolutional LSTM [19].
The main calculation formula of ConvLSTM is given as follows, where
The convolution product is used in ConvLSTM instead of matrix multiplication. This operation determines the future state of a cell through the input and past states of the surrounding cells. This process can be easily accomplished using convolution operators in state-to-state and input-to-state transitions. In the information construction process, all elements in ConvLSTM are three-dimensional, and all components of the network in LSTM are only one-dimensional arrays. The input to ConvLSTM is a three-dimensional tensor. The first dimension in the tensor is taken as the number of channels in the picture, and the last two dimensions in the tensor represent the time and space information [20].
The above advantages of ConvLSTM allow it to address time and space issues and extract temporal features and extract spatial features, and thus ConvLSTM effectively uses temporal and spatial associations. The input time series has a correlation with time and also has a correlation with each other, and each of the quantities can be considered as a spatial relationship. Therefore, the advantages of the related spatiotemporal characteristics can be extracted using the above ConvLSTM.
Deep neural networks (DNNs) cannot make the input sequences and output sequences different because DNNs require both the input and output to be fixed dimensions. Cho et al. [1] proposed the RNNs encoder-decoder model. This model contains two sub-models, an encoder model and a decoder model. The role of the encoder is to encode the input data into a vector representation in a continuous feature space. The decoder is used to decode the encoded input sequence and output the target sequence. These two models are jointly trained to maximize the conditional probability of the target sequence.
In brief, the encoder-decoder architecture encodes a variable length sequence into a fixed length vector representation and subsequently decodes the fixed length sequence into a variable length sequence. From the perspective of probability theory, the encoder-decoder model is a general method for learning a conditional distribution on variable-length sequences on the condition of another variable-length sequence, such as
If the encoder is an RNNs, the encoder reads the input sequence
and
where
The decoder is another RNNs that trains the input to generate an output sequence by predicting the next symbol
Where
The architecture of the model is shown in Fig. 5.
RNN encoder-decoder architecture.
In 2014, Ilya Sutskever proposed replacement of the RNNs in the encoder-decoder model [2] with LSTM and use of a layer of LSTM to read the sentences of the input language, obtain the vector representation of the largest dimension, and use the second layer of LSTM to extract the sentences of the output language from the vector. This paper uses the model structure proposed by Ilya Sutskever to achieve time series prediction.
The attention mechanism is used primarily for information screening, and the relevant information is selected from the input information. The calculation of the attention mechanism can be divided into two steps: one step calculates the attention distribution on all input information, and the other step calculates the weighted average of the input information according to the attention distribution [37].
The attention distribution
Where,
Soft attention mechanism.
This paper must forecast multiple energy loads simultaneously, and thus we decided to use multi-task learning. Multi-task learning [13] is used in many fields [15] and learning is considered to be more fully represented than other single learning. At the same time, this method can capture the intrinsic association features between tasks. Using multiple loss functions to share parameters, the feature representation capability of the neural networks can be improved. Because the tasks are related, the sharing of parameters helps to improve the generalization ability of the network [37].
The reasons for why multi-task learning can improve generalization ability are listed as follows:
The multitasking training set is larger than that of single task learning, and the task is related, which is equivalent to the implicit data enhancement, and thus the generalization ability is improved; Multi-task learning takes care of the fit of all tasks, which is equivalent to regularization, thus avoiding overfitting a single task learning.
The mathematical expression of the multitasking model is described as follows. Assume that the model has
where
Assuming that the
Where
This section describes our model in as much detail as possible such that the reader can better understand the model structure. In our model, we draw on the use of attention mechanisms in target recognition [32], semantic segmentation [33], and machine translation [31]. We use different attention mechanisms in our models to maintain task specificity and establish long-term dependencies for each task.
In essence, micro-energy network multi-energy load forecasting is time series prediction [30]. The encoder-decoder structure is a specific architecture in which RNN solves sequence-to-sequence problems. Instead of directly outputting the predictive vector, the encoder reads and encodes the input sequence and subsequently inputs it into the decoder for decoding. However, both the encoder and the decoder predict the output of the sequence, but the main difference is that in the decoder model, we allow it to know both the predicted value at the previous moment in the sequence and the cumulative internal state in the output sequence. In this paper, the encoded features are repeatedly input multiple times to accumulate internal states in the decoder and to achieve multi-step prediction.
When training the network, we combine two different loss function trainings. After applying the attention mechanism, the network of each task can produce a better fit to the data of the respective tasks in less time, which means that deeper relationships between data can be learned in less time and with fewer model parameters.
The encoder-decoder model is structured as follows Fig. 7, and the model parameters are shown in Table 1 below.
Model configuration table
Model configuration table
Model structure diagram.
Specifically, in the encoder-decoder architecture, the encoder reads the input time series
In this paper, the attention mechanism is applied between the encoder and the decoder, and our encoder uses the ConvLSTM network. Therefore, we calculate the vector differently from the RNN. The calculation formula for the hidden layer of the encoder calculation in this paper is given in Eq. (3.2).
The scoring function
The formula for calculating the weight
We parameterize the linear model
The attention mechanism multiplies
The specific operation process of the attention mechanism is shown in the following Fig. 8.
However, the regression model can use a variety of loss functions such as (mean absolute error) MAE, mean square error (MSE), etc. In our model, the mean absolute error (MAE) is used as the loss function for each task. Because our data are likely to contain outliers and MSE is highly sensitive to outliers, we choose MAE. The loss function for each task in our model is given as follows:
Where
Attention mechanism calculation process.
After obtaining the loss function for each task, we need to minimize the loss function for each task. We combine the linear weighted summation of the individual task loss functions as follows:
where,
In this section, we introduce the data sources, evaluation criteria and experimental results and finally present the performance of our network under various evaluation criteria. We also present a horizontal comparison with other networks. To make the experimental results more convincing, this paper compares the selected network data with the best performance of each network to the network data in this paper.
The data in this paper are the operation data from the micro-energy network of the China Electric Power Research Institute in a certain location in northwestern China. The data are desensitized by professionals. The experimental data are ten-dimensional data including two historical energy data and environmental data. Two types of energy must be forecast in the experiment, namely, electrical load and gas load.
The experimental results show that our model has the advantage of higher prediction accuracy than other excellent models. Specifically, we use the
In total, we use 43,000 data sets, and the training set size is 35,000 groups, of which approximately 10% of the data is used as a validation set. Ten feature quantities are present in the input data, and thus the input feature map is a
Experimental details:
To assess the accuracy of the model, we used RMSE (root mean squared error), MAPE (Mean Absolute Percentage Error), and
We mainly use RMSE to evaluate the model.
Since the MAPE index shows the error rate for each information point, the sum of the error rates of these information points can be used for horizontal comparison. All we will use MAPE to compare the models.
The
Multitasking input and output feature map.
In this experiment, we set up the first 20 time steps to predict the sequence of the next 5 time steps. The input and output characteristic map is shown in Fig. 9. The two time series of the predicted electrical load and the gas load are used as the tag training neural network of the respective tasks.
The input sequence is repeatedly input to the decoder 5 times after passing through the ConvLSTM encoding. Each unit of the decoder outputs a time series of length 5, which is the number of steps we need to predict, and finally outputs a time series of length 5 after interpretation by the fully connected layer.
Model comparison and data analysis
In this section, we will compare the models from the three aspects of RMSE, MAPE,
Because the encoder-decoder architecture was proposed, many networks have been used as encoders. However, for multi-energy forecasting of multi-energy load in the micro-energy networks, this paper decided that the ConvLSTM network is best suited for this task. The model designed in this paper uses three evaluation indicators to compare with other multi-task models, single-task models, and models that do not use attention mechanisms. The maximum number of iterations for all methods is 50. The description of each comparison model is given as follows:
In Table 2, this paper compares the differences between different tasks of several neural networks. As shown from the comparison in Table 2, the ConvLSTM-based model has significant advantages over the LSTM and CNN-based network. Especially for the gas load forecasting, the accuracy of the ConvLSTM model prediction is much higher than that of LSTM and CNN.
From Table 2, we observe that among the six evaluation indicators of the two tasks, the hybrid model designed in this paper (in addition to the RMSE index of the gas energy load) is completely ahead of the other models.
RMSE, MAPE and
In Fig. 10. Comparison of evaluation criteria for each step of each model. Comparison of evaluation criteria for each step of each model. below we compare the RMSE, MAPE,
Comparison of evaluation criteria for each step of each model.
Comparison of overall evaluation criteria.
Various time step prediction renderings of electrical load.
Various time step prediction renderings of gas load.
Comparison of different training set ratios at each time step.
Overall comparison of different training set splits.
Comparison of forecast evaluation criteria of oxynitride and sulfur dioxide.
C+L+At+Mul: CNN+LSTM+Att+Multi; ConL+At+Mul: ConvLSTM+Att+Multi; ConL+Mul: ConvLSTM+Multi; ConL_Sing_Ele: ConvLSTM_single_Ele; L+At+Mul: LSTM+Att+Multi; ConvL_sing_Gas: ConvLSTM_single_Gas; L_Sing_Gas: LSTM_Single_Gas; L_Sing_Ele: LSTM_Single_Ele; L+Mul: LSTM+Multi.
To make the experimental comparison more convincing, the maximum number of iterations for each model is 50, and the mini-batch is 64. The input data are the same processed data. The various parameter settings of the neural network are as consistent as possible.
From the results in Figs 10 and 11, the evaluation indicators of the model proposed in this paper are superior to those of other models. Moreover, as the prediction time scale increases, the prediction accuracy decreases, but the accuracy of the model is slower than other models, especially for the network with LSTM as the encoder. The model’s long-term scale prediction performance is superior to that of other models. It can also be observed that the model performance of the applied attention mechanism is better than the model without the attention mechanism.
It can be observed from Figs 10 and 11 that a large difference exists in the
In other words, in prediction of the gas energy load, LSTM does not work well as an encoder, and the model using ConvLSTM and CNN as the encoder has better prediction results. Because LSTM does not have feature extraction capabilities, all input features are encoded, resulting in features that are not desired to be learned by the model. Models that use CNN and ConvLSTM as encoders extract features of the input data to reduce the impact of uncorrelated features on the predicted output.
In addition, the electrical energy load is much higher than the predicted accuracy of the gas load. We believe that there are two main reasons for this result:
In the micro-energy network, the randomness of the gas energy load is large, which means that the model able is not able to fit the corresponding gas energy load curve effectively; Another reason is that this method is limited by data. The data set contains no gas load-related feature quantity. In other words, that is to say, the change in the amount of features that we use does not have much correlation with the change in the gas load.
Figures 12 and 13 below give a comparison of the predicted and actual values of the electrical energy load and the gas energy load. From the figure, the fitting effect of the model is quite good. Figures 12 and 13 show the top-five fitting maps from top to bottom. It is obvious that the fitting effect of the first time step is the best, and the accuracy begins to decrease.
From Figs 14 and 15 below, we can see the performance of the model in different training sets and test sets. The larger the training set, the better the performance of the three evaluation criteria.
We have found time series data of harmful gas emissions from petrochemical companies. The data includes oxynitride, sulfur dioxide, reaction temperature, reactant concentration, etc. The total number of data sets is 50,000, including two data must to be predicted, and the total number is 13 dimensions. Use the prediction model in this paper to predict the concentration of oxynitride and sulfur dioxide. The comparison with other models is as follows Table 3 and Fig. 16. Overall, our model is superior to other commonly used deep learning models.
Evaluation criteria of our multi-task learning model on another data sets against others model
In this paper, we design a novel encoder-decoder architecture model to achieve multi-time scale prediction of multi-energy in micro-energy networks. In this paper, the ConvLSTM encoder is used to encode the input data into a fixed-length vector, and LSTM is used to decode the fixed-length vector into the resulting output. ConvLSTM extracts the characteristics of the input data, similar to CNN, and can effectively extract the global characteristics of the input data. The use of multi-task learning improves the generalization ability of the model, and the use of the attention mechanism accelerates the learning speed of the model and improves the robustness of the model.
In future work, we will explore deeper multitasking relationships and design multitasking models that predict more energy at the same time. Additionally, we must also consider the shortcomings of the model in the actual application process and make the corresponding improvements.
