Cloud-edge collaboration based transferring prediction of building energy consumption

Abstract

Building energy consumption (BEC) prediction often requires constructing a corresponding model for each building based historical data. However, the constructed model for one building is difficult to be reused in other buildings. Recent approaches have shown that cloud-edge collaboration architecture is promising in realizing model reuse. How to complete the reuse of cloud energy consumption prediction models at the edge and reduce the computational cost of the model training is one of the key issues that need to be solved. To handle the above problems, a cloud-edge collaboration based transferring prediction method for BEC is proposed in this paper. Specifically, a model library stored prediction models for different types of buildings is constructed based the historical energy consumption data and the long short-term memory (LSTM) network in the cloud firstly; then, the similarity measurement strategies of time series with different granularity are given, and the model to be transferred from the model library is matched by analyzing the similarity between observation data uploaded to the cloud and the historical data collected in the cloud; finally, the fine-tuning strategy of the matching prediction model is given, and this model is fine-tuned at the edge to achieve its reuse in concrete application scenarios. Experiments on practical datasets reveal that compared with the prediction model which doesn’t utilize the transfer strategy, the proposed prediction model has better performance according to MAE and RMSE. Experimental results also confirm that the proposed method effectively reduces the computational cost of the network training at the edge.

Keywords

Cloud-edge collaboration transfer learning data driven similarity analysis energy consumption prediction

1 Introduction

Building energy consumption prediction is one of the key tasks of smart building data analysis, and it plays a considerable role in building energy conservation and refined energy management. Various methods for BEC prediction have been designed in recent years. Among which, data-driven methods are widely utilized in this field. These data-driven methods mainly include statistical means [1, 2] artificial neural networks [3], support vector machines [4, 5], random forests [6, 7] and deep learning [8 –10]. Lately, non-iterative methods have provided a novel modeling method for realizing the prediction tasks. A fast non-iterative information modeling scheme based a new geometric transformation is designed [11]. Izonin et al. [12] proposed a new solution for multiple linear regression problems with linear non-iterative calculations, which ensures the speed of identifying polynomials. Nevertheless, related researches show that due to the powerful feature extraction and approximation capabilities, deep learning methods perform optimum in building energy consumption prediction [13, 14]. However, the training of the deep prediction model requires abundant historical data, and the computational cost of model training is large. On the other hand, the above-mentioned methods for predicting energy consumption establish a new model for each building, and it is difficult to reuse the constructed model.

The emergence of edge computing makes it possible to meet the requirements of real-time data action in some areas [15] and reduce the computational cost of model training. Edge computing is composed of central cloud, edge cloud and terminals. The central cloud is responsible for analyzing large-scale data and optimizing the model training, the edge cloud is close to the terminals, it is responsible for analyzing and calculating part of the data or real-time data [16]. But the computing power at the edge is susceptible to memory constraints. To conduct data analysis more rationally, the cloud-edge collaborative architecture came into being, and it has achieved some successful applications in the construction field. Shen et al. [17] proposed a cloud-edge collaborative architecture with robust optimization model, which alleviates the cost of modeling and the calculation of large-scale distribution systems. Jararweh et al. [18] introduced a smart city service scheme based cloud-edge collaborative architecture to improve the work efficiency of smart city applications. In addition, Gang et al. [19] devised a cloud-edge collaborative scheme for non-intrusive load monitoring(NILM), experimental results revealed that this strategy effectively improved the traditional NILM models’ reliability. Accordingly, the cloud-edge collaboration technology provides technical support for alleviating the computational cost of building data analysis, and it is promising for building energy management.

Whereas, the problem of reusing cloud prediction model at the edge and reducing the training cost of the edge model need to be solved. Transfer learning provides an effective way to break this bottleneck. It migrates the knowledge gained from the related situation to the task of another. At present, transfer learning has been successfully applied in sentiment analysis, text classification, computer vision, medicine and other fields. Zhuang et al. [20] resorted the matrix factorization technology to solve the problem of cross-domain text classification. A safety guardrail detection model with convolutional neural network and transfer learning was given by Kolar et al. [21], experimental results demonstrated that the proposed approach’s detection accuracy reaches 96.5%. In order to predict the people’s emotions in various environments, Bawa et al. [22] designed a joint distribution program, which used LSTM network to extract global and local features.It diagnosed knee osteoarthritis using transfer learning [23]. Mao et al. [24] proposed an on-line detection strategy for early fault detection using transfer learning, and this strategy extracts deep features by fine-tuning the original support vector machine detection model. What’s more, transfer learning has also been initially devoted to intelligent buildings field. For instant, a building energy consumption prediction model transferred samples and models was provided, which has achieved deep model training with fewer data, and it was verified that transfer learning can be used for BEC prediction through statistical evaluation metrics [25]. In [26], a transferring prediction approach based seasonality and trend of time series for cross-building energy consumption was raised. In this method, the combined data of similar buildings were utilized to forecast the energy consumption with multiple characteristics. Besides, Fang et al. [27] presented a short-term BEC prediction framework using LSTM network and domain adaptive neural network, and this scheme transferred the knowledge obtained from relevant building data to support target data prediction for handling the problem of target data shortage. Whereas, in the field of building energy analysis, there is few research on the combination of transfer learning and cloud-edge collaborative architecture. Combining them can effectively achieve the reuse of the cloud models at the edge, decrease reliance on the large number of training samples and reduce training cost at the edge.

In this paper, a cloud-edge collaboration based transferring prediction method for BEC is proposed. The main contributions of the paper are: (1) in the cloud, the energy consumption prediction models for different types of buildings are constructed based the LSTM network. (2) a similarity measurement strategy for time series with different granularities is presented, through this approach, the similarity analysis between the observation data uploaded at the edge and the historical data in the cloud is realized, which provides support for the matching transfer model. (3) the fine-tuning strategy of the energy consumption prediction model that will be reused at the edge is given, and the strategy is verified by two application scenarios.

The rest of the paper is organized as follows. The theoretical description of transfer learning is provided in Section 2. Section 3 introduces the architecture of cloud-edge collaboration based transferring prediction method in details. In Section 4, we discuss the performance of the proposed model as regards predicting two practical datasets, after which a comparison is made with the LSTM model which don’t utilize the transferred approach. Finally, we conclude in Section 5.

2 Theoretical background

Transfer learning, raised by Professor Yang qiang in 2005, is an important branch in the field of machine learning. It is a learning process that transfers knowledge learned in the source domain to complete the task in the target domain [28], as shown in Fig. 1. Transfer learning involves two definitions: domain and task, in which domain is divided into source domain and target domain. Source domain refers to the domain with enough samples, denoted as D _s , $D_{s} = {y_{1}, y_{2}, \dots, y_{m}}_{m = 1}^{M}$ , where y_i is the i - th sample. Target domain is the data to be studied, which is recorded as D _t , $D_{t} = {t_{1}, t_{2}, \dots, t_{i}}_{i = 1}^{N}$ , the t_i is the i - th sample in the target domain.

Fig. 1

Schematic of transfer learning.

Transfer learning mainly includes four learning schemes: sample transfer, feature transfer, model transfer and relationship transfer. Sample transfer is to set weight generation rules for the available data in the source domain, which increase the amount of data in the target domain [29, 30], as shown in (a) in Fig. 2. And the main purpose of the feature transfer is to transform the data of two domains into a unified feature space [31, 32] through feature transformation, the process is shown in (b) in Fig. 2. Moreover, the realization of model transfer is based the parameters of the model can be shared by the two domains, as shown in (c) in Fig. 2. The relationship transfer aims to study the relationship among samples in different fields [33], and its principle is shown in (d) in Fig. 2.

Fig. 2

Transfer learning methods: (a) sample transfer, (b) feature transfer, (c) model transfer, (d) relationship transfer

Due to the outstanding performance of deep learning models in various fields, model transfer utilizing these networks has been widely concerned [34, 35]. Fine-tuning is one kind of model transfer methods, its main purpose is to migrate part or all feature extraction layers of the trained network, while retraining the non-transferred part of reused network. In this study, this method is applied to realize model reuse.

3 Cloud-edge cooperation based transferring prediction scheme

3.1 Architecture of the proposed model

The architecture of cloud-edge collaboration based transferring prediction method is shown in Fig. 3, it mainly includes three parts: constructing a model library in the cloud, analyzing the similarity of sequences and realizing model reuse at the edge, as detailed below.

Fig. 3

The architecture of cloud-edge collaboration based transferring prediction for building energy consumption.

(I) Energy consumption prediction models are constructed for different buildings using historical energy datasets in the cloud, and then the prediction model library is built.

(II) Target data collected at the edge are uploaded to the cloud, the similarity between the cloud data and the target data is analyzed to match the transferred model.

(III) Prediction model corresponding to the data of source domain is downloaded to edge, and then the learned parameters of LSTM layer is reused; besides, the data at the edge are adopted to train a new fully connected layer (FCL) to ensure the performance of the transferred model.

In the following sections, the implementation process of the above steps is given in detail.

3.2 Constructing model library in the cloud

In general, we assume that y (t), which is the output value of time t, is decided by the input values. In this study, the magnitude of the input variable for energy consumption prediction is determined by the partial autocorrelation function (PACF) [36]. This means that the y (t - p) which has a larger partial autocorrelation coefficient with y (t) is obtained. And then, the m energy consumption data before time t which is y (t - m) , y (t - m + 1) , ⋯ y (t - 1) are employed to forecast the value of y (t). Therefore, the form of the energy consumption prediction model is defined as: $y (t) = f (y (t - m), y (t - m + 1), \dots y (t - 1))$ (1) where f (·) denotes the LSTM algorithm that implements the prediction process. Suppose that the input variable of the prediction model is z = (z₁, z₂, ⋯ , z_p) ^T, among them z₁ = y (t - p) , z₂ = y (t - p + 1) , ⋯ , z_p = y (t - 1), and y (t) is the output variable.

To train and test the LSTM prediction model, the energy consumption sequence requires to be processed into the shape of input-output data pairs, the data pair is descripted by Eq.2. $(z^{(n)}, y^{(n)}) n = 1, 2, \dots, N - p$ (2) where z ⁽ⁿ⁾ = [y (n) , y (n + 1) , ⋯ , y (n + p - 1)], y⁽ⁿ⁾ = y (n + p), N is the total length of the energy consumption sequence.

To construct energy consumption prediction model, the network structure which is composed of three LSTM layers and a FCL is considered. Generally, the state unit and the three gate units are considerable components to realize LSTM layers’ function [37], and the internal structure of LSTM is shown in Fig. 4. The above input-output data pairs are input into the network. And then, forgetting gate F_t selects the value in the input vector z ⁽ⁿ⁾ that can be fed to the network, and the status unit is updated. The calculation of the forgetting gate F_t is expressed in the following equation: $F_{t} = σ (v_{f} \cdot [y_{t - 1}, z_{t}] + b_{f})$ (3) where y_t-1 is the output of the previous time, z _t is the input of time t; And σ (·) is the Sigmoid function, v _f and b _f delegate the weight matrix and bias matrix in this calculation process respectively.

Fig. 4

The structure of LSTM

Subsequently, z ⁽ⁿ⁾ filters the updated information through the input gate I_t given by Eq.4, and the updated value is changed from the tanh layer to the candidate value. $I_{t} = σ (v_{i} \cdot [y_{t - 1}, z_{t}] + b_{i})$ (4) where tanh(·) is a function operation, v _i and b _i is the weight matrix and bias matrix in the computation process of input gate. In the above process, the current network state is recorded and saved by the state unit d_t given by Eq.5. $\begin{matrix} d_{t} = F_{t} * d_{t - 1} + I_{t} * tan h (v_{d} \cdot [y_{t - 1}, z_{t}] + b_{d}) \end{matrix}$ (5) where d_t-1 is the previous network state, F_t is the calculated value of the forget gate, and I_t represents the input gate’s value.

Afterwards, the output door O_t selects the value of the current state unit that is employed as the output, and O_t is computed as follow: $O_{t} = σ (v_{o} \cdot [y_{t - 1}, z_{t}] + b_{o})$ (6) where v _o and b _o are the weight matrix and bias matrix in this process. Eventually, the output of the LSTM layer is y_t, which is defined as: $y_{t} = O_{t} * tanh (d_{t})$ (7) where O_t is the value of output gate, and d_t is the unit state at time t.

When training LSTM neural network, the gates’ values, memory cells and hidden states are updated cyclically, and the weight matrix and bias matrix of each process are initialized randomly. In this paper, MSE is used as the loss function to evaluate network performance, the computation of MSE is as below: $MSE = \frac{1}{M} \sum_{i = 1}^{M} (y_{i} - \hat{y_{i}})^{2}$ (8) where y_i is the i - th ground truth data, $\hat{y_{i}}$ is the i - th predicted data, and M denotes the amount of training data. Then the Adam algorithm [38] is adopted to optimize the training of the network, ReLU acts as the activation function of the FCL. Moreover, the Dropout algorithm is utilized to prevent the network from over-fitting.

Furthermore, the LSTM network is pre-trained with energy consumption data of different buildings to establish prediction models, and these models are preserved to the cloud model library to increase the number of available models.

3.3 Matching model

Similarity analysis draws the degree of similarity and difference between domains, when analyzing the similarity of the data with different dimensions or translation trend, the normally used measurement strategies, such as Euclidean distance, Cosine similarity, Manhattan distance, can not analyze the similarity of the above data well. This problem also exists in the application of BEC prediction. Considered that the acquisition frequency of target data and the data in the cloud may be different, this paper adopts the dynamic time warping (DTW) method [39] to analyze the data similarity.

Assume that the two energy consumption sequences to be matched are H = (h₁, ⋯ , h_m) and R = (r₁, ⋯ , r_n). The specific detection process is as below.

Initially, the distance among the corresponding elements of the two sequences is calculated by the Euclidean distance, while employing these distances to construct a distance matrix with m × n dimension. Euclidean distance is computed as: $d (i, j) = \sqrt{\sum_{i, j = 1}^{n, m} (h_{i} - r_{j}) 2}$ (9) In this process, there is a one-to-many correspondence in the two sequence elements; and then, with the help of dynamic programming, a path through several points is found in the distance matrix, which is called a warped path. Hence, the points passed by the path are the alignment points of the two sequences. The warped path of H sequence and R sequence is expressed as: $w_{HR} = ((h_{1}, r_{1}), \dots, (h_{i}, r_{j}), \dots, (h_{m}, r_{n}))$ (10) among them 1 ≤ i ≤ m, 1 ≤ j ≤ n.

Since the correspondence of the sequence elements is nonlinear, there are exponential lines in the warped path, we need to seek a optimally warped path. The optimal path is found according to the following steps. Firstly, specify that the warped path starts from the lower left corner of the matrix and accumulates to its upper right position; secondly, in this process, the stride is set to 1, this means that it only aligns with the point adjacent to itself; next, the warped path is defined to proceed along the right side of the previous point; besides, when going from point (i - 1, j) to point (i, j) in the distance matrix, it is supposed that the distance of the path in the horizontal or vertical direction is d (i, j), and the diagonal direction’s distance is 2d (i, j). These above steps are shown in Fig. 5, which shows three hypothetically warped paths, and the arrows in the image represent the direction of the next step.

Fig. 5

Seeking the optimal warping path.

Eventually, the path with the smallest cumulative distance is selected as the optimal path, and the total cumulative distance is applied to analysis the similarity of the two sequences, which is inversely proportional to the total calculated distance. Therefore, the similarity of the two sequences is calculated as shown in the following equation: $s (i, j) = \min {\begin{matrix} s (i - 1, j) + d (i, j) \\ s (i - 1, j - 1) + 2 d (i, j) \\ s (i, j - 1) + d (i, j) \end{matrix}$ (11) where s (i, j) denotes the cumulative distance that sequences are aligned sequentially from the starting point to the i - th point of the sequence H and the j - th point of the sequence R .

Through the above steps, the model that is reused to predict target data at the edge is selected from the model library and its corresponding dataset is the optimum source data.

3.4 Model reusing at the edge

Considering that target data and source data have similarity, the parameters extracted from prediction model can be shared. Nevertheless, the two datasets are not exactly same, the matched model can not be utilized directly. To ensure the prediction performance of the model, we ought to fine-tune the matched model. The fine-tuning strategy given in this study is displayed in Fig. 6. It consists of data processing, model transfer and fine-tuning.

Fig. 6

The scheme of model migration and fine-tuning.

We utilize the Min-Max Normalization method to standardize the data collected at the edge primarily. Its principle is to convert each initial value in data Z into a value in the range of 0 to 1 with the linearization mean. It is calculated by Eq.12. $z_{norm} = \frac{z - z_{\min}}{z_{\max} - z_{\min}}$ (12)

In order to find the model to be matched, the DTW analysis method is applied to select the source dataset with the smallest distance from the target data. Then, the corresponding prediction model which will be reused is selected and download it to the edge.

In addition, the model downloaded to the edge needs to be fine-tuned. The fine-tuning strategy provided by this paper is that the parameters of LSTM layer adopts the value learned by pre-training, and the LSTM layer is connected with a new FCL, which is retrained with the training data in target data to reduce the training burden on the edge.

4 Experiments

This section verifies the cloud-edge collaboration based transferring prediction method for building energy with two practical application datasets, and compares it with the LSTM model employed non-transfer scheme.

We executed these experiments using Python 3.5 on an Intel(R) Core(TM)i7-7700 CPU 3.60GHz system with 8GB RAM. In addition, Keras2.1 was selected as the deep learning framework for model training.

4.1 Performance metrics

In this work, root mean square error (RMSE), mean absolute error (MAE) and training time are selected as performance metrics to evaluate the performance of the devised model. The smaller their values are, the better the model prediction performance is. $RMSE = \sqrt{\frac{1}{M} \sum_{i = 1}^{M} {(y_{i} - \hat{y_{i}})}^{2}}$ (13) $MAE = \frac{1}{M} \sum_{i = 1}^{M} | y_{i} - \hat{y_{i}} |$ (14) where y_i denotes the i - th ground truth, $\hat{y_{i}}$ denotes the i - th predicted data, M represents the amount of test data.

To prove the proposed scheme can decrease the computatinal cost of model training at the edge, training time of LSTM model at the edge is mainly premeditated, which includes the time of training the LSTM model without transferring and the training time of the model after fine-tuning.

4.2 Experiment 1

4.2.1 Dataset

The first dataset is downloaded from https://trynthink.github.io/buildingsdatasets/, which collected the energy consumption data of Oak Ridge National Laboratory in the United States from 2013 to 2014, and it is adopted as irregular data to test the performance of the proposed model. We utilize the energy consumption data of five buildings in this dataset to construct predictive models, while the cloud prediction model library is established by these models. These buildings are named as AB, AD, AV, CR and X. Additionally, the operation data of another building named V, which measured energy consumption data from 09/08/2014 to 01/10/2014, is selected as target data. The target data with 1500 data are shown in Fig. 7.

Fig. 7

Target data in Experiment 1.

4.2.2 Transferring model

To select the reused prediction model, the target data are uploaded to the cloud and the DTW method is called for analyzing the similarity between two domain data. The similarity analysis results are descriped in Table 1. From the DTW distance between the target data and the cloud data, it is concluded that the optimum source dataset is AB.

Table 1
The results of similarity analysis in Experiment 1

Dataset AB&V AD&V AV&V CR&V X&V

DTW distance 66.5681 68.6238 66.6805 75.7482 70.0123

Dataset	AB&V	AD&V	AV&V	CR&V	X&V
DTW distance	66.5681	68.6238	66.6805	75.7482	70.0123

In the prediction model library, the model corresponding to the AB dataset has been pre-trained. When pre-training the network, we split this dataset into training, validation and test set in chronological order with 60%, 20%, 20%. We utilize ablation experiments to find the optimal parameters of this model, the optimal parameters of this model are as follows: the number of LSTM units is 64, 10 and 5, the learning rate is adjusted as 10^-6, the optimal dropout rate is 0.3 and the epoch of pre-training is set to 10. Besides, through PACF analysis, y (t - 5) has a larger partial autocorrelation coefficient with y (t). Therefore, the optimal input length is set to 5, and the stride is set to 1. This means that the input is (z_t-5, ⋯ , z_t-1).The prediction result of this model is shown in Fig. 8, where the blue line denotes the ground truth, purple line is the predicted energy consumption. And the RMSE, MAE of the pre-trained model are 4.9198, 3.1378 respectively. It can be deduced that the LSTM prediction model constructed for building AB has been trained well.

Fig. 8

The prediction result of the matching model.

Hereafter, the prediction model corresponding to the selected building AB is downloaded to the edge, and it is fine-tuned according to the steps described in section 3.4. The FCL of the transferred LSTM model is retrained with the 2/3 data of the target set, and the remaining data are adopted to evaluate the transferring model.

4.2.3 Experimental results

In order to reflect that the model reuse method decreases the computational cost of training model at the edge and achieves accurate prediction with a small amount of data, a comparative experiment has been done.

First of all, the same amount of target data are utilized to train the matched LSTM model without the transferring strategy at the edge, the optimal parameters of this model are same as the optimal parameters of model corresponding to the AB dataset. Secondly, different amounts of training data are employed for training model to certify the advantage of reusing model in training time. The prediction results of the comparative experiments are shown in Fig. 9. And the RMSE, MAE and network training time of all experiments are expressed in Table 2. Furthermore, these statistical metrics are visualized as histograms in Fig. 10.

Fig. 9

The prediction results in Experiment 1:(a)the amount of data is 5000,(b)the amount of data is 8000,(c)the amount of data is 10000,(d)the amount of data is 12000.

Table 2

Performance comparison of transfer and non-transfer models in Experiment 1

The number of data	Time(s)		RMSE		MAE
	Transfer	Non-transfer	Transfer	Non-transfer	Transfer	Non-transfer
5000	114	154	8.6536	9.378	2.2970	3.5638
8000	196	264	8.5696	9.2183	1.8214	2.3044
10000	247	380	8.4875	9.3497	2.3310	2.7364
12000	344	451	8.1976	8.9440	2.7926	2.9102
15000	426	576	8.1414	8.8153	1.4854	1.8348

As the Table 2 indicated, in these comparative experiments, the RMSE and MAE of the transferred model are superior than the matched model training directly. What’s more, with the amount of training data increasing, the time of transferring experiment don’t change significantly, but it changes obviously in non-transfer experiment. And from Fig. 9 and Fig. 10, we can infer that the constructed model achieves accurate prediction with fewer amount of data.

Fig. 10

The performance comparisons of Experiment 1: (a) Training time, (b) RMSE, (c) MAE.

4.3 Experiment 2

4.3.1 Dataset

In this experiment, the dataset downloaded from https://openei.org/ collected the energy consumption data of stores in California, USA, which is applied as regular data to test the performance of the given model. And this dataset contains energy consumption data of five stores and one office building in 2010, where the data of five stores named E, F, G, H, I are adopted to implement this experiment. We utilize the energy consumption data of store F from 03/06/2010 to 30/12/2010 to act as the target data, the rest stores’ data are applied to construct prediction model using LSTM network. The target data with 1500 data are shown in Fig. 11.

Fig. 11

Target data in Experiment 2.

4.3.2 Model transferring

In this work, the energy consumption data of store F are uploaded to the cloud and employ DTW method to select the reused prediction model from the cloud model library. The similarity analysis results are displayed in Table 3. From the DTW distance in this table, we can deduce that the optimum source dataset is H.

Table 3
The results of similarity analysis in Experiment 2

Dataset E&F G&F H&F I&F

DTW distance 94.4059 76.4065 67.9923 88.2805

Dataset	E&F	G&F	H&F	I&F
DTW distance	94.4059	76.4065	67.9923	88.2805

The prediction model corresponding to store H has been pre-trained in the cloud. When pre-training the network, the division ratio of dataset, the stride and the epoch of pre-training are same as Experiment 1. Through ablation experiments, the optimal parameters of this model are as follows: the number of LSTM units is 64, 128 and 10, the learning rate is adjusted as 10^-5, the epoch of pre-training is set to 6 and the optimal dropout rate is 0.2. Moreover, the stride is set to 1, the optimal input length is adjusted as 10, which is obtained by the PACF analysis. This means that the input is (z_t-10, ⋯ , z_t-1). Through the above configuration, the prediction model has excellent performance whose RMSE and MAE are 21.7861 and 14.6331, respectively. Therefore, it can be concluded that the prediction model has better capability for feature extraction.

4.3.3 Experiment results

To accomplish model reuse, the prediction model corresponding to store H is downloaded to the edge, and it is fine-tuned according to the steps described in section 2.4. When training and testing the transferred model, the energy consumption data of store F are applied. And then, a comparative experiment is carried out whose experimental strategy is the same as Experiment 1, the optimal parameters of comparison model are same as the optimal parameters of model corresponding to store H. In the five comparative experiments, prediction results are shown in Fig. 12, and the MAE, RMSE and training time of all experiments are shown in Table 4. In addition, these metrics are visualized as histograms in Fig. 13.

Table 4
Performance comparison of transfer and non-transfer models in Experiment 2

The number of data Time(s) RMSE MAE

Transfer Non-transfer Transfer Non-transfer Transfer Non-transfer

5000 234 284 21.2916 37.2795 11.1351 26.8192

8000 365 450 23.1082 35.7762 12.6912 19.1928

10000 451 583 20.6505 35.5352 9.4067 18.6440

12000 554 722 20.7064 36.2248 9.9005 22.3827

15000 714 928 20.6831 34.7586 9.1448 20.0502

The number of data	Time(s)	RMSE	MAE
5000	234	284	21.2916	37.2795	11.1351	26.8192
8000	365	450	23.1082	35.7762	12.6912	19.1928
10000	451	583	20.6505	35.5352	9.4067	18.6440
12000	554	722	20.7064	36.2248	9.9005	22.3827
15000	714	928	20.6831	34.7586	9.1448	20.0502

As Fig. 12 demonstrated, in terms of samples with large fluctuations, the method of model reuse still accurately predicts the energy consumption data with a small amount of data. From Table 4 and Fig. 13, we conclude that the RMSE and MAE of the transferred model are also superior than the matched model training directly. Fig. 13 and Table 4 also reveal that with the amount of data increasing, the superiority of fine-tuning experiment in training time is becoming more and more obvious. Therefore, the superiority of the given prediction scheme in reducing the computational cost of network training and reliance on the amount of data is once again confirmed.

Fig. 12

The prediction results in Experiment 2:(a)the amount of data is 5000,(b)the amount of data is 8000,(c)the amount of data is 10000,(d)the amount of data is 12000.

Fig. 13

The performance comparisons of Experiment 2: (a) Training Time, (b) RMSE, (c) MAE.

4.4 Analysis

Fig. 9 and Fig. 12 indicate that the model reuse scheme proposed in this paper effectively track the trend of BEC. In the two experiments, it is deduced that the proposed method can be applied to extract the characteristics of irregular and regular energy consumption data with a small number of samples.

As the prediction performance shown in Table 2, Table 4, we distinctly observed that the transferred scheme retains the optimum prediction performance in the two scenarios. In Experiment 1, the RMSE of model transferring method has decreased by 7.72%, 7.03%, 9.22%, 8.35%, 7.64% respectively, and the MAE has dropped by 35.55%, 20.96%, 14.82%, 4.04%, 19.04%, separately. Furthermore, the RMSE of model transferring method in the second experiment has improved by 42.89%, 35.41%, 41.89%, 42.84%, 40.49% separately, and the MAE has declined by 58.48%, 33.87%, 49.55%, 55.46%, 54.39% respectively. When the amount of training data is 10000 in Experiment 1, compared with an experiment using 8000 data for training, the non-transferring model has poor performance. This phenomenon may be caused by the scattered data characteristics and insufficient model training.

As Table 2, Table 4, Fig. 10 and Fig. 13 demonstrate, in the two experiments, the proposed approach still has remarkable advantage in time for network training, which productively decreases the computational cost.

Through the above analysis, it can be confirmed that the cloud-edge collaboration based transferring prediction method for building energy consumption not only realizes the reuse of existing models, but also decline the reliance on the amount of data.

5 Conclusion

To decrease the computational cost of network training at the edge, while solving the problem of extract the energy consumption characteristics of buildings with a small amount of samples. This paper combines cloud-edge collaboration with transfer learning to realize the reuse of cloud prediction model at the edge. The experimental results reveal that, compared with the non-transferred LSTM algorithm, the cloud-edge collaboration based transferring prediction method consumes smaller computational cost in the two experiments. In the first experiment, the training time of the proposed method is economized by 25.97%, 25.76%, 35.00%, 23.73%, 26.04% respectively. The training time of this method is decreased by 17.61%, 18.89%, 22.64%, 23.27%, 23.06% separately in another experiment. We also analyze the applicability of the designed model in data. The experimental results indicate that the method realizes models reuse with a smaller number of samples for irregular data and regular data. In addition, compared with the non-transferred algorithm, the proposed model has better prediction performance on each type of data in experiments with different numbers of samples.

Although our method realizes the reuse of existing models and reduces the computational cost of model training, it still has certain limitations. For instance, the similarity of the two-domain data has a significant impact on the predictive performance of the model, when the similarity between the two-domain data is lower, the method’s predictive performance may be poor. In future researches, we will consider how to transfer knowledge between energy consumption data with environmental factors. Secondly, in order to better analyze the transfer ability between domains and further optimize the performance of the prediction model, the strategy which considers the similarity metric such as Maximum mean discrepancy (MMD) are calculated by the network self-training will be studied.

Footnotes

Acknowledgment

This work is partly supported by the National Natural Science Foundation of China (62076150, 62003191, 61903226), the Taishan Scholar Project of Shandong Province (TSQN201812092), the Key Research and Development Program of Shandong Province (2019JZZY010115, 2019JZZY010120), the Youth Innovation Technology Project of Higher School in Shandong Province (2019KJN005) and the Natural Science Foundation of Shandong Province (ZR2020QF072).

References

Qin

, Li

and Du

, Red tide time series forecasting by combining arima and deep belief network, Knowledge-Based Systems 125 (2017), 39–52.

Alberg

and Last

, Short-term load forecasting in smart meters with sliding window-based arima algorithms, Vietnam Journal of Computer Science 5(3) (2018), 241–249.

, Ding

, Zhao

, Yi

and Zhang

, Building energy consumption prediction: An extreme deep learning approach, Energies 10(10) (2017), 1525.

Zhang

, Deb

, Lee

S.E.

, Yang

and Shah

K.W.

, Time series forecasting for building energy consumption using weighted support vector regression with differential evolution optimization technique, Energy and Buildings 126 (2016), 94–103.

Song

, Niu

, Qiu

, Xiao

and Ma

, Improved short-term load forecasting based on eemd, guassian disturbance firefly algorithm and support vector machine, Journal of Intelligent & Fuzzy Systems 31(3) (2016), 1709–1719.

Pham

A.D.

, Ngo

N.T.

, Truong

T.T.H.

, Huynh

N.T.

and Truong

N.S.

, Predicting energy consumption in multiple buildings using machine learning for improving energy efficiency and sustainability, Journal of Cleaner Production 260 (2020), 121082.

Chellamani

G.K.

, Firdouse Ali Khan

and Chandramani

P.V.

, Supervised electricity tariff prediction using random forest validated through user comfort and constraint for a home energy management scheme, Journal of Intelligent & Fuzzy Systems (Preprint) 1–13.

Chen

, Ren

T.T.

and Wu

Z.C.

, Research on neural network optimization algorithm for building energy consumption prediction, Journal of Computational Methods in Sciences and Engineering 18(3) (2018), 695–707.

Balaji

A.J.

, Harish Ram

and Nair

B.B.

, A deep learning approach to electric energy consumption modeling, Journal of Intelligent & Fuzzy Systems 36(5) (2019), 4049–4055.

10.

Liu

, Energy consumption simulation of green building based on bim system and improved neural network, Journal of Intelligent & Fuzzy Systems (Preprint) 1–12.

11.

Tkachenko

and Izonin

, Model and principles for the implementation of neural-like structures based on geometric data transformations, in: International Conference on Computer Science, Engineering and Education Applications, Springer, (2018), 578–587.

12.

Izonin

, Tkachenko

, Kryvinska

, Tkachenko

, et al., Multiple linear regression based on coefficients identification using non-iterative sgtm neural-like structure, in: International Work-Conference on Artificial Neural Networks, Springer, (2019), 467–479.

13.

Fan

, Xiao

and Zhao

, A short-term building cooling load prediction method using deep learning algorithms, Applied Energy 195 (2017), 222–233.

14.

Kim

T.Y.

and Cho

S.B.

, Predicting residential energy consumption using cnn-lstm neural networks, Energy 182 (2019), 72–81.

15.

Ren

, Yu

, He

and Li

G.Y.

, Collaborative cloud and edge computing for latency minimization, IEEE Transactions on Vehicular Technology 68(5) (2019), 5031–5044.

16.

Shi

, Cao

, Zhang

, Li

and Xu

, Edge computing: Vision and challenges, IEEE Internet of Things Journal 3(5) (2016), 637–646.

17.

Shen

, Dou

, Long

, Li

, Zhou

and Chen

, Cloud-edge cooperative dispatching methodfor distribution networks considering photovoltaic generation uncertainty, Journal of Modern Power Systems and Clean Energy.

18.

Jararweh

, Otoum

and Al Ridhawi

, Trustworthy and sustainable smart city services at the edge, Sustainable Cities and Society 62 (2020), 102394.

19.

Gang

, Liu

, Tong

and Zhou

, Non-invasive power load monitoring method based on cloud edge collaboration, in: IOP Conference Series: Earth and Environmental Science, Vol. 512, IOP Publishing, (2020), 012115.

20.

Pan

, Huang

, Gong

and Yuan

, Few-shot transfer learning for text classification with lightweight word embedding based models, IEEE Access 7 (2019), 53296–53304.

21.

Kolar

, Chen

and Luo

, Transfer learning and deep convolutional neural networks for safety guardrail detection in 2d images, Automation in Construction 89 (2018), 58–70.

22.

Bawa

V.S.

and Kumar

, Emotional sentiment analysis for a group of people based on transfer learning with a multi-modal system, Neural Computing and Applications 31(12) (2019), 9061–9072.

23.

Byra

, Wu

, Zhang

, Jang

, Ma

Y.-J.

, Chang

E.Y.

, Shah

and Du

, Knee menisci segmentation and relaxometry of 3d ultrashort echo time cones mr imaging using attention u-net with transfer learning, Magnetic Resonance in Medicine 83(3) (2020), 1109–1122.

24.

Mao

, Ding

, Tian

and Liang

, Online detection for bearing incipient fault based on deep transfer learning, Measurement 152 (2020), 107278.

25.

Fan

, Sun

, Xiao

, Ma

, Lee

, Wang

and Tseng

Y.C.

, Statistical investigations of transfer learning-based methodology for short-term building energy predictions, Applied Energy 262 (2020), 114499.

26.

Ribeiro

, Grolinger

, ElYamany

H.F.

, Higashino

W.A.

and Capretz

M.A.

, Transfer learning with seasonal and trend adjustment for cross-building energy forecasting, Energy and Buildings 165 (2018), 352–363.

27.

Fang

, Gong

, Li

, Chun

, Li

and Peng

, A hybrid deep transfer learning strategy for short term cross-building energy prediction, Energy 215 (2021), 119208.

28.

Zhuang

, Qi

, Duan

, Xi

, Zhu

, Xiong

and He

, A comprehensive survey on transfer learning, Proceedings of the IEEE 109(1) (2020), 43–76.

29.

Wang

S.-J.

, Li

B.-J.

, Liu

Y.-J.

, Yan

W.-J.

, Ou

, Huang

, Xu

and Fu

, Micro-expression recognition with small sample size by transferring long-term convolutional neural network, Neurocomputing 312 (2018), 251–262.

30.

Saha

, Gupta

, Phung

and Venkatesh

, Multiple task transfer learning with small sample sizes, Knowledge and Information Systems 46(2) (2016), 315–342.

31.

Yang

, Shi

, Chen

and Lin

, Deep convolution neural network-based transfer learning method for civil infrastructure crack detection, Automation in Construction 116 (2020), 103199.

32.

, Zhang

and Zhou

, Transfer learning for short-term wind speed prediction with deep neural networks, Renewable Energy 85 (2016), 83–95.

33.

, Liang

, Chen

, Lin

, Cross-domain semantic segmentation via domain-invariant interactive relation transfer, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), 4334–4343.

34.

, Wang

, Shi

, Hou

and Liu

, Adaptive batch normalization for practical domain adaptation, Pattern Recognition 80 (2018), 109–117.

35.

Long

, Cao

, Wang

and Jordan

M.I.

, Transferable representation learning with deep adaptation networks, IEEE transactions on pattern analysis and machine intelligence 41(12) (2018), 3071–3085.

36.

Ihueze

C.C.

and Onwurah

U.O.

, Road traffic accidents prediction modelling: An analysis of anambra state, nigeria, Accident Analysis & Prevention 112 (2018), 21–29.

37.

Song

, Liu

, Xue

, Wang

, Zhang

, Wang

, Jiang

and Cheng

, Time-series well performance prediction based on long short-term memory (lstm) neural network model, Journal of Petroleum Science and Engineering 186 (2020), 106682.

38.

Jais

I.K.M.

, Ismail

A.R.

and Nisa

S.Q.

, Adam optimization algorithm for wide and deep neural network, Knowledge Engineering and Data Science 2(1) (2019), 41–46.

39.

Hou

, Pan

, Peng

and He

, A new method to analyze protein sequence similarity using dynamic time warping, Genomics 109(2) (2017), 123–130.

Cloud-edge collaboration based transferring prediction of building energy consumption

Abstract

Keywords

1 Introduction

2 Theoretical background

3.1 Architecture of the proposed model

4.1 Performance metrics

4.2.1 Dataset

Table 1 The results of similarity analysis in Experiment 1 Dataset AB&V AD&V AV&V CR&V X&V DTW distance 66.5681 68.6238 66.6805 75.7482 70.0123

4.3.1 Dataset

Table 3 The results of similarity analysis in Experiment 2 Dataset E&F G&F H&F I&F DTW distance 94.4059 76.4065 67.9923 88.2805

5 Conclusion

Footnotes

Acknowledgment

References

Table 1
The results of similarity analysis in Experiment 1

Dataset AB&V AD&V AV&V CR&V X&V

DTW distance 66.5681 68.6238 66.6805 75.7482 70.0123

Table 3
The results of similarity analysis in Experiment 2

Dataset E&F G&F H&F I&F

DTW distance 94.4059 76.4065 67.9923 88.2805