Abstract
Building energy consumption (BEC) prediction often requires constructing a corresponding model for each building based historical data. However, the constructed model for one building is difficult to be reused in other buildings. Recent approaches have shown that cloud-edge collaboration architecture is promising in realizing model reuse. How to complete the reuse of cloud energy consumption prediction models at the edge and reduce the computational cost of the model training is one of the key issues that need to be solved. To handle the above problems, a cloud-edge collaboration based transferring prediction method for BEC is proposed in this paper. Specifically, a model library stored prediction models for different types of buildings is constructed based the historical energy consumption data and the long short-term memory (LSTM) network in the cloud firstly; then, the similarity measurement strategies of time series with different granularity are given, and the model to be transferred from the model library is matched by analyzing the similarity between observation data uploaded to the cloud and the historical data collected in the cloud; finally, the fine-tuning strategy of the matching prediction model is given, and this model is fine-tuned at the edge to achieve its reuse in concrete application scenarios. Experiments on practical datasets reveal that compared with the prediction model which doesn’t utilize the transfer strategy, the proposed prediction model has better performance according to MAE and RMSE. Experimental results also confirm that the proposed method effectively reduces the computational cost of the network training at the edge.
Keywords
Introduction
Building energy consumption prediction is one of the key tasks of smart building data analysis, and it plays a considerable role in building energy conservation and refined energy management. Various methods for BEC prediction have been designed in recent years. Among which, data-driven methods are widely utilized in this field. These data-driven methods mainly include statistical means [1, 2] artificial neural networks [3], support vector machines [4, 5], random forests [6, 7] and deep learning [8–10]. Lately, non-iterative methods have provided a novel modeling method for realizing the prediction tasks. A fast non-iterative information modeling scheme based a new geometric transformation is designed [11]. Izonin et al. [12] proposed a new solution for multiple linear regression problems with linear non-iterative calculations, which ensures the speed of identifying polynomials. Nevertheless, related researches show that due to the powerful feature extraction and approximation capabilities, deep learning methods perform optimum in building energy consumption prediction [13, 14]. However, the training of the deep prediction model requires abundant historical data, and the computational cost of model training is large. On the other hand, the above-mentioned methods for predicting energy consumption establish a new model for each building, and it is difficult to reuse the constructed model.
The emergence of edge computing makes it possible to meet the requirements of real-time data action in some areas [15] and reduce the computational cost of model training. Edge computing is composed of central cloud, edge cloud and terminals. The central cloud is responsible for analyzing large-scale data and optimizing the model training, the edge cloud is close to the terminals, it is responsible for analyzing and calculating part of the data or real-time data [16]. But the computing power at the edge is susceptible to memory constraints. To conduct data analysis more rationally, the cloud-edge collaborative architecture came into being, and it has achieved some successful applications in the construction field. Shen et al. [17] proposed a cloud-edge collaborative architecture with robust optimization model, which alleviates the cost of modeling and the calculation of large-scale distribution systems. Jararweh et al. [18] introduced a smart city service scheme based cloud-edge collaborative architecture to improve the work efficiency of smart city applications. In addition, Gang et al. [19] devised a cloud-edge collaborative scheme for non-intrusive load monitoring(NILM), experimental results revealed that this strategy effectively improved the traditional NILM models’ reliability. Accordingly, the cloud-edge collaboration technology provides technical support for alleviating the computational cost of building data analysis, and it is promising for building energy management.
Whereas, the problem of reusing cloud prediction model at the edge and reducing the training cost of the edge model need to be solved. Transfer learning provides an effective way to break this bottleneck. It migrates the knowledge gained from the related situation to the task of another. At present, transfer learning has been successfully applied in sentiment analysis, text classification, computer vision, medicine and other fields. Zhuang et al. [20] resorted the matrix factorization technology to solve the problem of cross-domain text classification. A safety guardrail detection model with convolutional neural network and transfer learning was given by Kolar et al. [21], experimental results demonstrated that the proposed approach’s detection accuracy reaches 96.5%. In order to predict the people’s emotions in various environments, Bawa et al. [22] designed a joint distribution program, which used LSTM network to extract global and local features.It diagnosed knee osteoarthritis using transfer learning [23]. Mao et al. [24] proposed an on-line detection strategy for early fault detection using transfer learning, and this strategy extracts deep features by fine-tuning the original support vector machine detection model. What’s more, transfer learning has also been initially devoted to intelligent buildings field. For instant, a building energy consumption prediction model transferred samples and models was provided, which has achieved deep model training with fewer data, and it was verified that transfer learning can be used for BEC prediction through statistical evaluation metrics [25]. In [26], a transferring prediction approach based seasonality and trend of time series for cross-building energy consumption was raised. In this method, the combined data of similar buildings were utilized to forecast the energy consumption with multiple characteristics. Besides, Fang et al. [27] presented a short-term BEC prediction framework using LSTM network and domain adaptive neural network, and this scheme transferred the knowledge obtained from relevant building data to support target data prediction for handling the problem of target data shortage. Whereas, in the field of building energy analysis, there is few research on the combination of transfer learning and cloud-edge collaborative architecture. Combining them can effectively achieve the reuse of the cloud models at the edge, decrease reliance on the large number of training samples and reduce training cost at the edge.
In this paper, a cloud-edge collaboration based transferring prediction method for BEC is proposed. The main contributions of the paper are: (1) in the cloud, the energy consumption prediction models for different types of buildings are constructed based the LSTM network. (2) a similarity measurement strategy for time series with different granularities is presented, through this approach, the similarity analysis between the observation data uploaded at the edge and the historical data in the cloud is realized, which provides support for the matching transfer model. (3) the fine-tuning strategy of the energy consumption prediction model that will be reused at the edge is given, and the strategy is verified by two application scenarios.
The rest of the paper is organized as follows. The theoretical description of transfer learning is provided in Section 2. Section 3 introduces the architecture of cloud-edge collaboration based transferring prediction method in details. In Section 4, we discuss the performance of the proposed model as regards predicting two practical datasets, after which a comparison is made with the LSTM model which don’t utilize the transferred approach. Finally, we conclude in Section 5.
Theoretical background
Transfer learning, raised by Professor Yang qiang in 2005, is an important branch in the field of machine learning. It is a learning process that transfers knowledge learned in the source domain to complete the task in the target domain [28], as shown in Fig. 1. Transfer learning involves two definitions: domain and task, in which domain is divided into source domain and target domain. Source domain refers to the domain with enough samples, denoted as

Schematic of transfer learning.
Transfer learning mainly includes four learning schemes: sample transfer, feature transfer, model transfer and relationship transfer. Sample transfer is to set weight generation rules for the available data in the source domain, which increase the amount of data in the target domain [29, 30], as shown in (a) in Fig. 2. And the main purpose of the feature transfer is to transform the data of two domains into a unified feature space [31, 32] through feature transformation, the process is shown in (b) in Fig. 2. Moreover, the realization of model transfer is based the parameters of the model can be shared by the two domains, as shown in (c) in Fig. 2. The relationship transfer aims to study the relationship among samples in different fields [33], and its principle is shown in (d) in Fig. 2.

Transfer learning methods: (a) sample transfer, (b) feature transfer, (c) model transfer, (d) relationship transfer
Due to the outstanding performance of deep learning models in various fields, model transfer utilizing these networks has been widely concerned [34, 35]. Fine-tuning is one kind of model transfer methods, its main purpose is to migrate part or all feature extraction layers of the trained network, while retraining the non-transferred part of reused network. In this study, this method is applied to realize model reuse.
Architecture of the proposed model
The architecture of cloud-edge collaboration based transferring prediction method is shown in Fig. 3, it mainly includes three parts: constructing a model library in the cloud, analyzing the similarity of sequences and realizing model reuse at the edge, as detailed below.

The architecture of cloud-edge collaboration based transferring prediction for building energy consumption.
(I) Energy consumption prediction models are constructed for different buildings using historical energy datasets in the cloud, and then the prediction model library is built.
(II) Target data collected at the edge are uploaded to the cloud, the similarity between the cloud data and the target data is analyzed to match the transferred model.
(III) Prediction model corresponding to the data of source domain is downloaded to edge, and then the learned parameters of LSTM layer is reused; besides, the data at the edge are adopted to train a new fully connected layer (FCL) to ensure the performance of the transferred model.
In the following sections, the implementation process of the above steps is given in detail.
In general, we assume that y (t), which is the output value of time t, is decided by the input values. In this study, the magnitude of the input variable for energy consumption prediction is determined by the partial autocorrelation function (PACF) [36]. This means that the y (t - p) which has a larger partial autocorrelation coefficient with y (t) is obtained. And then, the m energy consumption data before time t which is y (t - m) , y (t - m + 1) , ⋯ y (t - 1) are employed to forecast the value of y (t). Therefore, the form of the energy consumption prediction model is defined as:
To train and test the LSTM prediction model, the energy consumption sequence requires to be processed into the shape of input-output data pairs, the data pair is descripted by Eq.2.
To construct energy consumption prediction model, the network structure which is composed of three LSTM layers and a FCL is considered. Generally, the state unit and the three gate units are considerable components to realize LSTM layers’ function [37], and the internal structure of LSTM is shown in Fig. 4. The above input-output data pairs are input into the network. And then, forgetting gate F
t
selects the value in the input vector

The structure of LSTM
Subsequently,
Afterwards, the output door O
t
selects the value of the current state unit that is employed as the output, and O
t
is computed as follow:
When training LSTM neural network, the gates’ values, memory cells and hidden states are updated cyclically, and the weight matrix and bias matrix of each process are initialized randomly. In this paper, MSE is used as the loss function to evaluate network performance, the computation of MSE is as below:
Furthermore, the LSTM network is pre-trained with energy consumption data of different buildings to establish prediction models, and these models are preserved to the cloud model library to increase the number of available models.
Similarity analysis draws the degree of similarity and difference between domains, when analyzing the similarity of the data with different dimensions or translation trend, the normally used measurement strategies, such as Euclidean distance, Cosine similarity, Manhattan distance, can not analyze the similarity of the above data well. This problem also exists in the application of BEC prediction. Considered that the acquisition frequency of target data and the data in the cloud may be different, this paper adopts the dynamic time warping (DTW) method [39] to analyze the data similarity.
Assume that the two energy consumption sequences to be matched are
Initially, the distance among the corresponding elements of the two sequences is calculated by the Euclidean distance, while employing these distances to construct a distance matrix with m × n dimension. Euclidean distance is computed as:
Since the correspondence of the sequence elements is nonlinear, there are exponential lines in the warped path, we need to seek a optimally warped path. The optimal path is found according to the following steps. Firstly, specify that the warped path starts from the lower left corner of the matrix and accumulates to its upper right position; secondly, in this process, the stride is set to 1, this means that it only aligns with the point adjacent to itself; next, the warped path is defined to proceed along the right side of the previous point; besides, when going from point (i - 1, j) to point (i, j) in the distance matrix, it is supposed that the distance of the path in the horizontal or vertical direction is d (i, j), and the diagonal direction’s distance is 2d (i, j). These above steps are shown in Fig. 5, which shows three hypothetically warped paths, and the arrows in the image represent the direction of the next step.

Seeking the optimal warping path.
Eventually, the path with the smallest cumulative distance is selected as the optimal path, and the total cumulative distance is applied to analysis the similarity of the two sequences, which is inversely proportional to the total calculated distance. Therefore, the similarity of the two sequences is calculated as shown in the following equation:
Through the above steps, the model that is reused to predict target data at the edge is selected from the model library and its corresponding dataset is the optimum source data.
Considering that target data and source data have similarity, the parameters extracted from prediction model can be shared. Nevertheless, the two datasets are not exactly same, the matched model can not be utilized directly. To ensure the prediction performance of the model, we ought to fine-tune the matched model. The fine-tuning strategy given in this study is displayed in Fig. 6. It consists of data processing, model transfer and fine-tuning.

The scheme of model migration and fine-tuning.
We utilize the Min-Max Normalization method to standardize the data collected at the edge primarily. Its principle is to convert each initial value in data
In order to find the model to be matched, the DTW analysis method is applied to select the source dataset with the smallest distance from the target data. Then, the corresponding prediction model which will be reused is selected and download it to the edge.
In addition, the model downloaded to the edge needs to be fine-tuned. The fine-tuning strategy provided by this paper is that the parameters of LSTM layer adopts the value learned by pre-training, and the LSTM layer is connected with a new FCL, which is retrained with the training data in target data to reduce the training burden on the edge.
This section verifies the cloud-edge collaboration based transferring prediction method for building energy with two practical application datasets, and compares it with the LSTM model employed non-transfer scheme.
We executed these experiments using Python 3.5 on an Intel(R) Core(TM)i7-7700 CPU 3.60GHz system with 8GB RAM. In addition, Keras2.1 was selected as the deep learning framework for model training.
Performance metrics
In this work, root mean square error (RMSE), mean absolute error (MAE) and training time are selected as performance metrics to evaluate the performance of the devised model. The smaller their values are, the better the model prediction performance is.
To prove the proposed scheme can decrease the computatinal cost of model training at the edge, training time of LSTM model at the edge is mainly premeditated, which includes the time of training the LSTM model without transferring and the training time of the model after fine-tuning.
Dataset
The first dataset is downloaded from https://trynthink.github.io/buildingsdatasets/, which collected the energy consumption data of Oak Ridge National Laboratory in the United States from 2013 to 2014, and it is adopted as irregular data to test the performance of the proposed model. We utilize the energy consumption data of five buildings in this dataset to construct predictive models, while the cloud prediction model library is established by these models. These buildings are named as AB, AD, AV, CR and X. Additionally, the operation data of another building named V, which measured energy consumption data from 09/08/2014 to 01/10/2014, is selected as target data. The target data with 1500 data are shown in Fig. 7.

Target data in Experiment 1.
To select the reused prediction model, the target data are uploaded to the cloud and the DTW method is called for analyzing the similarity between two domain data. The similarity analysis results are descriped in Table 1. From the DTW distance between the target data and the cloud data, it is concluded that the optimum source dataset is AB.
The results of similarity analysis in Experiment 1
The results of similarity analysis in Experiment 1
In the prediction model library, the model corresponding to the AB dataset has been pre-trained. When pre-training the network, we split this dataset into training, validation and test set in chronological order with 60%, 20%, 20%. We utilize ablation experiments to find the optimal parameters of this model, the optimal parameters of this model are as follows: the number of LSTM units is 64, 10 and 5, the learning rate is adjusted as 10-6, the optimal dropout rate is 0.3 and the epoch of pre-training is set to 10. Besides, through PACF analysis, y (t - 5) has a larger partial autocorrelation coefficient with y (t). Therefore, the optimal input length is set to 5, and the stride is set to 1. This means that the input is (zt-5, ⋯ , zt-1).The prediction result of this model is shown in Fig. 8, where the blue line denotes the ground truth, purple line is the predicted energy consumption. And the RMSE, MAE of the pre-trained model are 4.9198, 3.1378 respectively. It can be deduced that the LSTM prediction model constructed for building AB has been trained well.

The prediction result of the matching model.
Hereafter, the prediction model corresponding to the selected building AB is downloaded to the edge, and it is fine-tuned according to the steps described in section 3.4. The FCL of the transferred LSTM model is retrained with the 2/3 data of the target set, and the remaining data are adopted to evaluate the transferring model.
In order to reflect that the model reuse method decreases the computational cost of training model at the edge and achieves accurate prediction with a small amount of data, a comparative experiment has been done.
First of all, the same amount of target data are utilized to train the matched LSTM model without the transferring strategy at the edge, the optimal parameters of this model are same as the optimal parameters of model corresponding to the AB dataset. Secondly, different amounts of training data are employed for training model to certify the advantage of reusing model in training time. The prediction results of the comparative experiments are shown in Fig. 9. And the RMSE, MAE and network training time of all experiments are expressed in Table 2. Furthermore, these statistical metrics are visualized as histograms in Fig. 10.

The prediction results in Experiment 1:(a)the amount of data is 5000,(b)the amount of data is 8000,(c)the amount of data is 10000,(d)the amount of data is 12000.
Performance comparison of transfer and non-transfer models in Experiment 1
As the Table 2 indicated, in these comparative experiments, the RMSE and MAE of the transferred model are superior than the matched model training directly. What’s more, with the amount of training data increasing, the time of transferring experiment don’t change significantly, but it changes obviously in non-transfer experiment. And from Fig. 9 and Fig. 10, we can infer that the constructed model achieves accurate prediction with fewer amount of data.

The performance comparisons of Experiment 1: (a) Training time, (b) RMSE, (c) MAE.
Dataset
In this experiment, the dataset downloaded from https://openei.org/ collected the energy consumption data of stores in California, USA, which is applied as regular data to test the performance of the given model. And this dataset contains energy consumption data of five stores and one office building in 2010, where the data of five stores named E, F, G, H, I are adopted to implement this experiment. We utilize the energy consumption data of store F from 03/06/2010 to 30/12/2010 to act as the target data, the rest stores’ data are applied to construct prediction model using LSTM network. The target data with 1500 data are shown in Fig. 11.

Target data in Experiment 2.
In this work, the energy consumption data of store F are uploaded to the cloud and employ DTW method to select the reused prediction model from the cloud model library. The similarity analysis results are displayed in Table 3. From the DTW distance in this table, we can deduce that the optimum source dataset is H.
The results of similarity analysis in Experiment 2
The results of similarity analysis in Experiment 2
The prediction model corresponding to store H has been pre-trained in the cloud. When pre-training the network, the division ratio of dataset, the stride and the epoch of pre-training are same as Experiment 1. Through ablation experiments, the optimal parameters of this model are as follows: the number of LSTM units is 64, 128 and 10, the learning rate is adjusted as 10-5, the epoch of pre-training is set to 6 and the optimal dropout rate is 0.2. Moreover, the stride is set to 1, the optimal input length is adjusted as 10, which is obtained by the PACF analysis. This means that the input is (zt-10, ⋯ , zt-1). Through the above configuration, the prediction model has excellent performance whose RMSE and MAE are 21.7861 and 14.6331, respectively. Therefore, it can be concluded that the prediction model has better capability for feature extraction.
To accomplish model reuse, the prediction model corresponding to store H is downloaded to the edge, and it is fine-tuned according to the steps described in section 2.4. When training and testing the transferred model, the energy consumption data of store F are applied. And then, a comparative experiment is carried out whose experimental strategy is the same as Experiment 1, the optimal parameters of comparison model are same as the optimal parameters of model corresponding to store H. In the five comparative experiments, prediction results are shown in Fig. 12, and the MAE, RMSE and training time of all experiments are shown in Table 4. In addition, these metrics are visualized as histograms in Fig. 13.
Performance comparison of transfer and non-transfer models in Experiment 2
Performance comparison of transfer and non-transfer models in Experiment 2
As Fig. 12 demonstrated, in terms of samples with large fluctuations, the method of model reuse still accurately predicts the energy consumption data with a small amount of data. From Table 4 and Fig. 13, we conclude that the RMSE and MAE of the transferred model are also superior than the matched model training directly. Fig. 13 and Table 4 also reveal that with the amount of data increasing, the superiority of fine-tuning experiment in training time is becoming more and more obvious. Therefore, the superiority of the given prediction scheme in reducing the computational cost of network training and reliance on the amount of data is once again confirmed.

The prediction results in Experiment 2:(a)the amount of data is 5000,(b)the amount of data is 8000,(c)the amount of data is 10000,(d)the amount of data is 12000.

The performance comparisons of Experiment 2: (a) Training Time, (b) RMSE, (c) MAE.
Fig. 9 and Fig. 12 indicate that the model reuse scheme proposed in this paper effectively track the trend of BEC. In the two experiments, it is deduced that the proposed method can be applied to extract the characteristics of irregular and regular energy consumption data with a small number of samples.
As the prediction performance shown in Table 2, Table 4, we distinctly observed that the transferred scheme retains the optimum prediction performance in the two scenarios. In Experiment 1, the RMSE of model transferring method has decreased by 7.72%, 7.03%, 9.22%, 8.35%, 7.64% respectively, and the MAE has dropped by 35.55%, 20.96%, 14.82%, 4.04%, 19.04%, separately. Furthermore, the RMSE of model transferring method in the second experiment has improved by 42.89%, 35.41%, 41.89%, 42.84%, 40.49% separately, and the MAE has declined by 58.48%, 33.87%, 49.55%, 55.46%, 54.39% respectively. When the amount of training data is 10000 in Experiment 1, compared with an experiment using 8000 data for training, the non-transferring model has poor performance. This phenomenon may be caused by the scattered data characteristics and insufficient model training.
As Table 2, Table 4, Fig. 10 and Fig. 13 demonstrate, in the two experiments, the proposed approach still has remarkable advantage in time for network training, which productively decreases the computational cost.
Through the above analysis, it can be confirmed that the cloud-edge collaboration based transferring prediction method for building energy consumption not only realizes the reuse of existing models, but also decline the reliance on the amount of data.
Conclusion
To decrease the computational cost of network training at the edge, while solving the problem of extract the energy consumption characteristics of buildings with a small amount of samples. This paper combines cloud-edge collaboration with transfer learning to realize the reuse of cloud prediction model at the edge. The experimental results reveal that, compared with the non-transferred LSTM algorithm, the cloud-edge collaboration based transferring prediction method consumes smaller computational cost in the two experiments. In the first experiment, the training time of the proposed method is economized by 25.97%, 25.76%, 35.00%, 23.73%, 26.04% respectively. The training time of this method is decreased by 17.61%, 18.89%, 22.64%, 23.27%, 23.06% separately in another experiment. We also analyze the applicability of the designed model in data. The experimental results indicate that the method realizes models reuse with a smaller number of samples for irregular data and regular data. In addition, compared with the non-transferred algorithm, the proposed model has better prediction performance on each type of data in experiments with different numbers of samples.
Although our method realizes the reuse of existing models and reduces the computational cost of model training, it still has certain limitations. For instance, the similarity of the two-domain data has a significant impact on the predictive performance of the model, when the similarity between the two-domain data is lower, the method’s predictive performance may be poor. In future researches, we will consider how to transfer knowledge between energy consumption data with environmental factors. Secondly, in order to better analyze the transfer ability between domains and further optimize the performance of the prediction model, the strategy which considers the similarity metric such as Maximum mean discrepancy (MMD) are calculated by the network self-training will be studied.
Footnotes
Acknowledgment
This work is partly supported by the National Natural Science Foundation of China (62076150, 62003191, 61903226), the Taishan Scholar Project of Shandong Province (TSQN201812092), the Key Research and Development Program of Shandong Province (2019JZZY010115, 2019JZZY010120), the Youth Innovation Technology Project of Higher School in Shandong Province (2019KJN005) and the Natural Science Foundation of Shandong Province (ZR2020QF072).
