Abstract
Energy consumption prediction can provide reliable data support for energy scheduling and optimization of office buildings. It is difficult for traditional prediction model to achieve stable accuracy and robustness when energy consumption mode is complex and data sources are diverse. Based on such situation, this paper raised an approach containing the method of comprehensive similar day and ensemble learning. Firstly, the historical data was analyzed and calculated to obtain the similarity degree of meteorological features, time factor and precursor. Next, the entropy weight method was used to calculate comprehensive similar day and applied to the model training. Then the improved sine cosine optimization algorithm (SCA) was applied to the optimization and parameter selection of a single model. Finally, an approach of model selection and integration based on dominance was proposed, which was compared with Support Vector Regression (SVR), Back Propagation Neural Network (BPNN), Long Short-Term Memory (LSTM), with a large office building in Xi ‘an taken as an example to analysis showing that compared with the prediction accuracy, root mean square percentage error (RMSPE) in the ensemble learning model after using comprehensive similar day was reduced by about 0.15 compared with the BP model, and was reduced by about 0.05, 0.06 compared with the SVR and LSTM model. Respectively, the mean absolute percentage error (MAPE) was reduced by 12.02%, 6.51% and 5.28%. Compared with several other integration methods, integration model based on dominance reduced absolute error at all times. Accordingly, the proposed approach can effectively solve problems of low accuracy and poor robustness in traditional model and predict the building energy consumption efficaciously.
Nomenclature
support vector regression back propagation neural network long short-term memory multiple linear regression autoregressive integrated moving average particle swarm optimization genetic algorithm GA combined with BPNN sine cosine optimization algorithm mean absolute error root mean square error mean percentage error coefficient variation of the RMSE root mean square percentage error SVR combined with coefficient of determination signal-to-noise an energy consumption attenuation coefficient corresponding to the week an energy consumption attenuation coefficient corresponding to the date type date type parameters an energy consumption sequence of the day before the prediction an energy consumption sequence of the day before a history day a time factor similarity a precursor factor similarity an evaluation matrix an ith evaluation object under the jth evaluation index an entropy of the jth index an entropy weight of the jth index a particle of the t generation a particle of the largest fitness in the generation a chaos parameter a Logistic chaotic sequence random parameters an energy consumption history day of meteorological vector an energy consumption forecast day of meteorological vector an energy consumption hourly grey correlation degree of Y and X
i
an energy consumption grey relation analysis resolution coefficient an energy consumption hourly similarity of meteorological features an energy consumption attenuation coefficient corresponding to the date a number of iterations an optimal search particle of this generation
Gaussian
gaussian distribution an average dominance matrix an absolute error of the ith model an absolute error of the jth model a dominance degree a dominance degree of the models a weight of ith prediction model a dominance record of the i model at t time a Total advantage of model i a prediction value of the ensemble Learning model a penalty coefficient of SVR a kernel parameter of SVR a test data a forecast value
Introduction
Nowadays, the increasing demand for building energy has been increasingly concerned due to energy depletion and global warming caused by greenhouse gases. Building energy consumption takes for a large part of world energy consumption, accounting for about 40% of global energy consumption [1]. Meanwhile, the proportion of building energy consumption is rising gradually in recent years. With consideration of increasing global energy demand, energy demands management been investigated increasingly by researchers. Energy consumption prediction is one of the crucial components in energy management, bringing the optimal programming and decisioning for energy companies. Meantime, accurate energy prediction also can provide a means to the managers for implementing market research management and speeding economic development [2]. Energy consumption prediction in large-scale buildings can be achieved by analyzing each influencing factor, which is conducive to guiding the reduction of building energy consumption and carbon dioxide emissions [3].
Literature review
At present, the commonly used modeling methods in the field of building energy consumption prediction include statistical model, data-driven model, and hybrid model [4]. Statistical model is a traditional and effective method. The statistical methods usually include Multiple Linear Regression (MLR), Autoregressive Integrated Moving Average model (ARIMA), etc. MLR is different from other machine learning algorithms, which is interpretable for input and output information. Amber [5] obtained an equation for forecasting the electricity consumption of buildings in London South Bank University by MLR. Experimental results showed that the MAPE of the two office buildings was both below 10%. But this accuracy is not good enough, building energy consumption and its influencing factors are highly nonlinear, the performance of MLR is worse than the data-driven model for nonlinear systems [6]. The ARIMA model has also been used to forecast India’s energy demand [7]. With the rapid development of machine learning algorithms, data-driven technology has become an important means of managing and evaluating buildings, as well as a modeling tool for data analysis and energy consumption prediction. Such method could directly establish input and output nonlinear characteristic from historical data, without considering internal mapping [8]. Therefore, more researchers tend to use data-driven methods to do the forecasting. The mostly used data-driven methods mainly include support vector regression (SVR) [9], BP neural network (BPNN) [10], long and short-term memory neural network (LSTM) [11] and other models. Compared with artificial neural network (ANN), SVR is easier to obtain the global optimal solution [12], and this algorithm has high accuracy and good performance on the regression problem of small data sets [13]. Joaquim Massana proposed a simple and computationally small SVR model, and applied it to non-residential buildings at the University of Girona [15]. However, SVR has a strong sensitivity to subtle changes in parameters, he did not get the optimal SVR parameters, the determination of parameters has a decisive influence on the prediction accuracy of SVR [14]. For BPNN, it has strong flexibility and adaptability, which can work out most prediction problems, but it is easy to overfit, and the risk of overfitting increases with the complexity of the model. Moreover, some problems are easy to be seen in BPNN, like falling into local optimal and that prediction accuracy cannot meet the engineering requirements and so on. Hence, hybrid model has been proposed to improve the prediction accuracy which refers to optimize the parameters of the model by optimization algorithm, to increase the prediction accuracy of the model, such as using of Particle Swarm Optimization algorithm (PSO) [16] and Genetic Algorithm (GA) [17] etc. KangjiLi [18] proposed an improved PSO to adjust the weight and threshold of BPNN, and the optimized neural network is more effective than the traditional neural network and GA-BPNN. For LSTM, it is a deep learning model which applies to solve nonlinear regression and time series problems, since the application of LSTM in both text processing and speech recognition has achieved good results [19]. In the field of energy consumption forecasting, data usually exhibits strong periodic characteristics that is the change of fluctuations or oscillations in a time series surrounding a long-term trend, which cannot be ignored. This is not a trend in a single direction, but a pattern that fluctuates alternately over time [20]. Some researchers also apply it to building energy consumption prediction, combining it with differential evolution algorithm to find suitable hyperparameters for LSTM [21]. Data-driven methods have their merits and demerits. Traditional model on prediction of energy consumption will cause convergence problems and low model accuracy when dealing with multiple data sources. And it is difficult for traditional model to fully reflect and mine the information and relationships among the data, which will easily lead to a limited accuracy [22]. The single prediction model performs unstable prediction accuracy that is sometimes high or sometimes low, due to the different characteristics of the single method and the randomness and uncertainty of some factors affecting the system [23]. However, most researchers focus on the research and improvement of a single prediction model, while how to choose an applicable machine learning model has not been settled. It seems that more researchers tend to adopt the model used by predecessors or lifting performance of model. Even there is a small change in the training data of the model, a model with different generalization ability will be produced, and the prediction of the unknown data will produce a large error, which leads to poor model robustness and cannot guarantee the reliability of the model [24]. However, due to the limitation of the models, the problems of prediction accuracy and robustness have not been well resolved when data sources are diverse and energy consumption patterns are complex.
By analyzing the development of machine learning technology and building energy consumption prediction, this paper firstly proposes a method called comprehensive similarity, which can analyze the similarity between the forecast day and history day, then comes out with an integrated prediction approach based on model dominance, in which the grid search and the improved SCA can be used to optimize the single models. Finally, the experiment compared the differences in the forecasting effect among several models and discusses the robustness of the models. The main contributions of this research are as follows: 1) The comprehensive similarity day obtained by connecting similarity and entropy weight method can pick out the historical data which are more similar to the prediction days, so as to train the prediction model and increase the prediction accuracy effectively. 2) The improved SCA algorithm can optimize the parameters of LSTM, which can avoid the complicated tuning process. SVR has high accuracy in the single model, but SVR cannot accurately find out the periodicity and time characteristics of energy consumption data, while LSTM can make up for this deficiency. 3) A method of model selection and integration is proposed, which can avoid the blindness of model selection and can seek more connection among data to achieve stable accuracy and robustness.
Selection of similar days based on entropy
Hourly similarity of meteorological features
There are hourly meteorological characteristic data information in both history and forecast days, which include outdoor dry bulb temperature, outdoor air relative humidity, wind speed and solar radiation, moreover meteorological similarity greatly affects the similarity of history and forecast days. It is assumed that vector X
i
(i = 1, ⋯ , n) as the history day of meteorological feature vector, X
i
= (x
i
(1) , x
i
(2) , ⋯ , x
i
(l)) which x
i
(1) , x
i
(2) , ⋯ , x
i
(l) history day weather characteristic vectors respectively. Y = (y
i
(1) , y
i
(2) , ⋯ , y
i
(l)) is the meteorological characteristics of forecast day. The similarity between vector X
i
and vector Y is calculated according to the grey relational degree, and the formula can be defined as in Equation (1) [25].
Where ξ
i
is the hourly grey correlation degree between Y and X
i
, namely grey correlation degree of the ith meteorological feature; the value of ρ in the range [0,1], and here ρ = 0.5. It could be obtained that the grey correlation degree of each meteorological features between the history and forecast day, which denoted the grey correlation degree of all meteorological features in the history day as follows:
The time factor reveals the “distance” between the history and forecast day in the time dimension. Further there is a “close big and far small” relationship between the history and forecast day. In other words, the closer time interval between the history and forecast day shows the more similar to energy consumption changes of the two days. Therefore, considering the large difference of energy consumption between week and weekend, it should be differentiated from date type. Time factor similarity are computed using Equation (3) [26].
In the formula: δ1,δ2 and δ3 represent the attenuation coefficient corresponding to the date, week, date type respectively and the value range is generally [0.9,0.98], in this paper δ1 = δ2 = δ3 = 0.94, ti is the time interval between the forecast and ith history day. If the forecast and the ith history day belong to the same date type that they are both weekend or weekday, then ɛ i = 0, else if they belong to different date type, then ɛ i = 1. int () and mod () denote the rounding function and the remainder function.
The energy consumption is unknown on the forecast day, therefore it is impossible to calculate directly the similarity of load trends on the history and forecast days; Nevertheless, the previous few days of the forecast and history day consumption are known, so it could be considered to calculate the similarity between the energy consumption trend of the day before the forecast day and history day. The similarity between the two days could be described by Pearson correlation coefficient. Pearson correlation coefficient is defined as the quotient of the covariance and standard deviation between two variables. Pearson correlation coefficient is represented as in Equation (4).
Where X is the energy consumption sequence of the day before the prediction and Y is the energy consumption sequence of the day before a history day. The value of ϒ3 is [-1,1]. The ϒ3 closer to 1 shows the better similarity between the prediction and history day of the precursor.
Entropy is the concept of thermodynamics, which describes the chaotic process of the system. Information entropy references on this concept and quantize the information. For information, the greater possibility indicates the lower information entropy and the smaller possibility, which means the higher information entropy. Therefore, probability could be used to measure the amount of information contained in the event. Suppose there are m evaluation indexes and n objects, the object could be evaluated by evaluation matrix R = [r
ij
] n×m, and the proportion of the ith evaluation object under the jth evaluation index is calculated as [27]:
The probability is between 0 and 1, and the amount of information is between 0 and infinity. Therefore, we could use - ln P
ij
to express the relationship between probability and the information. The entropy of the jth index is defined as:
When P ij = 0, make P ij ln P ij = 0.
After normalization, the entropy weight of the jth index is:
ω j represents the size of information about the jth indicator. Information entropy could be applied to evaluate the information by a certain degree. This objective weight method is easier to ascertain than the subjective method. It also avoids the possibility which is the empirical method is susceptible to subjective factors. The entropy weight method can weight the three similarities calculated from different angles and get a more comprehensive similarity.
BPNN
The ANN is currently one of the most widely used data-driven modeling methods [28]. BPNN is a feed-forward ANN. The signal is transmitted in the forward direction, and the training process uses the error back propagation algorithm. In the prediction of building energy consumption, there is a complex nonlinear mapping relationship between meteorological feature data and building energy consumption. It is difficult to establish an accurate mathematical model. In this single prediction model, BPNN is selected to describe this nonlinear relationship. The training process of BPNN is divided into three main processes: forward transfer, error back propagation and network parameter update. This paper adopts BPNN as the base model of integrated model.
Support vector regression
Support Vector Machine (SVM) is a reliable and effective method to solve nonlinear problems, its advantage is that accurate results can be obtained even when the amount of data is limited. SVM can not only solve classification problems but also make good prediction results for regression problems (SVR). Kernel function of SVM is to map the space represented by the data to a space of higher dimension. Although SVR has a good prediction performance for regression problems. But it is complicated to determine the hyperparameters of SVR, and the inaccuracy of parameters will seriously affect the accuracy of prediction. There are many ways to determine the parameters of SVR, common methods include enumeration, algorithm optimization, grid search and so on. The enumeration method is cumbersome and inaccurate. Although common optimization algorithms are able to search for the global optimal solution or the vicinity of the optimal solution, there is still the possibility of falling into a local optimal. The grid search method divides the parameters that need to be solved into grids within a certain range and searches for the best parameters by traversing all combinations in the grid. Considering fast calculation and high efficiency of SVR but more sensitive to parameters, this paper adopts the grid search method in order to determine the optimal hyperparameters.
Long short-term memory network
Energy consumption is also interdependent and interrelated with time, in addition to the complex nonlinear relationship between consumption and various characteristic factors. This changing trend is decreasing in a single direction, but has a strong periodicity, so the LSTM can explain this special relationship from the time dimension. LSTM is a deep learning method. This is suitable for processing data with time series characteristics, in order to solve the problem of gradient disappearance and gradient explosion in recurrent neural networks. It maps the energy consumption time series data into a series of hidden states to learn internal time dynamics. LSTM can be regarded as the combination of a network at different times. The network contains three gating units: input gates, forget gates and output gates to learn when to forget previous information and how to update them with new data [29]. The uniqueness of the gate in the hidden layer is that it can save relevant data and forget irrelevant data, thereby maintaining a constant error. LSTM has strong generalization ability and has good learning ability for both large and small data volumes. It has the advantage of dealing with nonlinear problems with time characteristics.
Heterogeneity and optimization of model
Sine cosine optimization algorithm and its improvement
The optimization of neural network weights and thresholds and the determination of hyperparameters of LSTM greatly affect the prediction accuracy of the network, and SCA [30] is a new type of optimization algorithm, proposed by the Australian scholar Mirjalili in 2016. The core of the algorithm is to search the global optimal solution alternately searching by sine cosine functions. This algorithm is used to iterate learning rate of LSTM, and the number of neurons in the hidden layer to get the best fitness parameters. The core particle update formula is as follows:
In the formula,
The initialization of the standard SCA is random so that the uneven distribution of particles will affect the result of optimization. Therefore, chaos is a non-periodic motion phenomenon in a nonlinear system. The chaotic sequence has the characteristics of randomness, ergodicity and regularity [31], which is suitable for algorithm initialization. So Logistic chaotic sequence is used to generate so that the ergodicity of chaos is used to increase the diversity of particles without changing the random characteristics of initialization. The mathematical expression of logistic mapping is as follows:
Among them: μ is a chaos parameterμ ∈ [0, 4], the larger the value, the better, and the value of μ is 4 in this research.
The principle of the SCA algorithm is simple and easy to implement, with fast convergence speed and high optimization accuracy. Although the SCA algorithm has various advantages, the change of its parameters has a great influence on the accuracy of optimization. In the SCA algorithm, the parameter r1 controls the ability of algorithm exploration and exploitation. When r1 > 1 the particles are controlled to move away from the optimal solution to strengthen the global search ability, and when r1 < 1 is the particles are controlled to move toward the optimal solution to strengthen the local development ability. The change of parameters r1 in the standard SCA algorithm is linearly decreasing according to the number of iterations, which does not have a good balance between global search and local search capabilities [32]. Most scholars use simple concave or convex functions to improve the r1 curve, which only strengthens the algorithm’s ability to search locally or globally. In order to better balance the capabilities of global and local search, combined the characteristics of the concave function and the convex function are controlled by ω1 and ω2, respectively, and the parameter r1 is dynamically changed according to the number of iterations. The mathematical formula is as (10):
In the optimization problem, the search particles often fall into the local optimum, especially the standard algorithm, resulting in the final optimization result, not the optimal solution [33]. In this paper, the Gaussian disturbance is introduced into the optimal particle of each generation. If the fitness value of the search particle after adding the Gaussian disturbance is larger, the optimal particle will be replaced, the formula is as (11):
Which, g best is the optimal search particle of this generation, Gaussian (μ, σ) is the Gaussian distribution, μ, σ take 0 and 1 respectively.
Ensemble learning models are usually divided into two categories: isomorphic and heterogeneous integration model. The isomorphic learning model splits the original data and recombines them into several categories, which are used as the input of the same learning model with different parameter of models, while the heterogeneous ensemble model uses the same group of training data as the input of several different models for training. The heterogeneous ensemble model combines the merits of different models to improve the performance and increase robustness by complementarity. In this paper, SVR, BPNN and LSTM are selected as the base models of the ensemble model. It is assumed that n kinds of prediction models predict building energy consumption at t time in the verification set as p1, p2, p3, ⋯ , p
n
, then the absolute error e of models is calculated respectively, the formula is e (t) = y (t) - p (t). It is supposed that the absolute error of the ith model is e
i
(t) and the jth model is e
j
(t), here the dominance degree between model M
i
and M
j
is defined as d (t) = e
j
/e
i
and the dominance matrix of each model in each hour could be obtained as follows:
The diagonal elements of the matrix are the dominance degree d
ii
= 1 between two same models, so they are all 1, and d
ij
× d
ji
= 1. The average dominance matrix of the whole prediction time is
It could be concluded that the dominance of the ith model over other models is di1, di2, di3, ⋯ , d
in
The dominance matrix R represents the dominance of the models. The model with a poor dominance is removed from the models to ensure the overall accuracy of the ensemble model. The maximum of the matrix elements is calculated the optimal ith model at time t in accordance with the dominance matrix at time t, then let f
i
(t) = 1 in f (t) = [f1 (t) f2 (t) ⋯ f
i
(t) ⋯ f
n
(t)], and the other elements are 0;
Step 1: Employ n heterogeneous models to predict the building power consumption at time t in the validation set. The predicted values are p1 (t) , p2 (t) , p3 (t) , ⋯ , p n (t), and the actual values are y (t).
Step 2: Calculate the absolute error of model i e (t) = y (t) - p i (t).
Step 3: Define the model dominance d (t) = e j /e i between the two heterogeneous models, then the dominance matrix is D (t) in (12).
Step 4: Calculate the average dominance D formula in (13) and matrix R formula in (14).
Step 5: Delete the model which is the minimum in the matrix R.
Step 6: Search the maximum value d ij in the matrix D (t), where d ij is the optimal model at time t.
Step 7: The optimal model at time t is stored in f (t) = [f1 (t) f2 (t) ⋯ f i (t) ⋯ f n (t)].
Step 8: The total number of each model selected as the optimal model is
Step 9: The prediction weight of model i is
Step 10: The prediction value of the heterogeneous ensemble model is
Project introduction
The case study is from an office building in Xi’an, a western city. The office building has 44 floors above ground and 3 floors underground, with a total construction area of 300000m2 and land area of 30000m2. The wall structure of the building adopts a concrete shear wall with shape coefficient of 0.8. The building has deployed the environmental detection sensors and energy consumption acquisition system, which could collect outdoor temperature, humidity and energy consumption data in real time.
Data sources
The energy consumption data collected from June 1, 2020 to June 30, 2020, including hourly data of air conditioning power, motive power, special power and lighting socket power consumption in Fig. 2. Furthermore, it could be found energy consumption data has significant time series characteristics from data. Lighting and sockets accounted for the largest proportion of energy in office buildings, while the others have a small proportion of energy consumption.

Flow chart of the method.

Energy consumption acquisition diagram.
Meteorological data were included outdoor dry bulb temperature, relative humidity, wind speed and intensity of solar radiation. The data have been collected from 8 am to 22 pm every day at per hour granularity. Figure 3 shows the outdoor weather data from June 2 to June 8, and the outdoor temperature change ranged from 20.8°C to 34.6°C is relatively unified in summer. On the other hand, relative humidity, wind speed and intensity of solar radiation are affected by weather conditions and uncertain factors. For example, solar radiation decreases and relative humidity increases under the influence of cloud cover and rainfall on June 5. For training data of office building energy consumption prediction, outdoor air temperature, relative humidity, intensity of solar radiation, wind speed at time T and historical energy consumption at time T-1 and time T-2 are usually selected as the input of model.

Diagram of meteorological data box.
Selection of similar day
According to the third section of the similarity day theory, the hourly meteorological factor, time factor and energy precursor similarity of history days are calculated respectively. The forecast day is July 1, 2020, and the history day, the month before forecast day selects the top 50% similarity of history to train prediction model. Figure 4 display the calculation results of the three similarities, and Table 1 figure out the comprehensive similarity of the past 30 history days. It can be seen from the results that similar day have regularity and periodicity, the precursor similarity decreases rapidly when the time interval is greater than 25 days. Further, the time factor similarity change period is 7 days and decreases slowly as the number of days increases. Altogether, the shorter the period between the history day and the forecast day, the larger the comprehensive similarity.

The scatterplot of three similarity results of history day.
Comprehensive similarity
The data set is separated into training data, validation data and test data. The 30-day hourly weather data and energy consumption data in June are applied to training data and validation data, then the data from July 1 to July 4 is used to discuss and analyze the model. The model is deployed in matlab2016a of win10, a 64-bit operating system. The base models are BPNN, SVR and LSTM. For BPNN, the main parameters that effect the performance of the model are the weight w and bias b. This research, SVR employs the radial basis kernel function, and the penalty coefficient C and kernel parameter gamma have a important influence on the accuracy for SVR. The one is an error constraint, and the other is the data distribution after mapping to the new feature space. The relationship between penalty coefficient C, kernel parameter gamma and accuracy of SVR is illustrated in Fig. 5. Taking MAPE as the evaluation index of the model accuracy, the changing relationship among the three parameters displays the shape of a paraboloid. When C and Gamma increase in the same proportion, the accuracy of SVR will be relatively high, and the accuracy change is small, when one of C and Gamma changes, the accuracy will become lower, which could provide some guidance for adjusting parameter by empirical method. Hyperparameters of SVR adopted grid search, the best parameters of the model are set as: C = 10.86 and Gamma = 31.45, and the SCA algorithm parameters are set as: μ = 4, α = 2, p = 2, Tmax = 500. The parameters optimization LSTM of result is shown in Fig. 6, where the green curve represents the fitness of the particle, which decreases with the number of iterations and converges at the 21st generation, and the blue curve and red curve represent respectively the number of neurons in the hidden layer of the LSTM and the learning rate fluctuate during the optimization process and finally converge. After optimization, the parameters of LSTM model are: the learning rate is 0.2045, and the number of hidden neurons is 11.

SVR parameter selection diagram.

LSTM parameter optimization diagram.
According to the previous theoretical analysis, the dominance of SVR, BPNN and LSTM is calculated first, and construct the dominance matrix of each hour to obtain the average dominance matrix D and the dominance matrix R of each model:
From the matrix, the SVR has the maximum dominance, followed by the LSTM, and the BPNN has the minimum dominance, therefore BPNN is removed from the ensemble learning model. After the models are integrated in several different ways, the prediction results of the models presented in Fig. 7 are compared in the form of relative errors. The error results calculated by the least squares method had the minimum error at individual time, however it has the largest variance and overall error. The other methods were roughly the same in error shape, and the model integrated by the dominance is superior to others at most of the time among them. The results of the methods are as follows: The MAE of the reciprocal error integrated model, equal weight integrated model, least squares integrated model and dominance integrated model are 421.497, 410.689, 513.892 and 330.508 respectively, so the results of dominance integrated model are superior to other combined models. In short, all ensemble methods could improve the prediction accuracy to some extent, among which the dominance integration method is the best.

Comparison diagram of different integration methods.
In generally, in order to evaluate and ensure the reliability of the model, several common model evaluation indicators could be selected, including MAE, RMSE, MAPE, CVRMSE, RMSPE. MAE is based on the absolute error which can intuitively represent the average difference between predicted and actual values. Compared with the average error, MAE could avoid the problem of cancelling errors, and thus can accurately reflect the size of the actual forecast error. RMSE is the quadratic root of the ratio of the square of the deviation between the predicted and the actual value to the observations n. RMSE is easier to identify large errors and can describe the degree of dispersion of the predicted values, videlicet if the maximum deviation is large, the RMSE will be enlarged. MAPE has more denominator y
i
than MAE. MAPE equals 0% means perfect model, more than 100% means inferior model. Moreover, compared with other indicators, it can identify the effect of errors caused by unusual abnormal outliers on model accuracy. CVRMSE and RMSPE are more suitable for comparing between different models. If CVRMSE is less than 10%, it means that the prediction effect is excellent [34]. Calculated as following:
Where n is the number of predicted samples, y
i
is the test data, and

SVR and integration model performance comparison.
Comparison of prediction performance of different models
The collection of meteorological and energy consumption data may influence the accuracy of prediction due to sensor errors and unavoidable noise jamming. The robustness of ensemble learning and single model is simulated and analyzed, and several Gaussian white noises of different intensities are added to the test data to generate a new data set. Finally, the decision coefficient R2 is applied to evaluate the model robustness. Figure 9 illustrates that robustness of the single prediction is good or bad for a while under different intensity interference, and robustness of BPNN is the worst. SVR and LSTM achieve the best robustness at signal-to-noise ratio of 50%, 35%, 15%, 10% and 45%, 40%, 30%, 25%, 20% respectively. The overall performance of the ensemble learning model’s robustness is better, and the anti-interference performance is better in the test data of different signal-to-noise (SNR) ratio. Although the best robustness of each model cannot be obtained, the robustness of the two models is integrated. The overall stability is better.

Robustness analysis.
From the experimental results, the selection of comprehensive similarity day can increase the prediction accuracy of a single model to a certain extent. On this basis, the ensemble prediction model can not only further improve the prediction accuracy but also enhance the robustness of the model. According to the reduction of RMSE and MAE, the performance of the ensemble learning model is significantly improved, because the single model has predictions with large absolute errors, and the integrated model greatly reduces the absolute errors of these predictions, the prediction curve fluctuates smaller and closer to the actual value. This method could solve the problem of unstable accuracy and poor robustness of traditional prediction models.
Numerous types of machine learning models, and relying on experience to choose a prediction model does not meet the prediction of demand brought about by the continuous increase in world energy consumption. Traditional prediction models cannot be applied well, an effective energy consumption prediction model is urgently needed to guide energy consumption. Therefore, this paper proposed a model integration method based on the selection of comprehensive similar day. Firstly, the three similarities of history days were calculated and the comprehensive similarity was obtained by the entropy weight method, and then the improved SCA and grid search were applied to optimize parameters of the single model to solve the optimization. Finally, a heterogeneous ensemble learning model based on dominance was established and theoretical analysis was made, simultaneously applied this method to the energy consumption prediction of office buildings.
After the comprehensive similarity, the MAPE of three models decreased from 12.69%, 18.2%, 11.46% to 9.44%, 13.39%, and 10.22%, respectively. After integration, the MAPE of the model was reduced to 6.18%. This paper presented a new method for building energy consumption prediction, the novel way can improve the accuracy of prediction and facilitate the advance dispatch of the energy system which is also suitable for prediction in other fields and can be better applied in engineering.
