Abstract
Building energy consumption (BEC) prediction is very important for energy management and conservation. This paper presents a short-term energy consumption prediction method that integrates the Fuzzy Rough Set (FRS) theory and the Long Short-Term Memory (LSTM) model, and is thus named FRS-LSTM. This method can find the most directly related factors from the complex and diverse factors influencing the energy consumption, which improves the prediction accuracy and efficiency. First, the FRS is used to reduce the redundancy of the input features by the attribute reduction of the factors affecting the energy consumption forecasting, and solves the data loss problem caused by the data discretization of a classical rough set. Then, the final attribute set after reduction is taken as the input of the LSTM networks to obtain the final prediction results. To validate the effectiveness of the proposed model, this study used the actual data of a public building to predict the building’s energy consumption, and compared the proposed model with the LSTM, Levenberg-Marquardt Back Propagation (LM-BP), and Support Vector Regression (SVR) models. The experimental results reveal that the presented FRS-LSTM model achieves higher prediction accuracy compared with other comparative models.
Keywords
Introduction
In recent years, energy consumption has sharply increased with the rapid development of global economic construction, and the proportion of building energy consumption (BEC) has also increased to approximately 36% of the entire energy consumption [1]. The electricity consumption of public buildings is a high-density area, and the primary consideration of building energy conservation. A complete understanding of the factors affecting the electricity consumption and the establishment of a corresponding prediction model will enable the comprehensive analysis and evaluation of the current situation and the establishment of a development direction for the electricity consumption of public buildings [2]. Additionally, an accurate BEC prediction will improve the rationality of the prediction results and provide more effective information to decision-makers for energy planning and management.
The development of energy consumption prediction has gone through three main stages: artificial prediction, statistical analysis, and artificial intelligence prediction. As early as the 1970s, owing to limited technical expertise, professionals could forecast the energy consumption based only on their own experience. Hence, the prediction results exhibited large deviations [3]. Subsequently, the statistical analysis method was developed for energy consumption prediction. In the 1990s, the development of computers and other technologies was not sufficiently mature, and the prediction methods were mainly statistical methods, including Multiple Linear Regression [4–6], Nonlinear Regression [7], Least Square Regression [8, 9], Time Series Method [10], Kalman Filtering [11], Fuzzy Exponential Method [12], Exponential Smoothing [13], and so on.
Statistical analysis methods have the advantages of fast prediction speed and simple data modelling, but the models constructed using statistical analysis methods are relatively simple. Because the accuracy is not ideal for processing a large number of nonlinear data, a large amount of stationary data are required [14]. Therefore, researchers have increasingly been using artificial intelligence methods for BEC prediction. These methods include the Support Vector Regression (SVR) [15, 16], Genetic Algorithm (GA) [17], Artificial Neural Networks (ANN) [18], Deep Learning [19], Extreme Learning Machine (ELM) [20], Long Short-Term Memory (LSTM) [21–24], and so on. Among them, deep learning algorithms have strong learning ability, and certain advantages in processing time series data. Particularly, the LSTM can capture the time-dependent manner of time series, and is capable of processing both sequential and non-linear data [25]. However, as the time span and data volume increase, the LSTM model becomes difficult to train. To solve this problem, when it comes to the selection of input variables, most researchers only consider the influence of temperature empirically, or artificially set threshold analysis methods [26]. However, such methods may cause information loss or redundancy. If all influencing factors are considered as the input variables of the network model, this will inevitably increase the burden of network training and reduce the accuracy of the predicted model. The Rough Set (RS) can simplify the data on the premise of retaining information, which is a research hotspot with regard to the feature variable reduction process. In [27], a RS-theory-based hybrid neural network was constructed, and the structure of the model was simplified to reduce the redundant function of the input data. In [28, 29], short-term power load prediction method integrating the RS and Support Vector Machine (SVM) was proposed to improve the prediction accuracy. However, because most factors related to the prediction of energy consumption are continuous fuzzy quantities, the attributes must be discretized in the RS reduction. During processing, some attribute information is lost and the reliability of the prediction cannot be ensured.
With consideration to the above-mentioned problems, this paper proposes a hybrid short-term energy consumption prediction method combining the Fuzzy Rough Set (FRS) and LSTM model. First, the FRS is employed to reduce the attributes of the factors affecting the BEC prediction. The main objective is to remove factors that are not related to the energy consumption or have low membership. Then, the reduced variables are considered as the input of the LSTM prediction model for training and testing. The experimental results reveal that the prediction precision of the presented model is significantly improved. The innovations and contributions of this study are summarized below. From the viewpoint of data processing for energy consumption prediction, the employment of the FRS avoids the information loss caused by the reduction of a classical rough set in the discretization process, and achieves the objective of reasonably simplifying the network structure. The data mining function of the FRS reduces the continuous fuzzy variables to be input into the neural network, and hence reduces the training time and improves the efficiency of the prediction system. The LSTM network is capable of handling both the temporal sequence and nonlinear relationships. Without the analytical expression between the energy consumption and various factors, the neural network can be trained using samples, and the trained network can be directly used to achieve higher accuracy in the energy consumption prediction. Experiments were conducted on actual data sets, and the effectiveness and superiority of the proposed method was validated by comparison with other commonly used algorithms.
The rest of this paper is structured as follows. In Section 2, the fuzzy rough set and its attribute reduction, and the principle of LSTM networks, are introduced. In Section 3, the algorithm structure and implementation steps of the proposed FRS-LSTM are described. In Section 4, the simulation experiment and the analysis of the results are discussed. In Section 5, the findings of this study are summarized.
Methods and principles
Fuzzy rough set and its attribute reduction
The RS can effectively analyse various incomplete information to discover hidden knowledge behind a large amount of data, reveal the significance of the condition attributes to the decision attributes, and thus remove redundant or irrelevant attributes. The RS describes the uncertainty and fuzziness of information by introducing precise sets, such as the upper and lower approximation sets, but classical RS cannot express the degree of elements belonging to a set, which results in the loss of information. To solve this problem in the process of RS discretization, the fuzzy membership function is used to express the difference in the middle transition of data, and can better complete the feature selection and more completely retain raw data information [30].
The FRS is obtained by replacing the exact set of classical RS with the fuzzy set. Moreover, the exact relationship in the classical RS is replaced by the fuzzy relationship, and then the FRS [31, 32] is generalized. The fuzzy upper and lower approximation can be defined as follows:
The reduction process of the FRS was initially based on the fuzzy upper and lower approximation sets. When the attribute value is discrete, it is consistent with the traditional reduction method. To obtain the fuzzy equivalence class F
i
in the domain U, the union operation in the discrete case is replaced by the upper bound operation in the fuzzy system. The corresponding fuzzy positive region and fuzzy dependence function are defined as follows:
In this study, Equations (2) and (3) were realized by the QuickReduct algorithm, which is a classical algorithm for attribute reduction. This algorithm increases the number of attributes and then decides whether attributes should stay or leave according to the change of dependency [33]. For example, let us assume that C and D are the sets of the condition attributes and decision attributes, respectively. First, an empty set R is selected as the initial set. Subsequently, the attributes are added to R in turn to assess how the dependency changes. If the dependency increases, this indicates that the corresponding classification ability will increase after the addition of this attribute, that is, the importance will be improved; otherwise, it will decrease. Until R (D) reaches the maximum value, R and C have the same classification capability. At this time, R is the attribute reduction result of C.
In the attribute reduction process of the FRS, owing to the application of fuzzy set theory in attribute reduction, the data after the discretization of RS represent a fuzzy equivalence class corresponding to the FRS under a single attribute. Accordingly, the discretization procedure is replaced by the process of attribute fuzzification. This method can avoid the discretization process and reduce the loss of information.
The LSTM is a specific form of recurrent neural network, wherein the weight of the self-loop changes by adding the input gate, forget gate, and output gate. Consequently, under the condition of settled model parameters, the scale integrated at different times can be dynamically changed to avoid the problem of gradient disappearance or gradient expansion [34–38]. The basic unit structure of the LSTM is shown in Fig. 1.

The basic unit structure of the LSTM.
According to this figure, the calculation formula of each LSTM cell is expressed as follows:
In this section, the scheme of the FRS-LSTM model is first introduced, and the main stages in the learning framework are then discussed in detail.
The scheme of the proposed model
This paper proposes a short-term energy consumption prediction method that combines fuzzy rough set theory (FRS) and long-term short-term memory (LSTM), and is thus called FRS-LSTM. The flow chart of the proposed FRS-LSTM method is shown in Fig. 2. First, the original variables are processed by the attribute reduction of FRS to reduce the input features, and hence the final attribute set is obtained and has the same classification ability as the original attribute set. Then, the final attribute set is used as the input of the LSTM model for training and prediction to output the final forecasting results. The specific processes are listed below:

Flow chart of the FRS-LSTM method.
Input layer: the first step is data processing and includes abnormal data, missing data, and normalization processing. After processing, the data are split into the training samples and testing samples. Then, the initial decision table is formed to reduce the FRS attributes. The simplest attribute set of the condition attribute relative to the decision attributes is obtained as the input of the LSTM model. Hidden layer: a single-layer LSTM model is adopted in the prediction experiment. The hidden layer is 50; the learning rate is 0.001; the number of epochs is 50; RELU is used as the activation function; the Adam optimizer is used to update the weights and biases. Network training layer: training and prediction are carried out according with the parameter settings in the experiment. The mean absolute error is used as the loss function. Output layer: through several iterations, the prediction results corresponding to the test data are output after satisfying the requirements.
Properties of public buildings
Public buildings mainly include office buildings, service buildings, commercial buildings, and so on. Generally, the electricity consumption of public buildings is generated by lighting sockets, air conditioning, power, and special purposes. However, different types of public buildings have different power consumption proportions [39]. Various factors, such as social and natural factors, influence the power consumption of public buildings. Among them, social factors mainly include the national economy, industrial structure, local consumption level, electricity price policy, and so on. Natural factors mainly include the geographical location of the building, architectural design, human factors, external climate factors, and random factors [40, 41]. The change of energy consumption is restricted by many factors. Because the influence of various factors on the change law of energy consumption is different, the load change fluctuates. Moreover, the energy consumption regularly fluctuates under the influence of external factors.
Compared with medium-term and long-term BEC forecasting, social factors only have a minor influence on short-term prediction. Generally, the main factors influencing the short-term energy consumption prediction results are as follows: Architectural design: mainly includes the category of buildings, shape factor, proportion of windows and walls, orientation, and so on. Human factors: mainly include personnel density, indoor temperature and humidity, equipment power, and so on. External factors: mainly include the outdoor temperature, humidity, wind speed, weather conditions, and so on. Random factors: sudden accidents, electrical maintenance, and so on. These factors can also cause energy consumption fluctuations, which greatly affect the prediction. Data processing methods corresponding to the data of related factors must be adopted.
In summary, because the energy consumption depends on various factors, the influence of these factors must be comprehensively considered to accurately predict the energy consumption. Then, scientific and reasonable forecasting methods must be adopted. However, the factors affecting load forecasting are too complicated and data collection is difficult. According to the data collected thus far, only the historical energy consumption, indoor and outdoor temperature and humidity, indoor wall temperature, and indoor carbon dioxide concentration are included. The experiment conducted in this study was based on all of these collected data types, and its objective was to verify the feasibility of the proposed model.
Attribute reduction
As mentioned above, there are many factors that affect the energy consumption prediction, including the indoor environment, weather, and so on. Furthermore, the degree of influence of each factor is also different. However, if all influencing variables are considered as the input of the forecasting algorithm, this will affect the prediction speed and reduce the generalization capability. Therefore, this study adopted the FRS to simplify the influencing factors, optimize the model’s input variables, and avoid the phenomenon of information redundancy.
To introduce the reduction algorithm in detail, a simple data set is considered as an example. The specific steps are as follows:
Table 1 is the original decision table containing six samples, wherein A, B, and C are conditional attributes, and D is the decision attribute. It is assumed that the fuzzy membership function adopted for each variable is as shown in Fig. 3. According to the fuzzy membership function, the fuzzy initial decision table presented in Table 2 is obtained by calculating the corresponding membership values of each attribute in the initial decision table. Based on the fuzzy initial decision table, the decision equivalence classes are divided by attributes, which are considered as fuzzy sets. Then, the QuickReduct attribute reduction is performed. Considering A as an example, its fuzzy equivalence class is expressed according to Equation (1), as follows:
The detailed calculation process is as follows: When x = 1 : λ
L
A
(1) =0 ; λ{1,4,5} (1)= 1 ; max {1 - λ
L
A
(1)} , λ{1,4,5} (1) =1 In the same manner, when x = 2, 3 . . .6, the corresponding values can be obtained as follows:
Initial decision table Fuzzy membership function.. Fuzzy initial decision table

By comparing the dependence, it can be seen that the dependence of attribute B is significantly higher than that of attribute C; therefore, attribute C can be removed. In the same manner, the following calculations are performed:
As can be seen, the FRS-based attribute reduction has the advantages of simple operation and high credibility, and is suitable for application in this experiment to delete variables with less correlation between the conditional attributes and the decision attributes. The final attribute set is obtained after the reduction.
The extracted data determined by the final attribute from the set of training parameters are taken as the LSTM input. The main processes are as follows: Determination of LSTM hyper-parameter. The optimal hyper parameter is determined in the repeated test, and the time step is based on the data of the former seven moments. In other words, the input layer dimension of the energy consumption and its affecting elements is 7. The number of the hidden layers is tested within a certain range. The dimension of the output variable is 1. The experimental program was conducted using the Keras deep learning library, which is built on top of the Tensorflow framework. The activation functions are very important for the ANN model’s learning and understanding of complex and nonlinear functions. Compared with the Sigmoid and Tanh functions, RELU converges more quickly and solves the vanishing gradient problem. Moreover, RELU is the most commonly used activation function in the development of ANN models. Therefore, RELU was selected as the activation function of the LSTM. The dropout was set to 0.5 to prevent over-fitting during the training process for energy consumption prediction. For neural networks, dropout means temporarily ignoring random nodes in the network during the training process. To determine the error evaluation indices, the mean absolute percentage error (MAPE), root-mean-square error (RMSE), mean absolute error (MAE), and R2 score (R2) were adopted, and are expressed in Equations (12), (13), (14), and (15), respectively. Additionally, the MAE was used as a loss function to verify the fitting degree. Optimizer selection. The optimizer is one of the two parameters required to compile the Keras model, and is used to update the weights and offsets. The commonly used optimizers are SGD, RMSprop, Adam, and so on. In many cases, the effects of the three optimizers are similar. Adam adds the bias-correction and momentum based on RMSprop, and achieves better performance with gradient dilution. Overall, Adam is the best option. In this experiment, the learning rate was set to 0.001 and the floating point was set to 0.999.
Experiment
This section first introduces the data pre-processing method and evaluation index. The actual case of public building energy consumption was experimentally investigated, and the results are presented and discussed in detail.
Data pre-processing and evaluation indices
Processing of the missing data
Historical data obtained at the same time but several days before and after were used to fill in missing data. The specific formula is expressed as follows:
In data acquisition, abnormal communication transmission, sudden accidents, and other reasons inevitably cause abnormal changes in the system data, and greatly influence the accuracy of the BEC prediction. Therefore, it is necessary to deal with outliers. In this experiment, the abnormal data were directly removed and the processing method of the missing data was adopted to fill in the vacancy.
Normalized treatment
Because the dimensional units of different input variables are different, it is essential to normalize the original data to eliminate the influence of different dimension eigenvalues on the predicted accuracy. After normalization, the data are processed into decimals between [0, 1]. The normalization formula is expressed as follows:
To more intuitively evaluate the forecasting performance of different models, four measurements, namely, the MAE, RMSE, MAPE, and R2 were adopted. The formulas are expressed as follows:
The MAE, RMSE, MAPE, and R2 are commonly used as evaluation indices in machine learning algorithms. Among them, MAE is the average of the absolute error between the forecasting values and the real values, and MAPE and RMSE describe the forecasting accuracy by comparing the deviations between the actual and predicted values. For these three indicators, smaller values indicate the higher accuracy of the prediction model. Finally, R2 can be used to assess the degree of data fitting and its value range is [0, 1]. The fitting effect of the model is improved as the result approaches 1.
Applied dataset
As shown in Fig. 4, an office building located in Jinan, China, was considered as a case study. This building is a passive low-energy and green building demonstration project, and adapts to climatic features and natural conditions by adopting a retaining structure with high air-tightness performance and thermal insulation. The use of an efficient heat recovery fresh air system can reduce the heating and cooling demand of buildings and fully utilize renewable energy to provide a comfortable indoor environment and satisfy the basic requirements of green buildings. The building has two floors above the ground and one floor underground, and a total building area of approximately 2030m2. The thermal conductivity of the graphite polystyrene boards used for external insulation is 0.032w/(m2/K), and the heat transfer coefficient of the doors and windows is 0.74w/(m2/K).

Exterior view of an office building.
This experiment used monitoring data for the energy consumption of the office building. The data were collected from October 31, 2017 until March 22, 2018, and include the energy consumption (F), indoor temperature (T n ), indoor humidity (H n ), outdoor temperature (T w ), outdoor humidity (H w ), carbon dioxide concentration (CO2), wall surface temperature (T q ), and other factors. The integrated dataset includes a total of 3418 samples, which form a short-term energy consumption time series. The change curve lines of the historical data are shown in Fig. 5. To visually observe the change trend of historical data, parts of the original date (500 samples) are plotted in Fig. 6. As can be clearly seen, the energy consumption values exhibit regularity conforming to the electricity characteristics of public buildings.

Historical data of the office building: Total historical data.

Historical data of the office building: 500 sample data.
In the experiment, 2/3 of the raw data were selected as training samples and 1/3 of data were used for testing. The training dataset was used to construct the model and measure its accuracy, whereas the testing dataset was used to test its generalization capability.
According to the FRS attribute reduction in Section 3, the data of the previous seven moments were used as input after a large number of tests. An initial decision table was formed as presented in Table 3. A certain moment was selected as the standard and the data of the previous seven moments were substituted in.
Initial decision table
Initial decision table
Then, according to the physical characteristics of the attributes, an appropriate fuzzy membership function was selected to fuzzily divide the attributes. The membership function can be the Gaussian function, normal distribution, trapezoidal distribution, triangular distribution, and so on. Practical experience has shown that the shape of the membership function only has a minor effect on the system control effect. Hence, a relatively simple fuzzy membership function was used in this experiment to simplify the calculation. The indoor temperature was considered as an example and was divided into the three states of low temperature, medium temperature, and high temperature; its membership function was determined as follows:
a. A left-shoulder trapezoidal distribution was adopted for low temperature

Fuzzy membership function: (a) Indoor temperature; (b) Outdoor temperature; (c) Indoor humidity; (d) Outdoor humidity; (e) Indoor wall temperature; (f) CO2 concentration; (g) Total energy consumption of public buildings.
The corresponding membership value of each attribute was calculated according to each fuzzy membership function. The initial fuzzy decision table was obtained through attribute fuzzification, as presented in Table 4.
The fuzzy initial decision table
Finally, the QuickReduct algorithm was used to implement the fuzzy rough set reduction process. The process started with an empty set and attributes were added sequentially. If the dependency increased, the corresponding classification ability also increased after the addition of this attribute, otherwise it decreased. Until the maximum possible value for the dataset was produced, this set had the same classification ability as the condition attributes. The threshold of the dependency increment was set, and the attribute was considered to have remained after the reduction only when it was higher than the threshold. The main steps of the algorithm are as follows: R← { } do T ← R ∀x ∈ (C - R) If γR∪{x} (D) > γ
T
(D) T← R ∪ { x } R ← T Until γ
R
(D) = γ
C
(D) return R
where C is the set of all conditional elements, and D is the set of decision attributes.
Using this algorithm, the final attribute set with the same classification capacity as the original data set was obtained as presented in Table 5.
Final attribute set after attribute reduction
The experimental results reveal that the degree of membership of the building wall temperature was 0.567, which indicates that its position and orientation in the building design and wall insulation can be considered as influencing factors in the short-term energy consumption prediction. The indoor carbon dioxide concentration was 0.458, which indicates that the personnel density has certain influence on the short-term energy consumption prediction. The historical energy consumption, indoor and outdoor temperature, and indoor humidity can also be considered as influencing factors in the short-term energy consumption prediction. Moreover, the degree of membership of the conditional attribute relative to the decision attribute ranged from 0 to 1, and the impact on the prediction result became more important as the value approached closer to 1. From Table 5, it can be seen that, following the FRS reduction, only eight items remained after the reduction of 55 conditional attributes. The current wall temperature, carbon dioxide concentration, and energy consumption data of the previous five moments had greater impact on the prediction results. While the membership degree of the outdoor humidity was lower than the threshold, it was not necessary to consider the influence of the outdoor humidity on the experimental results.
The final attribute set after the reduction was the LSTM input. In this experiment, the hidden layers were set to 10 levels for testing: 20, 40, 60, 80, 100, 120, 140, 160, 180, and 200. To reduce the influence of random initialization, the average MAE, RMSE, MAPE, and R2 values were obtained from 10 different experiments. The average performance indices of the proposed FRS-LSTM model are listed in Table 6. As can be seen, the comprehensive index values of the third test were the best values. In other words, when the number of neurons in the hidden layer was 60, the FRS-LSTM model achieved the best performance when the MAE, RMSE, MAPE, and R2 values were 0.512, 0.854, 2.905, and 0.993, respectively.
The averaged performance of the FRS-LSTM model in 10 cases
Figure 8 describes the training loss and testing loss of the FRS-LSTM in the experiment. As can be seen, the algorithm learned over time, and the training stopped after approximately 10 iterations. The loss of the training data and testing data decreased, and the train and test loss curves of the prediction model gradually tended to approach each other as the number of iterations increased. Hence, it is demonstrated that the history data were evenly dispersed and are suitable for training the energy consumption.

Fitting curve of the LSTM model.
The results of the four prediction models obtained in this experiment are shown in Fig. 9. To present more details, two parts of the results were selected. As can be intuitively understood, the proposed FRS-LSTM model can appropriately reflect the real fluctuation compared with the SVR, LM-BP, and LSTM prediction models.

Experiment results of the four forecasting models.
The evaluation indicators of the four models are listed in Table 7. After a large number of experiments on each training model, the optimal MAE, MSE, MAPE, and R2 values were obtained. The experimental results reveal that the proposed FRS-LSTM model had the smallest MAE, MSE, and MAPE values, and largest R2 value, that is, 0.512, 0.854, 2.905, and 0.993, respectively, which indicates that the proposed model achieved the best forecasting performance.
The performance indices of the four models
To further analyse the performance of the proposed model, we plotted the error histograms of the four models as shown in Fig. 10. The higher and narrower histogram near the zero point indicates better prediction performance. As can be clearly seen, the FRS-LSTM expresses higher and narrower errors close to the zero point compared with the SVR, LM-BP, and LSTM models. In other words, the FRS-LSTM prediction errors are relatively small and the overall performance is better than that of the other three models.

The error histograms of the four forecasting models.
In this paper, a novel FRS-LSTM model is proposed with consideration to the various factors affecting the energy consumption prediction of public buildings. First, the abnormal values and missing values of historical data are corrected and filled in. Secondly, all data, such as the energy consumption, temperature, and humidity are normalized, the sample data are reconstructed, and the attributes of the input variables are then reduced by the FRS. The final attribute set is taken as the input for the training of the LSTM model, and the prediction results are finally obtained. To verify the superiority of the presented model, we applied this model to an actual experiment and compared the results with the results obtained by three other commonly used prediction models. The proposed FRS-LSTM model achieved higher prediction accuracy compared with the other three prediction models, and can satisfy the requirements of practical applications. Moreover, the proposed model cannot only be used to forecast the energy consumption of public buildings, but can also be used for other multivariate input time series. In this article, the hybrid FRS-LSTM model was adopted to predict the short-term energy consumption. Although the accuracy of the forecasting results is higher than that of the comparative models, there are still various limitations in this study with regard to time, experimental conditions, and personnel amount. This study only considered the influence of temperature, humidity, carbon dioxide concentration, wall temperature, and other conditions on energy consumption, but did not consider the date type, personnel, and other conditions. In future work, the influence factor variable can be increased to improve the experimental accuracy. The proposed prediction model is based on short-term energy consumption prediction. Future work will investigate whether the proposed FRS-LSTM model is applicable to medium-term and long-term energy consumption prediction.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Footnotes
Acknowledgments
This research has been supported by the National Key R & D Program of China (2017YFB1302400), the National Natural Science Foundation of China (61773242,61803227), the Science and Technology Program of the Ministry of Housing and Urban-Rural Development of China (2020-K-083), the Shandong Natural Science Foundation (ZR2019MF064), and the Key Research and Development Program of Shandong Province (2016GGX103031).
