HVAC energy consumption prediction based on RJITL deep neural network model

Abstract

During the operation of HVAC (Heating, Ventilation, and Air-Conditioning) systems, precise energy consumption prediction plays an important role in achieving energy savings and optimizing system performance. However, the HVAC system is a complex and dynamic system characterized by a large number of variables that exhibit significant changes over time. Therefore, it is inadequate to rely on a fixed offline model to adapt to the dynamic changes in the system that consume tremendous computation time. To solve this problem, a deep neural network (DNN) model based on Just-in-Time learning with hyperparameter R (RJITL) is proposed in this paper to predict HVAC energy consumption. Firstly, relevant samples are selected using Euclidean distance weighted by Spearman coefficients. Subsequently, local models are constructed using deep neural networks supplemented with optimization techniques to enable real-time rolling energy consumption prediction. Then, the ensemble JITL model mitigates the influence of local features, and improves prediction accuracy. Finally, the local models can be adaptively updated to reduce the training time of the overall model by defining the update rule (hyperparameter R) for the JITL model. Experimental results on energy consumption prediction for the HVAC system show that the proposed DNN-RJITL method achieves an average improvement of 5.17% in accuracy and 41.72% in speed compared to traditional methods.

Keywords

HVAC energy consumption weighted similarity measure deep neural network Just-in-Time learning

1 Introduction

In recent years, commercial and industrial buildings have been utilizing 40% of the global primary energy and producing 30% of global greenhouse gas emissions [1]. The operating energy consumption of HVAC systems accounts for the largest proportion of the building’s energy consumption [2]. As economic development continues, the proportion of energy consumed by air conditioning will further increase. In the face of severe climate change and environmental problems caused by rapid growth in energy consumption, energy saving and consumption reduction of HVAC systems have attracted great attention [3]. HVAC system energy consumption prediction lays the foundation for its energy saving and consumption reduction. It enables precise control of the HVAC system, adjustment of operating parameters and efficient operation and maintenance. This prediction facilitates strategic energy planning, and enables the implementation of measures to enhance energy efficiency, lower operating costs, and achieve energy-saving objectives.

There has been extensive research on HVAC energy-related modeling techniques. Previous researches mainly divided into two types: mechanism modeling based on operational principles [4–6], and black box modeling based on neural networks [7, 8]. Mechanism modeling is characterized by its clear internal mechanisms and good model interpretability. Yao et al. [4] proposed a state-space based dynamics model for HVAC systems and verified the transient response of the model for a given input. However, the intense interactions of multiple parts of the HVAC systems were not considered. Turner et al. [5] proposed a HVAC system fault detection method that uses a recursive least squares modeling approach to systematically identify synthetic time series data from a residential building simulation program. The method does not require detailed physical formula of the HVAC and has the significant advantage of simplifying calculations. However, this method has strict restrictions on the running state of the HVAC system, and its anti-interference ability is not strong enough, which limits the implementation in practical scenarios.

With the increase of large database of HVAC system operations, the black box modeling approach, which requires abundant data and relies on its simplicity and flexibility, has been further developed [9–12]. Terzi et al. [9] employed a linear regression(LR) approach to predict the energy consumption of each HVAC system subsystem independently. It was demonstrated that LR is highly applicable in cooling systems with different layouts and components. Chen et al. [10] used SVR to predict hourly electricity demand in hotels and shopping centers over a 24-hour period, which completed the prediction process within 20 seconds with an error of approximately 4.0% and 6.0% respectively. Sonta et al. [11] used multiple linear regression (MLR), support vector regression (SVR), random forests (RF), and artificial neural networks (ANN) to determine the most robust surrogate model for the building energy consumption. The above methods are all shallow neural network modeling methods, which have the advantage of fast operation speed and do not need to deeply study the model mechanism. The limitation is that their shallow structure constrains the ability to express very complex functions, and potentially leads to convergence challenges, such as being susceptible to local minima during processing.

Deep Neural Networks (DNNs) have the capability to achieve intricate nonlinear mappings in a flexible manner [13–15]. The HVAC system is characterized by complexity and various linear or nonlinear constraints between the control parameters. The emergence of various deep learning algorithms has made it possible to use deep learning for HVAC energy consumption prediction [16–18]. Chen et al. [19] proposed a gait pattern recognition method based on Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) for lower limb exoskeleton. They validated that the LSTM model is good at processing time-series data, and the CNN is expert in processing data with spatial structure characteristics. Xu et al. [20] used deep learning methods CNN-LSTM to forecast the number of COVID-19 cases and used MAE as an evaluation indicator. Xie et al. [21] presented a physics-constrained deep active learning (P-DAL) framework to model spatiotemporal cardiac electrodynamics, which combines both sensor measurements and physics prior knowledge into DNN and demonstrated good predictive capability in this field. Zhao et al. [22] used a sliding time window for dynamic input to the deep learning model, with which the prediction accuracy can be significantly improved.

From the above analysis, it can be concluded that DNNs are wise tools for decision making, management, and design optimization in HVAC systems. Due to the large datasets and time-varying characteristics of HVAC systems, existing research always add a recurrent connection layer in the DNN to capture temporal features for precise predictions [17, 18]. However, this approach uses the entire historical dataset to learn the weights and parameters of the model, resulting in long training time and lake of flexibility. Study [22] utilized a sliding time window to construct local models for online prediction, which reduces the training time. However, the fixed length of the time window lacks adaptability to the input data. To address such obstacles, we propose a novel method in this paper. By combining the Just-in-Time Learning (JITL) algorithm with the DNN, we construct an online rolling model for HVAC system energy consumption prediction. The DNN-RJITL method is introduced to addresses the challenges posed by large data volume, complex data coupling and high time-variability of working conditions in HVAC systems. The JITL continuously adjust the local model based on changing operating conditions, which enables a better adaptability of the local dynamics of the operation process [23]. It makes accurate predictions with less training time. The main contributions of this paper are as follows:

The DNN is constructed as a local model within the JITL. It overcomes the problem of insufficient generalization ability of traditional models such as [9–11]. To speed up the network training and optimize the back propagation process, the DNN incorporates additional optimization strategies such as HuberLoss function, batch normalization strategy, and Adam optimization algorithm.

A JITL algorithm is employed to develop an online rolling prediction model. JITL exhibits a faster model update rate compared to existing methods [19] and [20]. We innovatively propose an adaptive selection of relevant samples based on Spearman-weighted Euclidean distance similarity, compared to methods [22] and [23], which proves more effective in handling dynamic data of the HVAC.

The prediction results of the local models are ensembled through an arithmetic averaging method mitigate the risk of overfitting and build a model with excellent generalization performance.

To achieve adaptive update of local models, the JITL model defines the update hyperparameter R, which reduces the frequency of model updates and improves the efficiency of online modeling.

The paper is organized as follows. Section 2 describes the structure and data processing of the HVAC system. Section 3 introduces the local model DNN within JITL and presents its optimization method. Section 4 presents the energy consumption prediction model based on the DNN-RJITL. Section 5 shows the simulation experiments and corresponding discussion. Finally, Section 6 summarizes the whole paper.

2 HVAC system overview

2.1 HVAC system structure

The HVAC system plays a crucial role in maintaining a specific temperature and humidity range within a building. Its primary purpose is to maintain the physical health of individuals and to protect sensitive electronic equipment in the building from excessive heat, condensation and other environmental factors [24]. A typical HVAC system consists of multiple components, including cooling towers, water pump systems, and chiller units, all of which work together to regulate the temperature and humidity levels in a building [25]. Overall, a well-designed and well-maintained HVAC system is essential to the reliable operation of any modern construction.

As shown in Figure 1, a precision air conditioning system is made up of a cooling tower, primary refrigeration pumps, chiller units, and cooling pumps connected in series. For efficient cooling, the principal cooling units (1#, 2#, 3#, and 4#) are all operated simultaneously. The devices parameters of HVAC are presented in Table 1. The energy consumed by the HVAC system includes cooling towers, pump systems and chiller units. Therefore, this paper investigates the establishment of energy consumption prediction models for cooling towers, pump systems and chiller units, respectively.

Fig. 1

Schematic diagram of 1# unit in the HVAC.

Table 1

Devices parameters of the HVAC

Devices	Devices No.	Parameter
Cooling Tower	28A7-1# ∼4 #	Power 49.75 (kW), Air Volume 61200 (m³/h)
Primary Refrigeration Pump	1A7-1#∼4#	Frequency 12.15 (Hz)
Chiller Unit	1A7-1#∼4#	Power 286 (kW), Refrigerating capacity 2003 (kW)
Cooling Pump	1A7-1#∼4#	Frequency 12.63 (Hz)
Secondary Refrigeration Pump	5A7-1#∼5#	Frequency 9.61 (Hz)

2.2 Data processing

HVAC system is a complex nonlinear system consisting of multiple interconnected devices. The characteristics required for modeling are obtained and shown in Table 2 through theoretical research and expert experience [26, 27].

Table 2
Input features of energy consumption prediction models

Systems Input features Range of value

Cooling tower Frequency of cooling tower fan 7.55-27.3 (Hz)

Inlet water temperature of cooling tower 23.96-30.83 (°C)

Outlet water temperature of cooling tower 18.77-23.16 (°C)

Outdoor temperature 15.4-34.66(°C)

Outdoor humidity 41.72-88.29 (°C)

Cooling pump frequency 8.15-17.76 (Hz)

Pump system Primary refrigeration pump frequency 8.19-18.68 (Hz)

Cooling pump frequency 5.87-22.63 (Hz)

Secondary refrigeration pump frequency 6.88-13.48 (Hz)

Chiller unit Outlet refrigeration water temperature 13.075-18.65 (°C)

Inlet refrigeration water temperature 18.375-21.225 (°C)

Chilled unit current percentage 15.27-60.70 (%)

Primary refrigeration pump frequency 8.19-18.68 (Hz)

Cooling pump frequency 5.87-22.63 (Hz)

Temperature differences of cooling water 1.55-4.65 (°C)

Frequency of cooling tower fan 7.55-27.3 (Hz)

Secondary refrigeration pump frequency 6.88-13.48 (Hz)

Outdoor temperature 15.41-34.66 (°C)

Outdoor humidity 24.08-88.87 (g/m³)

Systems	Input features	Range of value
Cooling tower	Frequency of cooling tower fan	7.55-27.3 (Hz)
	Inlet water temperature of cooling tower	23.96-30.83 (°C)
	Outlet water temperature of cooling tower	18.77-23.16 (°C)
	Outdoor temperature	15.4-34.66(°C)
	Outdoor humidity	41.72-88.29 (°C)
	Cooling pump frequency	8.15-17.76 (Hz)
Pump system	Primary refrigeration pump frequency	8.19-18.68 (Hz)
	Cooling pump frequency	5.87-22.63 (Hz)
	Secondary refrigeration pump frequency	6.88-13.48 (Hz)
Chiller unit	Outlet refrigeration water temperature	13.075-18.65 (°C)
	Inlet refrigeration water temperature	18.375-21.225 (°C)
	Chilled unit current percentage	15.27-60.70 (%)
	Primary refrigeration pump frequency	8.19-18.68 (Hz)
	Cooling pump frequency	5.87-22.63 (Hz)
	Temperature differences of cooling water	1.55-4.65 (°C)
	Frequency of cooling tower fan	7.55-27.3 (Hz)
	Secondary refrigeration pump frequency	6.88-13.48 (Hz)
	Outdoor temperature	15.41-34.66 (°C)
	Outdoor humidity	24.08-88.87 (g/m³)

Researchers have collected the HVAC operational data from the system for nearly a year, specifically from December 2021 to November 2022, which basically covered all working conditions. The representative 40,000 data are selected as the historical database, containing data under different environmental variables. 200 data are used as the validation set. The missing values are handled through linear interpolation, outliers are eliminated using the Daira (3σ) criterion, and a moving average filter is applied to reduce noise and enhance the overall quality of the data.

3 Local models building

In this Section, we will introduce the local model DNN and present its optimization methods.

3.1 Principles of deep neural network

The Deep Neural Network (DNN) [28] is a model of artificial neural network based on multiple layers of neurons, as shown in Figure 2, which consists of an input layer, hidden layers and an output layer. The input layer receives the input data, the hidden layer processes and transforms the data, and the output layer generates the final output result. The DNN’s ability to express complex functions increases with the number of layers in the network, and its compact nonlinear mapping relationship can handle a large set of functions, which is suitable for energy consumption prediction of HVAC systems [29]. The mathematical expression of DNN is as follows:

Fig. 2

The structure of DNN

{\begin{matrix} σ_{ij} = φ_{j} (ω_{ij} σ_{j - 1} + b_{ij}) \\ σ_{j - 1} = \sum_{i = 1}^{N_{i}^{j - 1}} φ_{j - 1} (ω_{i, j - 1} σ_{i, j - 1} + b_{i, j - 1}) \end{matrix}

(1) where φ, ω and b are activation function, weight matrix and bias of the DNN, respectively. σ_j is the j-th layer neural network, σ_ij is the i-th neuron of the j-th layer, i = 1, 2, …, N_σ, j = 1, 2, …, N_L . N_σ is the number of neurons in each layer of the neural network, N_L is the number of hidden layers.

3.2 HuberLoss function

HuberLoss function is adopted as the loss function for network training. By combining the absolute value term with the square term, a smooth transition in loss calculation is achieved in HuberLoss, which enables the model to remain smooth when approaching the target value, avoids too sharp changes in the loss function, and helps to improve the training stability of the model [30]. HuberLoss function is proposed with the following expression: $\begin{matrix} L_{δ} (y, f (x)) = {\begin{matrix} \frac{1}{2} ϵ^{2}, if | ϵ | \leq δ \\ δ | ϵ | - \frac{1}{2} δ^{2}, if | ϵ | > δ \end{matrix} \end{matrix}$ (2) where y is the true value, f (x) is the predicted value, ϵ = y - f (x), and δ is the parameter of HuberLoss. Compared with MSE Loss, HuberLoss introduces the absolute value term into the loss calculation, making it insensitive to outliers. This enhances the model’s robustness in the presence of noise or outliers. However, the performance of HuberLoss largely depends on the choice of the threshold parameter δ. A δ that is too small or too large can affect the training and generalization performance of the model.

There are many outliers and noise in the data set of HVAC systems, which are caused by the complexity of the system and the external environment changes. In this scenario, HuberLoss is able to improve the robustness of the model while maintaining high accuracy.

Remark 1. Due to the parameter-sensitive nature of HuberLoss, choosing the appropriate δ is crucial for the successful training. The $δ = {\begin{matrix} δ \sim 0, HuberLoss tends to MAE \\ δ \sim \infty, HuberLoss tends to MSE \end{matrix}$ (3) From Eq. (3), the HuberLoss function is close to the MAE when δ ∼ 0, which can reduce the influence of outliers and thus achieve a more robust training. Conversely, when δ ∼ ∞, the HuberLoss function is close to the MSE, which compensates for the slow decline of the loss. The selection of δ can be determined through data analysis and cross-validation.

1) Determine the range of the target δ based on the HVAC data set. For example, if there are large outliers in the data select a larger δ.

2) Employ cross-validation to select the optimal value for δ. During the training process, experiment with values chosen according to 1), such as 1, 1.5, 2, and evaluate the performance on the validation set to determine the best δ.

In this research, δ takes the value of 1.

3.3 Batch normalization strategy

As the number of hidden layers in a neural network increases, the data distribution of the input may change due to the matrix operations and nonlinear processing. To solve this problem, batch normalization is used to adjust the variances and mean positions of the input values of each layer by scaling and shifting them to be closer to the standard normal distribution. The nonlinear expression capabilities of the network can be improved by adding batch normalizing layers to the network architecture, which speeds up and improves the stability of model training [31]. The batch normalization formula is shown follows: $y^{(k)} = γ^{(k)} \frac{x^{(k)} - μ^{(k)}}{\sqrt{{(σ^{(k)})}^{2} + ɛ}} + β^{(k)}$ (4) In the k-th dimension of data, x^(k) represents the input matrix. y^(k) represents the normalized output matrix. μ^(k) and σ^(k) respectively represent the mean and standard deviation of the input. β^(k) and γ^(k) are learnable translation and scaling parameters, respectively. ɛ is a small quantity to prevent the denominator from being 0.

3.4 Adam optimization algorithm

Adaptive Moment Estimation (Adam) is an adaptive learning rate optimization algorithm, which efficiently update the weights of the neural network during the training process. Existing studies have shown that Adam optimization is computationally efficient, easy to implement, suitable for large datasets and models with high-dimensional parameter spaces, and has been shown to converge faster than other optimization algorithms like Stochastic Gradient Descent (SGD) [32]. The main steps of Adam are as follows:

Step 1: Initialize weight vector θ, 1^st moment vector m_t, 2^nd moment vector u_t, and set time stamp t to 0.

Step 2: t = t + 1, then calculate the derivative of the loss function L with respect to θ.

$φ_{t} \leftarrow \nabla_{θ} L_{t} (θ_{t - 1})$ (5)

Step 3: Estimation of 1^st moment vector m_t and 2^nd moment vector u_t using exponential translation. $m_{t} \leftarrow β_{1} * m_{t - 1} + (1 - β_{1}) * φ_{t}$ (6) $u_{t} \leftarrow β_{2} * u_{t - 1} + (1 - β_{2}) * φ_{t}^{2}$ (7) where β₁ and β₂ are the exponential decay rates of the 1^st and 2^nd moment vector.

Step 4: Introduce bias correction to avoid vectors that are biased towards 0. ${\hat{m}}_{t} \leftarrow m_{t} / (1 - β_{1}^{t})$ (8) ${\hat{u}}_{t} \leftarrow u_{t} / (1 - β_{1}^{t})$ (9) where ${\hat{m}}_{t}$ and ${\hat{u}}_{t}$ are gradient unbiased estimations of expectation and variance.

Step 5: Update the parameter vector θ. If the condition is satisfied, output θ_t, otherwise return to Step 2. $θ_{t} \leftarrow θ_{t - 1} - α * {\hat{m}}_{t} / (\sqrt{{\hat{u}}_{t}} + ɛ)$ (10) where α is the learning rate and ɛ is the minimum number with a default value of 10^-8.

In the process of building a deep neural network, the HuberLoss function is used to improve the speed and robustness of gradient descent, the batch normalization strategy is used to stabilize the input data distribution, and the Adam optimization algorithm provides an adaptive learning rate for each parameter by using the estimated mean and variance of the gradients. By adding optimization strategies to the deep neural network, the convergence speed and accuracy requirements of the RJITL model for nonlinear model prediction can be met.

4 Rolling online prediction with the DNN-RJITL

In the energy consumption prediction process of HVAC systems, the global deep neural network model ignores the dynamics of time-varying operating conditions and has high computational complexity. In contrast, the JITL method possesses adaptability to varying operating conditions and offers real-time capability, making it well-suited for the predictive needs of HVAC systems. Figure 3 shows the differences between the global modeling method and the JITL method. In the global model, the training set consists of the complete historical database, while in the JITL model, the training set applys a local dataset selected using a similarity metric. The online learning nature of the JITL method enables for real-time updates of the training set by replacing old samples with new ones. The main steps of the JITL method are as follows:

Fig. 3

The global modeling method versus the JITL modeling method

1) Upon the arrival of a test sample, relevant samples are queried from the historical database using the similarity metric.

2) A local model is constructed based on the relevant samples.

3) The output predicted value is estimated using the constructed local model. Once the estimation is complete, the local model is discarded.

4) When a new test sample arrives, the process is repeated, and a new local model is constructed based on the aforementioned steps.

4.1 Similarity metrics

The essence of the JITL lies in assigning pseudo-labels to the historical sample database through the similarity measure, which evaluates the resemblance between the current workpoint and the historical workpoints. Traditional similarity measurement methods in JITL method typically rely on Euclidean distance or the vector angle [33]. Euclidean distance calculates the straight line distance between data points, which is suitable for continuous data, and can capture the linear relationship between features. Euclidean distances are still valid in high dimensional spaces, especially if the relationship between features is linear. However, it is limited in capturing non-linear relationships, and the computed results are susceptible to the scale of features and the presence of outlier data. Cosine similarity overcomes the influence of feature scale because it measures the angle between feature vectors rather than the distance. This makes it more robust for features with different scales. Cosine similarity captures directional information about features. This is useful for certain problems such as text similarity and recommender systems. However, it only focuses on the direction of the vectors and ignores the magnitude of the vector specific values.

This paper presents a novel approach that utilizes a Spearman coefficient-weighted Euclidean distance similarity measure. The similarity between the test sample point x_t and any sample x_i in the historical database can be expressed as: $D (x_{t}, x_{i}) = exp [- \sum_{d = 1}^{D} w_{d} {(x_{t, d} - x_{i, d})}^{2}]$ (11) where D is the dimensionality of the input variable and w_d is the weight assigned to each feature of the input variable by the Spearman coefficient. The Spearman correlation coefficient is a statistically derived similarity measure, which is able to express the degree of nonlinear dependence between the input and output variables and is reliable in nonlinear problems [34, 35]. The weighted similarity measure of Spearman correlation is chosen in this paper to calculate the weights w_d in Eq.(12). The Spearman correlation coefficient is computed between the d-th dimension dimension input variable and the output variable as: $η^{d} (X_{l}^{d}, Y) = 1 - \frac{6 \sum_{i = 1}^{n_{l}} {(rg (x_{i}^{d}) - rg (y_{i}))}^{2}}{n_{l} (n_{l}^{2} - 1)}$ (12) where $rg (x_{i}^{d})$ and rg (y_i) are the rank of the d-th dimensional data $X_{l}^{(i)}$ and the output data Y, respectively. The Spearman correlation coefficient adds nonlinear factors into the traditional Euclidean distance and enhances the ability of this method to capture complex relationships in HVAC data. Additionally, this method addresses a limitation of cosine similarity, which is inability to handle vectors containing a large number of sparse (0 values). The feature scale of HVAC system is not uniform, and the data fluctuates in actual operation. Spearman coefficient-weighted Euclidean distance similarity method makes the results less susceptible to the influence of feature scales and the presence of outlier data. This characteristic contributes to the improved prediction accuracy and robustness of the model. After the features are weighted, the trained model is more able to focus on those features that have a real impact on the output variables, thus improving the prediction accuracy of the model.

4.2 The similarity factor accumulation method

When building the local model, insufficient sample length can result in missing information and reduced prediction accuracy, while excessive length can lead to increased computational demands and longer processing times. To address this challenge, this paper proposes a similarity factor accumulation method, and the expression is as follows: $k = \frac{\sum_{i = 1}^{n} D (x_{t}, x_{i})}{\sum_{i = 1}^{N_{l}} D (x_{t}, x_{i})}$ (13) where the numerator of the formula the sum of the similarity of the first n highly similar sample groups. The denominator represents the sum of the similarity of the sample x_t being tested with all samples in the historical database.

The inclusion of the similarity factor accumulation method in the DNN-JITL model helps to reduce the training size, and the method makes the selected samples more reasonable and improves the model adaptive capability and real-time performance.

Remark 2. The selection of the value of k involves solving a discrete one-dimensional hyperparameter optimization problem. The objective is to achieve faster model training while maintaining sufficient accuracy. We employ the grid search method, a fundamental technique in hyperparameter tuning to determine the k value.

1) The k value ranges from (0, 1). Set the appropriate range and step size of the grid search within (0, 1).

2) Use "Accuracy" as an evaluation indicator.

3) Iterate over the k-values and compute the cross-validation score.

4) Set the number of folds for cross-validation, such as cv = 5, indicating that the model will be trained and evaluated 5 times to reduce randomness.

4.3 Ensemble model of JITL

Due to the inherent flexibility and adaptive characteristics of the JITL method, the data may fluctuate after making a single forecast for the JITL model. Therefore, this paper proposes an integrated JITL using an arithmetic averaging method, whose expression is shown below:

$H (x) = \frac{1}{T} \sum_{i = 1}^{T} h_{i} (x)$ (14) where h_i (x) represents the output of an individual JITL model, H (x) is the output value of the ensemble learning. In deep neural networks, a single JITL model is susceptible to overfitting due to local features in the training data. In contrast, integrated learning models reduce the risk of overfitting by integrating multiple regressors, thus reducing the variance of the model. Therefore, this paper uses the integrated JITL model to improve the prediction accuracy.

4.4 The update rule for rolling prediction

In the JITL method, the local model is discarded immediately after outputting the predicted value, and a new local model is constructed by the similarity metric when a new test sample arrives. However, the operation state of the data usually does not fluctuate significantly under the continuous operation conditions of the HVAC system. It would lead to increased computational demand and low model utilization to reconstruct the prediction model for each sample. Therefore, this paper proposes a similarity update rule to dynamically update the local model. If the current production status is smooth and changes slowly, the previous prediction model is followed to predict the output.

The current sample input to be measured is x_t, and the previous model update moment sample input to be measured is x_t-r. The update hyperparameter is defined to be R, ranging from 0 to 1. The similarity between x_t and x_t-r is calculated according to Eq. (11). When D (x_t, x_t-r) > R, the local model is not updated and the output is projected by projection method along with the previous prediction model, and if D (x_t, x_t-r) < R, the local model is updated. This approach effectively reduces the reconstruction frequency of the online model, save the computational resources, and improve the real-time performance of the model.

The schematic diagram of using DNN-RJTIL method to solve the HVAC system problem is shown in Figure 4. The proposed method addresses the inherent problems of large data volume, uneven feature distribution and complex working conditions in HVAC system energy consumption prediction, which achieves high accuracy and provides adaptive online prediction.

Fig. 4

Problems solved by DNN-RJITL method.

5 Simulation expriment and discussion

In this section, the proposed DNN-RJITL model is applied to the prediction of a data center HVAC system. Simulation experiments are conducted in conjunction with the actual project, and the results of the simulations are presented and discussed.

5.1 Case study introduction

The energy consumption prediction equipment for the data center HVAC system is shown in Figure 5. Each part of the prediction device running independently and transmitting data through data interfaces. Data cleaning and feature selection are according to Section 2.2. All experiments are conducted on a Dell Inspiron 7500 laptop equipped with an Intel Core i7 processor, running the Windows 11 operating system. The programming language is Python (version 3.7).

Fig. 5

HVAC predictive process equipment.

Figure 6 shows the flow of the prediction model. When the model starts to work and the sample x_t arrives, the similarity between the current working point x_t and the last model update working point x_t-r is calculated. If the similarity condition is met, the previous model is followed for prediction; otherwise, the relevant samples are reselected and a new local model is built. Then the arithmetic mean of the three homogeneous local models is taken as the prediction result. The prediction process can be started or stoped at any time, which enables online rolling predictions.

Fig. 6

DNN-RJITL prediction model flow chart.

5.2 DNN-RJITL hyperparameters setting

This subsection uses the cooling tower as an example to demonstrate the hyperameters setting method. The nonlinear expression ability of DNNs increase with the increase of the number of hidden layers and the number of nodes. However, if the number of hidden layers and nodes becomes too large, it may lead to excessive computation and even overfitting. Researchers determine the number of hidden layers and the number of nodes by trail and error. Figure 7 shows that when the number of hidden layers is 3 and the number of nodes is 256, the error of the model no longer decreases significantly, so the network structure is set to 6-256-256-256-1.

Fig. 7

Relationship between the number of hidden layers and nodes and the prediction error.

The batch-size of the DNN refers to the number of data samples input to the neural network in one iteration. If the batch-size is too large, the computation required for each weight update is larger and consumes more memory; if the batch-size is too small, it may lead to unstable updating of the training parameters of the model. Therefore, the appropriate batch-size is determined by experimentations. As shown in Figure 8, a batch size of 256 is chosen for small batch gradient descent in order to balance the computational speed and the fitting effect. The remaining hyperparameters of the local model are determined through empirical methods and references, and their values are shown in Table 3.

Fig. 8

Relationship between the batch-size and the prediction error.

Table 3

Hyperparameters of the local model

Hyperparameters	Values
Layer structure	6-256-256-256-1
Batch-size	256
Iterations	50
Learning rate	0.001
β₁	0.9
β₂	0.999
ɛ (Minimum number)	10^-8

In the process of online rolling prediction, the number of training samples for local modeling needs to be adjusted adaptively. The value of k is related to the accuracy and rapidity of the online rolling prediction model. The selection of k is performed according to the method provided in Section 4.2.

1) The k value ranges from (0, 1). The online rolling prediction is more concerned with fast model training, therefore tend to choose smaller k values. Accordingly, k is set to range from(0, 0.2) with a step size of 0.02.

2) Set "Error" as an evaluation indicator. Set the number of folds for cross-validation to 5. Implement cross-validation.

3) Draw a cross-validation curve to observe the performance of the model under different ks. Figure 9 gives a plot of the relationship between k and the prediction error. It can be seen that as the value of k increases, indicating the sample size of the local modeling increases, the error of the model decreases. The error almost stops decreasing when k = 0.16.

Fig. 9

Relationship between k and the prediction error.

Through experiments, the hyperparameter k value for the model is determined to be 0.16.

In Figure 9, the black curve represents the prediction error of the single JITL model, and the red curve represents the arithmetic average result of the integration of the three JITL homogeneous models. The relative error of the integrated learning output results is 0.25% lower than that of the single JITL model. The results show that the ensemble learning is better than the single one, which can reduce the risk of overfitting and improve the accuracy of the online model.

In order to reduce the number of updates and avoid updating the local model every time a sample is input, the model update hyperparameter R is introduced. Figure 10 shows the relationship between the value of R and the error and the average time taken for modeling. It can be seen that when R = 0.96, the absolute error almost stops decreasing and the modeling time is 1.585 seconds, which is good for the rapidity of modeling.

Fig. 10

Relationship between the prediction error and R and the average time.

Based on experimental trials, the hyperparameters selected for the cooling tower, pump systems, and chiller units, respectively, are shown in Table 4.

Table 4

Hyperparameters of the DNN-RJITL models

Systems	k	R
Cooling tower	0.16	0.96
Pump system	0.20	0.98
Chiller unit	0.33	0.98

5.3 Results and discussion

In this section, the energy consumption prediction models for the cooling tower, the pump system and the chiller unit are established. 200 sets of validation data are used to verify the effectiveness of the model. As depicted in Figure 11, the prediction points consistently reside almost within the ±3% error threshold. This outcome highlights the models’ remarkable precision and robust generalization capability. As shown in Figure 12, when JITL training is performed without setting the R hyperparameter (or the value of R takes the extreme value 1), the model is reconstructed each time a sample arrives, and the prediction time is 6.378s, 5.141s and 5.755s in each systems. When the DNN-RJITL model is utilized, the average prediction time is reduced to 1.585s, 2.548s, and 3.075s, which represents a reduction of 41.72%. The simulation results show that the DNN-RJITL model has good modelling rapidity and can meet the needs of actual production.

Fig. 11

Predicted results

Fig. 12

Comparison of the time spent on DNN-JITL and DNN-RJITL

To enhance this statement, we compare the mainstream models introduced in the introduction with the DNN-RJITL model. A detailed comparison of performance indicators is shown in Table 5. The SVR model[10] demonstrates a rapid operational speed of 2.614s, 2.365s and 2.030s in the three systems. However, its accuracy falls short, with the MAPE of 7.25%, 10.59% and 2.83%. This discrepancy can be attributed to the HVAC system’s extensive data and high coupling. The limited expressive capacity of the shallow SVR structure hinders its ability to model complex functions effectively. The traditional DNN model[14] can improve the accuracy problem, with the MAPE of 4.35%, 2.48% and 1.01% in the three systems. However, a drawback is that it significantly consumes computing resources, with modeling times extending up to 61.99s, 60.141s and 58.55s. Accordingly, it is not suitable for HVAC running processes which with high real-time requirements. The SWL model [22] can predict with a fixed window size and step size, which reduces the training time to 6.242s, 5.145s, and 8.065s. However, it lacks adaptability when dealing with complicated data from HVAC systems since the fixed window size and step duration limit its flexibility, and its accuracy is insufficient. The DNN-RJITL model constructs local models through adaptive sample selection and updates them adaptively to improve the real-time performance and flexibility of the overall model. The prediction time is reduced to 1.585s, 2.548s and 3.075s, and the MAPE is reduced to 1.82%, 1.17% and 0.86%. The simulation results indicate that the DNN-RJITL method exhibits both high efficiency and precision. Compared with mainstream approaches, it has an average improvement of 5.17% in accuracy and 41.72% in speed.

Table 5

Comparison of prediction results of different models

Systems	Models	Time(s)	MAE	R² score	MAPE
Cooling tower	SVR [10]	2.614	2.218	0.945	7.25%
	DNN[14]	61.99	1.388	0.966	4.35%
	SWL[22]	6.242	1.249	0.970	4.33%
	DNN-RJITL	1.585	0.965	0.986	1.82%
Pump system	SVR	2.365	2.775	0.682	10.59%
	DNN	60.141	0.414	0.938	2.48%
	SWL	5.145	0.447	0.890	3.28%
	DNN-RJITL	2.548	0.321	0.959	1.17%
Chiller unit	SVR	2.030	20.78	0.965	2.83%
	DNN	58.55	7.42	0.989	1.01%
	SWL	8.065	8.065	0.992	1.09%
	DNN-RJITL	3.075	6.308	0.997	0.86%

Remark 3. In this paper, MAE, R2 and MAPE are employed to assess the real-time performance, accuracy, and stability of the DNN-RJITL algorithm. $MAE = \frac{1}{n} \sum_{i = 1}^{n} ∣ y_{i} - \hat{y_{i}} ∣$ (15) $R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}$ (16) $MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{y_{i} - \hat{y_{i}}}{y_{i}} | \times 100 %$ (17) where n represents the number of samples, y_i represents the true value of the i-th sample, and $\hat{y_{i}}$ represents the predicted value of the model for the i-th sample.

6 Conclusion

This paper explores energy consumption prediction methods for HVAC systems. An innovative DNN-RJITL energy consumption rolling prediction model is presented, which purposefully addresses the challenges of large data volume, complex data coupling and high time-variability of working conditions in HVAC systems. The RJITL framework employs Spearman coefficient-weighted Euclidean distance to select relevant samples, and sets the model update parameter R for adaptive local model updates. The DNN is utilized as the local model to enhance prediction accuracy. In a real scenario of the HVAC system, the proposed method is compared with several classical methods such as SVR, DNN and SWL. The average accuracy of the proposed method for the three systems can reach about 98.18%, and the average time can be reduced to 2.403s. The experimental results demonstrate that the DNN-RJITL model is a powerful and accurate tool for predicting energy consumption in cooling towers, pump systems, and chiller units. This study establishes a foundation for configuring HVAC system parameters, optimizing energy consumption, and promoting energy conservation and utilization. In the future, we will further explore methods for real-time parameter adjustment to achieve optimal values, thereby working towards the overarching goal of efficient energy management in HVAC systems.

Footnotes

Acknowledgments

This study is supported by the National Natural Science Foundation of China (62373012), and the Beijing Natural Science Foundation (4212040).

References

Kim

, Lee

, Do

et al., Energy Modeling and Model Predictive Control for HVAC in Buildings: A Review of Current Research Trends[J], Energies 15(19) (2022), 7231.

Kim

D.B.

, Kim

D.D.

and Kim

, Energy performance assessment of HVAC commissioning using long-term monitoring data: A case study of the newly built office building in South Korea[J], Energy and Buildings 204 (2019), 109465.

Jin

, Xiao

, Zhang

, et al., GEIN: An interretable benchmarking framework towards all building types based on machine learning[J], Energy and Buildings 260 (2022), 111909.

Yao

, Huang

and Yang

, Dynamic Modeling of HVAC System with State-Space Method[C], Proceedings of the 8th International Symposium on Heating, Ventilation and Air Conditioning, Springer, Berlin, Heidelberg (2014), 101–108.

Turner

W.J.N.

, Staino

and Basu

, Residential HVAC fault detection using a system identification approach[J], Energy and Buildings 151 (2017), 1–17.

, Qiao

, Li

et al., Recent advances in dynamic modeling of HVAC equipment. Part 1: Equipment modeling[J], HVAC&R Research 20(1) (2014), 136–149.

Wang

and Srinivasan

R.S.

, A review of artificial intelligence based building energy use prediction: Contrasting the capabilities of single and ensemble prediction models[J], Renewable and Sustainable Energy Reviews 75 (2017), 796–808.

Chen

, Guo

, Chen

et al., Physical energy and data-driven models in building energy prediction: A review[J], Energy Reports 8 (2022), 2656–2671.

Terzi

, Fagiano

, Farina

et al., Structured modelling from data and optimal control of the cooling system of a large business center[J], Journal of Building Engineering 28 (2020), 101043.

10.

Chen

and Tan

, Short-term prediction of electric demand in building sector via hybrid support vector regression[J], Applied Energy 204 (2017), 1363–1374.

11.

Sonta

, Dougherty

T.R.

and Jain

R.K.

, Data-driven optimization of building layouts for energy efficiency[J], Energy and Buildings 238 (2021), 110815.

12.

Ahmad

and Chen

, Nonlinear autoregressive and random forest approaches to forecasting electricity load for utility energy management systems[J], Sustainable Cities and Society 45 (2019), 460–473.

13.

Olu-Ajayi

, Alaka

, Sulaimon

et al., Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques[J], Journal of Building Engineering 45 (2022), 103406.

14.

Tian

, Zhang

, Shu

et al., A novel evaluation strategy to artificial neural network model based on bionics[J], Journal of Bionic Engineering (2022), 1–16.

15.

Guan

, Chen

, Wei

et al., Medical image augmentation for lesion detection using a texture-constrained multichannel progressive GAN[J], Computers in Biology and Medicine 145 (2022), 105444.

16.

Fan

, Xiao

and Zhao

, A short-term building cooling load prediction method using deep learning algorithms[J], Applied Energy 195 (2017), 222–233.

17.

Taheri

, Ahmadi

, Mohammadi-Ivatloo

et al., Fault detection diagnostic for HVAC systems via deep learning algorithms[J], Energy and Buildings 250 (2021), 111275.

18.

Mtibaa

, Nguyen

K.K.

, Azam

et al. LSTM-based indoor air temperature prediction framework for HVAC systems in smart buildings[J], Neural Computing and Applications 32 (2020), 17569–17585.

19.

Chen

, Du

, He

et al., A novel gait pattern recognition method based on LSTM-CNN for lower limb exoskeleton[J], Journal of Bionic Engineering 18 (2021), 1059–1072.

20.

, Magar

and Farimani

A.B.

, Forecasting COVID-19 new cases using deep learning methods[J], Computers in Biology and Medicine 144 (2022), 105342.

21.

Xie

and Yao

, Physics-constrained deep active learning for spatiotemporal modeling of cardiac electrodynamics[J], Computers in Biology and Medicine 146 (2022), 105586.

22.

Zhao

X.N.

et al., A Method for Predicting Carbon Emission of Railway Transportation System Based on an LSTM Network with Dynamic Input via Sliding Window[J], Journal of Transport Information and Safety 41 (2023), 169–178.

23.

Chen

, Gui

, Dai

et al., An ensemble just-in-time learning soft-sensor model for residual lithium concentration prediction of ternary cathode materials[J], Journal of Chemometrics 34(5) (2020), e3225.

24.

Asim

, Badiei

, Mohammad

et al., Sustainability of heating, ventilation and air-conditioning (HVAC) systems inbuildings—An overview[J], International Journal of Environmental Research and Public Health 19(2) (2022), 1016.

25.

Jia

, Wei

and Liu

, A review of optimization approaches for controlling water-cooled central cooling systems[J], Building and Environment 203 (2021), 108100.

26.

Stanford

H.W.

III. , HVAC Water Chillers and Cooling Towers: Fundamentals, Application, and Operation [M], CRC Press, 2011.

27.

Cutillas

C. Garcia

, Ramirez

J. Ruiz

and Miralles

M. Lucas

, Optimum design and operation of an HVAC cooling tower for energy and water conservation[J], Energies 10(3) (2017), 299.

28.

Yann

L.C.

, Bengio

and Hinton

, Deep learning[J], Nature 521 (2015), 436–444.

29.

Guo

, Xie

and Huang

, A deep learning just-in-time modeling approach for soft sensor based on variational autoencoder[J], Chemometrics and Intelligent Laboratory Systems 197 (2020), 103922.

30.

Xie

, Wu

and Sheng

, Uncertain regression model based on Huber loss function[J], Journal of Intelligent & Fuzzy Systems 45(1) (2023), 1169–1178.

31.

Ioffe

and Szegedy

, Batch normalization: Accelerating deep network training by reducing internal covariate shift[C], International Conference on Machine Learning, PMLR (2015), 448–456.

32.

Dogo

E.M.

, Afolabi

O.J.

, Nwulu

N.I.

et al., A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks[C], 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), IEEE (2018), 92–99.

33.

Mohanta

H.K.

and Pani

A.K.

, Adaptive non-linear soft sensor for quality monitoring in refineries using Just-in-Time Learning— Generalized regression neural network approach[J], Applied Soft Computing 119 (2022), 108546.

34.

Winter

J.C.F. De

, Gosling

S.D.

and Potter

, Comparing the Pearson and Spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data[J], Psychological Methods 21(3) (2016), 273.

35.

van den Heuvel

and Zhan

, Myths about linear and monotonic associations: pearson’s r, Spearman’s ρ, and Kendall’s τ[J], The American Statistician 76(1) (2022), 44–52.