Abstract
During the operation of HVAC (Heating, Ventilation, and Air-Conditioning) systems, precise energy consumption prediction plays an important role in achieving energy savings and optimizing system performance. However, the HVAC system is a complex and dynamic system characterized by a large number of variables that exhibit significant changes over time. Therefore, it is inadequate to rely on a fixed offline model to adapt to the dynamic changes in the system that consume tremendous computation time. To solve this problem, a deep neural network (DNN) model based on Just-in-Time learning with hyperparameter R (RJITL) is proposed in this paper to predict HVAC energy consumption. Firstly, relevant samples are selected using Euclidean distance weighted by Spearman coefficients. Subsequently, local models are constructed using deep neural networks supplemented with optimization techniques to enable real-time rolling energy consumption prediction. Then, the ensemble JITL model mitigates the influence of local features, and improves prediction accuracy. Finally, the local models can be adaptively updated to reduce the training time of the overall model by defining the update rule (hyperparameter R) for the JITL model. Experimental results on energy consumption prediction for the HVAC system show that the proposed DNN-RJITL method achieves an average improvement of 5.17% in accuracy and 41.72% in speed compared to traditional methods.
Introduction
In recent years, commercial and industrial buildings have been utilizing 40% of the global primary energy and producing 30% of global greenhouse gas emissions [1]. The operating energy consumption of HVAC systems accounts for the largest proportion of the building’s energy consumption [2]. As economic development continues, the proportion of energy consumed by air conditioning will further increase. In the face of severe climate change and environmental problems caused by rapid growth in energy consumption, energy saving and consumption reduction of HVAC systems have attracted great attention [3]. HVAC system energy consumption prediction lays the foundation for its energy saving and consumption reduction. It enables precise control of the HVAC system, adjustment of operating parameters and efficient operation and maintenance. This prediction facilitates strategic energy planning, and enables the implementation of measures to enhance energy efficiency, lower operating costs, and achieve energy-saving objectives.
There has been extensive research on HVAC energy-related modeling techniques. Previous researches mainly divided into two types: mechanism modeling based on operational principles [4–6], and black box modeling based on neural networks [7, 8]. Mechanism modeling is characterized by its clear internal mechanisms and good model interpretability. Yao et al. [4] proposed a state-space based dynamics model for HVAC systems and verified the transient response of the model for a given input. However, the intense interactions of multiple parts of the HVAC systems were not considered. Turner et al. [5] proposed a HVAC system fault detection method that uses a recursive least squares modeling approach to systematically identify synthetic time series data from a residential building simulation program. The method does not require detailed physical formula of the HVAC and has the significant advantage of simplifying calculations. However, this method has strict restrictions on the running state of the HVAC system, and its anti-interference ability is not strong enough, which limits the implementation in practical scenarios.
With the increase of large database of HVAC system operations, the black box modeling approach, which requires abundant data and relies on its simplicity and flexibility, has been further developed [9–12]. Terzi et al. [9] employed a linear regression(LR) approach to predict the energy consumption of each HVAC system subsystem independently. It was demonstrated that LR is highly applicable in cooling systems with different layouts and components. Chen et al. [10] used SVR to predict hourly electricity demand in hotels and shopping centers over a 24-hour period, which completed the prediction process within 20 seconds with an error of approximately 4.0% and 6.0% respectively. Sonta et al. [11] used multiple linear regression (MLR), support vector regression (SVR), random forests (RF), and artificial neural networks (ANN) to determine the most robust surrogate model for the building energy consumption. The above methods are all shallow neural network modeling methods, which have the advantage of fast operation speed and do not need to deeply study the model mechanism. The limitation is that their shallow structure constrains the ability to express very complex functions, and potentially leads to convergence challenges, such as being susceptible to local minima during processing.
Deep Neural Networks (DNNs) have the capability to achieve intricate nonlinear mappings in a flexible manner [13–15]. The HVAC system is characterized by complexity and various linear or nonlinear constraints between the control parameters. The emergence of various deep learning algorithms has made it possible to use deep learning for HVAC energy consumption prediction [16–18]. Chen et al. [19] proposed a gait pattern recognition method based on Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) for lower limb exoskeleton. They validated that the LSTM model is good at processing time-series data, and the CNN is expert in processing data with spatial structure characteristics. Xu et al. [20] used deep learning methods CNN-LSTM to forecast the number of COVID-19 cases and used MAE as an evaluation indicator. Xie et al. [21] presented a physics-constrained deep active learning (P-DAL) framework to model spatiotemporal cardiac electrodynamics, which combines both sensor measurements and physics prior knowledge into DNN and demonstrated good predictive capability in this field. Zhao et al. [22] used a sliding time window for dynamic input to the deep learning model, with which the prediction accuracy can be significantly improved.
From the above analysis, it can be concluded that DNNs are wise tools for decision making, management, and design optimization in HVAC systems. Due to the large datasets and time-varying characteristics of HVAC systems, existing research always add a recurrent connection layer in the DNN to capture temporal features for precise predictions [17, 18]. However, this approach uses the entire historical dataset to learn the weights and parameters of the model, resulting in long training time and lake of flexibility. Study [22] utilized a sliding time window to construct local models for online prediction, which reduces the training time. However, the fixed length of the time window lacks adaptability to the input data. To address such obstacles, we propose a novel method in this paper. By combining the Just-in-Time Learning (JITL) algorithm with the DNN, we construct an online rolling model for HVAC system energy consumption prediction. The DNN-RJITL method is introduced to addresses the challenges posed by large data volume, complex data coupling and high time-variability of working conditions in HVAC systems. The JITL continuously adjust the local model based on changing operating conditions, which enables a better adaptability of the local dynamics of the operation process [23]. It makes accurate predictions with less training time. The main contributions of this paper are as follows: The DNN is constructed as a local model within the JITL. It overcomes the problem of insufficient generalization ability of traditional models such as [9–11]. To speed up the network training and optimize the back propagation process, the DNN incorporates additional optimization strategies such as HuberLoss function, batch normalization strategy, and Adam optimization algorithm. A JITL algorithm is employed to develop an online rolling prediction model. JITL exhibits a faster model update rate compared to existing methods [19] and [20]. We innovatively propose an adaptive selection of relevant samples based on Spearman-weighted Euclidean distance similarity, compared to methods [22] and [23], which proves more effective in handling dynamic data of the HVAC. The prediction results of the local models are ensembled through an arithmetic averaging method mitigate the risk of overfitting and build a model with excellent generalization performance. To achieve adaptive update of local models, the JITL model defines the update hyperparameter R, which reduces the frequency of model updates and improves the efficiency of online modeling.
The paper is organized as follows. Section 2 describes the structure and data processing of the HVAC system. Section 3 introduces the local model DNN within JITL and presents its optimization method. Section 4 presents the energy consumption prediction model based on the DNN-RJITL. Section 5 shows the simulation experiments and corresponding discussion. Finally, Section 6 summarizes the whole paper.
HVAC system overview
HVAC system structure
The HVAC system plays a crucial role in maintaining a specific temperature and humidity range within a building. Its primary purpose is to maintain the physical health of individuals and to protect sensitive electronic equipment in the building from excessive heat, condensation and other environmental factors [24]. A typical HVAC system consists of multiple components, including cooling towers, water pump systems, and chiller units, all of which work together to regulate the temperature and humidity levels in a building [25]. Overall, a well-designed and well-maintained HVAC system is essential to the reliable operation of any modern construction.
As shown in Figure 1, a precision air conditioning system is made up of a cooling tower, primary refrigeration pumps, chiller units, and cooling pumps connected in series. For efficient cooling, the principal cooling units (1#, 2#, 3#, and 4#) are all operated simultaneously. The devices parameters of HVAC are presented in Table 1. The energy consumed by the HVAC system includes cooling towers, pump systems and chiller units. Therefore, this paper investigates the establishment of energy consumption prediction models for cooling towers, pump systems and chiller units, respectively.

Schematic diagram of 1# unit in the HVAC.
Devices parameters of the HVAC
HVAC system is a complex nonlinear system consisting of multiple interconnected devices. The characteristics required for modeling are obtained and shown in Table 2 through theoretical research and expert experience [26, 27].
Input features of energy consumption prediction models
Input features of energy consumption prediction models
Researchers have collected the HVAC operational data from the system for nearly a year, specifically from December 2021 to November 2022, which basically covered all working conditions. The representative 40,000 data are selected as the historical database, containing data under different environmental variables. 200 data are used as the validation set. The missing values are handled through linear interpolation, outliers are eliminated using the Daira (3σ) criterion, and a moving average filter is applied to reduce noise and enhance the overall quality of the data.
In this Section, we will introduce the local model DNN and present its optimization methods.
Principles of deep neural network
The Deep Neural Network (DNN) [28] is a model of artificial neural network based on multiple layers of neurons, as shown in Figure 2, which consists of an input layer, hidden layers and an output layer. The input layer receives the input data, the hidden layer processes and transforms the data, and the output layer generates the final output result. The DNN’s ability to express complex functions increases with the number of layers in the network, and its compact nonlinear mapping relationship can handle a large set of functions, which is suitable for energy consumption prediction of HVAC systems [29]. The mathematical expression of DNN is as follows:
The structure of DNN
HuberLoss function is adopted as the loss function for network training. By combining the absolute value term with the square term, a smooth transition in loss calculation is achieved in HuberLoss, which enables the model to remain smooth when approaching the target value, avoids too sharp changes in the loss function, and helps to improve the training stability of the model [30]. HuberLoss function is proposed with the following expression:
There are many outliers and noise in the data set of HVAC systems, which are caused by the complexity of the system and the external environment changes. In this scenario, HuberLoss is able to improve the robustness of the model while maintaining high accuracy.
1) Determine the range of the target δ based on the HVAC data set. For example, if there are large outliers in the data select a larger δ.
2) Employ cross-validation to select the optimal value for δ. During the training process, experiment with values chosen according to 1), such as 1, 1.5, 2, and evaluate the performance on the validation set to determine the best δ.
In this research, δ takes the value of 1.
As the number of hidden layers in a neural network increases, the data distribution of the input may change due to the matrix operations and nonlinear processing. To solve this problem, batch normalization is used to adjust the variances and mean positions of the input values of each layer by scaling and shifting them to be closer to the standard normal distribution. The nonlinear expression capabilities of the network can be improved by adding batch normalizing layers to the network architecture, which speeds up and improves the stability of model training [31]. The batch normalization formula is shown follows:
Adaptive Moment Estimation (Adam) is an adaptive learning rate optimization algorithm, which efficiently update the weights of the neural network during the training process. Existing studies have shown that Adam optimization is computationally efficient, easy to implement, suitable for large datasets and models with high-dimensional parameter spaces, and has been shown to converge faster than other optimization algorithms like Stochastic Gradient Descent (SGD) [32]. The main steps of Adam are as follows:
Step 1: Initialize weight vector θ, 1 st moment vector m t , 2 nd moment vector u t , and set time stamp t to 0.
Step 2: t = t + 1, then calculate the derivative of the loss function L with respect to θ.
Step 3: Estimation of 1
st
moment vector m
t
and 2
nd
moment vector u
t
using exponential translation.
Step 4: Introduce bias correction to avoid vectors that are biased towards 0.
Step 5: Update the parameter vector θ. If the condition is satisfied, output θ
t
, otherwise return to Step 2.
In the process of building a deep neural network, the HuberLoss function is used to improve the speed and robustness of gradient descent, the batch normalization strategy is used to stabilize the input data distribution, and the Adam optimization algorithm provides an adaptive learning rate for each parameter by using the estimated mean and variance of the gradients. By adding optimization strategies to the deep neural network, the convergence speed and accuracy requirements of the RJITL model for nonlinear model prediction can be met.
In the energy consumption prediction process of HVAC systems, the global deep neural network model ignores the dynamics of time-varying operating conditions and has high computational complexity. In contrast, the JITL method possesses adaptability to varying operating conditions and offers real-time capability, making it well-suited for the predictive needs of HVAC systems. Figure 3 shows the differences between the global modeling method and the JITL method. In the global model, the training set consists of the complete historical database, while in the JITL model, the training set applys a local dataset selected using a similarity metric. The online learning nature of the JITL method enables for real-time updates of the training set by replacing old samples with new ones. The main steps of the JITL method are as follows:

The global modeling method versus the JITL modeling method
1) Upon the arrival of a test sample, relevant samples are queried from the historical database using the similarity metric.
2) A local model is constructed based on the relevant samples.
3) The output predicted value is estimated using the constructed local model. Once the estimation is complete, the local model is discarded.
4) When a new test sample arrives, the process is repeated, and a new local model is constructed based on the aforementioned steps.
The essence of the JITL lies in assigning pseudo-labels to the historical sample database through the similarity measure, which evaluates the resemblance between the current workpoint and the historical workpoints. Traditional similarity measurement methods in JITL method typically rely on Euclidean distance or the vector angle [33]. Euclidean distance calculates the straight line distance between data points, which is suitable for continuous data, and can capture the linear relationship between features. Euclidean distances are still valid in high dimensional spaces, especially if the relationship between features is linear. However, it is limited in capturing non-linear relationships, and the computed results are susceptible to the scale of features and the presence of outlier data. Cosine similarity overcomes the influence of feature scale because it measures the angle between feature vectors rather than the distance. This makes it more robust for features with different scales. Cosine similarity captures directional information about features. This is useful for certain problems such as text similarity and recommender systems. However, it only focuses on the direction of the vectors and ignores the magnitude of the vector specific values.
This paper presents a novel approach that utilizes a Spearman coefficient-weighted Euclidean distance similarity measure. The similarity between the test sample point x
t
and any sample x
i
in the historical database can be expressed as:
When building the local model, insufficient sample length can result in missing information and reduced prediction accuracy, while excessive length can lead to increased computational demands and longer processing times. To address this challenge, this paper proposes a similarity factor accumulation method, and the expression is as follows:
The inclusion of the similarity factor accumulation method in the DNN-JITL model helps to reduce the training size, and the method makes the selected samples more reasonable and improves the model adaptive capability and real-time performance.
1) The k value ranges from (0, 1). Set the appropriate range and step size of the grid search within (0, 1).
2) Use "Accuracy" as an evaluation indicator.
3) Iterate over the k-values and compute the cross-validation score.
4) Set the number of folds for cross-validation, such as cv = 5, indicating that the model will be trained and evaluated 5 times to reduce randomness.
Due to the inherent flexibility and adaptive characteristics of the JITL method, the data may fluctuate after making a single forecast for the JITL model. Therefore, this paper proposes an integrated JITL using an arithmetic averaging method, whose expression is shown below:
In the JITL method, the local model is discarded immediately after outputting the predicted value, and a new local model is constructed by the similarity metric when a new test sample arrives. However, the operation state of the data usually does not fluctuate significantly under the continuous operation conditions of the HVAC system. It would lead to increased computational demand and low model utilization to reconstruct the prediction model for each sample. Therefore, this paper proposes a similarity update rule to dynamically update the local model. If the current production status is smooth and changes slowly, the previous prediction model is followed to predict the output.
The current sample input to be measured is x t , and the previous model update moment sample input to be measured is xt-r. The update hyperparameter is defined to be R, ranging from 0 to 1. The similarity between x t and xt-r is calculated according to Eq. (11). When D (x t , xt-r) > R, the local model is not updated and the output is projected by projection method along with the previous prediction model, and if D (x t , xt-r) < R, the local model is updated. This approach effectively reduces the reconstruction frequency of the online model, save the computational resources, and improve the real-time performance of the model.
The schematic diagram of using DNN-RJTIL method to solve the HVAC system problem is shown in Figure 4. The proposed method addresses the inherent problems of large data volume, uneven feature distribution and complex working conditions in HVAC system energy consumption prediction, which achieves high accuracy and provides adaptive online prediction.

Problems solved by DNN-RJITL method.
In this section, the proposed DNN-RJITL model is applied to the prediction of a data center HVAC system. Simulation experiments are conducted in conjunction with the actual project, and the results of the simulations are presented and discussed.
Case study introduction
The energy consumption prediction equipment for the data center HVAC system is shown in Figure 5. Each part of the prediction device running independently and transmitting data through data interfaces. Data cleaning and feature selection are according to Section 2.2. All experiments are conducted on a Dell Inspiron 7500 laptop equipped with an Intel Core i7 processor, running the Windows 11 operating system. The programming language is Python (version 3.7).

HVAC predictive process equipment.
Figure 6 shows the flow of the prediction model. When the model starts to work and the sample x t arrives, the similarity between the current working point x t and the last model update working point xt-r is calculated. If the similarity condition is met, the previous model is followed for prediction; otherwise, the relevant samples are reselected and a new local model is built. Then the arithmetic mean of the three homogeneous local models is taken as the prediction result. The prediction process can be started or stoped at any time, which enables online rolling predictions.

DNN-RJITL prediction model flow chart.
This subsection uses the cooling tower as an example to demonstrate the hyperameters setting method. The nonlinear expression ability of DNNs increase with the increase of the number of hidden layers and the number of nodes. However, if the number of hidden layers and nodes becomes too large, it may lead to excessive computation and even overfitting. Researchers determine the number of hidden layers and the number of nodes by trail and error. Figure 7 shows that when the number of hidden layers is 3 and the number of nodes is 256, the error of the model no longer decreases significantly, so the network structure is set to 6-256-256-256-1.

Relationship between the number of hidden layers and nodes and the prediction error.
The batch-size of the DNN refers to the number of data samples input to the neural network in one iteration. If the batch-size is too large, the computation required for each weight update is larger and consumes more memory; if the batch-size is too small, it may lead to unstable updating of the training parameters of the model. Therefore, the appropriate batch-size is determined by experimentations. As shown in Figure 8, a batch size of 256 is chosen for small batch gradient descent in order to balance the computational speed and the fitting effect. The remaining hyperparameters of the local model are determined through empirical methods and references, and their values are shown in Table 3.

Relationship between the batch-size and the prediction error.
Hyperparameters of the local model
In the process of online rolling prediction, the number of training samples for local modeling needs to be adjusted adaptively. The value of k is related to the accuracy and rapidity of the online rolling prediction model. The selection of k is performed according to the method provided in Section 4.2.
1) The k value ranges from (0, 1). The online rolling prediction is more concerned with fast model training, therefore tend to choose smaller k values. Accordingly, k is set to range from(0, 0.2) with a step size of 0.02.
2) Set "Error" as an evaluation indicator. Set the number of folds for cross-validation to 5. Implement cross-validation.
3) Draw a cross-validation curve to observe the performance of the model under different ks. Figure 9 gives a plot of the relationship between k and the prediction error. It can be seen that as the value of k increases, indicating the sample size of the local modeling increases, the error of the model decreases. The error almost stops decreasing when k = 0.16.

Relationship between k and the prediction error.
Through experiments, the hyperparameter k value for the model is determined to be 0.16.
In Figure 9, the black curve represents the prediction error of the single JITL model, and the red curve represents the arithmetic average result of the integration of the three JITL homogeneous models. The relative error of the integrated learning output results is 0.25% lower than that of the single JITL model. The results show that the ensemble learning is better than the single one, which can reduce the risk of overfitting and improve the accuracy of the online model.
In order to reduce the number of updates and avoid updating the local model every time a sample is input, the model update hyperparameter R is introduced. Figure 10 shows the relationship between the value of R and the error and the average time taken for modeling. It can be seen that when R = 0.96, the absolute error almost stops decreasing and the modeling time is 1.585 seconds, which is good for the rapidity of modeling.

Relationship between the prediction error and R and the average time.
Based on experimental trials, the hyperparameters selected for the cooling tower, pump systems, and chiller units, respectively, are shown in Table 4.
Hyperparameters of the DNN-RJITL models
In this section, the energy consumption prediction models for the cooling tower, the pump system and the chiller unit are established. 200 sets of validation data are used to verify the effectiveness of the model. As depicted in Figure 11, the prediction points consistently reside almost within the ±3% error threshold. This outcome highlights the models’ remarkable precision and robust generalization capability. As shown in Figure 12, when JITL training is performed without setting the R hyperparameter (or the value of R takes the extreme value 1), the model is reconstructed each time a sample arrives, and the prediction time is 6.378s, 5.141s and 5.755s in each systems. When the DNN-RJITL model is utilized, the average prediction time is reduced to 1.585s, 2.548s, and 3.075s, which represents a reduction of 41.72%. The simulation results show that the DNN-RJITL model has good modelling rapidity and can meet the needs of actual production.

Predicted results

Comparison of the time spent on DNN-JITL and DNN-RJITL
To enhance this statement, we compare the mainstream models introduced in the introduction with the DNN-RJITL model. A detailed comparison of performance indicators is shown in Table 5. The SVR model[10] demonstrates a rapid operational speed of 2.614s, 2.365s and 2.030s in the three systems. However, its accuracy falls short, with the MAPE of 7.25%, 10.59% and 2.83%. This discrepancy can be attributed to the HVAC system’s extensive data and high coupling. The limited expressive capacity of the shallow SVR structure hinders its ability to model complex functions effectively. The traditional DNN model[14] can improve the accuracy problem, with the MAPE of 4.35%, 2.48% and 1.01% in the three systems. However, a drawback is that it significantly consumes computing resources, with modeling times extending up to 61.99s, 60.141s and 58.55s. Accordingly, it is not suitable for HVAC running processes which with high real-time requirements. The SWL model [22] can predict with a fixed window size and step size, which reduces the training time to 6.242s, 5.145s, and 8.065s. However, it lacks adaptability when dealing with complicated data from HVAC systems since the fixed window size and step duration limit its flexibility, and its accuracy is insufficient. The DNN-RJITL model constructs local models through adaptive sample selection and updates them adaptively to improve the real-time performance and flexibility of the overall model. The prediction time is reduced to 1.585s, 2.548s and 3.075s, and the MAPE is reduced to 1.82%, 1.17% and 0.86%. The simulation results indicate that the DNN-RJITL method exhibits both high efficiency and precision. Compared with mainstream approaches, it has an average improvement of 5.17% in accuracy and 41.72% in speed.
Comparison of prediction results of different models
This paper explores energy consumption prediction methods for HVAC systems. An innovative DNN-RJITL energy consumption rolling prediction model is presented, which purposefully addresses the challenges of large data volume, complex data coupling and high time-variability of working conditions in HVAC systems. The RJITL framework employs Spearman coefficient-weighted Euclidean distance to select relevant samples, and sets the model update parameter R for adaptive local model updates. The DNN is utilized as the local model to enhance prediction accuracy. In a real scenario of the HVAC system, the proposed method is compared with several classical methods such as SVR, DNN and SWL. The average accuracy of the proposed method for the three systems can reach about 98.18%, and the average time can be reduced to 2.403s. The experimental results demonstrate that the DNN-RJITL model is a powerful and accurate tool for predicting energy consumption in cooling towers, pump systems, and chiller units. This study establishes a foundation for configuring HVAC system parameters, optimizing energy consumption, and promoting energy conservation and utilization. In the future, we will further explore methods for real-time parameter adjustment to achieve optimal values, thereby working towards the overarching goal of efficient energy management in HVAC systems.
Footnotes
Acknowledgments
This study is supported by the National Natural Science Foundation of China (62373012), and the Beijing Natural Science Foundation (4212040).
