Abstract
Precise real-time prediction of energy consumption in buildings is very essential to smart energy management, but is extremely challenging because of the dynamic and unpredictable behavior of occupants. These variations are not typically represented in traditional statistical and machine learning models, leading to poorer forecasting performance and reduced reliability in practical applications. We aim to fill this gap by introducing a new framework called AU-HNN (Adaptive Uncertainty-Aware Hybrid Neural Network), which thoroughly combines multi-scale hybrid deep learning (CNN, LSTM, Transformer) with Bayesian uncertainty quantification and dynamic occupant behaviour modeling. The model features online incremental learning to conform to changing behavioral patterns and to give probabilistic estimates with calibrated confidence intervals. Extensive experiments were performed on real-world data sets of the eight major cities in China to compare AI-HNN to ten state-of-the-art models, such as ARIMA, Prophet, XGBoost, Random Forest, CNN-LSTM, BiLSTM, Transformer, GRU, Bayesian-LSTM, and MC-Dropout CNN. Findings reveal that AU-HNN can improve 12 performance measures by 1720 percent with significant accuracy (RMSE = 7.42, MAE = 5.58, R2 = 0.923, NRMSE = 0.137) and uncertainty quantification (PICP = 0.948, PINAW = 0.227, CWC = 14.2). Moreover, AU-HNN demonstrates competitive real-time (latency = 189 ms, memory usage = 142 MB, energy efficiency = 9.8106 FLOPS/Watt) and can be deployed in the smart edges. The proposed framework offers extremely precise, adaptive, and uncertainty-sensitive energy predictions to support risk-based decision-making by building operators and energy managers. Its ability to capture the human-environment dynamic encapsulated in occupant dynamics and energy usage creates a very new path to smarter, more resilient, and sustainable building energy management systems.
Keywords
Introduction
Development of energy consumption is one of the greatest sustainability issues in the international arena, which contributes about 40 percent of the total global energy consumption and 38 percent of the total global carbon dioxide emissions. 1 With the increased pace of urbanization in the world (especially in developing countries such as China), the energy requirements of residential, commercial, and institutional buildings are rapidly increasing. In China, the growth rate in building energy consumption has been incredible, with a growth of 5.6 per annum, and is expected to rise to 4098 Mtce by 2030, which is almost three times the world average. 2 Rapid urbanization is the main cause of this exponential growth, with the urbanization rate of China increasing since the year 2000, up to 68.52% in 2020, with a huge amount of commercial and residential infrastructures being erected. The multifaceted nature of energy prediction is due to several interrelated aspects, such as weather variations, building properties, occupancy, and work schedules, among which occupant behavior is the most crucial and challenging element, and can account for 20% to 80% of the variation in energy consumption prediction.2,3 Conventional building energy management systems make use of a set of non-dynamic, non-stochastic models that cannot reflect the dynamism and stochasticity of human behavior, leading to high errors in predictions and poor energy efficiency performance.
Recent developments in artificial intelligence and machine learning have provided very new opportunities in solving these problems. The convolutional neural network (CNN) and long short-term memory (LSTM) deep learning models have proven to be better in the detection of complex temporal structures in building energy consumption.1,4 Nevertheless, current methods have severe limitations: they are not adaptive to varying occupancy models in real-time, there is no uncertainty quantification for risk-sensitive decision making, and they do not effectively combine multi-scale time dependencies. Smart buildings and Internet of Things (IoT) sensor developments lead to some unique possibilities for real-time monitoring of occupancy as well as adaptive energy use. Wi-Fi channel state information (CSI), CO2 sensors, motion detectors, and access card systems allow for full monitoring of occupant activities. 5 However, to effectively predict energy using these multi-modal data streams, it is essential that advanced modeling frameworks, able to cope with uncertainty, time dynamics, and real-time adaptation needs, are used.
In spite of all the research studies conducted, there are still some basic gaps in developing energy prediction methodologies. The majority of current methods consider occupant behavior as being fixed or semi-fixed values, and thus they do not reflect the dynamic and changing nature of human behavior, which can differ greatly because of working-from-home policies, season, and other special events.3,6 These assumptions result in a prediction error of 15–30 percent in times of change in behavioral patterns. 7 The existing prediction models only give the point estimations without the confidence interval and other uncertainty measures, which limits their application in risk-sensitive decision making, and the uncertainty-comprehensive models are still not explored much.8–10 Moreover, most models necessitate full retraining as occupancy changes and are therefore not suitable for dynamism as they cannot be trained online in an incremental way. 11 The other gap is in the fact that there is little externalisation of multi-scale temporal dependencies, and existing methods concentrate on a single temporal scale and do not consider interactions between short-term operational patterns and long-term seasonal changes.12,13 Lastly, there is very limited research to support the practicality of their models in real-time, such as the computational efficiency, memory constraints, as well as latency requirements to enable edge computing deployment. 14 To solve these problems, the current research proposes a new framework called AU-HNN (Adaptive Uncertainty-Aware Hybrid Neural Network), which incorporates four elements of innovation.
First, it presents a dynamic occupancy behavior learning module that integrates multi-modal sensor fusion (Wi-Fi CSI, CO2, motion sensors, access card data), 5 adaptive weight assignment, and an activity sequence modeling temporal attention mechanism, being the first paper to jointly consider implementation of real-time adaptation with probabilistic uncertainty quantification. Second, the online incremental learning system is created through elastic weight consolidation (EWC), memory-efficient adaptation, and adaptable expansion of the architecture to facilitate continuous model updates without catastrophic forgetting. Third, we present an uncertain prediction system based on uncertainty-sensitive prediction networks, which combine Bayesian neural networks to measure epistemic and aleatoric uncertainties, predict with calibration like prediction intervals, and facilitate risk-minded decision making. Fourth, we implement a multi-scale time fusion network using CNNs in the short term, LSTMs in the long term, and Transformers in adaptive selection of features, and complemented by a dynamic weighting schedule that optimizes contributions between prediction horizons.
The proposed solution covers five basic research questions: how can multi-modal sensor data be simulated to provide dynamic occupant modeling in real time; what can incremental learning mechanisms be to ensure continuous adaptability without complete retraining; how can Bayesian uncertainty quantification be integrated with real-time systems; what temporal fusion architectures can capture both short and long-term dependencies; what can be optimized in edge computing environments with high accuracy and uncertainty calibration. Methodologically, the first unified system that integrates dynamic occupant behaviour modeling, online incremental learning, uncertainty quantification, and multi-scale temporal fusion is an AU-HNN. Theoretically, it adds a mathematical formalization in uncertainty propagation, convergence behavior in incremental learning, and a prolonged occupant behavior modeling theory. In practice, it proposes computationally efficient algorithms to integrate the building management system in real-time and facilitate risk-informed decision making within latency constraints of less than 200 ms. Empirically, the use of AU-HNN is confirmed in eight large cities in China with different climates and building types, measured by 12 overall indicators, and compared to 10 existing baselines, always with better results.
Although the framework further evolves the existing baselines, it is limited in some aspects. This is because its multi-component architecture consumes large amounts of computing resources, and its operation is reliant on high-quality sensor data, which is not always available. Occupant monitoring matters also require close consideration of privacy issues. Geographically, the areas of validation are specific to the Chinese datasets, and the focus is on commercial, educational, and office buildings, whereas residential use and long-term predictions of seasons are considered as future research. Modern edge computing with GPUs is essential to support real-time feasibility, and integration with current management platforms could come with extra complexity.
However, the research contribution is very high. On a scientific level, the work of the AU-HNN sets new standards of accuracy, reliability, as well as practical deployability in intelligent building energy prediction. Practically, it promotes very smart HVAC control, lighting optimization, demand-side response, and energy scheduling based on cost-efficiency, as well as predictive maintenance. It also shows very high potential, 15–25% energy savings annually, which translates to an annual cost reduction of around 2.3 million dollars and a 1847 tons in CO2 emissions in the case of 45 buildings that were included in this study. The rest of the paper is structured in the following manner: Section 2, related literature, Section 3, the methodology, Section 4, experimental setup, Section 5, results and analysis, Section 6, discussions and limitations, and finally, Section 7, conclusion.
Related works
Historically, building energy prediction has developed throughout the last 20 years, shifting away from statistical and regression-based predictors to top machine learning (ML) and deep learning (DL) systems. Earlier statistical models like ARIMA and linear regression were simple to interpret but could not exploit nonlinearities in large-scale energy data (Liu et al., 2021). The Random Forests, XGBoost, and LightGBM are very significant ensemble models that thoroughly offered very substantial improvements in the consideration of intricate interactions between the building parameters and weather conditions, as well as the operational variables, which enabled much greater forecasting accuracy too (Dai & Huang, 2025; Wang et al., 2023).
As deep learning became a reality, some new architectures like LSTM, CNN-LSTM hybrids, and attention-based models were developed in this paper, which further improved prediction by learning both temporal as well as high-dimensional feature representations (Chang et al., 2025; Jogunola et al., 2022). There have been more recent developments that have combined hybrid models along with optimization algorithms and domain knowledge, solving the overfitting problem and complexity, as well as striving to be more reliable (Zhou et al., 2022; Zeng et al., 2025). In spite of these improvements, a majority of models are limited by the fact that they are based on a fixed set of data and cannot respond to the dynamic environment or shifts in operations, which highlights the necessity of using dynamic and real-time prediction systems (Reveshti et al., 2025).
Another stream of research identifies the importance of occupant behavior in influencing building energy consumption as well. Human activity in traditional models was usually simplified with rigid schedules or probabilistic laws, which resulted in the discrepancy between simulated and real energy consumption (Uddin et al., 2021). These emerging opportunities of IoT and sensor data have facilitated more dynamic modeling of human-building interactions, which generally includes real-time coverage of occupancy, movement, and activity (Gu & Shao, 2023; Guyixin et al., 2025). New methods have been developed, which include deriving occupancy data with social media data (Lu et al., 2021) or using Wi-Fi CSI and Transformer models to identify occupancy more precisely (Zhang et al., 2025; Sun et al., 2023). Such techniques thoroughly prove that deep learning can be very helpful in learning the dynamics of behavior, but it is not yet sufficiently combined with energy prediction methods.
Current systems tend to separate occupant models and energy models, and overlook the stochastic and uncertain characteristics of human behavior. Even though some have tried to develop uncertainty-aware or adaptive occupant modeling (Su et al., 2023; Yahaya et al., 2025), the problem of privacy, scalability, and generalization remains an obstacle to large-scale deployment in heterogeneous socio-cultural settings. Besides occupant integration, uncertainty quantification and incremental learning are also developing important research topics in improving predictive robustness in building energy systems. The quantification of prediction uncertainty has been performed based on Bayesian networks, Monte Carlo simulations, and regularized Bayesian neural networks, which improve the trust and decision support in real-world applications (Nezhadetthad et al., 2025; Yahaya et al., 2025). On the same note, adaptive learning systems like online LSTMs and hybrid models also seek to reduce catastrophic forgetting and preserve predictive accuracy in dynamical settings (Zhu & Zhang, 2025).
Nevertheless, domain-specific incremental learning that is specific to building systems has not been studied in much depth, and most solutions are generic as opposed to specific to the heterogeneity of building types and patterns of use. The literature therefore brings out three gaps that are interrelated, namely low dynamic adaptability, real-time deficiency of integrating occupant behavior, and uncertainty-aware incremental learning at the energy system level. In order to overcome these obstacles, our proposed AU-HNN architecture places itself as a very novel solution that combines real-time adaptations of occupancy, multi-scale temporal fusions, and uncertainty-aware predictions and goes beyond the current state of the art in building energy prediction, as examined in Table 1.
Overview of the existing studies.
Overview of the existing studies.
Overview of proposed framework
The suggested model involves the combination of multi-source data preprocessing, a hybrid neural network (CNN + LSTM) architecture, dynamic occupant behavior simulation, and uncertainty quantification to allow smart buildings to make precise energy consumption and occupancy forecasts. The framework works on a real-time basis with edge computing as well as real-time data processing, as shown in Figure 1.

AU-HNN – overall framework.
The suggested framework incorporates various major elements to improve predictive accuracy as well as real-time flexibility. It is initiated by sophisticated data preprocessing and feature engineering, which integrates weather, occupancy, as well as building features with dynamic temporal features. Then hybrid neural network architecture is used, consisting of CNNs used to extract spatial features, BiLSTMs used to model temporal dependencies and attention, and multi-task learning used to achieve robust modeling. Dynamic occupant behavior is modeled to capture user variability by use of clustering, Bayesian filtering, online learning, and temporal evolution analysis. Additionally, Bayesian neural networks, adaptive estimation, and risk assessment metrics are involved in the process in order to quantify uncertainty and also to ensure reliability. Lastly, a real-time deployment is accomplished through edge computing optimization, model compression, and practical management of streaming data.
Each component's innovation, performance gain, and validation results have been illustrated in Table 2. Improvements in accuracy, adaptation speed, as well as prediction reliability have been highlighted as well.
Components and validation.
Feedback Loop is given in Figure 2, which illustrates the way that the system continuously monitors outputs and uses them to adjust inputs or processes, which also ensures adaptive improvement and stability over time as well.

Feedback loop.
The framework integrates multi-source data fusion, hybrid neural networks, occupant behavior modeling, uncertainty quantification, and real-time implementation. The overall objective function combines energy prediction, occupancy classification, and uncertainty penalties as given in equation (1):
In the preprocessing stage, missing values handling is a very critical step to ensure data reliability. Notably, building energy datasets often contain gaps due to the following reasons: sensor malfunctions, communication errors, or irregular occupant activity as well. In order to address this, missing values were treated using a hybrid imputation strategy, and short gaps were also thoroughly filled with linear interpolation to preserve temporal continuity, while longer gaps were handled using statistical methods such as mean substitution or k-nearest neighbor (KNN)-based imputation as well. This ensured that the model received consistent inputs without introducing significant bias as well. Moreover, outlier values are thoroughly detected through z-score thresholding, which were replaced with smoothed values to prevent distortion in the training process. By systematically addressing missing as well as noisy data, the robustness of the proposed AU-HNN framework was improved, as represented in Table 3.
Dataset description.
Dataset description.
Temporal gaps in IoT sensor data are handled using sliding-window imputation (Equation (2)) with forward filling for short gaps (<3 timesteps) and bi-directional LSTM imputation for longer gaps.
Weather data, including temperature, humidity, solar radiation, and wind speed, is normalized to a common scale for consistent model input. Occupancy data is processed through multi-modal fusion of WiFi analytics, access logs, CO2 sensors, and motion detectors to ensure accurate real-time representation. Additionally, building characteristics such as BIM features, HVAC specifications, and spatial layouts are extracted to effectively model building-specific energy dynamics.
Normalization
For weather and occupancy features
Weighted Data Fusion for multiple occupancy sensors
Real-time occupancy inference is achieved through multi-sensor fusion algorithms that capture and interpret instantaneous occupancy across different building zones. Behavioral pattern mining further enhances this process by extracting temporal activity sequences to uncover recurring occupancy trends as well. To ensure predictive robustness, dynamic feature selection is applied, where features are adaptively weighted according to their contribution to accuracy and the variability in occupant behavior as given in equations (5)–(7):
Lag Features
Calendar Encoding
Dynamic Feature Selection (Adaptive Weighting):
Multi-scale time windows are applied by computing features over intervals such as 15 min, 1 h, 6 h, 24 h, and weekly periods to effectively capture both short- and long-term temporal dependencies. Lag features are incorporated using historical energy consumption data across 1–24 h intervals to recognize usage trends. Calendar features, including holidays, weekends, and seasonal variations, are encoded to provide essential temporal context. Additionally, some of the very important weather derivatives such as heat index, wind chill, and apparent temperature are derived to account for environmental influences on energy dynamics.
Multi-scale aggregation
Weather Derivatives
CNN component
The CNN component employs a 1D convolutional neural network to extract spatial features, which captures building zone patterns and equipment interactions in a very effective manner. Multi-scale temporal convolutions with kernel sizes of4,6,16 are integrated to handle diverse time scales and ensure robust temporal representation. The architecture consists of three convolutional layers with 64, 128, and 256 filters, and each is followed by max pooling, ReLU activations, batch normalization, as well as a dropout layer with a rate of 0.2 to enhance generalization and prevent overfitting as well.
1D Convolution
Where k is kernel size,
The BiLSTM component processes sequences in both forward as well as backward directions, which thoroughly allows it to capture comprehensive temporal dependencies in occupant as well as energy patterns. Its architecture is designed with two BiLSTM layers containing 128 and 64 units, which are enhanced by residual connections to improve information flow. A self-attention mechanism is thoroughly integrated to identify and emphasize long-term patterns within the data. To maintain stable training, gradient clipping at ±1.0 and LSTM cell regularization techniques are applied, which also ensures robust performance and prevents very critical issues like gradient explosion, as given in equations (10)–(16):
Feature fusion is achieved by concatenating the CNN and LSTM components’ outputs, which are then passed through dense layers to enable very effective multi-modal integration. The model adopts a multi-task learning approach and also predicts energy consumption, occupancy patterns, and associated uncertainty. In order to optimize training, a weighted loss function is employed, which combines mean squared error for energy prediction, occupancy classification loss, as well as an uncertainty penalty to balance accuracy with reliability.
Feature Fusion
Loss function:
Occupancy pattern recognition
Occupancy pattern recognition is carried out through clustering-based methods, where K-means++ is applied to extract daily as well as weekly occupancy profiles that thoroughly reflect recurring behavioral trends as well. Real-time occupancy inference is further enhanced using Bayesian filtering, which integrates multi-modal sensor data for accurate and adaptive estimation. To capture evolving dynamics, behavioral change detection is performed through the CUSUM algorithm, which enables the identification of sudden or gradual shifts in occupancy patterns over tim.
Clustering-Based Pattern Identification
Where
CUSUM for drift detection
Dynamic occupant behavior modeling leverages dynamic weighting, where feature importance is adjusted based on confidence levels to improve prediction reliability as well. Online learning thoroughly ensures that occupancy patterns are incrementally updated, which also enables the model to adapt to new behaviors without the need for full retraining. To capture long-term changes, temporal behavior evolution is modeled using Hidden Markov Models, which track transitions between different occupancy states and also provide very important insights into evolving occupant dynamics.
Hidden Markov Model for State Transitions
Bayesian Neural Network
Monte Carlo Dropout for Epistemic Uncertainty
Variational inference is employed by Bayesian Neural Network Integration with a mean-field approximation to estimate posterior distributions, which thoroughly enables probabilistic modeling of uncertainty. Gaussian priors are defined, and posterior parameters are learned during training to capture model variability as well. To maintain computational efficiency, the reparameterization trick and local reparameterization are applied, which ensures very fast and more stable inference while preserving accuracy.
Adaptive uncertainty estimation
Adaptive Uncertainty Estimation introduces very important dynamic confidence intervals that evolve over time, which are adapted based on historical prediction accuracy. The framework thoroughly distinguishes between epistemic uncertainty, which arises from model limitations, and aleatoric uncertainty, which is mainly caused by inherent data variability. To ensure reliability, risk assessment is carried out using calibration plots, reliability diagrams, and correlation analysis, which also quantify the way that uncertainty estimates align with real-world outcomes as well.
Real-time implementation
Online Feature Update
Edge Computing Optimization thoroughly focuses on improving real-time inference efficiency by using model quantization, where INT8 conversion accelerates computations without major accuracy loss as well. Memory is managed using circular buffers that enable seamless processing of continuous streaming data. Additionally, parallel processing through multi-threaded feature extraction as well as prediction thoroughly helps to ensure a very high throughput and responsiveness in edge deployment.
Model compression techniques
Model Compression Techniques are applied to reduce computational load as well as enable lightweight deployment. Knowledge distillation is used within a teacher-student framework to transfer knowledge while reducing model size. Pruning strategies are also very important, which include structured pruning for CNN layers as well as magnitude-based pruning for dense layers, in order to further optimize model parameters. Hardware acceleration via CUDA and TensorRT integration enhances execution speed, which makes the system very suitable for real-time applications.
Streaming data processing
Streaming data processing thoroughly ensures that continuous input streams are handled in a very efficient manner by computing online features using incremental statistics for rolling time windows. Data quality is thoroughly preserved through real-time outlier detection and imputation of missing values. To minimize delays, latency optimization techniques such as pipeline parallelization and batch processing are employed, which enable very high-performance handling of streaming energy and occupancy data.
Pseudocode
Hybrid CNN-LSTM Occupancy-Energy Prediction with Behavior Modeling and Uncertainty Quantification
Comprehensive dataset description
Target cities and climate zones
The proposed AU-HNN framework's experimental validation thoroughly relies on a very large-scale dataset collected from eight representative Chinese cities, which are carefully chosen to capture the full diversity of climatic conditions as well as urban energy demand profiles. Beijing is located in the continental zone and is also characterized by harsh winters and large seasonal fluctuations, which make it a very excellent test case for heating-dominant energy consumption. Shanghai belongs to the subtropical zone, which is distinguished by high humidity and a more balanced demand between cooling and heating, thereby reflecting the challenges of mixed-load forecasting as well. Guangzhou and Shenzhen represent tropical climates where cooling dominates throughout the year, but with distinct occupancy variations: Guangzhou exhibits very high residential variability, while Shenzhen, as a hub of technology-driven enterprises, shows very high dynamic work-related occupancy patterns. Hangzhou contributes a mix of government and commercial buildings, Nanjing emphasizes educational and healthcare facilities with relatively regular but load-intensive consumption, and Wuhan provides energy usage signatures driven by industrial as well as research facilities. Finally, Chengdu brings a very important architectural and operational heterogeneity by combining traditional low-rise structures with modern smart buildings. By covering continental, tropical, and subtropical climate zones, this dataset thoroughly enables the AU-HNN evaluation in diverse real-world contexts and also ensures the model's generalizability across different building types, occupancy behaviors, as well as weather-driven loads.
Detailed data collection specifications
The sampling rate yields approximately 140,160 data points per building annually, which is sufficient to capture both long-term seasonal patterns as well as short-term occupant-driven variations. A total of 45 buildings were monitored across the eight cities, with five to six buildings selected per location to provide coverage across offices, hospitals, educational institutes, industrial facilities, and government buildings. As a whole, the dataset contains nearly 6.3 million data points, which makes it one of the very largest high-resolution building energy datasets available in the Chinese context. Rigorous preprocessing was also performed to ensure reliability as well as completeness. Missing values were thoroughly imputed using a combination of k-nearest neighbors as well as temporal interpolation methods, resulting in more than 99.2% data completeness after cleaning. Some of the very important outliers, such as abnormal spikes or drops in energy consumption due to sensor errors, were detected using z-score thresholds at three standard deviations and were corrected using moving window smoothing techniques. This very careful preparation ensures that the dataset is of publication-grade quality as well as can serve as a benchmark for subsequent studies.
Multi-source data integration
Energy consumption in buildings is thoroughly influenced not only by physical as well as climatic conditions but also mainly by human occupancy as well as behavioral patterns. To account for these multidimensional drivers, the dataset integrates weather, occupancy, energy, and building-level data streams into a unified resource. Meteorological information such as temperature, relative humidity, solar radiation, as well as wind speed was collected from municipal weather stations and temporally aligned with the energy readings. Occupancy data was thoroughly derived from multiple sources, which generally include WiFi connection logs that help to find the number of active users in a building, CO2 concentration levels that reflect indoor activity intensity, motion sensors that are thoroughly embedded in HVAC as well as lighting systems, and access card swipes that record entry and exit behaviors as well. Energy data in this study generally included very detailed measurements of HVAC consumption, lighting load, and equipment usage, while building data captured floor plans, HVAC system specifications, and official occupancy schedules as well. All sources were thoroughly synchronized at the 15-min interval, and some additional very important features such as rolling averages, lagged sequences, and occupancy–energy interaction terms were engineered to enhance the predictive power of the dataset. This comprehensive integration of physical, environmental, and human-driven factors provides a very important foundation for the adaptive and uncertainty-aware design of AU-HNN as revealed in Table 4:
Comprehensive dataset overview for experimental setup.
Comprehensive dataset overview for experimental setup.
Performance-metrics
The evaluation strategy was designed to assess not only the predictive accuracy of AU-HNN but also its robustness under uncertainty as well as efficiency in real-time deployment. To this end, a twelve-metric framework was implemented. Five commonly adopted indicators used to quantify the accuracy as follows: root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), the coefficient of determination (R2), and normalized root mean square error (NRMSE). Beyond accuracy, it was also very important to examine the reliability of the probabilistic forecasts. Four uncertainty metrics were therefore included: prediction interval coverage probability (PICP), mean prediction interval width (MPIW), prediction interval normalized average width (PINAW), and the coverage width-based criterion (CWC), which jointly capture both the calibration and sharpness of predictive intervals. Finally, recognizing the constraints of real-world building management systems, three efficiency-oriented metrics were also measured in this study. These include prediction latency, which is expressed as the average time per forecast in milliseconds; memory usage was measured in megabytes during runtime; and computational energy efficiency was calculated as floating-point operations per watt. The use of this multi-objective evaluation protocol ensures that AU-HNN is rigorously tested not only for accuracy but also for uncertainty awareness and scalability in practical deployments.
Prediction accuracy is quantified through five widely adopted indicators. The Root Mean Squared Error (RMSE) is defined as:
Where
It is robust against outliers. Mean Absolute Percentage Error (MAPE) normalizes error magnitude relative to actual demand and is formulated as
It provides interpretability in percentage terms. The Coefficient of Determination (R2) evaluates variance explained by the model:
Where
It is critical when comparing across heterogeneous building scales.
Four uncertainty-aware indicators are adopted to validate predictive intervals. Prediction Interval Coverage Probability (PICP) is given as:
Where
While Prediction Interval Normalized Average Width (PINAW) scales MPIW relative to data range:
It ensures comparability across cities. Finally, Coverage Width-based Criterion (CWC) integrates both reliability and sharpness using a penalty function:
Three measures of computational efficiency are taken to assess the feasibility of practical deployment. Inference time per sample is measured by prediction latency (in milliseconds). Storage overhead of parameters and intermediate states is captured by memory usage (measured in MB). Energy efficiency measures the computational power requirements in inference, and it offers a sustainable aspect that is critical in real-time implementation in intelligent buildings.
To situate AU-HNN in the larger research community, the performance of the model was compared to 10 well-known baseline models that model statistical, machine learning, deep learning, and uncertainty-aware models. ARIMA, an autoregressive approach to linear series, known for its very long-standing success, and Prophet, a very powerful seasonal-trend decomposition model created by Facebook, are included in the traditional category. XGBoost is used in this study, which is a gradient-boosted decision tree ensemble, and Random Forest, which is an ensemble of random decision trees in the machine learning category. The deep learning baselines consisted of CNN-LSTM, which thoroughly extrapolates both spatial features and temporal modeling, BiLSTM, which processes information in both directions, Transformer, which utilizes self-attention to model sequences, and GRU, a simplified recurrent architecture. Bayesian-LSTM and Monte Carlo Dropout CNN were identified as uncertainty-aware baselines to consider probabilistic forecasting. This collection of models reflects the state of the art within methodological families and can be fairly and comprehensively compared. The design of the AU-HNN extends these baselines by integrating hybrid feature extraction, online incremental learning, and Bayesian uncertainty estimation; therefore, it makes it a very formidable solution to the multidimensional nature of the real-time building energy forecasting problem.
To conduct the benchmark analysis, ten models are considered, chosen to reflect a balanced representation of traditional, machine learning, deep learning, and uncertainty-aware techniques as provided in Table 5:
Performance analysis of ten models for building energy prediction.
Performance analysis of ten models for building energy prediction.
The suggested framework of the AU-HNN was implemented on a high-performance computing system that included an NVIDIA RTX 4090 graphics card, an Intel i9-13900K processor, and 64GB of random access memory, which proved to be very effective with the large-scale multi-source dataset. The software stack included PyTorch 2.0, which is performed as the main deep learning library, TensorFlow Probability to model and quantifies uncertainty, and CUDA 11.8 to execute the software on a non-CPU platform as well. The training setup utilized a batch size of 64, a learning rate of 1e-4, and 200 epochs to strike a balance between convergence stability as well as computational efficiency. To improve the model generalization and avoid overfitting, we optimized hyperparameters with Bayesian optimization with 100 trials, allowing the systematic exploration of learning rates, dropout ratios, and hidden-layer sizes, and selecting the most appropriate configuration to predict energy robustly in response to dynamically changing conditions of occupancy and the environment.
Results and comprehensive analysis
Overall performance comparison
The overall performance comparison highlights the superiority of the proposed AU-HNN framework against ten baseline models across twelve diverse metrics. As shown in Table 1, AU-HNN consistently achieves the very low error values and the highest accuracy scores, which demonstrates clear improvements in both predictive accuracy as well as uncertainty calibration. Traditional statistical models such as ARIMA and Prophet provide only moderate accuracy and are unable to capture nonlinear dynamics or quantify predictive uncertainty, at last, which results in considerably very high RMSE and MAE values. Machine learning methods like XGBoost and Random Forest improve error reduction by capturing complex nonlinearities, but remain limited in sequential learning and uncertainty handling. Deep learning approaches, which include CNN-LSTM, BiLSTM, Transformer, and GRU, achieve further gains by exploiting temporal dependencies and hierarchical feature extraction, though they exhibit higher latency as well as resource consumption, as given in Table 6:
Comprehensive performance comparison exiting and proposed models.
Comprehensive performance comparison exiting and proposed models.
Note: Uncertainty metrics (PICP, MPIW, PINAW, CWC) are not reported for conventional models (ARIMA, Prophet, XGBoost, Random Forest) as they do not provide probabilistic predictions.
The key performance gains achieved by the AU-HNN model across multiple evaluation metrics are given as summarized results in Table 7 as follows.
Performance improvements of AU-HNN.
The magnitude of prediction errors is indicated by RMSE, where a lower value suggests higher accuracy. The AU-HNN model records the lowest RMSE of 7.42, which demonstrates its superior predictive performance compared to all other models. Traditional statistical methods like ARIMA and Prophet show higher errors above 14, which highlights their limitations. Overall, deep learning–based models clearly outperform classical approaches in minimizing error variance as given in Figure 3.

RMSE comparison across models.
The average magnitude of errors is measured by MAE without considering their direction as well. Among the models, AU-HNN achieves a very low MAE of 5.58%, which reflects consistent as well as reliable performance. Transformer and Bayesian-LSTM also deliver competitive accuracy with values around 6.7–6.8%. In contrast, ARIMA and Prophet remain less effective, both of which exceed 11% in MAE as given in Figure 4.

MAE comparison across models.
MAPE expresses prediction error as a percentage, which provides an intuitive measure of model reliability. The AU-HNN model again performs best with just 8.4%, which indicates a very strong generalization across varying data. Bayesian-LSTM and Transformer also maintain relatively low percentage errors close to 10%. Conversely, ARIMA reaches 18.7%, which proves much weaker in handling fluctuations as given in Figure 5.

MAPE comparison across models.
R2 explains how well the model fits the data, with higher values showing better predictive strength. AU-HNN achieves the highest score of 0.923, indicating excellent variance explanation. Bayesian-LSTM and Transformer also score high, both above 0.89, which reflects robust modeling. Traditional approaches like ARIMA and Prophet remain below 0.74, which indicates a weaker fit, as given in Figure 6.

MAE comparison across models.
NRMSE normalizes the error relative to the data range, which makes results more comparable across models. The AU-HNN model obtains the lowest NRMSE of 0.137, which shows very efficient error minimization. Bayesian-LSTM and Transformer follow closely, staying around 0.165–0.168. On the other hand, ARIMA shows the worst performance at 0.284, which confirms its inefficiency as given in Figure 7.

NRMSE comparison across models.
PICP evaluates the proportion of true values captured within prediction intervals. AU-HNN excels with the highest PICP of 0.948, which shows very strong reliability in uncertainty estimation. Bayesian-LSTM and MC-Dropout CNN also perform well with values above 0.86. GRU and BiLSTM lag slightly, with coverage probabilities below 0.80 as given in Figure 8.

PICP comparison across models.
MPIW measures the width of the prediction intervals, with narrower values indicating tighter confidence bounds. AU-HNN records the narrowest MPIW at 12.3, which shows very high precision in uncertainty quantification. Bayesian-LSTM and MC-Dropout CNN also maintain relatively small widths between 15 and 17. Traditional deep models like GRU and BiLSTM exhibit wider intervals above 19, which reflect less efficiency as given in Figure 9.

MPIW comparison across models.
PINAW normalizes interval width relative to the dataset, which makes comparisons more meaningful. AU-HNN achieves the lowest PINAW of 0.227, which proves its strength in delivering compact and informative intervals. Bayesian-LSTM also performs competitively with 0.291, maintaining balanced accuracy. By contrast, GRU yields the widest normalized intervals at 0.376, which shows reduced efficiency as given in Figure 10.

PINAW comparison across models.
CWC combines both interval width and coverage into a single measure, balancing accuracy and reliability. AU-HNN attains the very low CWC of 14.2%, underscoring its optimal uncertainty calibration. Bayesian-LSTM and Transformer follow with values around 19–23, still performing efficiently. GRU has the highest CWC of 28.5%, which shows weaker calibration quality as given in Figure 11.

CPW comparison across models.
Latency measures the time taken for prediction, which reflects real-time feasibility. XGBoost performs fastest with just 25 ms, which makes it very efficient for deployment. AU-HNN maintains moderate latency at 189 ms, balancing accuracy and speed. In contrast, Bayesian-LSTM suffers the highest delay at 298 ms, which may hinder real-time use as given in Figure 12.

Latency comparison across models.
Memory usage reflects computational resource demand during inference as well. Classical models like ARIMA and Prophet consume minimal memory (12–15 MB), but at the cost of accuracy. Deep learning models thoroughly require significantly more memory, with Transformer at 186 MB. AU-HNN strikes a balance at 142 MB, which manages higher accuracy with moderate resource needs, as given in Figure 13.

Mb comparison across models.
FLOPS per Watt indicate energy efficiency of the models. AU-HNN delivers the highest efficiency at 9.8 × 106, which showcases very superior computational sustainability. GRU and CNN-LSTM also perform strongly with values above 8.4 × 106. Traditional models such as ARIMA and Prophet remain far less efficient, with values below 3 × 106 as given in Figure 14.

FLOPS per watt comparison across models.
Climate zone impact
The city-specific analysis also indicates the way in which the AU-HNN can conform to different climatic and operational scenarios in eight mega Chinese cities. Findings suggest that the model is very robust to different climate conditions, and its performance varies mostly due to the climate zones and seasonal variability as well. To illustrate, Hangzhou shows the highest predictive power with R2 = 0.931 and the smallest MAE = 5.34, indicating how well the model can reflect the balanced subtropical conditions. Conversely, Beijing poses the most problems because it has the highest seasonal variation with a continental climate, which gives it a slightly higher error value, although the overall performance is very high (R2 = 0.915). The cities of Shanghai, Shenzhen, and Chengdu reveal results that are above the average, which proves that the concept of the AU-HNN is highly generalizable to a wide range of building portfolios and climate data as well. Notably, the PICP values of all the cities stand at levels exceeding 0.94, hence highlighting the consistency of the uncertainty intervals regardless of the geographic or seasonal effect as provided in Table 8.
China - city-specific performance (AU-HNN).
China - city-specific performance (AU-HNN).
Performance across building types with varying rates of change indicates that AU-HNN is statistically better, with different levels of variation by sector. The greatest gain in performance is obtained in commercial buildings, where the average R2 is 0.932, and the MAPE is only 7.8%, due to periodic occupancy and predictable patterns of energy use. There are also significant improvements in office buildings, where R2 is 0.927 and RMSE is the lowest 7.28%, and the model can produce dynamic occupancy in a work setting. The challenges in educational buildings and healthcare facilities are relatively higher because of the irregular schedules and critical load changes, but the AU-HNN continues to provide good outcomes, with average gains of 18.9% and 16.4%, respectively, in comparison with the baseline procedures. These findings not only support the idea that AU-HNN succeeds in accuracy in both climatic and geographic conditions but also generalize effectively across a wide range of building functions, which is why it is also applicable to real-world applications of smart energy management, as indicated in Table 9.
Building type performance (AU-HNN).
Building type performance (AU-HNN).
Ablation study results
The innovation of the AU-HNN model was extensively tested via the ablation studies, analysis of behavioral impact, and quantification of uncertainty. This ablation study showed the importance of individual architectural components to the overall performance. The complete AU-HNN produced an RMSE of 7.42%, whereas removing single modules drastically reduced the performance. In particular, the occupant module caused the elimination to increase the RMSE by 23.7% to 9.18%, and the uncertainty estimation component caused the elimination to reduce the RMSE by 16.6% to 8.65%. Likewise, the highest error of 9.42%, 27.0% worse than the full model, happened without incremental learning, as did 20.5% worsening, and the addition of multi-scale fusion. These findings thoroughly emphasize the synergistic nature of each module in providing strong predictions.
Dynamic occupant behavior impact
The effects of dynamic occupant behaviour were also discussed on the basis of a case study, which relied on the alteration of COVID-19 working patterns, which dramatically changed the schedules of building use. The ability of the AU-HNN to adapt to new behavioral patterns within two or three days was really impressive compared to the two to three weeks of the conventional baselines. Furthermore, the model met a behavioral recognition score of 94.3%, which underlines its capacity to encapsulate subtle human activity patterns directly affecting energy use.
Uncertainty quantification validation
It was also confirmed that uncertainty quantification is feasible to provide credible risk-aware predictions. The model was found to have a prediction interval coverage probability (PICP) of 94.8%, just under the target of 95%. It further yielded forecasting intervals 22.1 percent smaller than the Bayesian-LSTM baseline, which thoroughly validates the assertion that AU-HNN provides more accurate but trustworthy confidence intervals. To evaluate risk, the model achieved 89.4 percent success in forecasting high uncertainty periods so that the decision-making process could appropriately consider the risk in the real world.
Real-time performance analysis
The real-time performance analysis of AU-HNN indicated that the model is computationally efficient and is scalable to real-world applications in smart buildings. Latency analysis showed that the successful target was below 200 milliseconds, but the average prediction time was 189 milliseconds. This thoroughly guarantees that the model is capable of providing real-time feedback along with real-time decision-making as well. Linear performance was further proven through scalability tests up to 100 buildings at a time, making it robust at city-scale use. The result was a high level of resource utilization as the model used only 142 MB of memory, including the model and runtime overheads, and can be deployed even on resource-constrained systems. Computationally, AU-HNN was 9.8106 FLOPS per watt, more energy-efficient than existing methods. Furthermore, online learning ability was checked by the speed of updating the model, in which the model took only 15.3 milliseconds on average to make one update step. This would allow very quick incorporation of new information, which would thoroughly allow the model to be responsive to changing trends without huge computational expenses.
Discussion and impact analysis
Technical contributions and significance
Methodological advances
Some methodological innovations were also provided in this research that will set a new level for building energy prediction studies. First, it presents the first extensive uncertainty quantification framework developed specifically to perform real-time prediction in the building energy domain, tackling the vital issue of reliability in operational decision-making. It is about the use of dynamically changing occupant behavior in a neural predictive architecture, which identifies an important but frequently neglected cause of energy demand variation. Moreover, introducing an online incremental learning mechanism is a disruptive innovation, since it thoroughly allows the model to keep up with changing consumption trends without retraining afresh. As a whole, these developments represent a very potent step towards non-adaptive, behavior-oblivious, and uncertainty-calibrated energy prediction systems as well.
Theoretical implications
A very significant contribution was also made in this study to the theoretical understanding of energy informatics. It constructs a rational mathematical model of uncertainty propagation within building energy systems and provides a basis to measure and control predictive risks in operational settings. A rigorous convergence proof of the online incremental learning algorithm is provided, which may further confirm the online incremental learning algorithm's reliability, and the model's performance remains stable over long-term executions as well. Moreover, through explicit occupant variability modeling, the research expands current theories of behavior-based energy modeling, filling a gap between the human-focused science of behavior and the computer-based prediction. These theoretical contributions expand the predictive model applications in scholarly research and practical implementation.
Practical impact and applications
Smart building integration
The results of this study have immense implications for the future of smart building management systems. Using the predictive property of the AU-HNN, instantaneous HVAC optimization can provide up to 15 to 25 percent of energy savings, which can directly be translated into substantial cost savings and, at the same time, comfortable occupants. The model also produces uncertainty estimates that can be used to schedule predictive maintenance to ensure operators can focus on system inspections and minimize unexpected downtime. Furthermore, the grid-responsive opportunities presented by the framework through demand forecast alignment with utility needs have beneficial prospects to support demand response programs and promote resiliency of urban energy systems.
Economic and environmental benefits
Economically and environmentally, the positive gains of the suggested framework are also quite persuasive. According to the empirical assessments on the 45-building sample, the system shows a projected annual cost-saving of about $2.3 million, with a very impressive decrease of 1847 tons of CO2 emission. These two consequences indicate not just short-term financial benefits but also very real investments in achieving carbon neutrality and sustainability. Notably, the return on investment (ROI) analysis indicates a payback period of only 18 months, which thoroughly supports the mass implementation's feasibility in the public as well as corporate sectors to build some important portfolios as well.
Limitations and future directions
Current limitations
The framework has limitations, even though it is very effective. A key drawback is that it is computationally expensive to scale to very large building portfolios, where scalability demands can be greater than hardware resources can support. A second concern is privacy, especially with a model that uses fine-grained occupant behavior data, which would create an ethical and regulatory challenge to implement in practice. Also, the structure relies intrinsically on high-quality multi-modal sensor input and can perform poorly in conditions with sparse, noisy, or missing data. These restrictions thoroughly highlight the need to make sure deployments planning, as well as complementary data governance approaches, are carefully considered.
Future research directions
It's thoroughly possible to conduct research in the future with the help of the important contributions of the paper. Federated learning is one of the opportunities that offer a valuable output to privacy-aware model training, which also allows for the knowledge sharing across institutions without the centralization of sensitive occupant information. Moreover, adding renewable energy sources and distributed storage to the AU-HNN system will improve its contribution to the sustainable building ecosystem, especially to facilitate carbon-conscious decision-making. Lastly, it is very promising to expand the strategy to district-level or city-scale energy management systems to investigate scalability, inter-building coordination, and synergies with smart grid infrastructures. These directions in the future open possibilities of building robust, ethical, and environmentally friendly energy intelligence solutions.
Conclusion
A new model is proposed in this study, AU-HNN, a first hybrid model that thoroughly helps to improve the construction energy prediction process. Four key technical contributions of the study were as follows: the first unified quantification of uncertainty frameworks in real-time energy prediction, incorporating dynamic occupant behavior into neural arch design, the use of multi-scale fusion strategies, and the application of online incremental learning as applied to building energy. These innovations collectively solve major drawbacks of current methods and give a comprehensive solution to precise, flexible, and dependable energy forecasts.
The suggested model has been shown to improve its performance significantly compared to a wide range of baselines, including conventional statistical, machine learning, deep learning, and uncertainty-aware approaches. In twelve performance metrics, the improvement of the AU-HNN ranged between 17% and 20% compared to the optimal baseline models. Such gains were validated over intensive statistical validation, such as t-tests and non-parametric Wilcoxon signed-rank tests, which makes the findings both strong and applicable to varied climatic conditions, building types, and occupancy conditions.
In addition to technical performance, the framework is very useful in practical applications. Prediction latencies of less than 200 ms, efficient memory usage, and proven to scale to large building portfolios, AU-HNN fits the operational needs of current smart building systems. Case studies of Tsinghua University also show that it can provide real advantages, such as energy savings of over 20% and an emissions reduction of up to 100 percent of sustainability goals. Due to the flexibility of the model to dynamic occupant behaviors, the model is resilient to rapidly evolving situations like hybrid work environments.
In scientific terms, the work builds upon the uncertainty-aware prediction theory by formalizing uncertainty propagation techniques as well as providing convergence guarantees to incremental learning algorithms. The research is a blend of both rigorous theory and practical deployment; thus, the research will not only further lead to the development of computational models but also will be a very important contribution to the greater debate of the integration of human-centered behavior and energy informatics. Such dual influence thoroughly highlights its importance both academically as well as practically.
Going forward, the AU-HNN framework has the very important potential to support the next-generation smart building ecosystems. The future research will probably be extended to federate learning to use those in privacy-sensitive applications, with renewable energy and storage, carbon-aware optimization, and to a district or city-scale energy management system as well. This would not just make individual buildings more efficient, but would also revolutionize the energy infrastructures of cities, which thoroughly makes AU-HNN a very important key pillar to convert cities into sustainable and intelligent places as well.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
