Abstract
Anomaly detection in modern power systems requires highly advanced techniques, as the heterogeneous data produced by power systems today can be high-volume and include sensor signals, textual logs, time series, and more. Nevertheless, conventional methods do not adapt well to dynamic data changes, lack structure, and require real-time processing, making them inefficient in more complex grids. This paper proposes a framework running on Qwen2 that leverages multi-source data fusion and cross-modal attention to address these issues. The framework can integrate sensor awareness and text logs via cross-attention, thereby aligning context and effectively finding anomalies. Additionally, it includes dynamic resolution processing to handle high-frequency sensor data and lightweight inference to support edge deployment. Experimental results on a real-world dataset show that the proposed approach delivers a 12% increase in F1-score relative to state-of-the-art models, such as Transformer-AD, and reduces false favorable rates by 50% compared with traditional methods. The framework also provides real-time footprint data with a 85 ms latency per batch, resulting in a scalable smart grid monitoring solution. The paper further develops the following practical applications of large language models in critical infrastructure by addressing deficiencies in heterogeneous data fusion, interpretability, and efficiency, given limited resources. The following plans entail minimizing energy use and scaling the system to identify multiple defect levels, such as cyberattacks and equipment wear.
Keywords
Introduction
With the rapid advancement of technologies such as the Internet of Things (IoT), power grids, and digital finance, modern systems generate vast volumes of multi-source data from diverse formats—including sensor readings, textual logs, and high-frequency transaction records 1 These heterogeneous datasets are often characterized by complex structures, conflicting data types, and real-time processing demands, which pose significant challenges for anomaly detection tasks. For instance, smart grid operations require simultaneous analysis of electrical load patterns, equipment sensor signals, and user behavior logs to identify potential failures or cyberattacks. At the same time, financial institutions must monitor transaction sequences alongside customer metadata to detect fraud. 2
Traditional anomaly detection methods, such as rule-based systems or shallow ML models, fail to handle complex multimodal data due to rigid feature representations and poor generalization. They struggle with dynamic data shifts and unstructured formats (e.g., logs, event sequences), while real-time demands amplify risks. Current deep learning approaches lack scalable, interpretable frameworks for cross-domain heterogeneity. Novel methodologies integrating large-scale models and robust fusion strategies are urgently needed. 3
Current power grid anomaly detection methods remain fragmented, lacking unified frameworks for effectively integrating large-scale models with multi-source data. Most approaches focus on isolated data types (text logs, sensor readings, or time series) without addressing real-world integration challenges—particularly under dynamic conditions such as seasonal shifts, system degradation, or adversarial attacks. While deep learning shows promise, existing solutions often lack scalability for high-velocity data streams and interpretability, which are critical for industrial monitoring and cybersecurity. 4
Additionally, computational demands hinder deployment in resource-constrained edge environments (e.g., smart grids and IoT), where efficiency is vital (Zhao et al., 2025; Sonthalia et al., 2025). A holistic approach is needed that balances accuracy, real-time processing, interpretability, and feasibility. Future solutions must optimize latency-sensitive scenarios to enable reliable, low-delay anomaly detection. Bridging the gaps between heterogeneous data fusion, scalable architectures, and edge-compatible optimization will advance the field toward resilient, practical systems capable of meeting modern anomaly detection demands. 5
This study proposes a Qwen2-powered anomaly-detection framework for power grids that integrates SCADA, equipment logs, and PMU data via its 128K-token context window. Validated on real-world datasets, it outperforms traditional methods like Isolation Forest, especially in noisy environments with limited labeled data, demonstrating superior robustness for grid applications. These contributions are outlined below:
Development of a novel Qwen2-based framework that enables effective fusion of multi-modal power system data for enhanced anomaly detection accuracy Demonstrated superior performance in both detection precision and real-time processing efficiency compared to existing approaches, validated on actual grid operation datasets.
The framework showcases Qwen2's exceptional adaptability and generalization capabilities across complex power system scenarios, offering a scalable solution for real-time grid monitoring and fault detection. These advancements provide valuable insights for implementing large language models in critical power infrastructure protection systems.6,7
Literature review
Zhang et al. surveyed deep learning approaches for multi-source heterogeneous data fusion, highlighting feature-level fusion methods like bimodal autoencoders and Deep Boltzmann Machines. 8 Sun et al. proposed a Mixed Information Gain strategy for dynamic multi-source heterogeneous data fusion in IoT intelligent systems, leveraging edge-cloud collaboration. 9 Feng et al. proposed an ARIMA-BP neural network fusion method for assessing the credibility of sensor data based on spatiotemporal correlations. 10 Sun et al. proposed an MPSO-Gaussian fusion method for utility tunnel fire detection using image segmentation and multi-source heterogeneous data. 11 Ma et al. comprehensively reviewed multi-source navigation fusion methods, including optimal estimation, filtering algorithms, factor graph optimization, and AI approaches, highlighting their adaptive capabilities in complex environments. 12 Saadi et al. proposed a hierarchical model (HM) combining Hidden Markov Models (HMM) and Iterative Proportional Fitting (IPF) to fuse unlimited multi-source datasets, achieving quasi-perfect marginal distributions and accurate multivariate joint distributions while handling missing data. 13 Zhou et al. proposed a systematic categorization framework for multi-source material data fusion, emphasizing conflict-resolution methods such as CRH and integrating AI algorithms and big data technologies to enhance knowledge discovery. 14 Liu et al. proposed a four-layer RDF-based data fusion framework that converts structured/semi-structured/unstructured data into RDF triples, utilizes balanced binary tree indexing for efficient subgraph matching, and performs syntactic fusion with Word2Vec for conflict resolution, validated on academic data. 15 Di Curzio et al. proposed a classification framework for multi-source spatial-temporal data fusion in environmental studies, categorizing approaches as complementary, redundant, or cooperative, and using Dasarathy's input-output levels to enhance reliability across the hydrology, soil science, and precision agriculture domains. 16 Li and Gan proposed an ensemble learning-based multi-source data fusion model for tourism information processing, utilizing Ctrip datasets for training and validating prediction accuracy against Feizhu/tunic data, achieving 78% post-pandemic trend consistency. 17
Multi-source heterogeneous data fusion approaches have emerged as a key functionality due to the increasing need to leverage a variety of data types (e.g., sensor signals, textual logs, and images) across complex systems and across general areas such as IoT, environmental monitoring, and industry. The initial methods focused on feature-level integration, using techniques such as bimodal autoencoders or Deep Boltzmann Machines to adjust cross-modal correlations, as pursued in early works addressing issues such as credibility assessment in sensor data and spatio-temporal modelling. Decision-level fusion strategies eventually emerged that involve independent modelling of different types of data (e.g., Isolation Forest on numeric data and an LSTM on text) and ideally aggregate the results via voting or weighted averaging. Nevertheless, these approaches could not preserve contextual relations or handle dynamic environments effectively. There are, nonetheless, problems to overcome: (1) How to handle high-velocity streams at scale; (2) How to address semantic/structure differences in real-time; and (3) how to achieve efficiency in resource-constrained settings. Recent bodies of knowledge have focused on adaptive frameworks, i.e., a confluence of domain-specific feature extraction, attention-based alignment, and hybrid modeling to address across-the-board cracks between theoretical robustness and eclectic application, blazing a path to resilient anomaly detection platforms in adaptive, multi-source settings.
Pan et al. proposed a dynamic residual generator via robust optimization to detect stealthy multivariate data-injection attacks in power systems, capturing transient/steady-state behaviors and achieving Nash equilibrium, and validated it on an IEEE 39-bus system. 18 Cooper et al. surveyed traditional and modern anomaly detection methods for power system state estimation, covering chi-square testing, residual analysis, hypothesis testing, and newer approaches such as quickest-change detection and AI techniques to address cyber threats, including false data injection attacks. 19 Baker et al. proposed an integrated LSTM-MPC framework for real-time anomaly detection and classification (distinguishing internal inverter faults from cyber-attacks such as FDI) in power-electronics-dominated grids, validated on a 14-bus system. 20 Zhang et al. proposed a random matrix theory-based method using covariance matrix spectral analysis (Marchenko-Pastur law, MSR) for identifying abnormal electricity consumption in industrial power systems. 21 Gaggero et al. surveyed AI-physics fusion methods for intelligent grid anomaly detection, highlighting performance gaps and low real-world validation readiness (TRL) across distribution grids and DERs. 22 Pei et al. proposed a Spark-optimized SVM-RF model, coupled with IoT data pipelines (Jnetpcap/Flume/Kafka), for real-time anomaly detection in wireless smart grid data acquisition. 23 Jin et al. proposed an anomaly detection framework that leverages the innovation-reduction properties of the Iterated Extended Kalman Filter (IEKF) and normalized residuals from static state estimation to accurately detect and distinguish sudden load changes, insufficient data, and line outages in power systems. 24 Guato Burgos et al. proposed a comprehensive review categorizing seven types of smart grid anomalies and analyzing prevalent artificial intelligence-based detection methods, including machine learning and deep learning frameworks, based on the 2011–2023 literature. 25 Passerini et al. proposed using power line modems as sensors to detect and localize grid anomalies by measuring input admittance and impedance, along with channel transfer functions, and using time-domain analysis algorithms. 26 Banik et al. proposed a multivariate analysis method for smart meter data to detect customer-level anomalies in smart grids, addressing gaps in existing research. 27
Modern grid operation has been highly complex and dynamic, leading to the development of anomaly detection technology in power systems. Deviations of sensor data or operational logs have long been detected using traditional methodologies (e.g., using rule-based thresholding, shallow machine learning models (e.g., Isolation Forest, One-Class SVM). Still, they can take significant time to train and/or are insufficient to detect deviations within new hardware or new behaviors of old hardware. Nevertheless, such approaches do not perform well in changing grid environments, with unstructured log formats (e.g., textual logs, event sequences), and under real-time processing requirements. New developments have moved into deeper learning structures, including Autoencoders and GAN-AD, which can learn more complex patterns but also suffer from interpretability and overfitting. There have been hybrid approaches that incorporate both probabilistic (e.g., Gaussian Mixture Models) and threshold-based models, aiming to make them more robust; however, these approaches lack unifying data integration mechanisms to accommodate heterogeneous data sources. Researchers also noted the importance of scalable systems that balance precision, real-time processing, and efficiency. The remaining critical problems are (1) how to respond to the small-sample learning situation of the rare anomaly scenario (2) how to handle multi-scale anomaly (e.g., transient voltage drops, and long-term equipment degradation), and (3) how to be cross-domain transferable. Such constraints demonstrate the need to develop adaptive, interpretable frameworks that can manage multi-source data and optimize performance in dynamic, resource-constrained environments, as shown in this study by implementing a cross-modal fusion model from Qwen2.
methodology
Overall workflow
The proposed framework tackles multi-source anomaly detection through integrated preprocessing, feature fusion, and pattern analysis. It first standardizes heterogeneous data (sensor readings, logs, time-series), then employs cross-attention to model inter-domain relationships. Anomalies are identified via threshold-based or probabilistic (e.g., GMM) comparisons against standard patterns. The scalable design adapts to varying data structures, enabling robust real-world deployment where source quality fluctuates.
The proposed method follows a four-step pipeline: Data Preprocessing, Multi-Source Feature Extraction, Model Fusion with Qwen2, and Anomaly Detection.
Data Preprocessing: Normalize multi-source data (e.g., sensor signals, text logs) and handle missing values via interpolation. Multi-Source Feature Extraction: Extract domain-specific features (e.g., spectral features from sensor data, text embeddings from logs) using Qwen2's modality-specific encoders. Model Fusion: Combine heterogeneous features using cross-attention mechanisms to align contextual dependencies. Anomaly Detection: Apply threshold-based or probabilistic models (e.g., Gaussian Mixture Models) to identify deviations from standard patterns.
The overall flowchart is shown in the following Figure 1.

Overall framework.
Data preprocessing
Power grid multi-source datasets consist of heterogeneous data from diverse origins. These datasets often differ in format, temporal resolution, and semantic meaning, requiring careful preprocessing to ensure compatibility. Key challenges include handling missing values, aligning time-stamped data, and resolving format/semantic discrepancies.
The process starts by collecting raw data from IoT devices, databases, and APIs, including sensor signals, logs, images, and time-series records. Data arrives in varied formats (CSV, JSON, XML) and requires standardization. Temporal alignment handles missing values or syncs timestamps for time-series data. Semantic conflicts (e.g., speed units) are resolved to unify features. This preprocessing ensures coherent data integration for downstream tasks like model fusion and anomaly detection.
The Data Pre-processing flowchart is shown in the following Figure 2.

Data pre-processing framework.
Power grid multi-source datasets are first standardized to ensure compatibility. Numerical data is normalized using min-max scaling:
Which maps values to the range [0,1]. For time-series data with missing entries, linear interpolation is applied:
Where
1) Multi-Source feature extraction
This step transforms preprocessed data into domain-specific features tailored for downstream tasks. For text logs, Qwen2's tokenizer converts raw text T into numerical embeddings
2) Model fusion with Qwen2
Integrating heterogeneous features (e.g., text embeddings
Where
3) Anomaly detection
Identifying deviations in the fused features F by combining threshold-based and probabilistic methods. First, deviation scores
Anomalies are detected when
4) Test and verification
The validation process ensures robustness by verifying preprocessing (normalization in [0,1], time-series interpolation accuracy), multi-source feature extraction (Qwen2 embeddings’ consistency, FFT frequency alignment), and model fusion (cross-attention weight analysis). Anomaly detection is evaluated using precision, recall, F1 score, AUC-ROC, and domain-specific metrics (e.g., false-positive rates). Cross-validation (5-fold) prevents overfitting and provides averaged metrics for stability. Benchmarking against standard methods confirms effectiveness, ensuring reproducibility and adaptability across scenarios.
Experimental environment and configurations
The experimental study was conducted on a system equipped with an NVIDIA GeForce RTX 4090 graphics card featuring 16,384 CUDA cores, 24GB GDDR6X VRAM with a 384-bit memory interface, and 7th-generation Tensor Cores for AI acceleration. The hardware was supported by 64GB G.Skill DDR5–6000 MHz RAM in a quad-channel configuration to maximize data throughput, paired with a 1200 W 80+ Gold-certified power supply. For software, the environment included Ubuntu 22.04 LTS, PyTorch 2.1, and the CUDA 12.1 and cuDNN 8.9.2 libraries, optimized for RTX 4090's DLSS 3.0 and Ada Lovelace architecture. The dataset used for experiments consisted of multimodal logs from industrial IoT systems, including high-resolution sensor telemetry (100 Hz sampling rate) and textual maintenance reports, preprocessed to ensure temporal alignment and semantic consistency. All experiments were conducted under controlled thermal conditions (25 ± 2 °C ambient temperature) to provide hardware stability.
The experiments were conducted on a power grid multi-source dataset containing sensor signals and textual logs collected from a real-world innovative grid system. Sensor data included voltage, current, and frequency measurements sampled at 1 Hz over 7 days (10,080 data points). The power grid sensor dataset is shown in the following Figure 3:

Power grid sensor dataset.
Maintenance personnel generated text logs, recording events such as “voltage drop detected” or “transformer overload” with timestamps. The dataset was preprocessed to align time-series data and normalize numerical values. The power grid text logs dataset is shown in the following Figure 4.

Power grid text logs dataset.
The analysis of the results showed that the Qwen2 framework outperformed the baseline models across all figures. For example, it reported a precision of 92%, a recall of 89%, and an F1-score of 90, with a 12-percentage improvement over the best-performing baseline (Transformer-AD). Its AUC-ROC was 94% whereas that of Transformer-AD was 91%. Strikingly, Qwen2 had a false-positive rate of 5%, which was 50% lower than the conventional models, such as Isolation Forest (12%) and SVM (13%). This advance demonstrates the effectiveness of the framework in managing noisy environments —an important issue for real-world applications of the grid. The cross-attention mechanism was crucial for aligning sensor signals and textual logs, resulting in an 815% increase in recall over single-modality approaches Table 1.
The results of the comparison with state-of-the-art models.
The results of the comparison with state-of-the-art models.
The bar chart of the results of the comparison with State-of-the-Art Models is shown in Figure 5.

The bar chart of the results of the comparison with the state-of-the-art models.
The importance of the major factors was also confirmed through an ablation study. By ablating the cross-attention block, the F1-score decreased by 5%, whereas avoiding the use of text embedding log (e.g., voltage drop detected) resulted in a 4% decrease in recall. The dynamically payable resolution-controlling module, layered to handle high-frequency sensor data, achieved a 23% improvement in AUC-ROC. These results highlight the need to integrate heterogeneous data and to adopt an adaptive mechanism that enables robust anomaly detection Table 2.
Ablation Study.
The bar chart of the Ablation Study is shown in Figure 6.

The bar chart of the ablation study.
The proposed Qwen2 framework significantly outperformed existing methods in detecting anomalies across multi-source power grid data. It achieved a 90% F1-score—12% higher than Transformer-AD—and reduced false positives by 50% compared to traditional models like Isolation Forest. This improvement stems from two key innovations: the cross-attention mechanism, which aligns sensor signals with text logs (e.g., linking “transformer overload” logs to voltage fluctuations), and dynamic resolution processing for high-frequency sensor data. The ablation study confirmed these components’ critical roles; removing cross-attention lowered the F1-score by 5%, while excluding text embeddings reduced recall by 4%. These results highlight how fusing contextual data from diverse sources (e.g., SCADA metrics and maintenance logs) enables more robust detection in noisy grid environments.
Despite its strengths, the framework has notable constraints. First, computational demands limit deployment on resource-constrained edge devices (e.g., remote IoT sensors), as Qwen2's architecture requires substantial memory and processing power. Second, the dataset—though real-world—lacked diversity in anomaly types and durations. It focused primarily on short-term faults (e.g., voltage drops, breaker trips), omitting slower-developing issues like equipment degradation or coordinated cyberattacks. This narrow scope may hinder generalization to broader scenarios, such as grids experiencing seasonal load shifts or prolonged infrastructure decay.
Therefore, to address this study's limitations, future efforts should prioritize three areas. Firstly, the model via quantization or distillation to reduce computational overhead needs to be optimized, enabling real-time inference on edge devices. Secondly, it is necessary to extend the framework to detect both transient events (e.g., cyber intrusions) and slow-evolving anomalies (e.g., insulator wear) by integrating long-term trend analysis. Last but not least, we will enhance decision transparency using attention heatmaps to visualize cross-modal links and embedding domain rules to flag high-risk anomalies. These future works will bridge scalability gaps while improving adaptability across diverse grid conditions.
Conclusion
In this paper, the authors proposed a feasible framework for Qwen2-powered anomaly detection in power grids, leveraging a combination of multi-source data fusion and cross-modal attention to overcome issues of heterogeneous data integration and real-time processing. Experimental findings indicated that the proposed Qwen2 framework outperformed state-of-the-art models, achieving a 12% increase in F1-score over the best baseline (Transformer-AD) and a 50% decrease in false favorable rates compared to conventional methods. The ablation study once again confirmed the importance of cross-attention, text log embeddings, and dynamic resolution management for improving robustness and contextual alignment. As an example, the elimination of the cross-attention mechanism resulted in a 5% reduction in F1-score. In contrast, the absence of the text log led to a 4% decline in recall, indicating the significance of contextual information for anomaly detection in transformer overload. The framework went a step further, implementing a low-latency operation (85 ms per batch) to meet the real-time monitoring needs of smart grids.
Nevertheless, this study is limited. To begin with, the model's overall computational cost can be a shortcoming when running on resource-limited edge devices and should be further optimized to enable lightweight inference. Second, the existing dataset, although representative, lacks substantial variety in anomaly types and time periods (e.g., transient versus long-term anomalies and faults). This might interfere with generalizability.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
