Secure sharing model of data throughout the entire process of power equipment based on machine learning

Abstract

The distributed, heterogeneous, and shared security risks of power equipment data across its entire lifecycle limit the facilitation and integrated sharing of data across the entire lifecycle. This paper proposes a machine-learning-based secure data-sharing model for power equipment data during its lifecycle. The development of the proposed model includes multi-source data merging from the operation, inspection, and maintenance process into one data format through semantic mapping, in the unified structure of the data so that machine-learning is used for feature extraction and risk-prediction for dynamic access control and the adaptive encryption and de-sensitization balance between risk mitigation and data sharing is still maintained. The full-process monitoring and feedback monitoring to detect anomalous behavior can also optimize the polices in real-time. The experimentation provides an assigned data penetration rate of 96.3% for the disconnector. The leakage rate of sensitive information was reduced to 1.8% once the risk level was increased to extremely high. This separation alleviates the conflict of data security and data sharing by providing original research efforts for intelligent information and database systems.

Keywords

power equipment throughout the entire process secure data sharing machine learning dynamic access control

1 Introduction

The intelligent transformation of the critical national infrastructure power system relies on deploying data efficiently and managing it all along the equipment life cycle. With smart grid developments and operationalization, equipment, including substations; transmission and distribution networks; and other new power-sector equipment, are capable of producing substantial quantities of heterogeneous data while in operation, during testing, and maintenance. Data contains essential information regarding the equipment state, operational behaviors, and risk factors, and may be useful in strengthening the reliability of the power system, as well as for operational and maintenance optimization studies.^1–2 The power system, at present, has acute data silos and interoperability problems across systems. Equipment operational data and maintenance records are recorded in history, operations, and maintenance sections of disparate departments and systems and do not have unified data standards or semantic understanding frameworks.^3–4 Incompatible data formats and inconsistent metadata definitions lead to fragmentation of information and greatly limit the analysis of integrated equipment data in the lifecycle of the equipment.^5–6 At the same time, power data sharing presents drastic security issues. Sensitive information, such as equipment parameters and behavior related to user electricity use, carries leakage risk when transferred between departments. Static traditional security protection mechanisms are unable to adequately defend against increasingly sophisticated network threat vectors.^7–8 Current data-sharing methods are often challenged to balance security strength with sharing efficiency. Generally, when security is made more stringent, data access latency is increased or data sharing is made more complicated; to ease data sharing, the necessary security will be lessened.^9–10 The duality of power equipment data's sensitivity - requiring requisite security protections - while still needing to share data will be further accentuated. The call for novel solutions that can both maintain data security and allow for efficient connectively are critical to enabling digital transformation for power systems.^11–12

This article provides an outline for a data security sharing framework applicable to the life cycle of power equipment. The framework uses an ontology-based method to accommodate multi-source of heterogeneous data, creating instead a common data model to account for equipment condition from operation, testing, to maintenance. The risk assessment module combines deep neural networks and graph convolutional networks to facilitate deep features extraction from accessing logs and real-time behaviors, providing a dynamic risk indicator system that mitigates the limitations of conventional rule-driven risk assessment. The access control mechanism implements a reinforcement learning algorithm that employs risk assessment as state inputs to perform real-time optimal and refine permission allocation management. The data protection layer suggests a parameter-adaptation mechanism that offers the possibility of altering encryption strength and desensitization schemes to risk levels in order to introduce a balance between data availabilities against risks. The anomaly detection module monitors potential threats based on behavioral patterns and collectively realizes the optimization of risk assessment, policy enforcement and resulting feedback. The framework offered solves the challenges caused by both unifying security and efficiency in power data sharing in a systematic manner. The true system of combining semantic unification with dynamic risk is a step towards intelligent, secure data lifecycle management. The structured approach of the framework both address significant issues in data governance in the power industry while also providing a transferable technical route for secure data sharing across other critical infrastructure, enabling the feasible use of data in safe operation of the power system.

2 Related work

The study of data connectivity and sharing procedures in the whole life cycle of power equipment is dedicated to multi-technical synergy to enhance efficiency of the overall system. The field of renewable energy power development is studied from the perspective of big data technology. Hong et al.¹³ presented an analysis of the current state of power generation, the power grid, and the user side data processing, built a cloud platform energy storage system architecture, and explained the core value of data fusion for improving coordination efficiency and optimizing energy configuration. This achievement provided a practical foundation for power equipment data integrated management. In industrial Internet of Things, Huo et al.¹⁴ reviewed blockchain technology based on a layered architecture design enabling framework, reviewed and revealed technical adaptation operational rules in equipment security, shared trust data, and process automation that support the trusted operation of intelligent manufacturing systems. The innovative exploration of data transfer mechanisms has drawn boundaries further out with communication performance optimization.^15–16 In the study of synchronous wireless charging and data transmission systems, Yao et al.¹⁷ analyzed key links, including crosstalk suppression, signal-to-noise ratio improvement, and modulation strategies, by classifying and comparing architectural solutions, and clarified the inherent trade-offs among efficiency, speed, and reliability. The above research is still insufficient in terms of the deep integration of data throughout the life cycle of power equipment and in the systematic construction of a secure sharing model.

Research on data security-sharing models based on machine learning in intelligent information systems is continuously deepening its technical depth and breadth of application. Gupta et al.¹⁸ used artificial intelligence technology to ensure the confidentiality, integrity, and authenticity of data in response to security threats in the dynamic topology of mobile networks. Machine learning and deep learning methods have demonstrated technical potential for addressing unauthorized access and fraudulent links, providing theoretical support for a high-trust mobile data protection system. This research direction has successfully strengthened practical applications, and promoted the adaptability of network security detection techniques. Asif et al.¹⁹ created a machine-learning intruder detection model on the MapReduce framework that discovered efficient anomaly prediction based on multi-source and near real time network data. This model produced detection accuracies of 97.7% during the training phase and 95.7% during the validation phase. Their results demonstrated efficiency improvements by leveraging distributed computing and intelligent analysis to target large-scale security threats. The maturity of this detection mechanism has improved the design capabilities of information management system security designs.^20–21 Lv et al.²² optimized the optical gradient enhancement algorithm. Using an Android-based intelligent support information management system they classified and managed a large-scale information file into specific classes rapidly. This system produced an accuracy of 89.24%, where the training time was about 100 s, and the testing time about 0.68 s, establishing a technical benchmark for timely information management. However, the adaptability of the above results to secure sharing mechanisms in the data integration for the entire life cycle of power equipment is yet to be fully verified.

3 Power equipment data secure sharing method

3.1 Unified modeling of full life cycle data

Data from the whole lifecycles of power equipment is the result of operating, inspecting, and maintaining those devices, and each of the three phases will display different heterogeneous characteristics. Operations data is collected from SCADA (Supervisory Control and Data Acquisition) and smart meter systems, presented as high frequency time series data. Inspection data is collected from periodic status inspection systems, presented as structured inspection reports. Maintenance data is collected from ERP (Enterprise Resource Planning) systems reporting on the device maintenance, presented as unstructured maintenance records. These three types of data have fundamental differences in terms of data structure, time granularity, and semantic expression.^23–24

In this study, an ontology for power equipment has been developed as a unified semantic framework, built using OWL (Web Ontology Language) for formal description. The ontology classes are organized into core classes such as power equipment, operational state, detection indicators, and maintenance activities, forming a hierarchy of conceptual classes to encompass comprehensively.^25–26 The process of developing ontology construction includes the industry standard data dictionary in the power industry standard IEC 61970/61968 CIM standards actual system data dictionary to ensure the unified semantic framework is industry compliance and business relevance.

The process of semantic mapping uses multi-faceted similarity calculations to accurately relate the heterogeneous data to the unified ontology:

sim (c_{i}, c_{j}) = α^{'} \dots s i m_{name} (c_{i}, c_{j}) + β^{'} \dots s i m_{context} (c_{i}, c_{j}) + γ^{'} \dots s i m_{type} (c_{i}, c_{j})

(1)

$c_{i}$ represents the source system data field, $c_{j}$ represents the target ontology concept, $si m_{name} (c_{i}, c_{j})$ is the name similarity based on edit distance and word vector, $si m_{context} (c_{i}, c_{j})$ is the contextual semantic similarity, $si m_{type} (c_{i}, c_{j})$ is the data type compatibility measure, and $α^{'}$ , $β^{'}$ , $γ^{'}$ are weight coefficients.

The architecture of the power equipment complete lifecycle data model is shown in Figure 1, and is a pathway of transformation from raw, heterogeneous data to a semantic representation in a unified way. The heterogeneous data flows through preprocessing, format standardization, to semantic mapping to create a unified RDF (Resource Description Framework) representation.

Figure 1.

Power equipment full lifecycle data model architecture.

Table 1 illustrates the differences in representation and the unified mapping results for key data fields of power equipment across heterogeneous systems. This demonstrates the various naming conventions for the physical quantities utilized in the operation and maintenance systems, as well as establishing a relationship between the fields by means of semantic mapping. A unified representation combines representation and eliminates any semantic gaps between the systems in their entirety, thus enabling a standardized foundation for introducing data to be integrated across a complete lifespan. The rules of mapping rely on semantic similarity calculations, along with industry standard constraints, to validate the accuracy and consistency of data conversion.

Table 1.

Mapping relationships between key data fields for power equipment.

Data field name	Operation link representation	Maintenance phase indication	Unified representation
Device identification code	Dev_ID	Device_Code	Equipment_Identifier
Transformer temperature	Trans_Temp	Transformer_Temperature	Equipment_Temperature
Circuit breaker status	CB_Status	CircuitBreaker_State	Equipment_Status
Insulation resistance value	IR_Value	Insulation_Resistance	Insulation_Resistance
Load current	Load_Current	Current_Load	Load_Current

The data conversion will utilize a bidirectional traceability mechanism to build an accurate mapping chain between the source data fields and target ontology concepts. The traceability index is organized as a B + tree for fast reverse queries based on either the source data or the target ontology. The data integrity constraints are established to be:

\forall x \in X, \exists y \in Y\; such\; that\; si m_{value} (x,y) \geq δ

(2)

$X$ is the original dataset, $Y$ is the converted dataset, $si m_{value} (x,\; y)$ is the data-value similarity function, and $δ$ is the integrity threshold, dynamically adjusted based on the data type. Anomalies during the conversion process are identified and handled using a predefined anomaly classification system, forming a closed-loop anomaly-handling mechanism.

3.2 Risk perception and feature extraction

The risk perception and feature extraction module works on the standardized data stream resulting from the unified data model, for the entire lifecycle of power equipment.The data preprocessing step removes outliers, and performs missing value imputation. Missing values are imputed interpol blindly using an algorithm based on similarity between equipment type and time series. The outliers are identified, and corrected using the Three Sigma principle.^27–28 Feature engineering proposes a security feature set distilled from access behavior, data sensitivity, and context. This set consists of 24 feature variables, organized as a time series using a 15-min sliding window.

The feature vector is fed into a hybrid architecture combining an LSTM (Long Short-Term Memory) and a GCN (Graph Convolutional Network). LSTM processes the temporal feature sequence.^29–30 Its unit structure is mathematically expressed as:

\begin{matrix} i_{t} = σ (W_{xi} x_{t} + W_{hi} h_{t - 1} + b_{i}) \\ C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ tanh(W_{xC} x_{t} + W_{hC} h_{t - 1} + b_{C}) \\ h_{t} = o_{t} ⊙ tanh(C_{t}) \end{matrix}

(3)

$x_{t}$ is the input feature vector at time $t$ , $h_{t}$ and $C_{t}$ are the hidden state and cell state, respectively. $i_{t}$ , $f_{t}$ , $and\; o_{t}$ are the activation vectors for the input, forget, and output gates, respectively. $W$ and $b$ are learnable parameters, and $σ$ is the sigmoid activation function, $⊙$ represents element-wise multiplication .

The LSTM output is passed to the GCN layer to construct the power data access graph. The adjacency matrix $A$ is defined as:

A_{ij} = \frac{1}{1 + exp (- α \cdot R_{ij})}

(4)

$A_{ij}$ represents the edge weight between nodes $i$ and $j$ , $R_{ij}$ is the historical interaction risk score, and $α$ is the sensitivity parameter. The GCN layer performs graph convolution operations :

H^{(l + 1)} = σ^{'} ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(5)

$\bar{A} = A + I_{N}$ and $I_{N}$ are the identity matrices, $\tilde{D}$ is the degree matrix of $\tilde{A}$ , $H^{(l)}$ is the node feature matrix of the $l$ layer, and $W^{(l)}$ is the trainable weight matrix. The output of the GCN is concatenated with the temporal features captured by the LSTM to create a unified feature representation that is given to a fully connected network to predict risk. The model is trained using a weighted cross-entropy loss and is optimized with the Adam optimization algorithm. The initial learning rate is set to 0.001, the batch size is set to 64 and the gradient clipping threshold is also set to 5.0 to stabilize training .

3.3 Dynamic access control policy generation

The dynamic access control policy generation module constructs an adaptive permission management protocol that is based on the outputs from the risk perception and feature extraction modules. The state space is expressed as:

S = S_{risk} \oplus S_{context} \oplus S_{sens}

(6)

$S_{risk} = [r_{1}, r_{2}, \dots,\; r_{n}]$ is the risk assessment vector, $r_{i} \in [0,1]$ represents the quantitative value of the $i$ risk indicator; $S_{context} = [u_{1}, u_{2}, \dots,\; u_{m}]$ is the user context feature vector, which includes $m$ dimensions such as role, access time, and device type; $S_{sens} = [d_{1}, d_{2}, \dots, d_{k}]$ is the data sensitivity indicator vector, reflecting the data confidentiality level and integrity requirements; $\oplus$ represents the vector concatenation operation .

The action space is defined as:

A^{'} = {a_{i} ∣ R_{i} \subseteq { 1,2, \dots,q} , O_{i} \subseteq { read,write,execute} , T_{i} \in (0, T_{max}]}

(7)

$R_{i}$ is the subset of $q$ data fields allowed for access, $O_{i}$ is the set of executable operations, $T_{i}$ is the validity period of the permission, and $T_{max}$ is the system's preset maximum value .

The policy optimization process uses a weighted security utility function:

R(s,a) = β \cdot (1 - \frac{N_{vul} (s,a)}{N_{total} (s)}) - γ \cdot \frac{D_{restricted} (s,a)}{D_{total} (s)}

(8)

$N_{vul} (s)$ (a, s) represents the number of security vulnerabilities that could be caused by executing action $a$ in state $s$ , and $N_{total} (s)$ (s) represents the total number of potential vulnerabilities in state $s$ . $D_{restricted} (s,\; a)$ represents the amount of data restricted by executing action $a$ , and $D_{total} (s)$ represents the total amount of data in state $s$ . $β$ and $γ$ represent weighting coefficients. This function quantifies the balance between the effectiveness of security protection and the efficiency of data sharing.

Policy parameter updates follow the temporal difference learning rule: $θ \leftarrow θ + η \nabla_{θ} Q(s, π_{θ} (s))$ and $θ$ are policy network parameters, $η$ is the learning rate, $Q(s,\; a)$ is the action value function, and $π_{θ} (s)$ is the policy function. The system sets up an experience replay buffer to store transfer samples and triggers a policy update when the buffer reaches a threshold.

Figure 2 shows the risk-aware access control decision making process, with three stages: state establishment, policy decision making, and permission output. Conducted in a dual-buffer system, the overall response time is at a millimeter level, with a closed-loop feedback mechanism designed to optimize policies.

Figure 2.

Dynamic access control decisions.

Table 2 presents a quantitative relationship among risk scores and permission parameters, which govern access scope, the scope of allowed activity, time allowed to operate in the session, and level of redaction for each risk zone, all of which allow for risk-adaptive, nuanced permission management.

Table 2.

Risk-Permission mapping rules.

Risk score range	Accessible data types	Operation permission level	Session time limit (minutes)	Data desensitization level
[0.0, 0.3)	All data	Read, Write, and Execute	30	0
[0.3, 0.6)	Non-core data	Read-only	15	1
[0.6, 0.8)	Public data	Read-only	5	2
[0.8, 1.0]	Aggregate data	Read-only	2	3

The risk-permission mapping mechanism dynamically adjusts access rights based on the real-time assessment results, balancing the strength of security protection and efficiency of shared use of data. In order to enable real-time risk assessment, the system employs a two-level caching mechanism. The primary cache contains the currently valid policies. The backup cache allows to pre-calculate state responses.

3.4 Adaptive encryption and desensitization mechanism

The adaptive encryption and desensitization mechanism designs a dynamic security protection system based on the risk score output by the risk perception module and aim to balance data protection strength and business availability. This mechanism takes the risk score as the core input parameter and applies a nonlinear mapping function to determine the encryption strength and desensitization level.

A hierarchical strategy is used in the adjustment of encryption parameters.When the risk score is lower than the threshold $R_{low}$ , AES (Advanced Encryption Standard) basic mode is enabled; when it is between $R_{low}$ and $R_{high}$ , it switches to AES-192 mode; when it exceeds $R_{high}$ , AES-256 strong encryption mode is activated, and ECC (Elliptic Curve Cryptography) is added for double protection. The encryption strength function is defined as:

E_{strength} = E_{min} + (E_{max} - E_{min}) \times \frac{R}{R + R_{0}}

(9)

$E_{strength}$ represents the normalized encryption strength value; $E_{min}$ represents the minimum encryption strength baseline value; $E_{max}$ represents the maximum encryption strength ceiling value; and $R$ represents the real-time risk score. $R_{0}$ serves as a risk sensitivity adjustment parameter.

The process of data desensitization applies multidimensional adaptive control that includes, but is not limited to, interval generalization for numerical data, hierarchical aggregation for categorical data, and key information masking for text data. Desensitization levels are computed by using a power law function:

D_{level} = D_{max} \times {R_{1}}^{\partial}

(10)

$D_{level}$ represents the discrete level of redaction, $D_{max}$ is the preset maximum redaction level, $R_{1}$ is the current risk score, and $\partial$ is the redaction sensitivity coefficient .

The security-availability trade-off model quantitatively evaluates redaction effectiveness using data retention rate:

R_{data} = 1 - \frac{{D_{level}}^{'}}{{D_{max}}^{'}} \times ρ

(11)

$R_{data}$ represents the data retention rate after desensitization, ${D_{level}}^{'}$ represents the current desensitization level, ${D_{max}}^{'}$ represents the maximum desensitization level, and $ρ$ represents the field sensitivity weighting coefficient .

3.5 Abnormal behavior monitoring and feedback optimization

To create a closed-loop security system for the entire power data-sharing process, monitoring unusual behaviors and improving the feedback are developed. In the end, the process involves developing a Gaussian Mixture Model (GMM) to create a “time-sliding-window” model baseline, where the window equals 24 h (the period corresponding to the lifecycle of the business). The behavioral feature vector consists of indicators established in 12 dimensions. Model parameters are iteratively solved through the EM (Expectation Maximization) algorithm for estimating the model. Anomaly detection has a multidimensional, multi-dimensional score:

A (t) = ω_{1} B(t) + ω_{2} C(t) + ω_{3} E_{d} (t)

(12)

$A (t)$ is the comprehensive anomaly score at the $t$ moment, $B(t)$ is the behavioral pattern deviation calculated using the Mahalanobis distance, and $C(t)$ is the contextual consistency, reflecting the degree of match between access behavior and user role, time, and device type. $E_{d} (t)$ is the encryption desensitization anomaly index, which monitors the mismatch between the strength of data protection and the risk level. $ω_{1}$ , $ω_{2}$ , $ω_{3}$ are the normalized weight coefficients required to meet $ω_{1} + ω_{2} + ω_{3} = 1$ . The anomaly threshold $τ$ is dynamically determined using the maximum F1 score method .

The feedback optimization process implements a balancing mechanism between parameter updates and system stability:

Δ θ = λ [\nabla R (θ) \cdot I(A(t) > τ) + μ \nabla L(θ)]

(13)

$Δ θ$ is the adjustment amount of the model parameter $θ$ , $λ$ is the feedback learning rate, $\nabla R (θ)$ is the risk assessment model gradient, $I(\cdot)$ is the indicator function, $μ$ is the regular learning rate, and $\nabla L(θ)$ is the loss function gradient. The system introduces parameter update stability constraints:

| θ (t) - θ (t - 1) | \leq ξ \cdot max(1, | θ (t - 1) |)

(14)

$θ (t)$ represents the model parameter at the $t$ moment, and $ξ$ represents the maximum relative change rate threshold. An experience replay buffer designed for abnormal events stores 10,000 samples, prioritizing experience replay to encourage learning of important abnormal behaviors. The system also performs full fine-tuning of the risk assessment model and access control policy network every 24 h, which enables streaming small-batch parameter updates in real-time while maintaining latency between detecting the anomaly and optimizing the policy under 500 ms on average.

4 Experimental evaluation setup

4.1 Experimental environment and dataset construction

The experimental platform is built on a high-performance computing cluster composed of multi-core CPUs (Central Processing Units), GPUs (Graphics Processing Units) and distributed storage connected using a high-speed network.

The experimental data are obtained from equipment operation, inspection, and maintenance records from a provincial power grid company from 2019 to 2022. Operational data come from SCADA systems, inspection data is obtained from status-monitoring terminals, and maintenance data originates from an enterprise resource management system. The raw data types are heterogeneous, ranging from time series data to structured reports and text records. In the preprocessing phase, all the data were consolidated and coded in the same format. Missing values were filled in using an interpolation algorithm, and outlier values were filtered out using a statistical threshold method to establish consistency and validity of data.

Table 3 presents summary statistics on the dataset characteristics including the number of data points or records, the number of fields, mean and variance for the numeric fields.

Table 3.
Statistical characteristics of the power equipment dataset.

Dataset Number of data entries Number of fields Numeric field mean Numeric field variance

Operational data 2,450,000 36 56.8 19.2

Detection data 28,600 24 41.5 13.4

Maintain data 14,200 18 32.1 10.2

Dataset	Number of data entries	Number of fields	Numeric field mean	Numeric field variance
Operational data	2,450,000	36	56.8	19.2
Detection data	28,600	24	41.5	13.4
Maintain data	14,200	18	32.1	10.2

The data in Table 3 illustrates the differences in dimension and statistical properties of the operation, inspection, and maintenance data, which creates a aligned data base for model training and evaluation for future experiments.

4.2 Evaluation metrics and benchmarking methodology

The evaluation framework of the current study includes an assessment of three areas: data integration efficiency, security protection capabilities, and operational performance of the system. Data integration efficiency is assessed based on the accuracy of field-mapping, semantic well-formedness, and cross-system query latency as indicators of how well unified modeling and semantic mapping can support data processing. Assessments of security protection capability consider rates of sensitive information leaks, abnormal access detection, and incident response latency. These measures describe the effectiveness of the system in identifying risks and responding to them. Assessment of performance largely focuses on CPU utilization, memory usage, throughput, and the latencies incurred to process individual data points, which reflect the resource consumption and scalability of the method in application scenarios with large amounts of data.

To maintain fairness in the comparative experiments, a number of standard data-sharing solutions were included as benchmarks. The benchmark system includes fixed-rule access control, single-strength encryption, statistical feature anomaly detection, blockchain data-sharing, and knowledge graph data-sharing methods. All five of these methods capture standard implementations with power data-sharing applications and were configured to run in the same environment and time as the methods provided in this paper for reproducibility of comparison.

5 Results analysis

5.1 Data integration efficiency analysis

In this research, the results corresponding to the data connectivity performance for each device type in the proposed full-process data security sharing model were reported. More specifically, it referred to the data processing performance pertaining specifically to six typical power equipment types: transformers, circuit breakers, cables, capacitors, disconnectors, and current transformers. The data penetration rate measures the completeness of cross process data integration, while the format conversion success rate measures the compatible formats between heterogeneous types of systems. The semantic consistency metric represents the consistency of data meaning across different business processes, whereas the cross-system query response time shows the real-time processing capability of the system. Penetration performance of the ontology standardized data model across the different power equipment types is compared in Figure 3.

Figure 3.

Comparison of data transmission quality and system performance for power equipment throughout the entire process.

The disconnector had the highest data transmission rate of 96.3% and a query response time of 85.6 ms (see Figure 3) because the data structure is relatively simple with fewer business-related dimensions. Transformers serve as major core equipment in the power grid and because of the complexity of the data model and multidimensional business connections, the transformer response time was 183.7, while still indicating good transmission rate of 94.2%. Cables had an extremely high rate of semantic consistency, reaching 91.8% of consistency which speaks to the standardized description of physical properties. Current transformers had only 86.3% consistency, and the conversion of formats only had a performance rate of 89.7%, which indicates that the multi-source heterogeneous nature of the measurement data increases the complexity of semantic mapping. The relationship between the device type and the complexity of the data complexity directly influences the efficiency of data integration which validates the ontology approach adaptability to heterogeneous informational data coming from the power equipment and demonstrates that the model proposed in this research can effectively integrate data based on device characteristics and configurations.

5.2 Security protection performance evaluation

In order to comprehend the dynamic protection capabilities of the power equipment full-process data security sharing model for each of these threat environments, this study expanded to a five-level risk assessment framework based on Table 2. This risk assessment framework systematically examined the five core indicators of interest: (1) abnormal access detection rate; (2) attack success rate: (3) sensitive information leakage rate; (4) false alarm rate; and (5) security incident response time. The five levels are strictly classified as follows: *Very Low (0.0–0.15);* *Low (0.15–0.3);* *Medium (0.3–0.6);* *High (0.6–0.8); and* *Very High (0.8–1.0)*. The abnormal access detection rate reflects how accurately the system detected illegal behavior. The attack success rate reflects the chances of an attacker successfully breaching defenses, the sensitive information leakage rate reflects the capacity for data protection, the false alarm rate reflects the accuracy of security policy, and the security incident response time reflects the timeliness for the system to handle security incidents, attack, etc. A dynamic representation of this relationship between time and risk level for the various indicators is shown in Figure 4.

Figure 4.

Protection performance evaluation of the power equipment full-process data security sharing model at different risk levels.

Figure 4 shows that as the risk level rises from very low to very high, the abnormal access detection rate rises from 96.2% to 98.9%, while the attack success rate decreases from 8.7% to 1.5%. This displays that the risk perception mechanism can dynamically increase protection strength to match the threat level. The rate of sensitive information leakage fell to 1.8%, confirming the influence of the adaptive encryption and desensitization mechanism. However, the increase in the false alarm rate from 3.8% to 6.1% illustrates the compromises commonly associated with tightened security policies. Security incident response time fell from 382 ms to 143 ms, representing that the system can prioritize response time in high-risk situations. This dynamic balancing mechanism successfully mitigates the tension between security and efficiency in power data sharing. This shows good, reliable support for the innovative grid data security service design.

5.3 Dynamic policy adaptability analysis

In order to assess the dynamic adaptability of the full-process data security sharing model for power equipment across risk scenarios, the research investigated four indicators: risk prediction accuracy, policy adjustment frequency, policy adaptation delay, and scenario switching response time. Risk prediction accuracy indicates the system's ability to anticipate likely threat types; policy adjustment frequency indicates the dynamic responsiveness of the access control policy approach to changes in the provisioning of data; policy adaptation delay indicates the efficiency of the timeframe of creating and accomplishing policy parameter changes; and scenario switching response time describes the system's responsiveness to changes in business scenario factors. Using a five-level risk assessment framework as the horizontal axis the study systematically outlines the known relationships among indicators across risk levels as illustrated in Figure 5.

Figure 5.

Dynamic policy adaptability evaluation of the power equipment full-process data security sharing model.

The risk prediction accuracy rises systematically from 88.7% to 98.4% as the risk escalates in Figure 5. The frequency of policy adjustment ranges from 2.9 to 24.1 times per hour, revealing that the system enhances its risk perception and the policy update mechanism in high risk contexts. At the same time, the policy adaptation latency decreases from 291.5 ms to 102.6 ms, then the associated scenario-switching response time decreases from 325.8 ms to 118.3 ms, and these show that the system optimizes the strategy of prioritizing response time in high-risk contexts. Such nonlinearity arises from the risk perception module's hybrid architecture that combines deep neural networks with graph convolutional networks that supports richer feature extraction. Moreover, the reinforcement-learning based access-control policy network dynamically updates its parameter space based on risk assessment results to ensure faster convergence to the optimal policy with an increase in the risk level. The closed-loop feedback mechanism demonstrates efficacy in balancing the inherent contradiction between the strength of protection against risk of loss (security) and efficiency of shared data dissemination through real-time surveillance and parameter tuning. This provides an engineering consideration for a pragmatic dynamic policy-adaptability design for secure data sharing for the entire life cycle of power equipment.

5.4 System resource expenditure evaluation

The analysis of the system resource consumption focuses on the effectiveness of computing and the resource use characteristics of the secure data-sharing model across the lifecycle of the electrical power equipment process and under different risk conditions. The analysis reflects computing resource requirements by monitoring CPU and memory use, characterizes the efficiency of data flows by monitoring network bandwidth use and latency, and captures economic viability by monitoring system throughput and the cost of processing individual pieces of data. Five distinct risk conditions are the critical variables in this analysis, and show the trade-off between strength of security and system resource consumption (Figure 6).

Figure 6.

Resource overhead variation with risk level in the power equipment full-process data security sharing model.

In Figure 6, CPU utilization rises from a level of 14.3% to 71.2%, memory utilization rises from a level of 17.8% to 49.3%, and processing cost per 1000 records rises from 0.072 yuan to 0.518 yuan as the risk level increases. The CPU utilization for AES-256 strong encryption mode is enhanced with an ECC to provide doubly-protective methodology. Network bandwidth utilization rises and falls and rises and falls with a downward trend initially of 12.4 MB/s to 9.3 MB/s, with a rebound to 13.5 MB/s. This change in bandwidth utilization was due to both the efficiency of data compression at the lower and medium level of risk, and the increased size of the data from the ECC signing protocol at the very high level of risk. In addition, data transmission latency rises from 43 ms to 254 ms and system throughput declines from 4780 messages/s to 1260 messages/s and demonstrates nonlinear trading off of effective security strength and processing performance. This model implies that it can be assured a high level of data security is guarantee that retains an acceptable compromise of resource overhead.

5.5 Method comparison and advantage analysis

To ensure secure data sharing across the entire process of power equipment, a trade-off between functional completeness and implementation cost is necessary. Study conducted a multidimensional assessment of six common methods: the machine learning method assessed in this paper, static rule-based access control, single-strength encryption, statistical feature anomaly detection, distributed ledger technology data sharing, and knowledge graph sharing. In the functional completeness dimension, we examine full lifecycle coverage which assesses the ability to combine data from the equipment operating, testing, and maintenance stages; and dynamic policy adaptability, which assesses how quickly the process can respond to changes in risk. The multi-device compatibility dimension assesses the ability to adapt to data shared across six categories of devices (e.g., transformers, circuit breakers). The implementation cost dimension includes (1) deployment complexity, (2) cost of routine maintenance (3) difficulty of system upgrades, respectively considered the difficulty of integrating systems, operational resource overhead, and flexibility in upgrading technologies. Figure 7 depicts the functional completeness and implementation cost across various methods.

Figure 7.

Comparison of functional completeness and implementation cost of the full-process data security sharing method for power equipment.

As evidenced in Figure 7, the suggested approach is highly effective for providing functional completeness, achieving scores of 92.5 for full lifecycle accessibility, 95.3 for dynamic policy adaptivity, and 93.7 for multi-device adaptability. This is a result of the risk aware mechanism that integrates an LSTM and a GCN to reliably capture the time-based and correlation patterns of power data. Knowledge graph sharing provides scores of 86.4 for multi-device compatibility that is not far off from the score of the method proposed in this paper, since it provides ontological modeling advantages for working with multi-source heterogeneous data. Blockchain data sharing, as expected, performed the worst with implementation cost, with deployment complexity (84.3), maintenance cost (78.2), and upgrade difficulty (82.6), showing the challenge in adapting its distributed architecture to meet the power system environment. Fixed-rule access control achieves the lowest implementation cost but lacks functional completeness, in which complete lifecycle coverage scored only 68.3, and therefore is not a good fit to meet the needs of secure data sharing in the power industry. As such, this approach provides an optimal balance between functional completeness and implementation cost, ultimately providing an achievable method to securely share data throughout the lifecycle of power equipment.

6 Conclusion

This paper constructs a machine learning-based secure data sharing model for the entire process of power equipment. This model uses a domain ontology to unify semantics and standardize the format of multi-source heterogeneous data across operations, inspections, and maintenance. Furthermore, it employs a hybrid LSTM and GCN architecture for risk feature extraction and prediction. A risk-driven dynamic access control mechanism was then established, along with encryption and desensitization strategies that adjust to risk scores, forming a closed-loop optimization system of “risk assessment – strategy execution – effect feedback.” Results showed that this model improved data connectivity and security protection for power equipment, achieving a 96.3% data connectivity rate for disconnector data and reducing the sensitive information leakage rate to 1.8% under extremely high-risk conditions. The model achieves consistency in data transformations with semantic mapping backed by bidirectional traceability, while risk-permission mapping facilitates the trust management process by providing an integrated view. To balance security with availability, we manage the trade-off with adaptive encryption and desensitization. The enduring management of the user-permission model and anomaly monitoring, indexed by historical user policy feedback, contribute to stable operation. The research outlines technical measures to address the challenge of balancing security and efficiency in operated data sharing of power systems - and provides technical safeguards for the life-cycle development and intelligent transformation of power process equipment.

Footnotes

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Hasan

Alkhalifah

Islam

, et al. Blockchain technology on smart grid, energy trading, and big data: security issues, challenges, and recommendations. Wirel Commun Mob Comput 2022; 2022: 9065768–9065793.

Chehri

Fofana

Yang

. Security risk modeling in smart grid critical infrastructures in the era of big data and artificial intelligence. Sustainability 2021; 13: 3196–3214.

Al-Jumaili

AHA

Muniyandi

Hasan

, et al. Big data analytics using cloud computing-based frameworks for power management systems: status, constraints, and future recommendations. Sensors 2023; 23: 2952–2988.

Wang

, et al. A survey on the development status and application prospects of knowledge graph in smart grids. IET Gener Transm Distrib 2021; 15: 383–407.

Jeddoub

Nys

Hajji

, et al. Data integration across urban digital twin lifecycle: a comprehensive review of current initiatives. Ann GIS 2025; 31: 367–386.

Liaw

Guo

JGN

Ansari

, et al. Quality assessment of real-world data repositories across the data life cycle: a literature review. J Am Med Inform Assoc 2021; 28: 1591–1599.

Hailu

Nyakoe

Muriithi

. Techniques of power system static security assessment and improvement: a literature survey. Heliyon 2023; 9: e14524–e14541.

Zhang

Jia

Hou

, et al. Data-driven security and stability rules for high-renewable-penetrated power system operation. Proc IEEE 2022; 111: 788–805.

Malik

. Analyzing the trade-offs of data sharing in social networks and privacy concerns. Int J Electr Crime Investigation 2024; 8: 97–120.

10.

Alzoubi

. Investigating the synergy of blockchain and AI: enhancing security, efficiency, and transparency. J Cyber Security Technol 2025; 9: 227–255.

11.

Manogaran

Alazab

Shakeel

, et al. Blockchain-assisted secure data-sharing model for internet of things-based innovative industries. IEEE Trans Reliab 2021; 71: 348–358.

12.

Zhang

Wang

, et al. Data security sharing model based on privacy protection for blockchain-enabled industrial internet of things. Int J Intell Syst 2021; 36: 94–111.

13.

Hong

Liang

Yang

. Research progress, trends, and prospects of big data technology for new energy power and energy storage system. Energy Rev 2023; 2: 100036–100041.

14.

Huo

Zeng

Wang

, et al. A comprehensive survey on blockchain in industrial internet of things: motivations, research progresses, and future challenges. IEEE Commun Surv Tutorials 2022; 24: 88–122.

15.

Tang

Xiang

. Data transmission reliability analysis of wireless sensor networks for social network optimization. J Sensors 2022; 2022: 3842722–3842733.

16.

Logeshwaran

Ramkumar

Kiruthiga

, et al. The role of an integrated structured cabling system (ISCS) for reliable bandwidth optimization in a high-speed communication network. ICTACT J Commun Technol 2022; 13: 2635–2639.

17.

Yao

Sun

Liu

, et al. Simultaneous wireless power and data transfer: a comprehensive review. IEEE Trans Power Electron 2021; 37: 3650–3667.

18.

Gupta

Johri

Srinivasan

, et al. A systematic review on machine learning and deep learning models for electronic information security in mobile networks. Sensors 2022; 22: 2017–2050.

19.

Asif

Abbas

Khan

, et al. MapReduce-based intelligent model for intrusion detection using machine learning technique. J King Saud Univ-Compu Inform Sci 2022; 34: 9723–9731.

20.

Shrivastava

Chaturvedi

. A review on intrusion detection system for distributed network based on machine learning. J Integrated Sci Technol 2024; 12: 739–739.

21.

Cui

Sun

Zhong

, et al. Collaborative intrusion detection system for SDVN: a federated deep learning approach for fairness. IEEE Trans Parallel Distrib Syst 2023; 34: 2512–2528.

22.

Lou

Feng

, et al. Novel machine learning for big data analytics in intelligent support information management systems. ACM Trans Manag Inform Syst (TMIS) 2021; 13: 1–21.

23.

Huang

Yin

, et al. Multidimensional heterogeneous data clustering algorithm for power transmission and transformation equipment. J Intell Fuzzy Syst 2023; 44: 5871–5878.

24.

Elahi

Afolaranmi

Martinez Lastra

, et al. A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment. Discov Artif Intell 2023; 3: 43–120.

25.

Lembo

Santarelli

Savo

, et al. Graphol: a graphical language for ontology modeling equivalent to OWL 2. Future Internet 2022; 14: 78–106.

26.

Palihakkara

Osorio-Sandoval

Tizani

. A review of the application of ontologies and semantic web for building information modelling and digital twins based construction management. J Inform Technol Construction (ITcon) 2025; 30: 1208–1243.

27.

Wang

Deng

, et al. Influence of different data interpolation methods for sparse data on the construction accuracy of electric bus driving cycle. Electronics (Basel) 2023; 12: 1377–1399.

28.

Zhang

Lim

Yang

, et al. Using deep learning to interpolate the missing data in time series for credit risks along the supply chain. Ind Manag Data Syst 2023; 123: 1401–1417.

29.

Mittal

Khurana Batra

. Graph-ensemble fusion for enhanced IoT intrusion detection: leveraging GCN and deep learning. Cluster Comput 2024; 27: 10525–10552.

30.

Yuan

Wan

, et al. A novel encrypted traffic detection model based on detachable convolutional GCN-LSTM. Sci Rep 2025; 15: 27705–27722.

Secure sharing model of data throughout the entire process of power equipment based on machine learning

Abstract

Keywords

1 Introduction

2 Related work

3 Power equipment data secure sharing method

3.1 Unified modeling of full life cycle data

4.1 Experimental environment and dataset construction

Table 3. Statistical characteristics of the power equipment dataset. Dataset Number of data entries Number of fields Numeric field mean Numeric field variance Operational data 2,450,000 36 56.8 19.2 Detection data 28,600 24 41.5 13.4 Maintain data 14,200 18 32.1 10.2

5 Results analysis

5.1 Data integration efficiency analysis

Footnotes

Funding

Declaration of conflicting interests

References

Table 3.
Statistical characteristics of the power equipment dataset.

Dataset Number of data entries Number of fields Numeric field mean Numeric field variance

Operational data 2,450,000 36 56.8 19.2

Detection data 28,600 24 41.5 13.4

Maintain data 14,200 18 32.1 10.2