Knowledge fusion method of power grid model based on Seq2seq half pointer and half label method

Abstract

Due to the complexity of the calculation process of the existing methods, the efficiency of data fusion of the power grid model is low. In order to improve the knowledge fusion effect of power grid model, this paper studied the knowledge fusion method of power grid model based on Seq2seq half pointer and half label method. The Text Rank algorithm is used to calculate the weight of semantic nodes of each grid model, and combined with the topological potential method, the semantic information of the grid model is extracted according to the final weight value, and the Seq2Seq semi-pointer semi-label model framework is constructed. The data of the scheduling automation system OMS and the production management system PMS are used as input. The extracted candidate mesh model semantics and the original mesh model semantics are encoded by Seq2Seq half-pointer half-label model. The semantic data of the power grid model is fused and sent to the Seq2Seq encoder. After the training is completed, the effective information is extracted from the power grid model through the Seq2Seq model to complete the knowledge fusion of the power grid model. Experimental results show that this method eliminates the redundant part of the basic attributes of each data source in the substation grid model after knowledge fusion, and the description of each basic attribute is more standardized, unified and perfect. Under different mesh model data dimensions, the support of the proposed method is all above 98%. The model trained by the proposed method tends to be stable after 120 iterations, and the precision, recall and F1 of the test set are 0.98, 0.93 and 0.91, respectively. At the same time, this method has high efficiency in the knowledge fusion processing of the power grid model, and its data processing speed is less than 160 s. The average integrity of the private data of the power grid model is 98.86%, indicating that the proposed method can better ensure the integrity of the data. Finally, compared with the application of other methods under different data amounts, the mean square error obtained by the proposed method is the smallest, indicating that the proposed method effectively improves the fusion accuracy.

Keywords

Grid model knowledge fusion method half label method LSTM neural network Seq2seq half pointer TPC TextRank algorithm

1 Introduction

In modern society, as an important infrastructure to support social and economic development, the stable and efficient operation of power grid is very important [1]. In order to better understand and manage power grids, deep research and analysis of power grid models are needed [2]. However, traditional power grid model research methods often fail to make full use of the semantic information of the model, which limits the understanding and application of the power grid model to a certain extent [3].

In recent years, with the rapid development of deep learning and natural language processing technology, text-based data processing and model construction methods have begun to attract more and more attention in the field of power grid [4]. However, existing research works have obvious shortcomings in knowledge fusion of power grid models. On the one hand, these methods often ignore the topological structure information of the power grid model, which makes the model unable to fully understand the operation mechanism of the power grid [5]. On the other hand, the existing methods often lack effective means to extract and utilize the semantic information of the power grid model, which limits the performance and application range of the model [6].

In order to solve these problems, many scholars have carried out research on it and achieved considerable research results. WANG Xin et al. proposed a self-organizing map knowledge fusion algorithm for big data of distribution network based on self-organizing map neural network under the premise of facing the big data environment. Through data clustering between homogeneous data, ontology mapping between heterogeneous data and self-organizing iteration between multi-dimensional data, the associated knowledge is effectively fused and certain real-time requirements are guaranteed. This method promotes the fusion of multi-dimensional heterogeneous data, but the fusion accuracy still needs to be improved [7]. LIU Dong et al. proposed a knowledge graph construction method for intelligent retrieval of power grid regulation information. The semantic relationship in scheduling operation is identified based on deep learning, and the rule transformation method is used to extract the power grid model information to establish the knowledge graph of regulation information, and the application scheme of intelligent retrieval of regulation information is proposed. This method has high recognition accuracy, but it is difficult to deal with applications in different scenarios and has certain limitations [8]. LI Xiaolu et al. proposed a knowledge graph verification method of power grid model based on shape constraint language. CIM is used to construct the concept graph and entity graph of power grid model, and the CIM pattern consistency shape, as well as cross-class and cross-attribute consistency shape for power grid model verification are set. This method improves the flexibility of the model, but it has higher requirements for the dynamic evolution process of the calculation process, large amount of calculation and easy to produce errors [9]. In order to solve the above problems, this paper proposes a knowledge fusion method of power grid model based on Seq2seq semi-pointer and semi-label method. This method aims to improve the performance and application range of the power grid model by combining the topological structure information and text semantic information of the power grid model.

Specifically, firstly, the semantic information extraction method of power grid model based on topological TextRank algorithm is used to extract key semantic information from the text description of power grid model. Then, with the help of the knowledge fusion strategy of half pointer and half annotation, the extracted semantic information is fused with the topological structure information of the power grid model to obtain a more comprehensive and accurate representation of the power grid model. The method proposed in this paper realizes the effective fusion of semantic information and topology information of power grid model, and can effectively improve the performance of power grid model. It brings new research ideas and method improvements to the field of power grid modeling and knowledge fusion, and helps to promote the further development of this field.

2 Seq2seq power grid model knowledge fusion method

Use Text Rank algorithm to calculate the weight of each grid model semantic node [10], and extract the grid model semantic information according to the final weight value in combination with the topological potential. The extracted information provides additional information for the subsequent generative grid model; Then, a half pointer and half label model framework based on Seq2Seq (Sequence to Sequence) is constructed, and the extracted candidate grid model semantics and the original grid model semantics are coded respectively. Then, the grid model semantics are fused and sent to the encoder of Seq2Seq. After training, the model will be able to extract effective information from the grid model and complete the grid model knowledge fusion.

2.1 Semantic information extraction of power grid model based on topological TextRank algorithm

The implementation process of TextRank algorithm is as follows:

Set up {U₁, U₂, ⋯ U_n } as a collection of semantic nodes of the power grid model, which is used to build the network diagram of the power grid model G = (U, F, W) represents the network diagram of power grid model, where F is an edge set, W is a weight set, and the Text Rank algorithm is used to solve the weight of each semantic node of the power grid model [11].

The expression of probability transfer matrix between semantic nodes of power grid model is as follows: ${QM}_{n \times n} = [\begin{matrix} ϑ_{11} & ϑ_{12} & \dots & ϑ_{1 n} \\ ϑ_{21} & ϑ_{22} & \dots & ϑ_{2 n} \\ ⋮ & ⋮ & ⋮ \\ ϑ_{n 1} & ϑ_{n 2} & \dots & ϑ_{nn} \end{matrix}]$ (1)

Through the probability matrix between the semantic nodes of the power grid model and the network diagram of the power grid model, the weight of each semantic node of the power grid model can be calculated iteratively. The formula is as follows:

$\begin{matrix} WQ (U_{i}) = \sum_{U_{j} \in IN (U_{i})} \frac{ϑ_{ij}}{\sum_{U_{k} \in OUT (U_{j})} ϑ_{jk}} WQ (U_{j}) \\ \times α + (1 - α) \end{matrix}$ (2) where: $WQ (U_{i}) = \sum_{U_{j} \in IN (U_{i})} \frac{w_{ij}}{\sum_{U_{k} \in OUT (U_{j})} w_{jk}} WQ (U_{j}) \times α + (1 - α)$ is the node weight value; α ∈ (0, 1) is the damping coefficient, which is used to indicate the probability of each grid model semantic node jumping to any grid model semantic node in the grid model network diagram. It is generally set to 0.85;

IN (U_i) represent the semantic node pointing to the power grid model U_i set of all semantic nodes of power grid model, OUT (U_j) represent the semantic node of power grid model U_j the set of semantic nodes of the grid model pointed to; w_ij represent the similarity between semantic node of power grid model U_i and semantic node with power grid model U_j; WQ (U_j) represents the weight value after the last iteration; ϑ is the probability matrix element weight [12].

At the same time, pay attention to the weight of the semantic nodes of the power grid model. Generally, the initial value weight of all the semantic nodes of the power grid model is set to 1 [13]. After multiple iterations, the weight value of each semantic node of the power grid model tends to be stable and converges,

A₀ = (1, 1, ⋯ , 1) ^T, and the convergence formula is as follows: $A_{0} = A_{i - 1} \times {QM}_{n \times n}$ (3)

The above formula is used to calculate the results of each iteration. When the difference between the two iteration results is close to 0, the calculation will be stopped, and the weight value vector containing the semantic nodes of each grid model can be obtained. Then, the weight value vector will be sorted according to the size of the weight value, and the content will be selected. The main steps of semantic information extraction of power grid model are as follows:

Step 1: Pretreatment. Complete segmentation and extraction, part of speech tagging and other operations of the semantic information of the power grid model [14], Z ={ Z₁, Z₂, ⋯ Z_n } semantic information set of power grid model.

Step 2: Calculate the similarity between the semantic information of the power grid model. The similarity calculation is based on the content overlap rate between the semantic information of the power grid model. The number of identical terms contained in the semantic information of the two power grid models is calculated [15]. The formula is as follows: $\begin{matrix} Similarity (Z_{i}, Z_{j}) = \\ | {ϑ_{k} | ϑ_{k} \in Z_{i} & ϑ_{k} \in Z_{j}} | / log (| Z_{i} |) + log (| Z_{j} |) \end{matrix}$ (4)

If there is semantic correlation between two edges, they can be used to form an edge. The weight of the edge is: $ϑ_{ij} = Similarity (Z_{i}, Z_{j})$ (5)

Step 3: Use formula (2) to iteratively calculate the semantic information weight of the power grid model until convergence, and get the semantic information score of each power grid model.

Step 4: Sort and select. The scores in the third step are used for ranking, and the most important semantic information of the power grid model is selected as the candidate semantic information of the power grid model [16].

Step 5: Compose the semantic information of the power grid model. Select the final content from the fourth step according to the relevant requirements.

TextRank algorithm constructs a term graph through the co-occurrence relationship of words in the grid model semantics in the window in the iterative process, and the weights on both sides are transferred evenly, without considering the structure of the grid model semantic information. In this paper, the TPC TextRank (Topology Potential Combined TextRank) algorithm, which combines topological potential with TextRank algorithm, is proposed by comprehensively considering the local and global importance of words in the semantic information of power grid model.

(1) Local importance.In the semantics of power grid model, words are often combined by grammatical rules. If a sentence is abstracted as a network with words as its vertices, the vertices will interact with each other due to the syntactic structure [17]. The topological potential value represents the joint action of nodes in the network by other nodes within the influence range. After the calculation method is modified, the influence between two individual nodes can be obtained, which is used as the transfer probability between words in the semantics of the power grid model. The calculation is as follows: $ϖ_{ij} = e^{- {(\frac{Γ_{ij}}{σ})}^{2}} \times b_{j}$ (6) where: b_j represent the semantic node of power grid model U_j and the inherent attribute of is always 1; Γ_ij represents the shortest distance between two semantic nodes of interconnected power grid model, which is measured by hops; σ is an influence factor, which is mainly used to control the influence range of the semantic node of the power grid model. Its optimal value is usually determined according to the potential entropy, but the TextRank algorithm is based on the window size ϒ to determine the co-occurrence relationship, it can be determined by the following formula σ: $σ = MAX {σ | | \underline{} 3 σ / \sqrt{2} \underline{} | = ϒ |}$ (7)

Γ_ij as the shortest distance between semantic nodes of two power grid models. Set up Γ_ij = |Index (u_i) - Index (u_j) |, when Γ_ij > ϒ, Γ_ij = 0; When Γ_ijϒ at this time Γ_ij = |Index (u_i) - Index (u_j) | in order to improve the transfer probability matrix of TextRank algorithm, the topological potential between words in the semantics of the power grid model is calculated as the weight of the connecting edge.

(2) Global importance. TextRank algorithm adds semantic information such as word frequency and word distribution to eliminate the impact of high-frequency words in power grid models. The implementation of inherent attributes of nodes mainly includes two parts: ψ (u_j) indicates the number of occurrences of semantics in the power grid model; P (u_j) represents the distribution of semantic information in the power grid model. It should be noted that if the grid model semantics are more widely distributed in the grid model, it is more likely to become the key grid model semantics [18]. The calculation of ψ (u_j) and P (u_j) is shown in formula (8): $ψ (u_{j}) = \frac{Number of occurrences of semantic words u_{j} in the power grid model}{Total of all semantic words}$ (8)

λ_j is the inherent attribute of the semantic node of the power grid model, and its expression is as follows: $λ_{j} = ψ (u_{j}) \times P (u_{j})$ (9)

If formula (10) is substituted into formula (6), the calculation formula of transfer probability is: $ϖ_{ij} = e^{- {(\frac{Γ_{ij}}{σ})}^{2}} \times [ψ (u_{j}) \times P (u_{j})]$ (10)

The specific process of TCP Text Rank algorithm is shown in Fig. 1.

It can be seen from Fig. 1 that the specific process of TCP Text Rank algorithm is as follows:

Fig. 1

Specific process of TCP text rank algorithm.

Step 1: Input power grid model data;

Step 2: Preprocess the grid model data;

Step 3: Calculate the global importance of the semantic words of the power grid model according to formula (10);

Step 4: Set the sliding window size to ϒ. In the window, the semantic map of the power grid model is constructed through the co-occurrence relationship of the semantic words of the power grid model in the power grid model [19].

Step 5: Calculate the topological potential value of the semantic words of the power grid model based on the global importance of the semantic words of the power grid model and the use formula (10) of the semantic map of the power grid model, as the transfer probability of the semantic words of the power grid model.

Step 6: Construct transfer probability matrix by combining the semantic map of power grid model and transfer probability.

Step 7: Iteratively calculate the weight of the semantic node of the power grid model according to formula (2) until convergence.

Step 8: The semantic words of the power grid model are arranged in reverse order of importance and the keyword set of the power grid model is output.

2.2 Knowledge fusion of power grid model

After extracting the semantic information of the power grid model, the TCP Text Rank algorithm uses the power grid model knowledge fusion method based on the Seq2seq half pointer half label method to improve the power grid model information fusion capability. The structure diagram of the power grid model knowledge fusion model based on the Seq2seq half pointer half label method is shown in Fig. 2.

Fig. 2

Structure diagram of a knowledge fusion model for power grid models based on Seq2seq half pointer and half label method.

It can be seen from Fig. 2 that the structure of the knowledge fusion model of the power grid model based on the Seq2seq half pointer half label method uses a single-layer bidirectional LSTM (Bi LSTM) as the model encoder, encodes the original semantics of the power grid model and extracts the semantics of the power grid model, and uses a one-way LSTM as the decoder for decoding operations [20]. For Seq2Seq half pointer half label method model training each step i, corresponding to the original text of the power grid model and the semantic keyword embedding of the extracted candidate power grid model respectively $w_{i}^{s}$ and $w_{i}^{e}$ . It will be sent to the encoder together, and the corresponding hidden state will be generated at the same time t, the decoder will start from step t - 1 the power grid model receives word embedding, which is obtained according to the previous semantic word of the power grid model during the training process, or provided by the decoder during the test. Then get the hidden status g_t, generate semantic vocabulary distribution of power grid model P (y_t).

Thus, the hidden state of the original semantic text of the power grid model can be obtained ${\overset{\leftrightarrow}{g}}_{i}^{s}$ , the calculation formula is as follows: ${\overset{⇀}{g}}_{i}^{s} = LSTM (w_{i}^{s}, {\overset{⇀}{g}}_{i - 1}^{s})$ (11) ${\overset{\leftarrow}{g}}_{i}^{s} = LSTM (w_{i}^{s}, {\overset{\leftarrow}{g}}_{i + 1}^{s})$ (12) ${\overset{\leftrightarrow}{g}}_{i}^{s} = [{\overset{⇀}{g}}_{i}^{s}; {\overset{↼}{g}}_{i}^{s}]$ (13)

At the same time, the hidden state of the extracted grid model semantics can be obtained. The specific formula is as follows: ${\overset{⇀}{g}}_{i}^{e} = LSTM (w_{i}^{e}, {\overset{⇀}{g}}_{i - 1}^{e})$ (14) ${\overset{\leftarrow}{g}}_{i}^{e} = LSTM (w_{i}^{e}, {\overset{\leftarrow}{g}}_{i + 1}^{e})$ (15) ${\overset{\leftrightarrow}{g}}_{i}^{e} = [{\overset{⇀}{g}}_{i}^{e}; {\overset{↼}{g}}_{i}^{e}]$ (16)

Then, the encoder hides the output of the layer g_i from ${\overset{\leftrightarrow}{g}}_{i}^{s}$ and ${\overset{\leftrightarrow}{g}}_{i}^{e}$ integrated, making full use of the original power grid model and extracting the semantic information of the power grid model. The specific formula is as follows: $g_{i} = [{\overset{\leftrightarrow}{g}}_{i}^{s}; {\overset{\leftrightarrow}{g}}_{i}^{e}]$ (17)

Then, the attention mechanism [19] is introduced to calculate the hidden state of the decoder s_t and g_i correlation between e_it, the calculation formula is as follows: $e_{it} = tanh (κ_{g} g_{i} + κ_{s} s_{t} + b_{attn}) \times V^{T}$ (18)

Where: tanh() is the activation function; g_i and s_t represent the output of the encoder and decoder hidden layers respectively; V, κ_g, κ_s and b_attn are the parameter learned by model training. From η_it context vector ɛ_t can be obtained, the formula is as follows: $η_{it} = \frac{e_{it}}{\sum_{j = 1}^{n} e_{jt}}$ (19) $ɛ_{t} = \sum_{i = 1}^{n} η_{it} \times g_{i}$ (20) Where: n is the total number of semantic words of power grid model, η_it is the attention distribution of the grid model semantic words input into the encoder, which can be regarded as the probability distribution on the original text of the grid model. It can make the decoder pay attention to some important grid model words, which is helpful to extract more important grid model semantic information, obtain the grid model text information representation, and promote the generation of the grid model.

In this model, firstly, the Seq2Seq model based on attention mechanism is used to construct, and the key semantic information of power grid model is extracted based on the combination of topological potential and TextRank algorithm. Then, in the model coding layer, special characters [SEP] are used to fuse the extracted grid model candidate semantics with the original grid model semantics for information knowledge fusion. The role of special characters [SEP] is to separate the special symbols of discontinuous token sequences, with the purpose of distinguishing the original grid model semantics and extracting the grid model semantics. Then, encoders are used to encode the original power grid model and extract the semantics of the power grid model. Through the encoders, the respective hidden states can be obtained. Then, the corresponding semantic vectors of the power grid model on the hidden layer of the encoder are spliced to achieve information fusion. Then, the fused semantic vectors of the power grid model are sent into the model for training. After training, the model will be able to learn that the content after the [SEP] mark is the extracted grid model text, and extract the effective information to assist in completing the grid model data fusion task.

The power grid model knowledge fusion model encoder based on Seq2seq half pointer half label method is a single-layer bidirectional LSTM. Long and short term memory unit LSTM is an improvement of the common recurrent neural network (RNN). By introducing memory cells into each neuron of the hidden layer, and using three gating units, namely, forgetting gate, input gate, and output gate, to control the state of memory units, it solves the long-term dependence problem of ordinary RNN that cannot learn sequences due to gradient disappearance.

The memory cell memorizes the historical information of sequence data together with the hidden state. The information in the memory unit is controlled by three door control units. The forget gate is based on the hidden state of the previous moment g_t-1 and current time input x_t delete the information in the memory unit. The forgotten door is calculated as follows: $f_{t} = d_{f} + δ (W_{f} [g_{t - 1}, x_{t}])$ (21) where: f_t is for forgotten gate the time output of t; W_f is the weight of forgetting gate; d_f is the forgetting gate offset; g_t-1 is hidden at the previous time; x_t is input the current time; δ is activate the function for Sigmaid.

The input gate is hidden according to the previous time g_t-1 and current time input x_t add information to the memory cell, and the calculation formula is shown in formula (22) and formula (23). $l_{t} = d_{l} + δ (W_{l} [g_{t - 1}, x_{t}])$ (22) ${\hat{R}}_{t} = d_{e} + tan g (W_{e} [g_{t - 1}, x_{t}])$ (23) where: l_t is information to be memorized; ${\hat{R}}_{t}$ is a candidate memory unit for updating the memory unit; W_l, W_e is the weight of the input gate; d_l, d_e is the offset of the input door.

After the calculation of forgetting gate and output gate is completed, the memory unit is updated with formula (24). $R_{t}^{'} = l_{t} \times {\hat{R}}_{t} + f_{t} \times R_{t - 1}$ (24) where: $R_{t}^{'}$ is the updated memory unit; R_t-1 refers to the information in the memory unit of the previous time.

The output gate is based on the hidden state of the previous time g_t-1 input of current time x_t and the updated memory unit $R_{t}^{'}$ determine the hidden status at the current time g_t the calculation formula is formula (25) and formula (26). $V_{t} = d_{V} + δ (W_{V} [g_{t - 1}, x_{t}])$ (25) $h_{t} = V_{t} * tan g (R_{t}^{'})$ (26) where: V_t is output gate output; W_V is the weight of the output gate; d_V is the offset of the output gate.

BeamSearch is mainly used in the Seq2Seq coding stage. Cluster search is a heuristic graph search algorithm [20]. Generally, when the solution space is very large, it will filter out some nodes with poor solution space quality and retain a fixed number of high-quality nodes in each step of depth expansion, thus reducing the space resources occupied by search and improving the time efficiency of search.

In the process of decoding, cluster search t - 1 in the time step, the decoder will track K assumptions, of which K is the preset cluster size, and their corresponding scores are expressed as Z (y_1:t-1|x)= log p (y₁, y₂, ⋯ , y_t-1|x). When moving to time t the decoder will expand each hypothesis (expressed as $y_{1}^{k} = {y_{1}^{k}, y_{2}^{k}, \dots y_{t - 1}^{k}}, k \in [1, K]$ ) will select K each hypothesis can be expressed as $y_{1}^{k, k^{'}} = k^{'} \in [1, K]$ , so we can get K × K assumptions: $[Y_{t - 1}^{k}, y_{t}^{k, k^{'}}], k \in [1, K], k^{'} \in [1, K]$ (27)

The score of each hypothesis can be expressed as: $\begin{matrix} Z (y_{1 : t - 1}^{k}, y_{t}^{k, k^{'}} | x) = Z (y_{1 : t - 1} | x) \\ + log p (y_{t}^{k, k^{'}} | x, y_{1 : t - 1}^{k}) \end{matrix}$ (28)

Then, from K × K to choose the one with the highest score among the possibilities K, the others may ignore repeating this process until the decoding process is completed, and improve the data fusion rate of Seq2seq for power grid model through cluster search.

3 Experimental Analysis and Results

To meet the computational requirements and ensure the smooth running of the experiments, a computer equipped with an Intel Core i9-11900K processor with 32 GB of DDR4 RAM was used, and a 1TB SSD was also mounted. In order to accelerate the training process of the model calculation, NVIDIA GeForce RTX 3080 graphics card is used, which has excellent performance and CUDA support, which is able to significantly shorten the model training time. In terms of software, the Windows 10 operating system was used as the experimental platform. Python version 3.8 is selected as the programming language, and PyTorch 1.9.0 deep learning framework is used to build and train the Seq2seq model.

Taking the substation power grid model in a 220 kV power grid as an example, the specific process of data fusion of substation power grid model data from two different data sources, namely, dispatching automation system OMS and production management system (PMS), is selected to test the practicability of the knowledge fusion method proposed in this paper. The basic data provided by each data source is shown in Tables 1 and 2. In the model training process of this method, the encoder hidden layer uses a single-layer bidirectional LSTM network, the number of hidden units is 256, the word vector dimension is 128, and the decoder hidden layer uses a single-layer unidirectional LSTM network.

From the original data provided in Tables 1 and 2, for the same substation power grid model, the data information provided by different data sources has both mutually repetitive and complementary parts. Even if it is the same basic attribute, the representation of two different data sources are different, so it is necessary to conduct knowledge fusion processing on the original data in the table, so that the description of the basic attributes of the substation power grid model is standardized, the unit is unified, and the data precision is consistent.

Table 1
Basic Information of substations in OMS

Basic attributes Raw data

Power factor 0.96

Device Name YY

District YY Province YY City YY Bureau

Voltage level 380 kv

Transmission maximum 2080 kW

active power

Capacity 400 MVA

Maximum load rate 0.86

Peak reactive power 210 Mvar

Basic attributes	Raw data
Power factor	0.96
Device Name	YY
District	YY Province YY City YY Bureau
Voltage level	380 kv
Transmission maximum	2080 kW
active power
Capacity	400 MVA
Maximum load rate	0.86
Peak reactive power	210 Mvar

Table 2

Basic information of substations in PMS

Basic attributes	Raw data
Device Name	YY substation
Device location	YY City, YY Province
Operation time	2022-03-36
Capacity	80 MVA
Voltage level	380 kv
Reactive power reserve	120 Mvar
High voltage side wiring method	Double busbar
Number of used incoming lines on the high-voltage side	10
Planned number of incoming lines	7

The method in this paper is used to fuse the grid model data in Tables 1 and 2 to obtain the substation grid model data after data fusion, as shown in Table 3.

Comparing the fused data in Table 3 with the original data in Tables 1 and 2, it can be seen that the data of the substation power grid model after knowledge fusion using this method is more comprehensive, systematic and standardized. The fused basic attributes contain the basic attributes of all data sources, eliminate the redundant parts of the basic attributes of each data source, and the description of each basic attribute is more standardized, unified and perfect. It is proved that the method in this paper is more practical for power grid model data processing.

With the support of the grid model data fusion as the measurement index, the support of the method in this paper is tested when the grid model data dimensions are different. The test results are shown in Fig. 3.

Table 3

Fused power grid model data

Basic attributes	Detailed description
District	YY Province YY City YY Bureau
Device Name	YY substation
Power factor	0.96
Capacity/MVA	400
Voltage level/kv	380
Transmission maximum active power/kW	2080 kW
Reactive power reserve/Mvar	210
Maximum load rate	0.86
Peak reactive power/Mvar	120
High voltage side wiring method	Double busbar
Number of used incoming lines on the high-voltage side	10
Planned number of incoming lines	7
Operation time	2022-03-36

It can be seen from the analysis of Fig. 3 that the support value of the method in this paper when fusing the grid model data decreases with the increase of the grid model data dimension. Before the grid model data dimension is 2, the support of this method when fusing the grid model data is 100%. With the increase of the grid model data dimension, the support when fusing the grid model data is slightly reduced. When the grid model data dimension is 7, the support of this method when integrating the grid model data is about 98%. The results show that the support of the method in this paper is less affected by its dimensions when fusing power grid model data, and the ability of fusing power grid model data is better. These results further verify the practicability and effectiveness of the proposed method. It still maintains a high fusion support degree on high-dimensional data, which shows the superior performance of this method for power grid model data fusion. This also shows that the proposed method can cope well with various dimensions of power grid model data and achieve efficient knowledge fusion in practical power grid modeling applications.

Fig. 3

Data fusion test results of power grid model.

In the experimental data set, select 8000 manually labeled triple grid model data, use 80% data training, and 20% data testing. The accuracy rate, recall rate and F1 value are selected as the evaluation indicators of the fusion performance of the test method, and the method in this paper is used to conduct knowledge fusion on the data of the three tuple power grid model. The changes of the accuracy rate, recall rate and F1 value of the training set test set with the number of iterations during model training are obtained, as shown in Fig. 4.

It can be seen from the analysis of Fig. 4 that the method in this paper is used to conduct knowledge fusion on the data of the triple grid model. The model training of the method in this paper is stable after 120 iterations. We further verify the effectiveness of the proposed method through specific quantitative indicators of the experimental results. On the test set, the proposed method achieves an accuracy of 0.98, which reflects its excellent performance in the task of power grid model knowledge fusion. At the same time, the recall rate is 0.93, which means that the proposed method can obtain the key information in the power grid model comprehensively. However, the f1 score reaches 0.91, which further proves that the method achieves a good balance between precision and recall. According to these comprehensive evaluation indicators, we can be sure that the proposed method shows high efficiency and stability in power grid model knowledge fusion. This not only verifies the effectiveness of the method, but also lays a solid foundation for its wide application in the field of power grid modeling.

Fig. 4

Changes in metrics during the training of design modol.

When the amount of power grid model data is 10 GB–90 GB, the method in this paper is used for knowledge fusion of power grid model data, and the data processing speed of this method is obtained. The test results are shown in Fig. 5.

It can be seen from the analysis of Fig. 5 that with the increase of the amount of power grid model data, the data processing speed of the method in this paper for knowledge fusion of power grid model data also increases, but the data processing speed is less than 160 s, indicating that the method in this paper for knowledge fusion of power grid model data has faster processing efficiency. This result strongly demonstrates the efficiency of the proposed method. Such a short processing time is actually a very excellent performance when dealing with large-scale power grid model data, which further verifies the superiority of the proposed knowledge fusion method for power grid model data in processing efficiency.

Fig. 5

Data processing speed.

The higher the data integrity, the better the data fusion effect. In order to verify the effect of grid model data fusion of the method proposed in this paper, select grid model privacy data for fusion, and analyze the integrity of privacy data. The results are shown in Fig. 6.

Fig. 6

Data integrity test results of knowledge fusion method of power grid model based on seq2seq half pointer and half label method.

As can be seen from Fig. 6, with the increase of the number of private data of the power grid model, the data integrity of the method in this paper presents a downward trend. Using the method proposed in this paper, the average integrity of the private data of the power grid model is 98.86%, which shows that the method in this paper has a better effect on the fusion of the private data of the power grid model, can better ensure the data integrity, and has better performance. This result also further shows the potential and value of the proposed method in practical applications. In the face of a large number of private data, this method can not only achieve efficient knowledge fusion, but also maintain high data integrity, which is of great significance for the further development of the power grid modeling field.

The experimental comparison methods are literature [8] method integration evidence data method, literature [9] R multivariable data integration method, literature [10] GIS data integration method, literature [11] knowledge hypergraph data integration method, literature [12] deep learning multi-mode data integration method. When the grid model data volume is 10 GB–90 GB, the methods in this paper, literature [8] method, literature [9] method, literature [10] method, literature [11] method and literature [12] method are used for data fusion of power grid model data with different data volumes, and the mean square error comparison results of several methods are obtained through variance analysis method, as shown in Table 4.

Table 4

Comparison results of Mean squared error of six methods

Grid model	Mean squared	Mean squared error after fusion
data volume/GB	error before fusion	Proposed method	Reference [8] Method	Reference [9] Method	Reference [10] Method	Reference [11] Method	Reference [12] Method
10	4.69	2.06	2.95	3.24	3.94	3.97	4.00
20	5.03	3.64	4.53	4.82	4.56	4.59	4.62
30	5.73	3.26	4.15	4.44	5.14	5.17	5.20
40	5.73	3.27	4.16	4.45	5.15	5.18	5.21
50	4.36	2.37	3.26	3.55	4.25	4.28	4.31
60	4.35	1.52	2.41	2.70	3.40	3.43	3.46
70	3.99	1.76	2.65	2.94	3.64	3.67	3.70
80	4.06	1.23	2.12	2.41	3.11	3.14	3.17
90	5.26	3.07	3.96	4.25	4.95	4.98	5.01

According to the experimental results in Table 4, the mean square error of the six grid model data fusion methods is higher than the mean square error of the grid model data fusion before. For the grid models with different data volumes, the mean square error of the grid model data fused with this method is lower than the mean square error of the other five methods, which shows that the fusion accuracy of this method is relatively high. This is mainly because the Seq2seq half pointer and half label method is selected to fuse the data before the fusion in this method, which effectively improves the fusion accuracy. Through this method, the characteristics and information of the original data can be better retained, the information loss in the fusion process can be reduced, and the accuracy of the fusion can be improved.

Combined with the previous experimental results, the proposed method shows superior performance in processing power grid model data fusion. However, just as any method has its limitations, the proposed method is no exception. The performance of the proposed method strongly depends on the quality of the input data. If the input power grid model data has noise, outliers or inconsistencies, it may affect the results of data fusion. Therefore, it is crucial to preprocess and clean the data before applying the proposed method.

4 Conclusion

In this paper, the problem of power grid model data fusion is studied, and a method based on Seq2seq half pointer and half label is proposed. The effectiveness and superiority of the method are verified by multiple experiments. This study is important to improve the accuracy and reliability of the power grid model. In the research process, the performance of different data fusion methods is compared through experiments, and the processing efficiency, fusion accuracy and private data integrity of the proposed method are analyzed in detail. The experimental results show that the proposed method can still maintain a fast processing speed when dealing with large-scale power grid model data, and also performs well in terms of data fusion accuracy and private data integrity. However, the proposed method still has some limitations, such as the dependence on the quality of the input data. In the future, the robustness of the method will be further improved to cope with the impact of noise and outliers on data fusion. By continuously improving and perfecting these methods, it is expected to achieve higher data fusion accuracy and efficiency in the field of power grid modeling and make greater contributions to the development of smart grid.

Author contribution statement

Yuzhong Zhou initiated the study and Zhengping Lin wrote the review. Four people jointly carried out the experimental research, and Zhengrong Wu recorded and counted the experimental data. Yuzhong Zhou and Zhengrong Wu jointly completed the analysis of experimental data and the writing of experimental contents. Zhengping Lin provided the initial guidance for the paper, and Zifeng Zhang objectively reviewed the paper and contributed to the further improvement and revision of the paper.

References

Lamnatou

, Chemisana

and Cristofari

, Smart grids and smart technologies in relation to photovoltaics, storage systems, buildings and the environment, Renewable Energy 185(2) (2022), 1376–1391.

Mina-Casaran

J.D.

, Echeverry

D.F.

and Lozano

C.A.

, Demand response integration in microgrid planning as a strategy for energy transition in power systems, IET Renewable Power Generation 15(4) (2021), 889–902.

Kirilenko

, -Gong

and Chung

C.Y.

, A framework for power system operational planning under uncertainty using coherent risk measures, IEEE Transactions on Power Systems 36(05) (2021), 4376–4386.

Shahidehpour

, Li

, Wang

, Huang

and Nengling

, Two-stage full-data processing for microgrid planning with high penetrations of renewable energy sources, IEEE Transactions on Sustainable Energy 12(4) (2021), 2042–2052.

Rocco

C.M.

, Barker

and Ramirez-Marquez

J.E.

, Multi-objective power grid interdiction model considering network synchronizability, International Journal of Performability Engineering 17(7) (2021), 609–618.

Wei

X.M.

and Peng

C.H.

, Credibility model of internet of things security data fusion based on prim, Computer Simulation 39(06) (2022) 421–424+443.

Wang Xin , Zhao Long , Zhang Shujuan , Wang Yu , Qin Dandan , Sun Wei , Self-organizing map based knowledge fusion algorithm for big data environment of power grid[J], Journal of Hefei University of Technology:Natural Science 45(5) (2022), 620–624+653.

Liu Dong , Zhang Yue , Pi Junbo , Shan Lianfei , Liu He , Song Pengcheng , Jiang Tao , Construction and application of knowledge graph for intelligent retrieval of power grid regulation information[J], Electric Power 56(7) (2023), 78–84.

Li Xiaolu , Zuo Xuan , Liu Riliang , Lu Yiming , Li Congli , Lin Shunfu , SHACL-Based Validation Method of Knowledge Graph for Power System Model[J], Electric Power 55(1) (2022), 119–125+228.

10.

Baral

, Poumand

, Adhikari

, Abediniangerabi

and Shahandashti

, Gis-based data integration approach for rainfall-induced slope failure susceptibility mapping in clayey soils, Natural Hazards Review 22(3) (2021), 4021026.1–4021026.14.

11.

Masmoudi

, Abdallah

S.B.

, Zghal

H.B.

, Archimede

and Karray

M.H.

, Knowledge hypergraph-based approach for data integration and querying: application to earth observation, Future Generation Computer Systems 115(1) (2021), 720–740.

12.

Race

A.M.

, Sutton

, Hamm

, Maglennon

and Bunch

, Deep learning-based annotation transfer between molecular imaging modalities: an automated workflow for multimodal data integration, Analytical Chemistry 93(6) (2021), 3061–3071.

13.

Lyu

, Gao

and Li

, A partial charging curve-based data-fusion-model method for capacity estimation of li-ion battery, Journal of Power Sources 483(Jan. 31) (2291), 229131.1–229131.13.

14.

Y.X. A, Y.L. A, J.Y. A, S.G. A, Y.X. B and Z.L. A, History-based attention in seq2seq model for multi-label text classification, Knowledge-Based Systems 224(Jul.19) (2021), 107094.1–107094.11.

15.

Geng

, Text segmentation for patent claim simplification via bidirectional long-short term memory and conditional random field, Computational Intelligence 38(1) (2022), 205–215.

16.

Qiu

and Zheng

, Improving textrank algorithm for automatic keyword extraction with tolerance rough set, International Journal of Fuzzy Systems 24(3) (2022), 1332–1342.

17.

Brek

and Boufaida

, Enhancing information extraction process in job recommendation using semantic technology, International Journal of Performability Engineering 18(5) (2022), 369–379.

18.

Richards

R.J.

and Paul

, An attention-driven long short-term memory network for high throughput virtual screening of organic photovoltaic candidate molecules, Solar Energy 224(August 2021) (2021), 43–50.

19.

Kwon

and Shin

, Optical proximity correction using bidirectional recurrent neural network with attention mechanism, IEEE Transactions on Semiconductor Manufacturing 34(2) (2021), 168–176.

20.

Libralesso

, Focke

P.A.

, Secardin

and Jost

, Iterative beam search algorithms for the permutation flowshop, European Journal of Operational Research 301(1) (2022), 217–234.