Abstract
Nowadays, the multi-sensor information fusion algorithm of the integrated power grid operation system based on Bayesian network is disturbed by the high flow data, causing that the single data fusion level and large convergence error. Therefore, an intelligent fusion algorithm of multi-sensor information in integrated power grid system is proposed. According to the asynchronous aggregation distribution construction algorithm based on hierarchical clustering, and in accordance with the hierarchical clustering, all nodes are put to aggregate and construct a collection tree according to the distance, then calculate the optimal grouping number. Then based on the number of grouping, grouping is implemented. According to asynchronous distributed strategy, selection of the optimal aggregation nodes and construction of the optimal transmission topology are carried out, to quickly find the aggregation mode of sensor data in power grid with minimal overhead, in order to reduce the data flow of power grid. In the aggregation distribution environment of multi-sensor, based on the principle of multi-sensor information fusion and detection in the integrated power grid operation system, the information fusion abstract model of the integrated power grid operation system is applied. The multi-sensor information fusion is divided into three levels: data level, feature level and decision level. The functional structure of multi-sensor information fusion can realize the effective fusion of multi-sensor information. The experimental results show that the proposed algorithm has a high accuracy and stability of information fusion, and can reduce the loss of the power grid.
Keywords
Introduction
With the rapid development of power grid, today’s power grid system has become the characteristics of high order nonlinear and high complexity [1]. At present, the integration of power grid operation system in intelligent information fusion is relatively backward, which is mainly reflected in that the information processing is still at the bottom stage of digital signal processing, information collection is more repetitive, information optimization is not realized, and the application of information is too simple, which results in the defects of the control system in dealing with the edge problem. Information fusion technology is the information processing process, which uses computer technology to automatically analyze and integrate observation information of several sensors acquired in time sequence under certain criteria to complete the required decision and estimation tasks [2, 3]. The application of intelligent fusion technology for multi-sensor information in the integrated power grid operation system is the inevitable trend of the development of the power grid system in the future. Through the rational distribution and use of sensors and their source information, the redundancy or complementary information of multiple sensors in space and time can be combined according to a certain criterion to get consistent interpretation or description of the tested objects. Then, through the combination of data, more information are derived, to get the result of the best synergy [4]. Nowadays, the intelligent fusion algorithm for multi-sensor information in the integrated power grid operation system based on Bayesian network is disturbed by the high flow data, and there are the problems of single data fusion level and the larger convergence error [5]. To solve this problem, an intelligent fusion algorithm for multi-sensor information in integrated power grid operation system is proposed, which enhances the accuracy of information fusion and ensures stable operation of power grid.
Intelligent fusion algorithm for multi-sensor information in integrated power grid operation system
Asynchronous distributed algorithm based on hierarchical clustering
A large number of wireless sensors are distributed in the integrated power grid, which are used to monitor the operation state information of the power grid equipment and users. The original monitoring data are collected to the data processing center, which will bring enormous data traffic pressure to the data acquisition communication network. The strategy of data aggregation in the process of data acquisition will greatly reduce the data traffic and reduce the overhead of the communication network. Therefore, the choice of the aggregate node and the construction of the aggregation topology become the key problem of the data acquisition of the smart grid [6]. In this paper, an asynchronous aggregation distribution structure algorithm based on hierarchical clustering is proposed. In this algorithm, all nodes are firstly constructed to a collection tree according to hierarchical clustering and the distance between nodes. The optimal number of grouping is then calculated and grouped according to the number of the groups. According to asynchronous distributed strategy, selection of the optimal aggregation nodes and construction of the optimal transmission topology are carried out, to quickly find the aggregation mode of sensor data in power grid with minimal overhead [7], in order to reduce the data flow of power grid, and improve the efficiency of integrated power grid operation system.
Construction of collection tree
In this section, a hierarchical clustering method is used to construct a collection tree of integrated power grid data. The construction method is shown in Fig. 1. Each sensor in the grid is regarded as a class. According to the N classes, it is hierarchical aggregated according to the nearest principle of distance [8]. If the two classes are A ={ a1,a2, …, a
m
} and B ={ b1,b2, …, b
n
}, respectively, the distance between a
i
and b
j
is d
ij
, the distance between the classes is recorded as d
AB
, and the calculation formula is as follows:

The structural flow chart of the tree.
In the construction of the binary tree, the new category is increasing on the basis of the original. Assuming that there are 7 classes initially, the nearest two classes are 4, 5, and the newly generated class is 8 after the combination of 4 and 5, as shown in Fig. 2(a), followed the new class 9 is generated. The collection tree, which is finally constructed in this way, is shown in Fig. 2(c).

Acquisition of tree structure.
Because the purpose of this section is to find the best layout of a data aggregation, the cost is minimal when the data is transferred along the layout. When the number of groups is large, the cost in the group is reduced and the cost between groups will increase; when the number of groups is less, the cost in the group is increased and the inter-group cost is reduced. How to determine an optimal number of groups to make the sum of the cost of intra-group and inter-group minimum [9], is the main problem to be solved in this section. In order to solve this problem, two factors that affect the grouping effect, the intra-group separation degree CI and the inter-group separation degree CE are proposed in this section. Specifically, supposing that there is N sensor nodes in the network, which are divided into n groups. The number of the intra-group’s nodes in the r-th group is represented by m r , then the n groups of sensors can be represented as set B ={ C1, C2, … C n }, C n of which represents the n-th sensor group. d ij represents the distance cost between any two nodes in the same group.
(1) The separation degree of the intra-group CI: the average cost of the two nodes in group C
r
is shown by formula (2), C (m
r
, 2) indicates the m
r
nodes for the r-th group. If it wants to calculate the cost between any two nodes, the total number of computation needs to be calculated.
The variance of cost between two nodes in a group is
(2) The separation degree of inter-group CE: under the same premise, each group is regarded as a point. The average value of all node coordinates in the group is used as the coordinate of the group. The link cost between the two groups is expressed by D
ij
. C (n, 2) is for n groups, and the total number of calculations is required to calculate the cost between any two groups. The average of the cost between the two groups is:
The separation degree of inter-group can be calculated by
The above formula shows that the more the number of groups is, the smaller the separation degree CI of intra-group is, and the more closely the distribution of the nodes in the group is. But the change rate of CI decreases with the increase of the number of nodes. The greater the separation degree CE of inter-group is, the better the grouping effect is, but CE does not change monotonously with the increase of node number, it achieves the maximum at a certain point in the middle [10]. The optimal number of groups should ensure that the maximum value of CE is obtained at this point, and the change rate of CI is very small when the number of groups is increased again.
In the previous two sections, the construction of the collection tree and the determination of the number of groups have been completed. Figure 2(c) is the constructed collection tree. The process of grouping is the inverse process of the collection tree’s construction process. In this section, the collection tree constructed in Fig. 2 (c) is as an example to introduce the specific way of grouping. And it can see from Fig. 3, in the preferred combination, the closer the classes are, the smaller the class number is, so when grouping, the node with larger number is firstly removed [11]. As shown in Fig. 3(a), the node 13 is removed, the tree is transformed into a forest with two trees, the leaf nodes in a tree are a group, and the sensors are divided into two groups of {1, 2, 3, 4, 5} and {6, 7}; As shown in Fig. 3 (b), the node 12 is further removed, and the sensors are divided into three groups of {1, 2}, {3, 4, 5}, and {6, 7}; As shown in Fig. 3 (c), the node 11 is further removed, and the sensors are divided into four groups of {1, 2}, {3, 4, 5}, {6} and {7}. According to this grouping method, it is assumed that there are n sensors in the collection tree, that is, the number of leaf nodes in the collection tree, and it needs to be divided into m groups (m < n). Then the best grouping method can be obtained by removing the m - 1 nodes with the largest node number in the collection tree [12].

Grouping process.
For all the nodes in the group, the selection of the aggregation nodes and the construction of the aggregation topology are carried out. In this algorithm, the minimum cost required for data aggregation can be calculated by using each node in the group as an aggregation node. The nodes with the least number in the minimum cost are chosen as aggregator nodes, and the intra-group data forwarding link generated by calculating the minimum cost of the node is used as the best aggregation topology [13]. In the calculation of each node as the minimum cost of aggregation node, each non-aggregation node is set a time parameter, the forwarding topology of nodes with minimal link overhead can be prioritize, and the node is determined once the topology time parameter becomes zero, there is no information about link packet sent to the node, so the link cost of the algorithm can be reduced [14]. The algorithm flow is shown in Fig. 4.

Asynchronous distributed aggregation policy flow chart.
In the aggregation distribution environment of multi-sensor in previous section, based on the principle of multi-sensor information fusion and detection in the integrated power grid operation system, the information fusion abstract model of the integrated power grid operation system is applied. The multi-sensor information fusion is divided into three levels: data level, feature level and decision level. The functional structure of multi-sensor information fusion can realize the effective fusion of multi-sensor information [15].
Information fusion level
From the point of view of levels, multi-sensor information can be divided into 3 types: ① the data information (Information obtained directly from a sensor); ② the feature information (the information to represent environmental characteristics which is extracted from the original data, it is between data information and decision information); ③ the decision information (it refers to the concluding information about the environmental description, that is, the processing of symbolic information through the analysis and processing of data or feature information [16]).
Figure 5 describes the integrated hierarchical structure of multi-sensor information in the integrated power grid operation system, it can be seen that the information of each level of the sensor is synthesized in each fusion node [17]. The fusion information and fusion results of all fusion nodes can also interact with each other through database blackboard system, and enter other fusion nodes, so as to participate in the fusion of other nodes.

Hierarchical structure of multi-sensor information fusion in integrated grid operation system.
The level of information fusion relative to information representation is also divided into three categories: data layer fusion, feature layer fusion and decision layer fusion. As shown in Fig. 6, the hierarchical structure based on visual information fusion is to divide the information fusion into three levels of low level (image pixel level), intermediate level (feature level) and high level (decision level), in which the pixel level fusion is the fusion of information produced on the original information of the sensor and the information produced at various stages of preprocessing [18]; The feature level fusion is a intermediate level process, which utilizes the feature information extracted from the original information of each sensor to carry out comprehensive analysis and processing [19]. Decision level fusion is a fusion process at the highest level of information representation. The fusion of this multi-vision sensor can be applied to an integrated power grid system.

Hierarchical structure of information fusion based on multi vision sensors.
The information fusion structure can be divided into different forms according to the information flow, the control relationship, the application and the scale of the fusion. According to information flow, it can be divided into: ① Parallel multi-sensor information fusion; ② Serial multi-sensor information fusion; ③ Hybrid multi-sensor information fusion; ④ Network multi-sensor information fusion.
In Fig. 7, a multi-sensor information fusion structure is presented in series and parallel. In the design process of the integrated grid operation system, multi-sensor can also be integrated in different forms, including functional integration and method integration [20].

Serial/parallel multi-sensor information fusion.
Figure 8 describes the functional structure diagram of multi-sensor information fusion. Before several sensors integration, each sensor data is modeled effectively. The sensor model represents the uncertainty and error of each sensor data, and it is usually assumed that the uncertainty in the sensor data is subject to the Gauss distribution [16]. After modeling the data of the sensor, it can be integrated through three different perceptual processing of fusion, single operation, and guidance or hinting.

Functional structure diagram of multi-sensor integration and information fusion.
Test of internal noise and environmental interference in multi-sensor
The experiment is used to analyze the data provided by 600 sensors in the regional integrated power grid operation system. According to the statistical characteristics, all the sensors are divided into seven different kinds of mean square difference according to the difference of internal noise and environmental interference. 2000 random sampling is taken on the sensor data of the same kind of variance. The batch estimation method is divided into two cases: one is the batch estimation algorithm based on the principle of uniformity of the number of sensors in each batch, this is Algorithm 2. According to the same variance, it is divided into a batch estimation algorithm as an Algorithm 3. The adaptive weighting algorithm is also divided into two kinds: the weight estimation method based on the variance of the sensor itself is used as the Algorithm 4, and the proposed algorithm is used as Algorithm 5. The above algorithms are used to fuse the sampled data and get the simulation results shown in Fig. 9.

Error distribution diagrams of various fusion algorithms.
From the error distribution of various fusion algorithms shown in Fig. 9, it is clear that the performance of multi-sensor system is much better than that of single sensor system, and these algorithms all reduce the fusion error to varying degrees. Table 1 shows the distribution of the error’s absolute value in each error interval. It can be seen from Table 1 that the fusion effect of the algorithm is the best, the Algorithm 3 has the second fusion effect, and the effect of the simple average estimation is the worst. The weighted estimation method (Algorithm 4) according to the sensor’s own variance is slightly superior to the average value method, but is less than the algorithm in this paper. It can be seen that this algorithm can effectively improve the accuracy of data fusion and reduce the error of data fusion in multi-sensor data acquisition system.
Distribution of error absolute values in each error interval
The following points are also obtained in the simulation of various sensor data with different numbers and different precision.
the more the number of sensors is, the better the fusion effect is. The effect of fusion is obvious when the total number of sensors is less. When the number of sensors reaches a certain number, the advantages of increasing the number of sensors are not very obvious. Because when the number of sensors is enough, the sample is enough to characterize the statistical characteristics, and it is obviously not very large to increase the number of sensor numbers. if the number of various precision types of sensors is equal, the effect of the Algorithm 3 is the same as that of the algorithm in this paper. From the fusion formula, Algorithm 3 can also be seen as a weighted fusion algorithm. When the principle is divided into a batch according to the same accuracy, the number of sensors will not affect the distribution weights of different batches. At this time, the weight is determined by the accuracy. The final fusion calculation formula is consistent with the algorithm in this paper. If the number and precision of each batch of sensors are different, if the Algorithm 3 is subdivided in the same batch, the results of this algorithm can be gradually reached. for batch valuations, when the number of sensors in each batch is enough, the more the divided batches is, the more outstanding the advantages of the fusion is. The more the batches to be divided is, the less the variance will be. when the precision of the sensor is the same, the fusion effect of Algorithm 1 and Algorithm 4 is the same. Average estimation can be regarded as a special case of weighted fusion, because the weights assigned by each sensor are the same. When the accuracy is all the same, the weight of the weighted allocation according to the given variance is the same, obviously, the fusion result is also the same.
The performance of the algorithm is evaluated by the following four indicators:
Running time of CPU(TIME): it indicates that the total time spent by the algorithm when the sensor is all moving, of course, the smaller of the running time the better is. Sum of squares of residuals (RSS): it represents the sum of the difference square between the theoretical position value of the sensor and the estimated position value at each sampling point. The smaller the sum of squares of residuals (RSS) is, the more accurate the algorithm is. The formula is as follows:
Variance (P): it represents the variance of the estimated value of the sensor, and the variance reflects the performance of the filter in each algorithm. Correlation coefficient: it represents the relationship between the estimated value and the theoretical value under each algorithm. The range of correlation coefficient is – 1∼1. Discriminant function (CF): a formula for calculating the discriminant function is the utility estimate of each technique. According to the importance of each index, a weight value of ω (0 ∼1) is allocated for CPU running time (TIME), sum of squares of residuals (RSS) and variance (P), and the sum of three weights is 1. The calculated value of the experiment under each technique is represented by c
i
. then the discriminant function (CF) under each kind of technology obtained by the following formula:
Among them, ω1, ω2 and ω3 represent the weights of (TIME), (RSS) and 6(P), respectively. These weights can be adjusted according to different circumstances. Here, supposing ω1 = 0.3, ω2 = 0.3, ω3 = 0.4. c1, c2 and c3 represent the calculated values of (TIME), (RSS) and (P) obtained in each experiment. c1max, c2max and c3max represent the maximum calculated values of (TIME), (RSS) and (P) obtained in each experiment. Our goal is to minimize the value of the discriminant function in order to produce accurate estimates in the minimum variance in the shortest time.
Due to the noise sources of each sensor, errors may be generated, and the measurement error value is shown in Fig. 10. In addition, Fig. 10 also shows the error caused by different algorithms, and the estimated error value represents the difference between the theoretical value at a sampling time and the estimated value of each algorithm. It can be seen that the measurement error of this algorithm is the minimum.

Measurement and estimation of error values.
Under the condition of 10 sensor nodes in the integrated grid operation system, the decision situation of different fusion algorithms is described in Table 2.
A decision table for the fusion algorithm of ten nodes
Table 2 summarizes the previous description average results of the three indexes in 500 times, which has a minimum value of the bold. As you can see in Table 2, MB has the maximum RSS, although it has the minimum run time compared with other algorithms. Compared with other algorithms, the time and residual sum of squares of using the proposed algorithm for the two grid nodes are the minimum. At the same time, the discriminant function of the proposed algorithm is also the minimum, which shows that the algorithm can produce accurate estimation value under the minimum variance in the shortest time.
Figure 11 shows the variance of the estimated value under each algorithm. It can be seen from the figure, the variance of MBF estimation and the proposed algorithm is significantly lower than that of MB and F-MB. This can prove the validity of MBF and the proposed algorithm. More importantly, the estimated value of the proposed algorithm is almost proportional to the theoretical value, and the correlation coefficient is 0.99. Three sensors are repeated the same experiment, and the data from each sensor based on a centralized and distributed architecture are integrated. In a centralized architecture, all the data of the sensors are fused simultaneously, and in a distributed architecture, the data is fused in a certain order. For single component failure, distributed fusion is more robust, and has higher efficiency in communication resources compared with conventional solutions [21–26].

Variance of the estimated value.
Under the condition of 20 sensor nodes in the integrated power grid operation system, the decision situation of different fusion algorithms is described in Table 3.
A decision table for the fusion algorithm of twenty nodes
Analysis of Table 3 shows that, regardless of centralized power grid operation system or distributed power grid operation system, the algorithm of the indicators is lower than the other algorithms, illustrating that the robustness of the proposed algorithm is high. In different environments, the efficiency and precision of the integration of multi-sensor information in the power grid operation system is higher. And compared with the other three algorithms, the proposed algorithm has the minimum CF, which can provide relatively reliable results and strong stability.
After the application of the proposed algorithm, the best analytical formula for the basic power outage loss of the integrated load is as follows:
The fitting error is s = 0.05632, and the related data are shown in Table 1.
Under different experimental statistics, the power outage loss of the traditional Bayesian network algorithm and the proposed algorithm are described in Tables 4 and 5.
Power outage loss statistics of traditional algorithms under different conditions
Power outage loss in this paper under different conditions
Comparison of Tables 4 and 5 show that the proposed algorithm can use the known statistical data under similar conditions and a small amount of survey data to calculate the time-sharing interruption, to a certain extent, it can solve the problem of comprehensive power loss calculation under big data environment, and can avoid the malpractice of inaccurate calculation results.
In this paper, an intelligent fusion algorithm for multi-sensor information in integrated power grid operation system is proposed. The asynchronous aggregation distribution algorithm based on hierarchical clustering is adopted to efficiently acquire the data aggregation mode of grid sensor with the minimum power cost, and reduce the data traffic of power grid. In the aggregation distribution environment of multi-sensor, the information fusion abstract model of the integrated power grid operation system is applied, to divide the multi-sensor information fusion into three levels: data level, feature level and decision level. The functional structure of multi-sensor information fusion can realize the effective fusion of multi-sensor information.
The algorithm can improve the efficiency and accuracy of multi-sensor information fusion in the integrated power grid operation system. At the same time, the proposed algorithm is applicable to centralized and distributed power grid operation system, and has high robustness, which can reduce grid losses and improve the economic benefits of power grid.
