Abstract
Transformers play a crucial role in ensuring the safety of power grids. It is of great value to diagnose faults using the large amount of power data generated by the grid. It is possible to detect internal latent faults in transformers in advance of their occurrence if the normal operating condition of the transformer is detected in a timely manner. To perform online fault diagnosis of grid current transformers, we combine the Transformer and BiGRU methods. There is a temporal component to the fault input sample sequences. By using Transformer’s multi-headed attention mechanism to extract deep features from fault input sample sequences, the temporal association between latent variables can be fully exploited. As a result of the extraction of features, BiGRU is used to generate fault category coding as an output. The experimental results indicate that using the proposed algorithm achieves better results than using a single model, which is useful for the study and application of fault diagnosis in power grids for current transformers.
Keywords
Introduction
In power systems, current transformers are used to measure and provide critical information to protection devices. This is essentially a special type of transformer, which unifies and standardizes various instruments and protection devices by transforming large currents on the primary side into small currents on the secondary side. The current transformer can greatly avoid the disadvantage that the shunt has no electrical isolation. the output accuracy and voltage can achieve the effect of the shunt at the same time. Its non-contact detection function can greatly enhance the safety of workers in construction or power grid maintenance, and it is very convenient to use in practical projects.It is important to note that the safe operation of current transformers is not only important for the equipment itself but also directly affects the safe and stable operation of the power grid, which holds an essential and important position in the electricity distribution syste [1]. Voltage transformer fault diagnosis is currently mainly based on routine outage inspections on the power grid. As a result of the outage inspection cycle not being able to detect latent defects in the equipment in time, equipment failures and power outages often result. It is for this reason that online monitoring technology has been developed for current transformers. Using fuzzy clustering state ranking and uniformity measures, Huang et al. [2] mined and analyzed real-time monitoring state vector data. The algorithm complexity is not high, but the operational efficiency is high. However, the number of state clusters must be selected manually, the algorithm results are highly dependent on the initial state value selected, and the applicability of the algorithm is limited. An improved artificial fish swarm algorithm was used by Jia et al. [3] to optimize the wavelet neural network for diagnosing current transformer faults. Based on the excellent window scale performance of wavelet function bases in both the time and frequency domain, Xing et al. [4] conducted fault diagnosis research using data obtained from current transformers, which has certain advantages in convergence accuracy. However, it is difficult to select the wavelet parameters. Lin et al. used thermal imaging depth convolution technology to perform fault classification semantic detection on current transformer infrared images, and achieved good results. However, the problems of high fault noise and low super pixel detection efficiency of infrared images have not been solved well. Xing et al. combined infrared image depth convolution, CCD optical image and other multi-source sensor data to detect the defects of power equipment, and used ZigBee wireless sensor network networking for information transmission and multi-source Kalman filtering data analysis methods to achieve the effect of reducing errors, but it has the problem of real-time asynchronous transmission delay. Liu et al. used multi-scale wavelet deep neural network and multi data fusion methods to analyze the fault of power equipment signals, and made full use of the high time resolution of wavelet basis change, which has greatly improved the accuracy of fault classification, but it is also difficult to select wavelet basis function parameters. Our study explores the online monitoring and fault diagnosis of current transformers in a grid environment using a transformer combined with the BiGRU real-time prediction method based on the time scale, secondary output voltage amplitude, phase angle, frequency, temperature, humidity, and other types of data collected from current transformers in the equivalent test environment. As a result of the experimental results, the proposed method shows some improvement in diagnostic accuracy and reliability and has practical value for online monitoring and metering diagnosis of transformers in power grids [5].
System overview
Composition of the overall system
The system consists of three components: a data acquisition unit, a data transmission unit, and a data analysis and monitoring unit. Distributed storage and big data calculation technologies have been introduced to meet the needs of storing and optimizing massive power data. By using the online monitoring sensor head, the data acquisition unit obtains status communication information, which includes the insulation gas pressure, insulation temperature, and humidity. Using the secondary voltage and secondary current digital signals, the data package is then formed and encoded into the data transmission unit, It mainly considers the selection standards and requirements of the industry to choose actual characteristics of the grid sensor data. and is then transmitted. There are two main parts to the data transmission unit, namely the data routing gateway and the big data distributed storage. By combining the data collected by each sensor with noise reduction, the former determines a fixed packet length limit taking into account the transmission period and the actual transmission rate. The result is that not all the information in the message is transmitted, but rather in a rotating cycle in the form of “state quantity
Diagram of the system architecture.
Diagram of the simple current transformer.
In order to realize the online diagnosis of current transformer faults of power equipment and the acquisition of fault data, the equivalent test environment platform established will store and filter the collected data in all dimensions, and deeply combine the strong sequence data processing capabilities of Transformer and BiGRU to divide training and test set data for training and reasoning of fusion algorithm.
Technical framework.
Transformer is not only widely used in natural language processing. As a result of its sequence feature concept, Transformer has gradually penetrated other application fields, such as deep feature extraction. A breakthrough has been achieved through the application of the parallel Attention mechanism. In the case of sequence-to-sequence extraction feature networks, the Encoder-Decoder architecture is generally used before the Transformer is generated. RNN is fundamental to the entire approach and is used as the codec’s basic structure. The input sequence is converted into a Context Vector by the Encoder, and the Context Vector is used as the input to the Decoder to predict the output. As a fundamental part of the method, RNN is used as the basic structure of the encoder and decoder. The encoder encodes the input data into a low-dimensional vector, which is then used as input for the decoder. RNN has achieved good results, but it has several obvious shortcomings.
The classical network structure Transformer was born out of a paper published by Google researchers in 2017 called Attention is All You Need [18]. To address the problems related to sequence models, an attention-only based structure is proposed. Transformers consist of an encoder and a decoder. Encoders consist of a stack of six encoders, and decoders consist of a stack of six decoders. The classical Transformer structure is illustrated in Fig. 4.
Transformer structure diagram.
The encoder consists of
There are n attentional computation units in multi-headed attention. Parallel computations are used to determine the long term and short term dependencies of the input sequence. To click on the attention mechanism, the n weight matrices are randomly initialized and scaled by matrix linear operations, then output to the next layer of the Transformer by concatenation operations. The multi-headed attention mechanism is illustrated in Fig. 5.
Multi-headed attention mechanism.
Following is a description of the computational procedure for the multi-headed attention mechanism:
Where LN(.) represents the layer regularization operator.
A variation of GRU is BiGRU. Gated Recurrent Neural Networks (GRU) are proposed to solve the problems of long-term memory and gradient in back propagation. Compared with the long and short-term memory model, it reduces the original three control gates to two, and its structure is relatively simple, with few parameters and high calculation efficiency. Under the allowable cost of calculation error, it is a good choice to select GRU model to improve the convergence speed of the modelBiGRU’s specific structure can be seen in Fig. 6.
BiGRU neural network structure.
The current hidden layer state
A one-way gated recurrent neural network is referred to as a GRU. A diagram of the GRU structure is shown in Fig. 7.
GRU neural network structure.
Where the computational procedure is shown below [23, 24, 25, 26]:
Where () represents Hadamard Product,
Data acquisition
It has been established to monitor the current transformer of the power grid online by developing a test environment platform based on the four types of faults mentioned above, namely, short circuits between turns or layers of the primary and secondary coils, short circuits in secondary circuits, overheating faults, and others. Furthermore, five important parameters were selected and collected as samples, including insulation gas pressure, insulation temperature and humidity, secondary voltage, and secondary current. There were 3200 sets of data selected. Two thousand data sets are used as training samples, and eight hundred data sets are used as test samples. To achieve standardization of the data scale, the voltage, current, pressure, temperature, and humidity parameters were mapped to the interval of [0,1]. In general, the formula is as follows:
Where
In order to collect sensor data required for fault detection experiment in multiple dimensions and solve the problem of few samples of traditional fault data, an equivalent test environment platform has been established. This platform is fast and pluggable, and can obtain a large number of enough data for many times. The information about the equivalent test environment platform’s actual environment and important experimental sensor parameter models are as follows:
Important information of sensor
Important information of sensor
Experiment platform diagram.
While obtaining the test data, we also debugged the overall architecture parameters. After many experimental comparisons and grid parameter analysis, the overall structure of Transformer and BiGRU is shown as follows:
Schema parameter
The five states of the test circuit are binary coded as 0000, 0001, 0010, 0011, 0100 for normal operation, short circuit between turns or layers of the primary or secondary coil, short circuit fault in the secondary circuit, overheating fault, and other faults respectively. The deep learning framework of Pytorch is used, the GPU is Nvidia 2060, and the version of Cuda is Cuda11.3. Python version is Python 3.6. The training inference is performed on Windows 10 operating system. The training learning rate is set to polynomial decay mode, initial learning rate 0.1, decay steps 50, minimum learning rate 0.005, decay factor 0.5. The Transformer module is experimentally tested with 3 layers, the BiGRU module is 6 layers in both directions, the output neurons are 5, the corresponding binary code is output, batchsize
Transformer 
From Fig. 9, we can see that the Transformer combined with BiGRU network converges after about 100 Epoch iterations. The convergence accuracy is high, about 0.01. Transformer extracts the fault feature sequences in depth, including local near features and long-range correlation features, and then extracts the predictions by BiGRU’s bi-directional information. From the above figure, we can see that the proposed method is efficient in terms of convergence speed and convergence accuracy.
In order to further investigate the performance difference between the proposed method and the traditional method, a comparison experiment on GRU, Transformer algorithm is conducted. The 800 test samples are divided into 5 equal parts. Each of them is 160 test samples. The following Figs 9–11 and Table 1 show the actual grid current transformer diagram and experimental results respectively.
Comparison of accuracy of algorithms
Comparison of accuracy of algorithms
Grid current transformer operating diagram.
Comparison chart of discount effect of three methods.
Comparison of discounted effects of five methods.
It can be seen from Tables 3, 4 and Fig. 11, the GRU+Transformer combined model achieves higher accuracy in four fault states and normal states than only using GRU or Transformer algorithm. Especially, for 0001 fault and 0011 fault, The accuracy of GRU+Transformer combined model is about 5% higher than that of Transformer. The average recall of Transformer is 1.64% higher than that of GRU, and the average recall of GRU+Transformer is 4.25% higher than that of Transformer. It shows that GRU+Transformer combined model has a good effect on the detection of the above five states. Compared with SVM algorithm and 6-layer convolutional neural network, the GRU+Transformer combined model shows a better effect. The average recall rate of the combined model is increased by 8.75% and 6.75% respectively, which shows that the combined model is superior to both traditional machine learning and deep neural network.
As can be seen from Fig. 12, the green dotted line represents the GRU+Transformer composite model, which reaches the convergence state after iteration of 100 Epochs, with the convergence accuracy of 0.01. The black solid line and blue dotted line represent the GRU algorithm and the Transformer algorithm respectively. The GRU algorithm converges after iteration of 80 Epochs, with the convergence speed rising, but the convergence accuracy is lower than that of the Transformer algorithm. The Transformer algorithm reaches the convergence after iteration of 120 Epochs, The convergence accuracy is 0.015 higher than that of GRU algorithm. The red dotted line and the yellow implementation represent the iterative effects of SVR algorithm and 6-layer convolution neural network algorithm respectively. The final results of the two algorithms are in the upper left corner of the decision plane, and their iteration times and algorithm accuracy are poor. It can be seen from the broken line graph of the effects of the five algorithms that the GRU+Transformer combined model has greater generalization and effectiveness.
In this paper, four fault states and normal states of current transformers in power grid are studied by combining the combined model of BiGRU [31, 32, 33] and Transformer [34, 35], and the experimental data of current transformers are obtained off-line. Through the analysis of fault experiments, it can be seen that the combined model of BiGRU and Transformer has improved convergence accuracy and convergence speed compared with the single algorithm, but the source of experimental data used in this paper is narrow, and the comparative experiment is not complete and sufficient , which is the next stage.
