Abstract
Differential protection of power transformers, as the fundamental protection, plays an important role in power system reliability and security. The main challenge in differential protection is discrimination between internal faults of power transformers and inrush current. Development of differential protection, especially discrimination between internal faults from other disturbances, have been a favorite subject in power system protection field over decades. Traditional methods proposed so far have several shortcomings: i) high computational burden, ii) sensitivity to noise, iii) being influenced by predefined threshold value/additional parameters/different models at varying ambient conditions, and iv) dependence on handcrafted or spectral analysis to extract features. Deep neural networks (DNN) is selected as the potential solution in this paper, which is able to capture the hierarchical features of a half-cycle of raw data. This paper proposes convolutional neural networks (CNN), in which batch normalization and scaled exponential linear unit (SELU) are merged to enhance differential protection performance. In order to generalize the CNN-based differential protection, several external factors, i.e. the compensation error of current transformer (CT) saturation, series compensated line, and superconducting fault current limiter (SFCL) are conducted to verify the reliability of the proposed method through different reliability metrics. The simulation and experimental results are assessed to show high reliability and the speed of the proposed method.
Keywords
Introduction
Power transformers are crucial instruments in power systems. Although power transformers are considered as traditional equipment, their protection system should be gradually upgraded due to progress and emerging new technology in the electrical networks [21]. Thus, it is crucial to protect power transformers against the damages due to the important role and high cost of transformers and hardly compensable damages which can happen consequentially by the failure of this device. Failure of power transformers are becoming increasingly important while power transformer market is projected to reach from 20.71 billion in 2015 to 29.91 billion by 2020 [53]. According to CIGRE technical report, about 11.62% of the power transformers failures are concerned to winding faults [1]. Differential protection is a basic and low-cost tool in power transformer protection, which is based on the difference between the current of primary and secondary sides of the transformers.The main challenge in differential protection is the potential malfunction due to inrush phenomena [34]. The inrush phenomena leads to inrush current, which is due to the transformer energizing and core saturation curve. The magnetizing inrush current highly depends on transformer size, core characterization, residual flux in core, and voltage magnitude at energizing time [40]. In general, the magnitude of inrush current is large enough to cause false tripping of differential relay.
Data-driven models are intelligent and powerful tools in complex systems and perform based on dataset analysis. The original idea about using learning-based identifiers to discriminate between transformer inrush and faults was published in 1994 [39] and there has been no practical protective relay working based on that since then because besides all the advantages of the mentioned machine learning methods they have major disadvantages. Deep neural network (DNN) based techniques have been newly emerged with promising structures and inherently adaptive structure [24] to resolve shortcomings of data-driven models. Deep neural networks (DNN) are a branch of machine learning established by [20] in order to capture comprehensive and inclusive information to enhance confidence level in feature extraction [33]. During recent years, DNN has exhibited remarkable capability in different areas, such as medical diagnosis and classification [42], text sentiment analysis [37], fault detection [13, 51], malware and network intrusion detection [17, 41], internet criminal activities detection [50], image processing [3], emotion recognition [16], speech recognition [12], and big data analysis [49]. In the classification process, DNN-based methods, especially convolutional neural networks (CNN) [14, 19] (originally introduced by [31]) are able to incorporate spatial and temporal networks to diagnose specific anomaly signals [9].
On the other hand, there is a major complexity related to external factors in differential protection, which can dramatically reduce the reliability of protection system due to the inherent limitations. Comprehensive differential protection scheme should perform properly in these conditions. Therefore, the performance of the proposed method is evaluated in cases of utilizing series capacitors, fault current limiters in the transformer neutral point, and considering current transformer (CT) saturation impacts.
To prevent the mal-operation of differential protection, this study proposes the CNN method. As a powerful classification technique, CNNs could be a potential solution in diagnosing the internal fault of power transformers. In order to reduce the time and hardware cost of protecting large power transformers, the autonomous feature learning approach based on raw 1D signals using CNN is considered as the main goal of this paper.
This paper aims at encapsulating the discrimination of internal fault from inrush current into a single efficient block using advanced deep learning methods. The designed algorithm simultaneously speeds up and enhances the performance of differential protection of power transformers by integrating CNN with scaled exponential linear unit (SELU) activation function and batch normalization technique. The proposed method accelerates the early event detection since it uses only half-cycle data that is essential for online discrimination of inrush and fault currents. The contributions of this paper are as follows:
The need for feature engineering process is eliminated because the proposed CNN can extract the effective features of the original signal automatically. The proposed method performs robustly in the presence of external factors such as CT saturation, series capacitor compensation or superconducting fault current limiter (SFCL) which can make the fault detection more difficult. The proposed method is useful for handling large datasets since there is no need to human-expert knowledge for feature extraction or modeling the corresponding data. The proposed method has high accuracy and outperforms other key methods.
The rest of the paper is organized in six sections as follows: Section 2 briefly reviews the related techniques on discrimination between internal faults and inrush currents. In Section 3, the basic principle of differential protection of power transformers and the main challenge in differential protection are briefly introduced. Section 4 describes the framework of the presented CNN structure in the power transformers protection. The results based on simulation and experimental data are analyzed and discussed in Section 5. Finally, Section 6 concludes the paper.
Related works
Discrimination between internal fault and inrush current has widely been investigated in the literature. In general, discrimination methods of inrush current and internal fault of power transformers can be categorized into four typical methodologies, i.e. spectral analysis, wave shape recognition, transient signal estimation based and artificial intelligence (AI) methods.
In the first group, firstly, harmonic restraint and other spectral features are extracted and compared with the constant criteria. The second harmonic restraint [46] is the most widely used method in the industrial applications which belongs to this group. However, this method does not properly preform in modern transformers due to advanced core materials and power electronic infrastructure in power systems which reduces harmonic contents [52]. To tackle these problems, Spectral analysis such as wavelet transform (WT) are presented [52]. The differential protection schemes based on spectral analysis are able to extract signal details with high range of frequency band. However these groups suffer from high computational burden and are extremely sensitive to noises.
The second group of differential protection is related to the analysis of wave shape of measurable variables such as current [11], induced voltage [27], instantaneous power [26] and etc. The main advantage of this group of methods is easy implementation. However, they are highly influenced by predefined threshold, which is obtained based on trial and error.
The third group of the differential protection of power transformers includes the transient signal estimation based algorithms. In general, estimation-based fault detection algorithms perform based on the consistency between the measured outputs of the industrial system and the estimated parameter/states [23]. In the estimation-based differential protection of the power transformers, internal faults are characterized by the deterministic models such as the least square method [4, 5] and stochastic moments such as extended Kalman filter [36]. The estimation-based methods perform fast within less than a half-cycle window. However, the main drawback of the estimation-based methods is the requirement for a priori known models in complex systems such as the power transformers [10]. Moreover, the dependence on the predefined threshold issue still remains.
In the last group, several protection schemes have emerged from the AI techniques in the last two decades. These methods do not require any predefined threshold and a model which are capable to learn complex non-linear relationships through non-statistical methods for classification issues. The quantitative approaches such as artificial neural network (ANN) [8], support vector machine (SVM) [45], K-nearest neighbor (KNN) [48], and learning vector quantization (LVQ) [35] are presented in the literature. These methods are not sufficient in classification based on raw data due to their shallow structure. To this end, hybrid techniques are more suitable for discrimination of internal faults from inrush currents due to strong feature extraction techniques. In general, hybrid techniques integrate spectral analysis techniques such as WT to increase information in the training and testing process. Although, these methods perform with satisfactory results, high range of hand-crafted techniques in feature extraction lead to disability in presenting a general solution for any power transformers or even any other devices sensitive to abrupt changes. Moreover, feature extraction increases computational cost and requires additional hardware. To this end, CNN proposed in this study, as a powerful DNN-based approach, inherently extracts features and can be implemented in the differential protection of power transformers.
Problem statement
Differential current is determined based on the difference between the currents flowing in the secondary side of CTs installed in both sides of power transformers. Differential protection is founded on the fact that the differential current is nearly zero in normal conditions. In practice, in normal conditions, a differential current might be non-zero due to the errors of measurement devices, transformer tap changers, and human error in matching the sides of transformers and CTs [6]. In order to take into account these situations, the differential current is compared with the restriction current as follows:
Where I P and I S are the primary and secondary currents of the power transformer which are measured through CTs in both transformer windings. S D is a positive constant, related to differential relay slope whose value is usually in the range of 15 % to 40 % [44].
In order to generalize the discrimination approach, it is required to take into account the perturbation in differential protection caused by external influentialfactors. Series compensation in lines, resistance in transformer neutral point, and distortion in measurement devices, especially in CTs can be important external factors.
CTs are low-cost transformers, which are used widely in the industrial application as auxiliary measurement devices. CTs might be saturated due to nonlinear characterization of their core [22]. Large DC component of inrush current and reduced flux leakage due to short-circuit faults are considered as main reasons for CT saturation. Saturated inrush current, considered in the study, is a possible and rare condition. In this paper, the presented method in [18] is utilized as the supplementary method to compensate CT saturation in order to reconstruct distorted measured signals due to fault and inrush saturated currents and cover deep and steep saturation of CT. However, the correction error can leave an effect on performance, which is considered and handled in the proposed method. Series capacitor compensation is the other factor affecting differential protection, which is investigated in [4]. The high penetration of distributed generations (DGs) in restructured power systems leads to an increase in the level of fault current. Fault current limiters (FCL) are emerging devices which can be used in the transformer neutral to limit short-circuit fault current magnitude in these conditions. FCL [43]. The impacts of superconductive FCL (SFCL) [43] on differential protection based on wave shape recognition methods are investigated which showed that FCL can cause mal-operation in differential protection through making the value of internal fault close to inrush current. In order to address the capability and generalizability of CNN in differential protection, these external factors are considered in examining the performance of the proposed DNN-based method which is described in detail in the next section.
Among the existing deep learning methods, CNN exhibits remarkable performance in classification issues. CNN is a unique neural network composed of different layers, i.e. convolution, pooling, and dense. The main changes to the standard CNN model include the activation function in the SELU, batch normalization, and dropout techniques, which are detailed in the next sub-sections.
Input data preparation
In order to perform fault data detection, the CNN ought to be trained at first. The training data are obtained by simulation and measurement of differential current and output fault information. The input normalized data is
The network is trained by a set of R training data
Compared to all neural networks, the main unique part of CNN is convolution layer. The output feature map of this layer is obtained by convolving the previous feature maps and is characterized by the number and weight filters which are parts of the feature maps in the previous layers [30]. The m th feature map at the L th layer with f activation function from the previous (L - 1) layer, y (m, L) is given as:
Where S, W, and x (n, L – l) represent the filter size, the n th input map from the previous (L – 1) layer and a kernel with learned weight matrix n th input map and the m th output map at the L th layer and the input map, respectively. Subscript wi and he denote the width and height. Note that ⊗ shows a convolution operation.
Distribution of activation function might be changed during the training due to the change in CNN parameters which is known as internal covariate shift. Batch normalization is presented to resolve this problem [25]. It can be extended to CNN parameters and weight matrices. In this paper, batch normalization is used in weight matrices. The batch normalization β (L) at the L th layer presented in [25] can be described as:
Where s, μ b , σ b , ξ, and β (L) represent the scale parameter of the training, minibatch mean and variance, stability enhancement parameter, and shifting parameters, respectively. Scale parameter s, and shifting parameters are used to restore the network capacity and remove the bias matrices to reduce computational complexity.
In general, the activation functions such as sigmoid units, tanh units, and rectified linear units (ReLU) are utilized in DNN-based structure methods. In this paper, SELU activation function presented in [29] is utilized to speed up the training process through decreasing the required memory space and guarantee zero mean and unit variance. SELU can produce high levels of noises in loss function, which is resolved through utilizing batch normalization. The SELU activation function is described as:
Where λ and α represent control parameters, which are considered 1.04895 and 1.68001 in this paper. The control parameters adjust a slope for positive and negative inputs. In addition to controlling mean and variance values, SELU provides continuous curve to handle saturation in the training process.
In order to pool feature maps from the previous layer, pooling layer as subsampling layer is applied to prevent the overfitting and high computational cost through reducing information redundancy and highly overlapping convolution window. The m
th
output map at the L
th
layer of the pooling layer y (m, L) with SELU activation function and P (.) pooling function is given as follows:
In the last layer of the proposed CNN classifier with application in differential protection technique, the dense layer is added to connect all hidden states and control the dimension of the CNN output. The m
th
output map at the L
th
layer passes the dense layer to construct the final output map as:
Where W f represents the weight matrix of the dense layer.
The DNN-based techniques in the fault diagnosis problems suffer from the possibility of overfitting due to the large number of complexities in hidden layers. The implementation of CNN with SELU activation function requires a high level of information in the training process. To address these issues in this paper, a simple technique, known as the dropout technique, is performed and presented in [47].
The binary-cross entropy is considered as the loss function of a single sample, which is optimized based on the ADAM algorithm [28]. The output of the polling layer flattens with flatten technique and, in the last layer, sigmoid activation function is used to determine the fault condition and send/do not send a trip signal of differential relay of the power transformers. The proposed structure, filter sizes and parameters of the CNN is shown in Fig. 1.

The flowchart of the proposed CNN-based differential protection
In this section, five case studies are discussed to address the advantages of the proposed differential protection scheme. The case studies are: 1) A simple case study without external factors; 2) CT saturation caused by fault or inrush current; 3) Power transformer with series capacitor compensation; 4) A transformer with SFCL on neutral point; and 5) The experimental setup. PSCAD/EMTDC software package is used to simulate a 230 kV network with four power transformers (Fig. 2). The generated data is further processed in Keras package [2] on a computer with a 3.4 GHz processor, 32 GB memory, a 64-bit OS, and a GeForce GTX 1080 TI graphical processing unit (GPU). In all cases, the samples are randomly partitioned into a training and testing set with 70% to 30% ration, respectively.

Single line diagram of a modified part of the 230 kV Iranian power network
For the first, third and fourth cases, 4488 samples are generated for fault current, inrush current and external fault excluding any external factors. Fault currents are generated based on different fault types, winding connections, source impedances, fault inception angles, and fault locations. Inrush currents are generated considering various residual fluxes, winding connections, source impedances, switching instant fault inception angles, and loads. External fault dataset contains 8 fault types, 12 fault inception angles and 4 impedance values. In the second case, Jiles-Atherton model for CT [7] is used in data sample generation for Transformer 2 in short circuit fault and inrush current occurrence. 1200 fault current CT saturation samples are generated in different conditions containing 4 different CT burdens, 5 fault types, 12 fault inception angles, and 5 different fault resistances. 864 samples are generated for inrush current CT saturation involving 12 switching inception angles, 4 CT burdens, 3 different load levels, 2 winding types, and 3 different source impedance values. Moreover, in order to demonstrate applicability of the proposed CNN, experimental test data is generated and examined. Series capacitor can be utilized to improve power transferring capacity, voltage regulation, stability, and loss reduction. Therefore, the differential protection scheme is examined in presence of series capacitor in the third case. Series capacitor compensation can increase fault current and cause sudden changes in angle and sequence of three phase currents. Fault current can be limited in less than a half cycle by a high impedance in the neutral point of a power transformer added by SFCL which is illustrated in Fig. 4. As a result, the differential relay cannot diagnose the fault to send the trip signal. In the fourth case study, a variable resistance is used to fix this problem. This problem can happen during an inrush current, too. The SFCL model is added to the Transformer 3 in Fig. 2. Parameters of SFCL are set based on the procedure in [43]. Appendix A gives the data of the test system in Fig. 2. Sample waveform of the studied cases is represented in Fig. 3

Sample waveform of five cases a) simple case, b) deep saturation, c) steep saturation, d) series capacitor compensation, e) SFCL and f) experimental prototype

Experimental test system
The experimental data is gathered using a 1 kVA, 50 Hz, and 380/380 V three-phase transformer shown in Fig. 4. Internal points in the middle of windings are accessible through terminals to create internal faults. 0.6-3kV, 2.5VA CTs are used for current measurement. Measurements were recorded by a digital storage oscilloscope with 7.8 kHz sampling rate. In this cases, 5 fault types, 12 fault inception angels, which are generated randomly, 4 fault resistance, 3 winding connection, and 5 different levels of resistive load are considered to generate 3600 samples for internal fault of power transformer. Due to the limits we faced in selecting the devices, the rated values of the experimental devices are lower than devices in transmission scale. However, the proposed approach uses normalized data. Therefore, we can compare the simulated and experimental results.
To evaluate the performance of CNN-based protection scheme, a half-cycle of each sample are extracted, and normalized as follows:
Where ID–nom is the normalized differential current, which is considered as the input of the CNN method.
The comprehensive evaluation of each case is presented in this section. To this end, the overall results obtained from the simulation and experiments are presented in several metrics which are discussed individually in the following sub-sections.
The metric is defined based on the “Confusion Matrix”, which includes true positive (TP), true negative (TN), false positive (FP), and false negative (FN). Accuracy is the ratio of the number of correctly diagnosis patterns to the total number of patterns, which is given as:
Accuracy metric (ACC) can cover the main aim in the classification issues which shows the authenticity of the model. In addition, five different fault types are tested to verify the performance of the proposed differential protection scheme: line-to-ground (LG), line-to-line (LL), line-to-line to line (LLL), line-to-line-to-ground (LLG), and primary-to-secondary (P-S), inrush current and external faults (Ex). Fig. 5 shows the ACC obtained from the proposed CNN-based scheme. All the values are above 98.1375% in the experimental case in LLL fault. In the experimental results, noises can dramatically reduce the ACC value. This impact can be very higher in the small scale transformer. Despite this, the results are great. Expect one case, in all cases the accuracy values are higher than 98%. In three cases, ACC is 100% and among 28 different evaluated conditions, the average value of ACC is 99.0147%. As can be seen in Fig. 5, Case 2 is not included in accuracy analysis. In Case 2, two different conditions are involved: saturated fault and inrush currents. In both conditions, ACC equals 98.06%. SFCL in the neutral point of the power transformers can be considered as the most external disorder, because the lowest values of ACC are in this condition. For instance, ACC equals 100%, 98.91%, 98.55%, and 99.52% in LG fault of cases 1, 3-5, respectively. Consequently, the results of the CNN-method demonstrate the high accuracy of this method. However, ACC cannot cover all the aspects of a differential protection scheme. In the next sub-sections, several aspects of the CNN-based power transformer protection scheme is evaluated.

Accuracy of the proposed method in different cases and fault and inrush conditions
In the classification problem, ACC can be a perfect metric. In order to address the incompleteness of ACC, assume the condition in confusion matrix with zero TP and all TN or all TP and zero TN. In thiscondition, ACC shows 100% accuracy. However, this is a very rare condition; a different metric should be defined to support the evaluation of the classification method. Dependability (DEP) is one of these metrics which is calculated as follows:
DEP metric shows the proneness of the proposed CNN-based differential protection method to detect and isolate the fault. In other words, DEP metric shows the ratio of the number of detected internal faults to the number of all internal faults used during the testing stage. Fig. 6 shows the DEP obtained by test results of cases 1, 3-5. While maximum value is 100% in Case 1 and LG fault, inrush current and external faults, the minimum DEP belongs to the detection of external faults in Case 5 which is 95.3846%. In the practical case study, dependability reduction is caused by waveform distortion in the differential current. However, the range of DEP metric ([95.3846% 100%]) could be considered as an acceptable performance.

Dependability of the proposed method in different cases and fault and inrush conditions
In addition, DEP is 98.61% in Case 2, large enough to validate the dependability of the method in Case 2.
Safety (SAF) of the classification method is calculated to evaluate the capability of the CNN-based method to isolate internal faults from other disturbances. Therefore, SAF demonstrates the ability of a fault diagnosis method where inrush current and external fault are not mistakenly detected as internal fault. SAF metric is given as:
Fig. 7 shows the safety values of the method. As can be expected, safety values are very high (above 98%). SAF is between 100% and 98.05% which proves the great performance of the CNN-based method to distinguish non-internal faults from the internal fault. In Case 2, SAF is 98.05%, the lowest value among the all cases for the proposed method.

Safety of the proposed method in different cases and fault and inrush conditions
Security (SEC) metric shows the ratio of differential protection of a false trip. SEC can be dramatically reduced due to loss of information from a measurement device and experimental noise [15]. SEC is determined based on the ratio of the detected inrush current and external fault to the actual number of inrush currents and external faults. SEC is given as:
From the result in Fig. 8, it is clearly evident that the proposed CNN-method is robust against practical noise. The lowest SEC is 97.29% in Case 2. The average of SEC is 99.0147% in all cases. SEC values show that the proposed method is able to properly prevent differential relay from sending false trips.

Security of the proposed method in different cases and fault and inrush conditions
For the sake of comparison, the proposed DNN-based method for distinguishing inrush current, external and internal fault is compared with four typical shallow-bases AI classifiers, which are briefly described as follows.
ANN performs with 86 neurons in the input layer, two parallel hidden layers with 25 and 13 neurons, and one neuron in the output layer and ANN is integrated with discrete wavelet transform (WT). KNN performs with one nearest neighbor based on Euclidean distance and Baye’s discussion rule. LVQ performs based on Euclidean distance, with 6 input and 6 hidden layers with corresponding rate of 1 and 2 neurons in the output layer. SVM performs with radial basis function (RBF) kernel and cross-validation.
Classification results using the aforementioned metrics, also known as reliability metrics, are shown in Fig 9. The overall reliability offered by the CNN and existing AI-classifiers is extensively evaluated in this figure. Case 1 is a simple benchmark, therefore these case results are neglected in Fig. 9. The impacts of external factors cause the reliability reduction in cases 2-5 rather than Case 1. Reliability metrics measure the ability of the classification to distinguish an internal fault from non-fault disturbances within half-cycle of raw data. The superiority of the proposed method over the existing shallow-based methods, can be seen in Fig. 9. The poor performance of several AI-classifiers used in this paper can be due to the fact that shallow-based classifiers are unable to properly perform in this information content and wide variation of inrush current and internal fault conditions. However, CNN is capable of extracting features due to inherent adaptive structure and independence from a separate signal processing or a feature extraction algorithm. Case 5 can address the superiority of CNN. Through the comparison between CNN and SVM, KNN, LVQ, ANN+WT and ANN, accuracy is improved 7.24%, 7.1%, 7.54%, 9.49 and 16.05%, respectively. For the test set, the total average of the accuracy of CNN, SVM, KNN, LVQ, ANN+WT and ANN are 98.51%, 92.27%, 91.52%, 91.27%, 89.02% and 83.57%, respectively. Increasing the length of the data window and adding the feature extraction technique might enhance the reliability of the shallow-based methods but it leads to high computation time and extra hardware. Through the results above and in Fig. 9, it is shown that the classifier in the proposed method yields fairly high diagnosis accuracies compared to the existing shallow AI-based methods.

Reliability metrics comparison between the proposed CNN and shallow structure classifiers Case 2: CT saturation consideration (b) Case 3: In presence of series capacitor (c) Case 4: SFCL impact (d) Case 5: experimental
In the implemented method, the training time of the proposed CNN-based system is around 8.12 minutes. However, the training time of the standard CNN was 10.14 minutes. To detect the internal fault of power transformers obtained by the proposed CNN and standard CNN, the total time of a single differential current data are 9.48 and 13.56 ms.
In this paper, the CNN-based digital protection scheme for power transformers is proposed. The robustness and effectiveness of the proposed method is verified using five different cases, i.e. simple benchmark, CT saturation compensation error impacts (deep and steep saturation), presence of series capacitor compensated line, and SFCL located in the neutral point of power transformers. In this paper, SELU activation function and batch normalization technique are integrated to standard CNN to further improve the training and testing time about 22.48% and 21.97%, respectively. The proposed method directly processed a half-cycle window of raw data and did not require any feature extraction techniques resulting in more efficient differential relay in terms of both speed and hardware. The reliability of the proposed method is measured through accuracy, dependability, safety, and security which are more than 98%, 95%, 97%, and 98%, respectively. Some well-known shallow-based classifiers, i.e. SVM, KNN, LVQ, ANN+WT and ANN were performed to compare their reliability by the same simulation and experimental data. CNN improved the total average accuracy of SVM, KNN, LVQ, ANN+WT and ANN by 6.76%, 7.63%, 7.93%, 8.11% and 17.87%, respectively. This improvements are due to extracting the features of raw data automatically. The results demonstrate that speed and reliability of the proposed CNN-based differential protection scheme of power transformers are very promising and can achieve significant improvements.
Footnotes
Appendix
Table 1 illustrates the parameters of the test system in Fig. 2.
