Abstract
In order to improve the accuracy and reliability of fault diagnosis of oil-immersed power transformers, a fault diagnosis method based on the Modified Artificial Gorilla Troops Optimizer (MGTO) and the Stochastic Configuration Networks with Block Increments (BSCN) is proposed. First, the original artificial gorilla troop optimization algorithm is improved, which effectively improves the convergence speed and optimization accuracy of the algorithm. Secondly, the conventional Stochastic Configuration Networks (SCN) learning methodology is modified when the fault diagnosis model is constructed. The original SCN adopts point incremental approach to gradually add hidden nodes, while BSCN adopts block increment approach to learn features. It significantly accelerates training. MGTO algorithm is used to jointly optimize regularization parameter and scale factor in BSCN model, and the fault diagnosis model with the highest accuracy is constructed. The experimental results show that the accuracy of MGTO-BSCN for transformer fault diagnosis reaches 95.9%, which is 3.5%, 9.9% and 11.7% higher than BSCN fault diagnosis models optimized by GTO, Grey Wolf Optimizer (GWO) and Particle Swarm Optimization (PSO) respectively, reflecting the superiority of MGTO algorithm. Meanwhile, the comparison with the traditional model shows that the proposed method has obvious advantages in diagnostic effect.
Introduction
As the key hub equipment of the power system, the power transformer is mainly responsible for transforming, distributing and transmitting electric energy in the power grid. It is an important support for power system to realize power supply [1]. Making an accurate assessment of the working condition of transformers is crucial since it has a direct impact on the grid’s ability to reliably supply electricity [2]. A common and reliable technique for diagnosing transformer faults is the Dissolved Gas Analysis (DGA) technique. When a transformer fault occurs, a significant amount of gas can be detected in the insulating oil. Its gas content exhibits a strong non-linear characteristic correlation with the type of fault [3–5].
Traditional transformer fault diagnosis methods based on DGA include Duval triangle method [6], improved three-ratio method [7] and so on. But the majority of these diagnostic techniques are empirical, and the coding limit is excessively rigid. The reliability of fault diagnostics cannot be fully guaranteed. With the maturity of artificial intelligence, machine learning and data mining, various intelligent learning methods have been applied to the field of fault diagnosis [8]. A fault diagnosis model of support vector machine based on kernel principal component analysis and a hybrid improved seagull optimization algorithm was proposed in [9]. In this paper, kernel principal component analysis was used to extract features from DGA features, and tent mapping, nonlinear inertia weight and random double helix formula were used to optimize the seagull optimization algorithm. Its fault diagnosis model diagnostic accuracy was greatly improved, but there were some problems such as slow convergence speed and insufficient accuracy in the process of training and diagnosis. In [10], a fault diagnosis method of power transformer based on Harris Eagle optimization algorithm for optimizing Kernel Extreme Learning Machine (KELM) was proposed. In this method, the non-coding ratio of gas was used as the representation vector of KELM model, which improves the fault diagnosis performance of KELM. In [11], a transformer fault recognition method based on adaptive extreme learning machine was proposed in view of the increased accumulation scale and complexity of transformer state data. The diversity regulation mechanism and storage mechanism of immune algorithm were used to classify the particle population as superior and inferior. The superior and inferior particles were respectively evolved in different ways. The PSO algorithm improved by immune algorithm effectively overcomes the shortcoming that the population is easy to precocity, which improves the global optimization ability. However, the above two methods have the shortcomings of slow search speed in the process of network training. In [12], a novel Double-Stacked Autoencoder (DSAE) was proposed for a fast and accurate judgment of power transformer health conditions with an imbalanced data structure. Three problems affecting the diagnosis effectiveness were overcome by DSAE framework, aging-tolerance criterion, and advanced sparse deep clustering network. But the diagnostic results were vulnerable to the influence of model parameters. With the development of deep learning technology, neural network has been widely used in the field of diagnosis [13]. In [14], a new transformer fault diagnosis model was proposed by combining the advantages of convolutional neural network and Long Short Term Memory Network. The model has a strong ability to identify faults, but it cannot realize online diagnosis and can only identify faults of partial discharge. The bat algorithm was used to optimize the smoothing factor of the probabilistic neural network in [15]. This method has solved the drawback that the probabilistic neural network is easy to fall into local extremes and has a high fault diagnosis accuracy. But there was the problem of unbalanced data samples and the recognition of a few sample classes was poor. In order to improve the overall accuracy of fusion diagnosis results, a fusion diagnosis method based on artificial neural network was proposed in [16]. According to the detection accuracy of independent methods for fault types in the range of input gas concentration, the weights of independent methods in the fusion process were intelligently distributed. The ideal result was obtained, and it has superior performance in online application. Mahmoud Elsisi [17] et al. combined IoT architecture with deep learning to propose a novel one-dimensional convolutional neural network. It can provide safe on-line monitoring for transformer state and has strong robustness and high fault diagnosis accuracy. But these two methods need a large number of complete sample training, and the learning cost is high. Although the above-mentioned diagnostic methods have achieved certain results in the diagnosis of transformer faults, the overall diagnostic efficiency and accuracy is still insufficient.
To solve the problem of insufficient overall diagnosis efficiency and accuracy in transformer fault diagnosis, a transformer fault diagnosis model based on MGTO and BSCN is proposed in this paper. With this technique, the transformer condition can be promptly and precisely detected. The main contributions of this paper: The original GTO optimization algorithm is improved, which effectively improves the convergence speed and optimization accuracy of the algorithm. In building the model, the traditional SCN point-increment learning method is modified, and the block-increment learning method is used to learn the features, which largely improves the training speed. MGTO algorithm is used to jointly optimize regularization parameter and scale factor in BSCN model, and the fault diagnosis model with the highest accuracy is constructed. Compared with Extreme learning machine (ELM), Support Vector Machine (SVM) and BSCN fault diagnosis models optimized by different algorithms, the high accuracy and feasibility of MGTO-BSCN fault diagnosis model are verified.
MGTO
GTO
By studying gorilla group behavior, GTO was proposed by Benyamin Abdollahzadeh and other scholars in 2021 [18]. The algorithm is mainly used to search for optimization by imitating the whole living behavior of gorillas. It has the advantages of strong optimization ability and fast convergence speed. The GTO exploration phase update equation is shown in Equation (1).
In Equation (1), GX (t + 1) is the gorilla candidate position vector in the next t iteration; r1, r2, r3 and rand is random values ranging from 0 to 1 updated during each iteration; p is a parameter that must be given a value before the optimization operation and has a range of 0-1. The probability of selecting the migration mechanism to an unknown location is determined by this parameter. ub and lb represent the upper and lower bounds of the variables, respectively. X
r
is a randomly selected member of the entire gorilla population. GX
r
is the location of each phase update. C, L and H are calculated using Equations (2), (3) and (4), respectively.
In Equation (2), It is the current iteration value and max iter is the total value of iterations to perform the optimization operation. L is calculated using Equation (3), where l is a random value in the range of –1 and 1 and Z is a random value in the problem dimensions and the range of –C,C.
At the end of the exploration phase, the cost of all GX solutions is calculated, and if the cost is GX (t) < X (t), the GX (t) solution is used as the X (t) solution. Thus, the best solution generated in this phase is also considered as a silverback.
In the GTO algorithm’s exploitation phase, two behaviors of Follow the silverback and Competition for adult females are applied. If C ⩾ W, the follow the silverback mechanism is selected, but if C < W, adult females’ Competition is taken.
Follow the silverback mechanism is illustrated as
In Equation (5), Xsilverback is the silverback gorilla position vector (best solution). Moreover, L is calculated by Equation (3). In Equation (6), N represents the total number of gorillas.
Competition for adult females mechanism can be represented as
In Equation (7), Q is seen to simulate the impact force. A coefficient vector used to calculate the level of violence in the conflict using Equation (9). β is a parameter to be given value before the optimization operation.
In high-dimensional optimization problems, although GTO has strong global search ability and fast convergence speed, it is easy to fall into local optimization in the later stage of iteration. In order to improve the global optimization ability of GTO and avoid local optimization, Chebyshev chaotic mapping, Gaussian mutation strategy and firefly disturbance strategy are introduced to improve MGO.
Chebyshev mapping
When using a metaheuristic algorithm for parameter search, the initialization of population individual positions is crucial to the search performance [19]. Chaotic variables are characterized by randomness, ergodicity and regularity. Therefore, chaotic mapping can be introduced to initialize the population. It can make the population evenly distributed and improve the convergence speed and optimization accuracy of the algorithm [20]. Chebyshev chaotic mapping is employed in this paper. The mathematical expression of the chaotic model is simple, the ergodic distribution is uniform, and it is less affected by the initial value and has good robustness. The expression of Chebyshev map is as follow:
In equation (11), r denotes the chaotic coefficients, and the mapping generates random values in (–1,1). With the change of r, Chebyshev chaotic sequence distribution is shown in Fig. 1.

Chebyshev mapping.
As can be seen from Fig. 1 (a), no matter how comparable the initial values are selected, the iterated sequences are uncorrelated when r is taken to be 1. This means that the values of x between –1 and 1 are chaotic and ergodic. Chebyshev chaotic sequence with r as 1 is selected for initialization, and the number of mapping iterations is set as 500. The distribution diagram of chaotic sequence is shown in Fig. 1 (b).
After the end of the exploration phase in the GTO algorithm, it enters the exploitation stage. When C < W, in order to compete for the female gorilla mechanism, the weaker members are likely to fail in competing with other adult males for the female gorilla. Therefore, most of the female gorillas are distributed near the strong individuals in the population, which satisfies the characteristics of Gaussian distribution, so Gaussian variation is introduced to disturb. Figure 2 shows the Gaussian distribution with a mean of 0 and a standard deviation of 1. According to the characteristics of normal distribution, most individuals search near the original location, while a few individuals are far away from the original location, thus enhancing the diversity of the population. It is beneficial to search potential areas, thus improving the search speed, and has good robustness. Gaussian variation is shown in Equation (12).

Gaussian distribution map.
In Equation (12), the mean u is set to 0, the standard deviation (σ) is set to 1, x j and x i denote the positions of two randomly selected gorilla individuals in the population, respectively.
In the exploitation phase, when C > W, it will choose to follow the silverback mechanism, and the silverback gorilla will make all decisions. If the silverback gorilla is not the best one, it will enter the local optimization. In the firefly algorithm, each firefly has its own fluorescein, regardless of gender, the firefly that is brighter than itself will attract and move to that position. The algorithm has the advantages of simple implementation, few parameters and simple operation. Therefore, the firefly algorithm is added to disturb the firefly algorithm. By introducing group communication in the Firefly algorithm, the improved spatial search method follows a new strategy to include a group information exchange function, which improves the dilemma of easily falling into local optimum.The global search ability and optimization accuracy are improved. The low-light firefly approaches the highlight firefly as shown in Equation (13).
In Equation (13), ρ0 is the maximum attraction, λ is the light intensity attraction coefficient, α is the step factor with a random value from 0 to 1, and x j is the individual firefly that is brighter than x i .
Griewank function and Generalized penalized function are selected for simulation experiments to verify the optimization effect of MGTO algorithm. It is compared with the standard GTO, PSO [21], GWO [22], Fire-fly Optimization Algorithm (FA) [23] and Sine Cosine Optimization Algorithm(SCA) [24]. To ensure that the algorithms are relatively fair, the population size and the maximum number of iterations are the same for each algorithm. The maximum number of iterations is 500 and the number of populations is 50.
In the two-dimensional dimension, the Griewank benchmark function searches for the surface as shown in Fig. 3 and the function expression as shown in Equation (14).

Test function f1 (x).
In Equation (14), x ranges from [–600, 600] and the optimal solution is 0.
In the two-dimensional dimension, the Generalizedpenalized benchmark function searches for the surface as shown in Fig. 4 and the function expression as shown in Equation (15).

Test function f2 (x).
In Equation (15), y i = 1 + (x i + 1)/4, x ranges from [–50, 50] and the optimal solution is 0.
Griewank function and Generalizedpenalized function are tested 30 times independently, and the average convergence curve of each algorithm are shown in Fig. 5 and Fig. 6.

The average convergence curve of each algorithm to function f1 (x).

The average Convergence Curve of each algorithm to function f2 (x).
From Fig. 5 and Fig. 6, it can be seen that the f1 (x) benchmark function test function has only all minimal points, and the convergence accuracy of GTO and MGTO is much higher than that of PSO, WDO, GWO and SCA, and has a faster convergence speed. MGTO has higher accuracy and faster convergence compared to GTO.
The test function of f2 (x) benchmark function test function has several local minima, which can be used to test the ability of the optimization-seeking intelligent algorithm to jump out of the local optimum. All the six optimization-seeking algorithms can reach convergence, but PSO, WDO, GWO and SCA perform poorly, with slow convergence and large deviations from the global minima. The convergence accuracy and convergence of GTO and MGTO are better than other optimization algorithms. At the same time, MGTO is more accurate than GTO in terms of finding the best local optimum.
Through f1 (x) and f2 (x) test functions, the performance of six optimization algorithms is tested. It shows that the improved artificial gorilla force optimization algorithm is superior to GTO, PSO, WDO, GWO and SCA algorithms in terms of convergence speed, optimization accuracy and jumping out of local optimal. Since PSO and GWO performed better than WDO and SCA in general, only PSO and GWO are used for the subsequent comparison experiments.
Stochastic configuration networks
Stochastic Configuration Networks (SCN) was first proposed by Wang et al. in 2017 [25], which has the advantages of less human interference, fast learning speed and strong generalization ability.
SCN adopts the boosting idea when generating hidden nodes step by step. Under the constraint of inequality, it can start from a small network, randomly select the input weights and thresholds, and gradually increase the number of hidden layer neuron nodes. The output weights and thresholds are calculated by the least square method until the training accuracy of the network meets the termination conditions.In addition, SCN adds inequality constraints to random parameters, and adaptively selects the range of random parameters according to the selection of random parameters. The specific calculation process of SCN is as follows.
Given an objective function f : R b → R m , L represents the number of hidden layer nodes. Using the Sigmoid function as the activation function to calculate the output g i of the i hidden node. For N samples, the output of the SCN containing the implied nodes is:
In Equation (16), w
i
= [wi1, ⋯ w
im
]
T
, β
i
= [βi1, ⋯ β
im
]
T
denote input weights and output weights respectively; b
i
indicates bias. The network residual at this point is shown in Equation (17).
Whether new hidden nodes need to be added needs to be judged for ∥eL-1∥. If the set error range is met, the network construction is completed; otherwise, new nodes need to be added continuously. The input weight vector w
L
and deviation b
L
are adaptively selected through the supervision mechanism. The generated w
L
and b
L
satisfy the inequality as shown in Equation (18).
In Equation (18),
The output weight matrix of the hidden layer is calculated using the least squares method as shown in Equation (19).
Traditional random configuration networks can only add one node at a time during the learning process. In this way, each new node needs to be re-modeled. When the number of required nodes is large, the construction of the network will become relatively complicated and time-consuming [26]. The point-increment approach largely limits the acquisition of features, so the block-increment approach is used to speed up the model building process. The block incremental learning approach is shown in Fig. 7, where the implicit layer nodes added by block are called node blocks.

Block incremental learning approach.
For input x = {x1, x2, . . . x N }, x i = {xi,1, xi,2, . . . xi,d}, the corresponding output T = {t1, t2, . . . t N }, t i = {ti,1, ti,2, . . . ti,m}. For block increment ΔB, the output residuals are shown in Equation (20).
In Equation (20), fL-Δ
B
is the output of this network, eL-Δ
B
,q (x) = [eL-Δ
B
,q (x1) , . . . eL-Δ
B
,q (x
N
)]
T
. The output block of the implicit layer is shown in Equation (21).
In Equation(21), H
ΔB (w
ΔB, b
ΔB, x
i
) is calculated by the sigmoid function with the value
A large block increment during model construction can result in a greater residual rate, but it is not good for the model’s compactness. It may also result in a rise in network complexity, a steady fall in residual, and a increase in unlearned features, all of which could have an impact on prediction outcomes. Figure 8 shows the average training results for different block increments.

Average training results for different block increments.
Compared with SCN, BSCN training time t is reduced, but it requires more number of nodes L in the hidden layer and the network complexity is significantly higher. Twenty independent experiments were conducted on different node blocks respectively. The results of training time t(s) and the number of hidden nodes in the model are shown in Table 1.
Incremental experimental results of different blocks
As can be seen from Table 1, when the network model is point incremental learning, the establishment of the training model takes about 7.6 seconds. The training time gradually gets shorter as the number of node blocks increases. The training time is obviously shorter when the node block size goes from 1 to 2, and around 63.8% of the training time is saved. The complexity of the network model will rise along with the node block size.
As can be observed from Fig. 8, As the number of node blocks increases, the number of iterations required decreases. But the larger block increment is not conducive to the compactness of the model. When it is a point-increasing learning approach, the number of iterations is the largest, but at the same time it takes the most time. In the network model with node blocks 3 and5, there are 225 hidden nodes total. Although the network model with node block 3 has a longer training time, its network model has far more iterations than the network model with node block 5. It offers a more condensed training model that can learn more features, enhancing fault diagnosis precision.
Considering comprehensively, the block incremental learning method with node block 3 is the best choice.
According to IEC 60599, the operating status of a transformer can be determined by analyzing the concentration of gases H2, CH4, C2H6, C2H4, and C2H2 in the transformer oil. Due to the large difference in the concentrations of the five characteristic gases at the time of failure, a large error will be generated if used directly as model input [27]. Therefore, a pre-treatment is required, and the gas concentration normalization formula is shown in Equation (22).
In equation (22), η C is the concentration of total hydrocarbons and η H is the concentration of hydrocarbons.
The fault labels are encoded in one-hot encoding, which is a method of converting categorical variables into a numerical vector format. Each category has its own column or feature in the numerical vector and is converted to a numerical vector of 0 and 1. One-hot encoding solves the problem that the classifier is difficult to deal with attribute data, and to a certain extent, it also plays a role in expanding features. In this paper, overheating at low and medium temperatures (T1), low energy discharge(D1), high energy discharge(D2), high temperature overheating(T2), normal(N) and partial discharge(PD) correspond to the labels of 100000, 010000, 001000, 000100, 00000010, and 00000001, respectively.
To improve the fault type recognition of the model for a few classes of samples, the RMSE (Root Mean Square Error, RMSE) of the classification model is chosen as the fitness function for the minimization of the optimization algorithm. The difference between observed and true values is measured by using the RMSE. Additionally, it is often used as a standard to measure the prediction results of machine learning models. The formula is shown in Equation (23).
In Equation (23), f (x) is the predicted outcome, y is the true outcome.
The fault diagnosis flow chart of MGTO-BSCN oil-immersed power transformer is shown in Fig. 9. The MGTO-BSCN model with the best fault diagnosis result is obtained by jointly optimizing the two parameters of r and λ. The specific steps are as follows:

Flow chart of fault diagnosis of oil-immersed power transformer based on MGTO-BSCN.
Step 1: set the initial parameters. Initialize the BSCN-related parameters: set the maximum number of iterations (Tmax), the maximum number of hidden layer nodes (Lmax), tolerance error (ɛ), and number of node blocks (NB). Initialize the parameters related to the BGTO optimization algorithm: set the population size (pop), dimensionality (dim), maximum number of iterations (Itermax), upper bound (ub) and lower bound (lb).
Step 2: population initialization. Chebyshev chaotic map initialization is carried out by using Equation (11).
Step 3: for each position after initialization, assign the two parameters r, λ to BSCN. The fitness of each individual gorilla is calculated in turn, ranked, and the position with the best fitness is recorded
Step 4: enter the exploration phase according to equations (1)–(4) and calculate the candidate position vector of the individual gorilla at the next iteration. At the end of the exploration phase, the fitness value of the new position of all individuals is calculated. If it is lower than the optimal position, the candidate position of the individual will replace the original position, and the best individual produced in this stage is regarded as the silverback gorilla.
Step 5: enter the exploitation phase. When C < W, enter the competitive adult female mechanism according to equations (7)–(10). And perform Gaussian variation on individual positions according to equation (12). When C > W, follow the silverback gorilla mechanism according to formula 5 and 6. The firefly disturbance to the population position is carried out, and the update equation as Equation (13). At the end of the exploitation phase, perform a group operation. In this operation, the fitness values of all individuals are calculated and the individual position is updated. If the former is better, the original individual will remain unchanged. And the best solution obtained in the whole population is regarded as the silverback gorilla.
Step 6: determine whether the maximum number of iterations is satisfied. If the condition is met, the optimal parameter is assigned to BSCN. If not, the step4 is returned to continue the iteration.
The data collected in this paper come from three aspects: (1) literature [28] and [29]. (2) IECTC10 database. (3) some power grid. Duplicate and abnormal samples were removed, and a total of 827 sets of these data were selected. The database is trained and tested according to the ratio of 8:2. Transformer status is divided into 6 types of T1, D1, D2, T2, N and PD. The distribution of sample data and the corresponding state coding of fault types are shown in Table 2.
Distribution of sample data and fault type codes
Distribution of sample data and fault type codes
For ELM, SVM and BSCN fault diagnosis models, 20 experiments were conducted separately. Among them, the results of one randomly selected experiment are shown in Figs. 10, 11 and 12 for SVM, ELM and BSCN fault diagnosis, respectively.

Diagnostic results of SVM

Diagnostic results of ELM.

Diagnostic results of BSCN.
After 20 separate experiments, the average correct rate of SVM diagnosis result is 0.7294, ELM diagnosis result is 0.7411, and BSCN diagnosis result is 0.7941. In comparison to SVM and ELM, the average accurate rate of BSCN is 6.5% and 5.3% higher, respectively.
This paper uses a confusion matrix to evaluate fault diagnosis. The confusion matrix is mainly used to compare the classification results with the actual measured values. The accuracy of the classification results can be displayed in it. For the binary classification problem, the confusion matrix is schematically shown in Fig. 13. It has been pointed out in Section 3.2 that the block incremental learning mode with node block 3 is the best choice. Therefore, different optimization algorithms are adopted to conduct experimental analysis on the BSCN network model with node block 3.

Confusion matrix diagram.
The fault diagnosis results of GWO-BSCN and PSO-BSCN transformers are shown in Fig. 14 and Fig. 15, respectively.

Fault diagnosis result chart of GWO-BSCN.

Fault diagnosis result chart of PSO-BSCN.
By analyzing Fig. 14 15, it can be seen that the fault diagnosis accuracy of transformer GWO-BSCN and PSO-BSCN is 86.0% and 84.2%, respectively. For PD faults, the two fault diagnosis models have poor effects, but the GWO-BSCN fault diagnosis model has better performance than PSO-BSCN fault diagnosis model. And it is relatively rare to misdiagnose PD fault class as normal class.
The fault diagnosis results of MGTO and GTO optimization algorithms for optimizing BSCN respectively are shown in Figs. 16 17.

Fault diagnosis result chart of GTO-BSCN.

Fault diagnosis result chartof MGTO-BSCN.
Figures 16 17 show that the MGTO-BSCN transformer fault diagnosis model is sensitive to the fault types of T1, D1 and D2. The comprehensive fault diagnostic accuracy of MGTO-BSCN is the highest, achieving 95.9%, despite the less than ideal diagnosis effect of T2. The accuracy of MGTO-BSCN model is 3.5% higher than that of GTO-BSCN model. It demonstrates that the improved GTO optimization technique has a greater optimization accuracy and can significantly increase fault diagnosis accuracy.
The accuracy variation curves of different models on each fault type are shown in Fig. 18.

Accuracy curve for each fault type.
Figure 18 illustrates that for each type of fault diagnosis, the BSCN fault diagnosis model has greater accuracy than the ELM fault diagnosis model and the SVM fault diagnosis model. It demonstrates the superior effectiveness of the BSCN diagnostic approach. The MGTO-BSCN model performs better than GWO-BSCN, PSO-BSCN, and GTO-BSCN. The findings demonstrate that the established fault diagnosis model can significantly increase the accuracy of transformer fault diagnosis by using the MGTO algorithm to optimize the BSCN network’s parameters. The effectiveness and superiority of the proposed model are proved.
This paper presents a new method of transformer fault diagnosis based on MGTO and BSCN, which uses DGA technique, improved GTO algorithm and SCN with block increments for fault diagnosis of oil-immersed power transformers. The following conclusions are drawn.
(1) The GTO algorithm is improved, which improves the local search accuracy, enhances the global search ability of the algorithm, and avoids the algorithm falling into the local optimal value. The tests of different types of benchmark functions show that the proposed MGTO algorithm has strong convergence ability and high optimization accuracy.
(2) The block-increment learning method is adopted for SCN, which greatly improves the training speed compared to the traditional point-increment learning method. Choosing the appropriate node blocks can improve the speed while learning a sufficient amount of features to maintain the tightness and accuracy of the training model.
(3) The average accuracy of the BSCN fault diagnosis model is 6.5% and 5.3% higher than that of the conventional SVM and ELM fault diagnosis models, respectively, indicating the reliability and superiority of the proposed model. The accuracy of the transformer failure diagnosis approach based on MGTO and BSCN suggested in this paper achieves 95.9% through a series of comparative experiments. It is improved by 3.5%, 9.9%, and 11.7% in comparison to the BSCN fault diagnosis model optimized by GTO, GWO, and PSO, respectively. The superiority of the MGTO algorithm is demonstrated. The experimental analysis shows that this method has strong generalization ability and can effectively improve the accuracy of transformer fault diagnosis.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61572416), Hunan province Natural science Zhuzhou United foundation (2020JJ6009), Key Laboratory Open Project Fund of State Heavy Duty AC Drive Electric Locomotive Systems Integration and Key Laboratory Open Project Fund of Disaster Prevention and Mitigation for Power Grid Transmission and Transformation Equipment. Postgraduate Scientific Research Innovation Project of Hunan Province (QL20210153).
