An efficient CNN-transformer hybrid approach for water turbine unit failure prediction

Abstract

Water Turbine Unit is the core equipment of water power generation. It is the most important research object in hydroelectric power. The analysis of daily monitoring data of Water Turbines can evaluate its running state and avoid the loss caused by failure. In this paper, A hybrid neural network architecture which CNN combines Transformer is proposed to evaluate the health state of water turbines based on multivariate long time series data. It has two core components: (i) dilated convolution applies to capture low-level and local semantic information, then the output of convolutional layers is divided into subseries-level patches by time, these patches are regarded as input tokens of Transformer layers; (ii) utilizing self-attention of Transformer to extract high-level and global semantic information. Patching design naturally has benefit as follow: (i) local semantic information is retained in Transformer input tokens and high-level semantic confirm to common sense of human understanding through low-level construction; (ii) the length of Transformer input sequence is greatly shorted and attention can be more concentrated compared with point-wise form. Meanwhile, the computation and memory usage reduces at the same time. The experimental result indicates that the hybrid architecture can achieves excellent performance in time series understanding.

Keywords

CNN and Transformer hybrid health condition water turbine multivariate time series time series analysis

1. Introduction

Electricity is an indispensable power resource for modern industrial and life. Hydropower generation is an important branch of electricity production. Compared with other power generation methods, it has many advantages, such as renewable, environmental protection, etc. Hydro turbine is a kind of fluid machinery widely used in the field of hydropower generation, and is the core power generation equipment in hydropower stations. Its main function is to covert the kinetic energy of the water flow into mechanical energy, and the drive the power generation equipment to convert the mechanical energy into electrical energy. Its failure will be cause huge economic losses and safety hazards. Traditionally, maintenance personnel analyze the operating status of the unit through daily monitoring data, while, due to differences in manufacturing technology, various operating environment, inexperience of the maintenance personnel, it becomes a difficult problem to detect and find abnormalities in the early stage, especially the slight fluctuations in monitoring data. Therefore, it is a general trend to solve the above problems through AI technology.

The monitoring data often tends to span a long period of time, long-term time series analysis requires to sufficient long-term context information, Besides, the local information is also important because of the devil is in the details. Convolutional operator is good at extracting local features via the kernel design. Recently, Transformer have shown great power in sequence modeling and global features extracting with the self-attention module. Based on the above motivations, a new architecture that combines CNN and Transformer for time series classification task is proposed. It has three main components: (i) dilated convolution is used for focus the local details; (ii) Transformer is used for modeling the long-term context; (iii) a module is designed to aggregate the different range information. The task of assessing the health condition of water turbine unit is adopt to verify the validity of our method. It accepts multivariate long time series monitoring data and outputs the faulty component. Experiments show that our method is effectiveness and efficiency in time series analysis application, and we strongly believe this network will be effective on any time series analysis task.

2. Review of the literature

In the early days, time series classification task mainly relied on machine learning methods. Among the various of approaches, the most impressive one is nearest neighbor applied a distance function [1]. Particularly, the combination of dynamic time warping distance and nearest neighbor was a breakthrough at that time [2]. Later, researchers discovered that ensembling different distance functions and various discriminant models could archive better results. While, the disadvantages of these methods is obvious, machine learning approaches are for structured data, often applied hand-designed features based on statistical theory, unfortunately, time series data is usually unstructured, the transformation between structured and unstructured is accompanied by information loss, and even more fatal is unable to capture the devil of data. Thus, it is necessary to develop a simple method that directly predicts the category of input.

Deep learning have been seen so many successful applications in different domains, it is desired to bring deep learning methods to time series classification tasks. Recurrent Neural Network (RNN) are specifically designed for modeling temporal sequences, it takes input from the previous step and current state. When a standard RNN is exposed to long sequences or phrase it tends to lose the previous information because it lacks the ability of storing the long-term context, this problem is commonly defined as vanishing gradients. Many specialized versions of models are created to overcome this disadvantage, such as LSTM, GRU and so on. Deep Convolutional Neural Network (CNN) can be said the most successful architecture, especially, in the computer vision field. It relies on the kernels to slide and extract features in space. There are so many approaches are proposed to model the long-range time series analysis, Beijie Hong built a memory pool to storage distant information [3], which is to used refer the history when it is necessary, they thought it can avoid the information loss in this way. Cui used multi-scale convolutional layers to capture features at different scales [4], which can explore the knowledge of different receptive field. Falze proposed a hybrid architecture of LSTM and convloutional network to enable spatiotemporal feature learning [5]. Although RNN have been designed and optimized, they still cannot fully solve the problems of gradient vanishing and long-range dependencies. Convolutional layer are inspired the theories of digital image processing, the convolutional operator is designed to capture local information by sliding step by step, While, the shallow CNNs are insufficient to model dependencies that extend beyond the receptive field. Just extending the receptive field by stacking the convolutional layers is not the perfect way to solve this problem. More layers means more parameters and deeper network, meanwhile, deeper networks mean more computation and harder to train.

Over the last few years, Transformer is the most successful and fascinating design in AI field, its excellent capabilities at modeling long-range dependencies based on self-attention architectures. Motivated by these observations, we propose a hybrid network of CNN and Transformer. Our approach is influenced by Vision Transformer (VIT), great work that use transformer block for image classification [6]. First, utilizing CNN strong inductive biases (e.g., local connectivity and translation equivariance) to capture local and low-level features, such as shape, amplitude, etc. Second, segment time series sequence into patches by time, capture the global and high-level semantic information based on Transformer blocks. Since timing information generally has redundancy, applied dilated convolutional operator to obtain more diverse content. To adapt to the multivariate case, 1x1 convolution is used for aggregation before inputting Transformer block. In order to collect monitoring data required for experiment in multiple dimensions and solve the problem of few samples of difference degrees of fault data, an equivalent test environment platform has been established. Our approach have outstanding performance over the previous methods, we hope this method will be a indispensable part of hydropower in the further.

3. Method

An overview of the hybrid architecture is depicted in Fig. 1. The model receives as input a 1D multi-channel time series data $X\in R^{M\times L}$ consisting of $M$ variables of length $L$ . In order to avoid the instability of the process of optimization caused by different ranges of data value, the input of network is normalized by min-max as follow.

$\displaystyle y=\frac{x-x_{\min}}{x_{\max}-x_{\min}}$ (1)

3.1 Local features extraction models

Convolution operator is a form of non-linear transformation of input tensor. It is a powerful tool of extracting local information because of its property. The success of the CNN in the various tasks, researchers have adopt it for time series analysis [7].

In our opinion, the shape of time series represented by the value of data values is the most basic and important feature. Meanwhile, a lot of redundant and useless information is hidden in time series, for example, values with small intervals of show high consistency. Inspired by above, channel-wise dilated convolutional layers are applied to learn the local representation. Given the multi-channel tensor $X=[x_{1},x_{2},\ldots,x_{M}]$ , $x_{i}\in R^{L}$ , the results of channel-wise dilated convolution:

$\displaystyle Z^{c}=[f_{d}(x_{1}),f_{d}(x_{2})\ldots f_{d}(x_{M})],Z^{c}\in R^% {L}$ (2)

$f_{d}$ represents the dilated convolutional operator, illustrated in Fig. 1 (left). it is similar to traditional convolution computation:

$\displaystyle y_{i}=\sum_{k}x[i+r\times k]w[k]$ (3)

$r$ represents the stride which decides the spacing of elements which in the same computation window, when $r$ equals 1, it is the traditional convohtional operation.

The standard Transformer layer takes 1D sequence of token embedding as input, therefore, 1x1 convolution is used to reduce the channels of previous layer output and aggregate the information from different channels. In this paper, we stacked three dilated convolutional layers, the dilation rates are 1, 2, 4, respectively, and batch normalization is applied in each layer, the activation function we used is ReLU.

3.2 Global features extraction models

Our global features extraction model, consisting of 6 Transformer blocks, aims to output the global representation based on local representation. We segment the long time series $S\in R^{L}$ into patches $S=[s_{1},s_{2},\ldots,s_{n}]$ , $S\in R^{0\times 1}$ by time, consistent with Transformer input format.

Figure 1.

Overview of our proposed architecture, it consists of two models: convolution models extract local and low-level semantic information. Transformer models capture global and high-level semantic information, both are aggregated by cross-attention.

Position embedding

Temporal information is necessary for time series analysis. As mentioned in [8], time location information of patches is added to $s_{i}$

$\displaystyle s_{i}=s_{i}+pe_{i}$ (4)

Where $pe_{i}\in R^{l}$ represents a learnable positional embedding and updates while model is training via gradient descent. Local representation and position embedding together constitute the input of global information extractor.

Self-attention

Like information exploration, human beings firstly focus on the more important parts of object in daily life. Mimicking the way of understanding of human beings, researchers introduced attention mechanism for deep learning to filter out redundant, irrelevant, insignificant information. Various of attention mechanism designs exist. Among them, self-attention [8] was the most influential. Queries, keys and values are generated from the input tokens by linear transformation. The output is a weighted sum of values, where the normalization weight of all elements is scaled dot product between queries and keys follow softmax activation function.

For easily understanding the computational logic of self-attention, We first describe the element-wise form. Given the input token $s_{i}\in R^{l},i=1,2,\ldots,n$ , the query $q_{i}\in R^{d}$ , key $k_{i}\in R^{d}$ and the value $v_{i}\in R^{d}$ are linear transformation of $s_{i}$ through the learnable parameters $W^{q}$ , $W^{k}$ and $W^{v}$ , $d$ is the dimensional size of target linear space.

$\displaystyle q_{i}=s_{i}^{T}W^{q},w^{q}\in R^{I\times d}$ $\displaystyle k_{i}=s_{i}^{T}W^{k},W^{k}\in R^{l\times d}$ (5) $\displaystyle v_{i}=s_{i}^{T}W^{v},W^{v}\in R^{l\times d}$

The normalization weight is the scaled dot product of query and key, then get the weighted sum of value

$\displaystyle a_{ij}=\textit{softmax}\left(\frac{q_{i}k_{j}^{T}}{\sqrt{d}}\right)$ (6)

where ${q_{i}k_{j}^{T}}$ measures the similarity or correlation between token $i$ and token $j$ , $a_{ij}$ is the weight of sum, thereby the final output of token $i$ through self-attention module can compute as follows:

$\displaystyle z_{i}=\sum_{j}a_{ij}v_{j}$ (7)

In practice, all the tokens should compute the weight sum as above, we can extend to matrices from the element-wise form. Let $S\in R^{n\times l}$ denote all input tokens of self-attention, stacked by $s_{i}$ . $Q$ , $K$ and $V$ denote the query matrix, key matrix and value matrix, respectively. SA denote self-attention operator. Similarly, the final output can be computed as follows:

$\displaystyle A=\text{softmax}\left(\frac{QK^{T}}{\sqrt{d}}\right)$ (8) $\displaystyle SA(Q,K,V)=AV$ (9)

We have mentioned that the core idea of computing $Q$ , $K$ , $V$ is linear transformation, transform the space of token to another to seen unseen information. Why not using multiple transformation to seen more? That is multi-head self-attention. Figure 2 illustrates the application of multi-head self-attention structure.

$\displaystyle\textit{head}_{i}=SA(QW_{i}^{q},KW_{i}^{k},VW_{i}^{v})$ (10) $\displaystyle\textit{MHSA}(Q,K,V)=\textit{concat}(\textit{head}_{1},\textit{% head}_{2},\ldots,\textit{head}_{h})W^{o}$ (11)

where the $W_{i}^{q}$ , $W_{i}^{k}$ and $W_{i}^{k}$ represents the linear transformation matrix, $\textit{head}_{i}$ represents the output of ith head of multi-head self-attention, $W^{o}$ represents an output project matrix, and concat(.) represents the matrix concatenation operator.

Transformer block

Self-attention is the soul of Transformer, furthermore, layer normalization layer, residual connection and multi-layer perceptron are its components. Figure 3 is an brief overview of Transformer block. Layer normalization and residual connection make the training of model more stable and converge faster. Stacked Transformer blocks are capacity to capture global information, while, the intuition is that a fixed-size representation is insufficient to express all critical features in both local and global. Local information will be lost after global information extraction process. Hence, we propose a hybrid representation aggregation layer to integrate local and global information via cross-attention. Cross-attention is similar to self-attention, the distinction is that Q and K, V are generated from different input. In our aggregation model, Q is generated by global representation and K, V are generated by local representation, the output of Transformer blocks.

$\displaystyle Q_{c}=f_{g}W_{c}^{q}$ $\displaystyle K_{c}=f_{l}W_{c}^{k}$ (12) $\displaystyle V_{c}=f_{l}W_{c}^{v}$ $\displaystyle\textit{Agg}(\textit{local},\textit{gloabl})=SA(Q_{c},K_{c},V_{c})$

$f_{g}$ represents the global representation, $f_{l}$ represents the local representation, $W_{c}^{q}$ , $W_{c}^{k}$ and $W_{c}^{v}$ represents the linear transformation matrix, and $\textit{Agg}(.)$ represents the aggregation model. Similar to transformer block design, residual connection and layer normalization are also added. The output of aggregation model is the logits of time series classification task.

Figure 2.

A brief illustration of multi-head self-attention.

Figure 3.

Transformer block structure.

4. Experiments

Datasets

It is almost impossible to collect sufficient and various of faults data from production environment. For the purpose of verifying the performance of our proposed method, a micro hydroelectric turbine units and some sensors were bought from the internet and equivalent test environment was established. The hardware parameters of test environment can be seen in Table 1 and the live situation of test environment is shown in Fig. 4. We simulated faults via the control units, five types of faults are our research objects, Genset Temperature Faults (GTF), Bearing Temperature Fault (BTF), Bearing Horizontal Vibration Amplitude Fault (BHVAF), Bearing Vertical Vibration Amplitude Fault (BVVAF) and Oil Pressure Fault (OPF). These states are closely related to the health state of the turbine. For example, the amplitude is a measure of whether the mechanical operation of the water wheel is normal. If the temperature is too high, it means whether the heat energy generated by the mechanical operation exceeds the normal value. Overheating will cause mechanical damage. In an experimental setting, we measured the state of turbine units every second and stored on the storage devices. A total of 5200 pieces of record were collected, all the results were generated through 5-fold cross validation in our experiments. These records are continuous floating-point type value data, the sampling rate is 8k, and the range of different data is different. The categorical distribution of collected data is shown in Fig. 4.

Implementation Details.

We evaluated the health state of water turbine units via multi-class classification task. We implement the submodels using Pytorch 1.8.0 and the models are jointly trained on our workstation (Ubuntu 18.04 LTS with four Intel Core i9 @3.60 GHz CPUs, four NVIDIA GeForce 3090 graphics cards, and 128-GB RAM, the version of CUDA is 11.0. Python version is 3.8). The parameters of the network are initialized by xavier initialization method, It employs an AdamW optimizer, which $\beta_{1}=$ 0.9, $\beta_{2}=$ 0.999 and weight decay $=$ 0.005, respectively. The learning rate starts from 0.01 and drops every 20 epochs, a cosine decay mode is applied, moreover, minimum learning rate is limited to 0.005. Batch size we adopt is 128 and epoch is 120. The total dilated convolutional layers is experimentally tested 3 layers and the dilation rates are 1, 2 and 4, respectively. 6 Transformer blocks is used, the input dimension of Transformer is 384 and only one cross attention layer is used before the classification head. The loss function is standard cross-entropy loss.

Table 1
Important parameters of our experimental equipment

Turbine & Sensor type	Sensor type	Important parameters
Micro turbine units	HL220-WJ-60	2.36 m³/s, 750 r/min, 630 kw
Temperature sensor	GXC58-DIS	( $-$ 40 ${{}^{\circ}}$ C–90 ${{}^{\circ}}$ C), (0–100%)
Pressure sensor	32PCS	0–250 pis
Vibration sensor	SW-420-NC	3.5V-5V

Figure 4.

The categorical distribution of collected data.

Experimental results

In order to further research the difference in performance between the proposed method and the traditional methods, a comparison experiment on 1D ResNet50, 1D ResNet50 $+$ LSTM, 1D Dilated ResNet50, 1D Dilated ResNet50 $+$ LSTM, Multi-Scale ResNet50, LSTM and Transformer algorithm is conducted, Table 2 presents the results of comparisons and the visualization of comparisons is shown in Fig. 6. We employ three frequently-used metrics in classification task, these measurements are precision, recall, F1, respectively. From Table 2, we can see our proposed method have an significant improvement by about 5.5%–15% in F1-score compared with classical time series analysis methods, and the other thing we should notice is temporal information is very important to our research via the comparison result with or without LSTM of ResNet50. The performance of LSTM is the worst of all methods, which once again explained that RNNs is weak in capturing long-distance correlations. Moreover, we found a pure Transformer architecture is not as remarkable as we excepted, we guess that the amount of data is not sufficient to meet the need of Transformer training. The comparison of 12 layers Transformer and our proposed method show that using CNN to extracting features is an effective way of speeding up convergence and the final performance, meanwhile, reducing the reliance on the amount of data. The training loss curve of our method and Transformer is illustrated in Fig. 7.

Table 2

Comparison results of algorithm

Fault Type	ResNet50 (1D)			ResNet50 (1D) $+$ LSTM			Dilated ResNet50 (1D)			Dilated ResNet50 (1D) $+$ LSTM
	Precision	Recall	F1	Precision	Recall	F1	Precision	Recall	F1	Precision	Recall	F1
GTF	0.872	0.793	0.830	0.912	0.891	0.901	0.881	0.892	0.886	0.887	0.902	0.894
BTF	0.813	0.826	0.819	0.924	0.903	0.913	0.914	0.874	0.894	0.920	0.883	0.901
BHVAF	0.834	0.821	0.827	0.901	0.898	0.899	0.911	0.904	0.905	0.907	0.914	0.910
BVVAF	0.792	0.813	0.802	0.897	0.912	0.904	0.924	0.908	0.916	0.919	0.921	0.920
OPF	0.783	0.842	0.811	0.933	0.902	0.917	0.946	0.931	0.938	0.937	0.942	0.939
Fault type	Multi-scale ResNet50			LSTM			Transformer (12-layers)			Our method
	Precision	Recall	F1	Precision	Recall	F1	Precision	Recall	F1	Precision	Recall	F1
GTF	0.883	0.852	0.867	0.851	0.802	0.826	0.881	0.892	0.886	0.933	0.927	0.930
BTF	0.847	0.832	0.839	0.804	0.799	0.801	0.914	0.874	0.894	0.952	0.937	0.944
BHVAF	0.857	0.839	0.848	0.815	0.82	0.817	0.911	0.904	0.905	0.903	0.946	0.924
BVVAF	0.825	0.824	0.824	0.811	0.808	0.809	0.924	0.908	0.916	0.928	0.916	0.922
OPF	0.817	0.843	0.83	0.797	0.837	0.817	0.946	0.931	0.938	0.961	0.938	0.949

Figure 5.

Experiment platform diagram.

Figure 6.

Visualization of comparison results.

Figure 7.

The training loss curve.

Ablation study

We first analyze the effect of number of dilated convolutional layers and the scale of dilated ratios, the results are reported in Table 3. It is observed that dilated convolutional operator is undoubtedly necessary to gain a notable boost relative to using traditional convolution, i.e., dilated ratio $=$ [1, 1, 1]. We also notice that too many convolutional layers have a negative effect on the performance, as does a too large or too small dilated ratio.

Table 3

Ablative testing for increasing number of layers (left) and different dilated ratio (right) of our method

	Marco precision	Marco recall	Marco F1
Layers
1	0.901	0.904	0.903
2	0.911	0.937	0.924
3	0.935	0.933	0.934
4	0.928	0.930	0.929
Dilated ratio
[1, 1, 1]	0.917	0.924	0.920
[2, 2, 2]	0.922	0.929	0.925
[1, 2, 4]	0.935	0.933	0.934
[3, 3, 3]	0.941	0.918	0.929

Table 4 lists the results of different number of Transformer layers and different dimensions of Transformer input. From Table 4, we can see that Transformer can capture global information and be able to model the long-distance dependence based on the function of self-attention, what is the core components of Transformer. However, it usually requires stacking multiple Transformer layers to achieve it. At the same time, deeper networks require more data and more skills to update the parameters of network. That is why more Transformer layers is less effective. The reasons above can explain the effect of hidden size of Transformer input on the performance.

Table 4

Comparison results of effect of the change of Transformer layers (left) and vector size of Transformer input (right)

	Marco precision	Marco recall	Marco F1
Transformer layers
2	0.868	0.874	0.871
4	0.895	0.903	0.899
6	0.935	0.933	0.934
12	0.919	0.92	0.919
Hidden size
128	0.912	0.909	0.910
256	0.928	0.931	0.929
384	0.935	0.933	0.934
768	0.941	0.905	0.922

The final ablative testing is the importance of our proposed information aggregation layer of local and global representation based on cross-attention. It can be observed from Table 5 that the information aggregation is necessary, it does improve the model performance and validates our inference of information loss, what is capturing the global information via self-attention, the details of local will be lost. For better performance, we can not ignore the local features. Utilizing the cross-attention to enhance the final representation by global representation adaptively looks for supportive information from local representation. We found that the performance is insignificant improvement with number of cross-attention layers increasing.

Table 5

Results of with and without information aggregation layer

Aggregation layer	Marco precision	Marco recall	Marco F1
With	0.935	0.933	0.934
Without	0.923	0.919	0.921

5. Conclusion

In the paper, A CNN and Transformer hybrid deep neural network is proposed to assess the state of health of water turbines based on multivariate long time series data. In order to verify the effectiveness of the method, an equivalent test environment was established. Through the analysis of fault experiments, it can be seen that the combine model of CNN and Transformer can effectively capture the local and global information. Meanwhile, local information will be lost after the global module working. Therefore, We proposed a representation aggregation layer to solve this problem. The results of experiments show that our propose method is advanced compared with the classical time series analysis method. While, the source of data used in this paper is narrow, we hope to verify the effectiveness in production environment. This will be the focus of our future work.

References

Lines

Bagnall

. Time series classification with ensembles of elastic distance measures. Data Mining and Knowledge Discovery. 2015; 29: 565-592.

Bagnall

Lines

Bostrom

, et al. The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery. 2017; 31: 606-660.

Karim

Majumdar

Darabi

, et al. LSTM fully convolutional networks for time series classification. IEEE Access. 2017; 6: 1662-1669.

Qian

Xiao

Zheng

, et al. Dynamic multi-scale convolutional neural network for time series classification. IEEE Access. 2020; 8: 109732-109746.

Karim

Majumdar

Darabi

, et al. Multivariate LSTM-FCNs for time series classification. Neural Networks. 2019; 116: 237-245.

Dosovitskiy

Beyer

Kolesnikov

Weissenborn

Zhai

Unterthiner

Dehghani

Minderer

Heigold

Gelly

Uszkoreit

Houlsby

. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, 2020.

Lara-Benítez

Carranza-García

Riquelme

. An experimental review on deep learning architectures for time series forecasting. International Journal of Neural Systems. 2021; 31(03): 2130001.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Advances in neural information processing systems. 2017; 30.

Amari

. Natural gradient works efficiently in learning. Neural Computation. 1998; 10(2): 251-276.

10.

Xie

. Analysis of the impact of the accuracy of current Transformers on the safety performance of grid equipment. Science and technology innovation and application. 2015(05): 110.

11.

Ting

Marco

Roberto

Sergio

. On-Line Fault Detection Technique for Voltage Transformers. Measurement. 2017; 108.

12.

Bianchi

Livi

Mikalsen

KØ

, et al. Learning representations of multivariate time series with missing data. Pattern Recognition. 2019; 96: 106973.

13.

Hochreiter

. The vanishing gradient problem during learning recurrent neural nets and problem solutions. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 1998; 6(02): 107-116.

14.

Jin

Xuan

, et al. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Advances in neural information processing systems. 2019; 32.

15.

Lucas

Shifaz

Pelletier

, et al. Proximity forest: An effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery. 2019; 33(3): 607-635.

16.

Ismail Fawaz

Forestier

Weber

, et al. Deep learning for time series classification: A review. Data Mining and Knowledge Discovery. 2019; 33(4): 917-963.

17.

Taieb

Sorjamaa

Bontempi

. Multiple-output modeling for multi-step-ahead time series forecasting. Neurocomputing. 2010; 73(10-12): 1950-1957.

18.

Mehdiyev

Lahann

Emrich

, et al. Time series classification using deep learning for process planning: A case from the process industry. Procedia Computer Science. 2017; 114: 242-249.

19.

Petitjean

Forestier

Webb

, et al. Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm. Knowledge and Information Systems. 2016; 47: 1-26.

20.

Dorozhko

Shaimordanova

. Reduction of current transformers errors of rural electrical networks when operating at low currents. IOP Conference Series: Earth and Environmental Science. 2022; 996(1).

21.

Xing

Jin

Lin

Liu

. Current transformer saturation identification and effective data application strategy based on wavelet packet transform. Journal of Electrical Technology. 2019; 34(06): 1170-1179. doi: 10.19595/j.cnki.1000-06.

22.

Jiao

. A new method to improve fault location accuracy in transmission line based on fuzzy multi-sensor data fusion. IEEE Transactions on Smart Grid. 2018; 10(4): 4211-4220.