Forecasting top oil temperature for UHV reactor using Seq2Seq model with convolutional block attention mechanism

Abstract

The top oil temperature in ultra-high voltage (UHV) reactors has attracted enormous interest due to its wide applications in fault diagnosis and insulation evaluation. In this work, the precise prediction method based on the Seq2Seq module with the convolutional block attention mechanism is proposed for the UHV reactor. To reduce the influence of vibratility and improve computational efficiency, a combination of the encoding layer and decoding layer named Seq2Seq is performed to reconstruct the complex raw data. The convolutional block attention mechanism (CBAM), composed of spatial attention and channel attention, is utilized to maximize the use of information in data. The Seq2Seq-CBAM is established to forecast the variation tendency of the oil temperatures in the UHV reactor. The experimental results show that the proposed method achieves high prediction accuracy for the top oil temperature in both single-step and multi-step.

Keywords

UHV reactor top oil temperature attention convolution block attention mechanism (CBAM)online detection scenario

1. Introduction

In the transportation of electric energy, ultra-high voltage (UHV) transmission has the characteristics of large capacity, low loss, etc. [1], and UHV transmission lines are being vigorously developed [2]. The reliability of transformers, reactors, and circuit breakers guarantees the operation of UHV transmission lines [3]. High-voltage shunt reactors play an important role in transmission networks. The main functions are reducing the capacity rise effect of the UHV transmission line at no load or light load, suppressing over-voltage, and optimizing UHV lines voltage [4]. Due to the large capacity and high voltage, the internal heating of the UHV reactor is severe and easy to cause insulation damage. It is difficult to judge the internal insulation status of UHV shunt reactors directly. Oil temperature can reflect the internal insulation status of UHV shunt reactors [5].

In recent years, with the development of smart monitoring of high voltage assets, the relevant datasets, such as oil temperature, load, etc., have been collected and have the advantages of high sensitivity, high precision, low cost, and high integration [6]. Catterson et al. [7] bridged between the smart grid and insulation communities and examined various technologies considered under the smart grid umbrella and studied how these affect electrical insulation. Based on the development of smart monitoring technology, there is a data basis for reactor oil temperature prediction. If oil temperature can be predicted in advance, the insulation status of the UHV shunt reactors can be evaluated in time to improve their operational reliability and safety.

Owing to its outstanding ability to predict complex data, some machine learning models have been widely used in transformer and reactor state research. Many scholars used SVM and grey prediction to perform parameter prediction, including oil temperature and dissolved gases in oil [8–10]. In some cases, machine learning is more precise than other statistical methods, but a single model still has inherent limitations in performance and robustness [11]. Fei et al. [12] proposed a hybrid model of RVM and PSO to predict dissolved gas content in transformer oil. Particle swarm optimization (PSO) is applied to choose the appropriate embedded dimension, and RVM is used to do single-step and multi-step prediction. Zeng et al. [13] proposed a model for predicting the dissolved gas concentration in the power transformer based on the modified grey wolf optimizer and the least squares support vector machine (MGWO-LSSVM). Hu et al. [14] developed an optimized support vector machine (SVM) based on particle swarm optimization (PSO) to establish diagnosis and prediction models and perform optimization selection on the parameters of the models.

Machine learning algorithms face the problems of over-fitting and local optimum when processing high-dimensional data [15]. Deep learning models, including LSTM, GRNN, etc., are introduced to predict parameters in the transformer. The parameters include oil temperature, winding temperature, dissolved gases in the oil, etc. The authors in [16–18] developed the model of LSTM to determine the correlations between historical status monitoring information and the gas content in the forecasting time. The result showed that the model achieves greater forecasting accuracy than the grey model, backpropagation network, and support vector machine model. Qi et al. [19] considered the time-series theory and oil chromatography data characteristics and proposed a deep recurrent belief network (DRBN) model for transformer state prediction. Liu et al. [20] presented a time series forecasting model based on the seasonal autoregressive integrated moving average (SARIMA) model.

As complex datasets are used for forecasting models, feature extraction methods are introduced to consider the relationship between features and reduce data redundancy. The authors in [21,22] proposed kernel principal component analysis (KPCA) to consider the nonlinear components in the original data and select the main features to avoid the influence of redundant features. Lin et al. [23] proposed conditional mutual information (CMI) to simultaneously analyze correlation and redundancy between variables. Using the CMI value as the index to screen out features can ensure low redundancy on the premise that the selected feature is strongly correlated with the predictive target.

Despite effective results, feature extraction can only remove redundant data in one dimension. This has limited improvement in the prediction accuracy of the prediction model [24]. Due to the same ability to remove redundant information and not limited by the data dimension, many scholars have begun to study the attention mechanism [25]. Common attention mechanisms include self-attention [26], Bahdanau attention [27], Luong attention [28], etc. Owing to the advantages of processing high-dimensional data, attention mechanisms are widely used in the electrical field. Zhang et al. [29] proposed a remaining useful life prediction model based on dual-aspect self-attention, which works in parallel to extract features of different sensors and time steps simultaneously. Qin et al. [30] employed the long short-term memory (LSTM) architecture in the load prediction model, supported by an attention mechanism to prevent performance deterioration. The above studies only analyzed the correlation between the different features or time steps and could not combine them to figure out optimal attention values. A convolutional block attention module (CBAM) is proposed in this work, which can figure out attention values calculated by blending cross-channel and spatial information [31]. Wang et al. [32] proposed a novel convolutional neural network with a non-local convolutional block attention module (NCBAM) to automatically classify ECG heartbeats. The result showed that NCBAM achieves a noticeable improvement in classifying ECG heartbeats. Therefore, inspired by the works of Wang and his colleagues, this work applies the CBAM originally applied in the field of image recognition to the prediction of the oil temperature of the reactor.

The UHV reactor datasets’ intense volatility affects the computational attention value of the attention mechanism. Many scholars began to study the algorithms of EMD and VMD, decomposing the non-stationary series data into stationary subsequences with different scales [33]. Wang et al. [34] overcame the problem that traditional prediction methods have less consideration of the interaction between different trend items in the oil temperature series and proposed the VMD (Variational Modal Decomposition) and GRU (Gated Recurrent Unit) neural network to predict top oil temperature. Ruan et al. [35] proposed a hybrid prediction model based on modal decomposition. The variational mode decomposition (VMD) technique was used to decompose the original sequence into more robust subsequences. The LSTM neural network model was used to predict each subsequence. The result showed that the proposed hybrid model improves the prediction accuracy and robustness. Establishing forecasting models for each subsequence takes time and computer resources. In this work, the Seq2Seq structure involving an encoder and decoder is proposed to overcome the above problems and normalize noise [36].

In this paper, a top oil temperature forecasting method based on sequence to sequence (Seq2Seq) structure using a convolutional block attention mechanism (CBAM) is proposed. The Seq2Seq framework is used to convert the original sequence to a new sequence by the encoder, enabling the model to quickly learn the timing relationship between the data. The predicted data is decoded through the decoder. Then, the CBAM is established to calculate attention scores using two dimensions of information to preserve data information and avoid losing critical information. The Seq2Seq-CBAM reduces the impact of large fluctuations in the original data and overcomes the problem of predictive models being affected by redundant information. The proposed prediction framework achieves high prediction accuracy for the top oil temperature in the UHV reactor and can be easily deployed in online monitoring scenarios.

2. Structure of reactor and operation parameter

High-voltage shunt reactors are the essential equipment commonly used in a high-voltage power grid, which can effectively compensate for the capacitance of a high-voltage transmission line, absorb its reactive power, and prevent the voltage increase caused by excessive capacity when the power grid is light. The data parameters collected by the online monitoring device on the reactor are shown in Table 1. The external structure of the UHV reactor comprises bushing, an oil-expansion chamber, a gas-actuated relay, and various temperature gauges. The oil-expansion chamber is one of the most critical parts of the shunt reactor, which is installed in the upper and connected with the reactor oil tank with a bend. The function of the chamber is to compensate for the thermal expansion and contraction volume to avoid direct contact between transformer oil and the atmosphere. The temperature of the oil in the oil-expansion chamber is an essential index for the evaluation of the state of the reactor [37], affected by various factors such as environment temperature, load, and the operational status of the internal components [38]. The external structure of the reactor is shown in Fig. 1, comprised of seven parts: heat sink, bushing, oil-expansion chamber, gas-actuated relay, thermometer for winding, thermometer, and control box.

Table 1
Description of thermal parameters and related factors

Parameters Notation Specific meaning

Thermal parameters OT The daily maximum oil temperature (°C)

WT The daily maximum winding temperature next to the neutral bushing (°C)

OTNP Oil temperature of neutral point (°C)

Dissolved gases in oil TH Total hydrocarbon content of the main part (μL/L)

H₂ Hydrogen content of the main part (μL/L)

C₂H₂ Acetylene content of the main part (μL/L)

CH₄ Methane content of the main part (μL/L)

CO Carbon monoxide content of the main part (μL/L)

CO₂ Carbon dioxidee content of the main part (μL/L)

C₂H₆ Ethane content of the main part (μL/L)

C₂H₄ Vinyl content of the main part (μL/L)

Environment parameters ET Temperature around ultra-high-voltage reactor (°C)

RH The moisture content around ultra-high-voltage reactor (%)

Other load The electrical component or portion of a circuit that consumes electric power (MW)

Parameters	Notation	Specific meaning
Thermal parameters	OT	The daily maximum oil temperature (°C)
	WT	The daily maximum winding temperature next to the neutral bushing (°C)
	OTNP	Oil temperature of neutral point (°C)
Dissolved gases in oil	TH	Total hydrocarbon content of the main part (μL/L)
	H₂	Hydrogen content of the main part (μL/L)
	C₂H₂	Acetylene content of the main part (μL/L)
	CH₄	Methane content of the main part (μL/L)
	CO	Carbon monoxide content of the main part (μL/L)
	CO₂	Carbon dioxidee content of the main part (μL/L)
	C₂H₆	Ethane content of the main part (μL/L)
	C₂H₄	Vinyl content of the main part (μL/L)
Environment parameters	ET	Temperature around ultra-high-voltage reactor (°C)
	RH	The moisture content around ultra-high-voltage reactor (%)
Other	load	The electrical component or portion of a circuit that consumes electric power (MW)

Fig. 1.

The external structure of reactor.

Winding temperature dramatically influences the transformer insulation and is closely related to the top oil temperature [39]. The same goes for reactors. The reactor insulation oil is affected by temperature and breaks down into low-molecular hydrocarbon gases during operation, in which H₂, C₂H₂, and total hydrocarbon hold the highest weights [40]. The load is always an important indicator that impacts the status of the reactor. Environmental factors such as temperature and relative humidity can improve the model’s accuracy. In this paper, temperature and relative humidity are used due to their great influence on the top oil temperature [41].

In conclusion, oil temperature (OT), winding temperature (WT), total hydrocarbons (TH), H₂, C₂H₂, environment temperature (ET), load, and relative humidity (RH) are chosen as features to predict top oil temperature.

Fig. 2.

Procedure of training, forecasting and deployment based on Seq2Seq-CBAM.

3. Proposed hybrid forecasting system

In this study, a novel hybrid system based on Seq2Seq and CBAM mechanism is proposed for oil temperature forecasting in the reactor, as shown in Fig. 2, comprised of three parts: data collection, model training and online forecasting. The specific training, forecasting and deployment process are described as follows.

Step 1:

The historical datasets are collected by online monitoring sensors from the UHV substation, including the oil temperature sensor, winding temperature sensor, current sensor, gas analyzer, and environmental sensor. The various data is transmitted to a remote database for real-time storage.

Step 2:

After reading the data from the remote database, it first performs data pre-processing and then divides the data into the training set and test set and inputs it into the Seq2Seq-CBAM model to train the model.

Step 3:

The real-time data is directly input into the trained model, and the predicted value of oil temperature is output.

3.1. Data pre-processing method

(1) Data cleaning

The operating environment of reactors, transmission lines, and gas insulation systems are complex and diverse. Environment interference and limitations of measurement techniques result in original monitoring data containing abnormal data involving outliers and missing values. In the paper, the K-means algorithm is used for anomaly detection. K-means divides each variable into clusters: a cluster for abnormal data and a cluster for normal data. All abnormal data are removed. The missing data is filled by the result calculated by the multiple imputations by the chained equations (MICE) Algorithm.

(2) Data normalization

Different features of input data usually have different scales and units, which affects the results of data analysis. In order to eliminate the impact of the scale between features, data standardization is needed to be used in the forecasting module. Min-max normalization is adapted in the forecasting module and is the most common normalization method. The formula is as follows. $\begin{eqnarray}X^{^{\prime }}=\frac{X-X_{\text{min}}}{X_{\text{max}}-X_{\text{min}}}.\end{eqnarray}$ (1)

After normalization, the data is in the interval of [0,1], and the various features on different scales can also be compared in parallel. In the training module, the data after normalization can accelerate the gradient decline to the optimal solution speed and improve forecasting accuracy.

3.2. Seq2Seq model based on CBAM

Seq2Seq model belongs to one of the encoder-decoder structures, as shown in Fig. 3, including LSTM, hidden states, convolutional block attention mechanism, and attention values calculated by CBAM.

Fig. 3.

The structure of Seq2Seq.

The basic idea is to use one RNN as the encoder and the other RNN as the decoder. The encoder is responsible for compressing the input sequence into a vector of a specified length, which can be thought of as the semantics of the sequence, and the process is called encoding. The decoder section decodes based on the vector output of the encoder section. Traditional Seq2Seq uses RNN as encoder and decoder layers, which cannot handle long-distance dependencies and the problem of gradient disappearance. In this paper, LSTM is adapted to deal with the problems of RNN, as shown in Fig. 4, including four parts: Forget gate, Input gate, Output get, and Cell gate.

Fig. 4.

The structure of LSTM.

The internal formulas can be expressed as follows: $\begin{eqnarray}\displaystyle f_{t} & = & \displaystyle {\sigma}(W_{f}[h_{t-1},x_{t}]+b_{f})\end{eqnarray}$ (2) $\begin{eqnarray}\displaystyle s_{t} & = & \displaystyle {\sigma}(W_{i}[h_{t-1},x_{t}]+b_{i})\end{eqnarray}$ (3) $\begin{eqnarray}\displaystyle z_{t} & = & \displaystyle \tanh (W_{s}[h_{t-1},x_{t}]+b_{s})\end{eqnarray}$ (4) $\begin{eqnarray}\displaystyle b_{t} & = & \displaystyle {\sigma}(W_{o}[h_{t-1},x_{t}]+b_{o})\end{eqnarray}$ (5) $\begin{eqnarray}\displaystyle c_{t} & = & \displaystyle f_{t}\odot c_{t-1}+s_{t}\odot z_{t}\end{eqnarray}$ (6) $\begin{eqnarray}\displaystyle h_{t} & = & \displaystyle b_{t}\odot \tanh (c_{t})\end{eqnarray}$ (7) Where W_f, W_i, W_o, and W_S represent the weight matrix multiplied by the corresponding gate and b_f, b_i, b_o, b_S are the basis.

The main function of Seq2Seq is to use the memory ability of LSTM for time series to extract the time series features in the original data and obtain the reconstructed data. Since the reconstructed data is composed of time series features in the original data, the data distribution is smoother. The essence of the forecasting model is to analyze the law of data changes. The less fluctuating and more stable data can be easily learned by the forecasting model.

The attention mechanism is widely used in various realms, including image processing, speech recognition, and natural language processing. This mechanism imitates human vision, which can quickly and effectively capture the vital part of the data [42]. Attention helps artificial neural networks make more accurate judgments by assigning different weights to their input characteristics, highlighting more critical influence factors, and reducing the calculation and storage of models [43]. Traditional attention mechanisms are limited by representation power and cannot handle high-dimensional and irregular data about UHV shunt reactors. In this paper, a novel attention mechanism named the convolutional attention block mechanism is introduced to calculate the relationship between different dimensions and figure out in-depth information between different points to increase representation power and ultimately improve forecasting accuracy, which has been proved to be a more robust representation ability [44]. The structure of CBAM is shown in Fig. 5. It can be seen that the module has two sequential sub-modules: channel attention and spatial attention. The channel module focuses on ‘what’ is meaning in the field of computer vision, but in a time series, it assigns different values to each feature. Consequently, those features representing most of the information would be assigned relatively considerable attention, and the other features would be assigned little attention, which has a similar function to feature extraction. The important parts of the data would be magnified, and redundant information could be used to a certain extent.

Fig. 5.

The structure of convolutional block attention mechanism.

The channel module contains max-pooling and avg-pooling, generating two different channel context descriptors, $M_{\text{avg}}^{c}$ and $M_{\text{max}}^{c}$ , representing average-pooled features and max-pooled features, respectively. The two descriptors are then sent to a shared network of multi-layer perception (MLP) with one hidden layer to produce an attention map A_c = R^{C × 1 × 1}. In the shared network, the activation size in the hidden layer is set to R^{C∕C × 1 × 1} to reduce burden, where R is the reduction ratio. After MLP, the element-wise summation is used to merge the output. Channel attention module M_c is given by $\begin{eqnarray}M_{c}={\sigma}(W_{1}(W_{o}(M_{\text{avg}}^{C}))+W_{1}(W_{o}(M_{\text{max}}^{C})))\end{eqnarray}$ (8) where σ represents the sigmoid function, W_o ∈ R^{C∕r × C}, and W₁ ∈ R^{C∕C × r}. W_o, W₁ and the MLP weight are shared for both inputs.

For the spatial module, it focuses on ‘where’ the part of the information is in the realm of vision. However, in the time domain, it would dig into relationships of data in a different time, which is channel attention’s complement [45]. The mechanism is that the outline data or extreme values are assigned less attention, and the normal data is attached with greater attention. Through the module, abnormal data are ignored to some extent, and later models would focus more on normal data rather than being affected by outliers. In spatial attention, average-pooling and max-pooling are applied on channel-refined features F, resulting from a multiplying of M_c and input data M, and generate a feature descriptor. Behind the pooling module, a convolution layer is adapted to generate a spatial attention map M_s. Spatial attention module is given by $\begin{eqnarray}M_{S}=\tanh (f^{3\times 3}[\text{AvgPool}(F)\text{MaxPool}(F)])\end{eqnarray}$ (9) Where tanh denotes the tanh function and f^3 × 3 represents a convolution operation with the filter size of 3 × 3.

4. Experiments

4.1. Datasets

To demonstrate the performance of the proposed Seq2Seq-CBAM model, the historical monitoring data of the UHV reactor in three ultra-high-voltage transmission lines from province A to province B (line AB) and from province A to C (line AC) and from province B to C (line BC) are used. The UHV reactor is a single-phase oil-immersed reactor, and the rated capacity of the high voltage side is 1000 MVA, the rated line voltage is 1050 kV, and the rated current is 1649.6 A. The monitoring period for data is one day. The databases involve oil temperature and winding temperature, environment temperature, load, humidity, and gas concentration like total hydrocarbons, H₂, C₂H₂. The multi-dimension data spanning three years are divided into three parts: training set, test set, and validation set. The proportion of training and test set in the whole dataset is 8.5: 1.5, and the proportion of the train and validation set in the training set is 1:1.

In this study, the proposed hybrid forecasting module is compared with the Seq2Seq module based on Bahdanau attention [27], Seq2Seq, LSTM, bi-directional long short-term memory (BiLSTM), LSTM module based on Bahdanau attention, convolutional neural network (CNN), CNN based on Bahdanau attention, general regression neural network (GRNN), XGBoost, ensemble learning (EL), extreme learning machine (ELM) and gradient boosting decision tree (GBDT) to test the performance and computation efficiency of the proposed module in single and multiple step prediction.

All experiments are conducted on PC equipped with Intel^® i7 CPU, 72 G memory and NVIDIA^® GTX 1080 Ti GPU.

4.2. Hyperparameters

Hyperparameter directly affects the accuracy of the module. To improve the accuracy of forecasting, the control variable method is adopted to compare the forecasting accuracy of models with different learning rates, batch sizes, units in LSTM, the kernel size in CBAM, and the order of spatial attention and channel attention. This section uses MAPE (Mean Absolute Percentage Error) to evaluate the accuracy.

Set batch size, units in LSTM and kernel size to 10, 32 and 3, and test the optimal parameters of learning rate. Table 2 lists the proposed model’s forecasting accuracy with different learning rates. In line AB and line AC, the forecasting accuracy is the highest when the learning rate is 0.005. The accuracy is significantly low when the learning rate is more significant than 0.005, indicating that many neurons are over-fitting. The accuracy decreases slightly when the learning rate drops to 0.001, which means a weaker convergence. The proposed model gets the highest accuracy in line BC when the learning rate is 0.001. The proposed model also achieves high accuracy when the learning rate is 0.005 in line BC, and the final learning rate is 0.005.

Table 2
Accuracy of models with different learning rates

Lr Line AB Line AC Line BC

MAPE (%) MAPE (%) MAPE (%)

0.05 5.008 4.898 4.501

0.01 4.971 4.827 4.418

0.005 4.796 4.796 4.456

0.001 4.824 4.923 4.334

Lr	Line AB	Line AC	Line BC
0.05	5.008	4.898	4.501
0.01	4.971	4.827	4.418
0.005	4.796	4.796	4.456
0.001	4.824	4.923	4.334

Batch size also directly affect the performance of the module. Set learning rate, units in LSTM and kernel size to 0.005, 32 and 3, and test the optimal parameters of Batch sizes. Table 3 lists the forecasting accuracy of the proposed model with different batch sizes. The forecasting accuracy is the highest when the batch size is 10 in three databases. The accuracy is the smallest when the batch size is set to 1. There are too many neurons that cause the under-fitting. The accuracy is still lower when the batch size exceeds ten because the module falls into the local optimum. The final batch size is set as 10.

Table 3

Accuracy of models with different batch sizes

Batch size	Line AB	Line AC	Line BC
	MAPE (%)	MAPE (%)	MAPE (%)
1	7.912	7.647	8.269
5	5.009	5.892	4.599
10	4.793	4.796	4.385
20	4.891	5.699	4.502

Table 4

Accuracy of models with different units

Units	Line AB	Line AC	Line BC
	MAPE (%)	MAPE (%)	MAPE (%)
16	4.964	5.479	4.861
32	4.791	4.801	4.710
64	5.093	6.426	5.368
128	5.213	5.699	5.502

LSTM is used as the encoder and decoder. The number of units in LSTM affects the effect of encoding and decoding. Set learning rate, batch size and kernel size to 0.005, 5 and 3, and test the optimal parameters of units in the LSTM. Table 4 lists the accuracy of models with different units. In all three lines, the forecasting accuracy is highest when the units are 32, indicating that the encoder and decoder process the dataset most accurately.

In the spatial attention model, the convolution layer is applied, and the kernel size setting in the convolutional layer affects the accuracy of the attention values. Set learning rate, batch size, units in LSTM to 0.005, 10 and 32, and experiment with the kernel size parameter. Table 5 lists the accuracy of models with different kernel sizes. When the kernel size is set to 3, the model is most accurate on the three lines, indicating that the convolutional layer calculates the accurate attention value.

Table 5

Accuracy of models with different kernel size

Kernel size	Line AB	Line AC	Line BC
	MAPE (%)	MAPE (%)	MAPE (%)
2	5.841	5.883	5.450
3	4.770	4.886	4.429
4	5.235	5.292	5.388
5	5.213	5.199	5.502

In CBAM, the order of spatial attention (SA) and channel attention (CA) also affects the module’s performance. Based on the above parameter experiments, the learning rate, batch size, units in LSTM and kernel size are set to 0.005, 10, 32, 3, and the influence of the order of spatial attention and channel attention on prediction accuracy is tested. Table 6 lists the forecasting accuracy of the proposed model with different orders of SA and CA. The result shows that the proposed model gets high accuracy with the order of channel attention and spatial attention.

Table 6

Accuracy of models with different order of SA and CA

	Line AB	Line AC	Line BC
	MAPE (%)	MAPE (%)	MAPE (%)
CA + SA	4.793	4.796	4.385
SA + CA	5.009	6.371	5.510

4.3. Validation of CBAM

In order to verify the advantages of the convolutional block attention module, the two parts of CBAM are used separately to test its good representation power. Representation power means utilizing more typical data to represent the original data. In typical data, the vital information in raw ones would be magnified, and redundant information would be ignored to the same extent. In this part of the experiments, the two components of CBAM are tested separately to verify spatial and channel attention functions.

To test the representation power, the database about line AB after normalization is solely inputted to each module to avoid the effect of different scales in the different features. The input data dimension is three named C ∈ R^b×m×n, in which b, m and n represent the number of samples, the number of days in each sample, and the number of features in each day, respectively. In this experiment, b, m, n are 1036, 1, 8, respectively.

In the part of spatial attention, the output results are extracted to figure out the variance of each feature and compare with the normalized data, as shown in Fig. 6. It can be seen that the data variance after the spatial attention mechanism is much smaller, indicating that spatial attention comprehensively compares the situation of the data in each sample, and the outline samples or extreme values are assigned with less attention.

Fig. 6.

The variance comparison before and after spatial attention.

Fig. 7.

The datebase channel attention weight heatmap.

Similarly, in the part of the channel attention experiment, because databases’ channel attention data are too large, only 365 channel attention data in the year 2017 are extracted to show in Fig. 7. It is shown that the channel attention module mainly gives attention on winding temperature, environment temperature, and concentrations of H₂ and C₂H₂. Other features, like humidity, load, etc., are assigned smaller values. The results show that winding temperature, environment temperature, and the concentrations of H₂ and C₂H₂ are essential to the data. It is entirely reasonable because those temperature factors are crucial to affecting the operating state of the reactor, and C₂H₂ and H₂ are generated by excessive oil temperature and electrolysis [4], which also reflect the reactor operating state and are adopted to assess reactor condition [46]. It can be seen that channel attention compares the importance of each feature and gives different attention to each feature.

4.4. Performance in single and multi-step prediction

In the three UHV transmission lines, with the above-mentioned optimal parameters set, the proposed Seq2Seq-CBAM is trained on the total training set for one-step-ahead forecasting and is tested on the test set. The curves of the single-step prediction of the test set on three lines are drawn, as shown in Fig. 8. It can be seen that the Seq2Seq-CBAM model has a slight deviation from the actual values.

Fig. 8.

The curve of forecasting under the proposed model on the test set. (a) Forecast results of line AB. (b) Forecast results of line AC. (c) Forecast results of line BC.

To validate the proposed approach, comparisons are made with twelve other forecasting models, including Seq2Seq-Bahdanau attention, Seq2Seq, LSTM, BiLSTM, LSTM-Bahdanau attention, CNN, CNN-Bahdanau attention, GRNN, XGBoost, EL, ELM and GBDT, in which the EL model includes LinearRegression, RandomForestRegressor, and LinearSVR. Table 7 lists RMSE, MAE, and MAPE as the lowest in the proposed module. MAPE is more than 1% lower than that of other models. The indicators of RMSE and MAE are also 0.5 or so lower than those of other models, indicating that the proposed model’s forecasting precision and generalization performance outperform others.

Table 7

Comparison of errors of different models on the test set

Line	Models	RMSE (°C)	MAE (°C)	MAPE (%)
	Seq2Seq-CBAM	3.455	2.621	4.080
	Seq2Seq-Bahdanau	3.958	3.149	5.781
	Seq2Seq	4.550	3.968	5.669
	LSTM	3.929	3.025	4.859
	BiLSTM	3.950	3.021	4.882
Line AB	LSTM-Bahdanau	3.814	2.869	4.808
	CNN	5.312	4.441	6.765
	CNN-Bahdanau	4.904	3.420	6.114
	GRNN	3.628	2.733	5.000
	XGBoost	5.658	3.716	5.944
	EL	4.514	4.854	4.665
	ELM	4.208	3.273	5.266
	GBDT	3.726	2.767	5.055
	Seq2Seq-CBAM	3.651	2.619	4.730
	Seq2Seq-Bahdanau	3.989	2.294	5.360
	Seq2Seq	4.315	4.508	7.262
	LSTM	3.871	3.013	4.690
	BiLSTM	3.692	2.830	4.456
Line AC	LSTM-Bahdanau	4.151	2.919	5.018
	CNN	4.431	3.588	5.512
	CNN-Bahdanau	3.887	2.979	5.005
	GRNN	4.680	3.587	5.755
	XGBoost	4.538	3.212	5.144
	EL	4.431	3.088	4.947
	ELM	9.995	8.360	7.969
	GBDT	5.027	3.887	6.247
	Seq2Seq-CBAM	3.864	2.945	4.470
	Seq2Seq-Bahdanau	4.367	3.298	5.275
	Seq2Seq	4.897	4.316	6.070
	LSTM	3.789	2.943	4.606
	BiLSTM	3.692	2.837	4.487
	LSTM-Bahdanau	3.959	3.019	4.876
Line BC	CNN	4.776	3.914	5.978
	CNN-Bahdanau	4.759	3.860	6.154
	GRNN	4.374	3.400	5.297
	XGBoost	3.738	2.756	4.286
	EL	4.684	4.749	5.291
	ELM	4.442	4.233	5.665
	GBDT	4.204	3.226	5.041

To further prove the efficiency of the proposed method, a comparison is carried out between the proposed approach and the above models on the test set. From Table 8, the prediction accuracy decreases gradually as the prediction scale increases because the large prediction scale decreases the dependency on the time series. In the three transmission lines, with the increase of the prediction step, the prediction errors of the twelve comparison models gradually increase, and the errors of CNN, CNN-Bahdanau, GRNN, ELM, XGBoost, GBDT, and EL increase rapidly. Seq2Seq-Bahdanau, LSTM, BiLSTM, LSTM-Bahdanau, and Seq2Seq can keep the error stable to a certain point in multi-step prediction. The twelve comparison models are limited in the accuracy of multi-step prediction, which indicates that comparison models are difficult to use effectively for long-term information. The Seq2Seq-CBAM model performs significantly better than the other models and shows a relatively small loss of accuracy with the increase of prediction steps, indicating that CBAM amplifies useful information in the data and enables the model to learn long-term trends.

Table 8

Performance comparison of multi-step prediction on the test set

Model	Steps	Line AB			Line AC			Line BC
		RMSE (°C)	MAE (°C)	MAPE (%)	RMSE (°C)	MAE (°C)	MAPE (%)	RMSE (°C)	MAE (°C)	MAPE (%)
	Two	3.878	2.991	5.604	4.214	3.040	5.780	4.395	3.248	5.264
Seq2Seq-CBAM	Three	3.873	2.986	5.540	4.1145	3.069	5.578	4.615	3.449	5.550
	Four	3.981	2.484	5.811	4.391	3.480	6.409	4.597	3.473	5.659
	Five	4.024	3.120	5.888	4.458	3.412	6.597	4.817	3.589	5.409
	Two	4.225	3.185	6.151	4.628	3.351	5.951	4.628	3.526	5.774
Seq2Seq-Bahdanau	Three	4.216	3.332	6.157	4.486	3.454	6.257	4.905	3.744	5.998
	Four	4.609	3.701	7.232	4.881	3.765	6.656	4.748	3.291	6.094
	Five	4.662	3.835	7.725	4.752	3.719	7.181	5.028	3.827	7.412
	Two	5.451	4.851	6.145	4.714	3.541	7.315	8.481	6.854	9.481
Seq2Seq	Three	5.414	4.951	6.451	4.641	3.815	6.784	8.541	7.151	10.481
	Four	5.341	4.841	6.214	4.841	3.715	6.512	8.915	7.481	11.511
	Five	5.662	4.711	6.814	4.812	3.654	6.615	8.156	7.051	10.984
	Two	4.665	3.753	6.194	4.308	3.441	5.655	4.370	3.414	5.420
LSTM	Three	4.778	3.852	6.370	4.548	3.686	6.065	4.494	3.569	5.693
	Four	4.843	3.843	6.365	4.477	3.593	5.916	4.595	3.602	5.817
	Five	4.847	3.940	6.772	4.726	3.909	6.625	4.988	3.984	6.476
	Two	4.602	3.680	6.06	4.345	3.419	5.649	4.192	3.251	5.163
BiLSTM	Three	4.910	3.902	6.482	4.461	3.578	5.898	4.396	3.459	5.496
	Four	4.741	3.844	6.371	5.148	4.155	5.891	4.748	3.420	5.475
	Five	4.840	3.962	6.845	6.156	4.478	6.185	4.673	3.681	5.977
	Two	4.431	3.316	5.638	4.192	3.245	5.432	4.304	3.375	5.711
LSTM-Bahdanau	Three	4.656	3.550	6.111	4.518	3.468	5.921	4.557	3.603	5.784
	Four	4.659	3.803	6.577	6.719	3.689	6.282	4.626	3.618	5.905
	Five	5.399	4.289	7.745	6.849	4.105	6.825	4.970	3.992	6.548
	Two	4.430	3.453	5.799	4.781	3.748	6.212	4.249	3.353	5.323
CNN	Three	4.463	3.648	6.219	5.818	4.518	7.238	4.565	3.603	5.793
	Four	5.060	3.969	6.931	7.384	5.152	7.912	4.771	3.859	6.265
	Five	5.755	4.383	8.343	8.504	6.655	8.627	5.342	4.384	7.368
	Two	4.497	3.499	5.910	4.311	3.377	5.623	4.801	3.371	5.341
CNN-Bahdanau	Three	4.813	3.713	6.429	4.769	3.366	5.718	4.912	4.002	6.334
	Four	4.860	3.817	6.468	4.676	3.719	6.444	4.788	3.875	6.303
	Five	5.869	4.443	8.519	5.125	4.231	7.185	5.058	4.087	6.874
	Two	6.364	5.082	8.432	5.918	5.267	5.061	7.837	5.380	6.057
GRNN	Three	6.687	5.214	8.860	6.777	6.518	5.578	8.290	6.604	9.388
	Four	7.039	5.398	9.380	8.623	6.805	5.535	8.597	6.473	10.768
	Five	8.633	6.219	12.084	14.483	8.454	6.628	9.493	7.317	13.018
	Two	5.065	4.708	7.357	4.214	4.647	8.186	5.196	3.811	4.843

Table 8 (Continued).

Model	Steps	Line AB			Line AC			Line BC
		RMSE (°C)	MAE (°C)	MAPE (%)	RMSE (°C)	MAE (°C)	MAPE (%)	RMSE (°C)	MAE (°C)	MAPE (%)
XGBoost	Three	5.563	4.089	7.164	7.537	5.239	11.554	5.929	4.423	6.697
	Four	5.935	4.319	7.718	7.701	5.378	13.961	6.245	4.649	7.687
	Five	7.240	5.106	10.111	8.092	5.707	23.106	7.492	5.374	9.988
	Two	4.836	3.607	5.536	7.006	4.737	6.815	4.919	3.649	6.000
EL	Three	4.864	3.744	6.130	6.805	4.653	12.913	5.106	4.035	5.942
	Four	4.775	3.729	5.788	6.843	4.667	15.273	5.190	4.144	6.117
	Five	4.885	3.886	6.271	6.859	4.750	15.710	5.674	4.557	7.004
	Two	8.717	6.769	11.097	6.943	4.628	9.284	5.119	3.248	5.603
ELM	Three	5.389	4.191	6.455	6.860	4.731	12.709	5.061	4.032	6.010
	Four	5.607	4.392	6.850	7.039	4.843	13.559	5.429	4.269	6.967
	Five	8.388	6.117	11.216	6.659	4.551	8.786	5.542	4.535	6.993
	Two	4.822	3.531	5.581	6.660	4.499	6.378	5.176	3.878	5.699
GBDT	Three	3.873	2.987	6.346	6.514	4.524	7.268	5.536	4.036	6.837
	Four	5.659	4.114	7.148	6.593	4.456	7.817	6.023	4.852	7.253
	Five	7.641	5.086	10.275	6.356	4.364	9.642	7.875	5.297	10.070

4.5. Comparisons of computation time and convergence

The proposed model is applied to the online detection scenario. It is meaningful to examine the computation time. A CPU time experiment is performed on the proposed model and the comparison models.

The comparison of CPU time for forecasting is listed in Table 9. XGBoost, ELM, and GBDT are three basic algorithms that are easy to implement. Therefore, these models are faster than deep learning models. The EL takes about 2.1, 1.2, and 1.1 seconds for the three lines of forecasting tasks, respectively. EL combines three machine learning models and takes more time than other machine learning models. Seq2Seq-Bahdanau, Seq2Seq, LSTM, BiLSTM, LSTM-Bahdanau, CNN, CNN-Bahdanau, and GRNN spend more time in three lines forecasting. Seq2Seq-Bahdanau and Seq2Seq use LSTM as the encoder and decoder, and the model structure is more complex. Therefore, the models take more time than the single LSTM and BILSTM. The principle of CNN is to do convolution operations, and this model structure is more superficial than LSTM. CNN spends 9.5, 10.1, and 10.3 seconds on the three lines forecasting tasks, and CNN-Bahdanau spends more time than CNN. As for Seq2Seq-CBAM, spatial and channel attention models make data easier to analyze. Therefore, the CPU time of Seq2Seq-CBAM for three lines is 25.6, 24.5, and 24.8 seconds, respectively, which is not much more than the other models. In a real scenario, the top oil temperature is predicted once a day. The time consumption of the proposed model is acceptable.

Table 9
Comparison of CPU time for forecasting

Models Line AB Line AC Line BC

Seq2Seq-CBAM 25.6 24.5 24.8

Seq2Seq-Bahdanau 21.7 20.8 22.4

Seq2Seq 39.1 40.3 40.2

LSTM 19.5 19.7 19.5

BiLSTM 24.5 24.1 24.5

CPU time (sec) LSTM-Bahdanau 20.8 24.3 22.5

CNN 9.5 10.1 10.3

CNN-Bahdanau 11.3 11.5 11.1

GRNN 27.1 26.5 26.3

XGBoost 0.4 0.6 0.5

EL 2.1 1.2 1.1

ELM 1.2 1.5 1.2

GBDT 0.3 0.4 0.3

Models	Line AB	Line AC	Line BC
	Seq2Seq-CBAM	25.6	24.5	24.8
	Seq2Seq-Bahdanau	21.7	20.8	22.4
	Seq2Seq	39.1	40.3	40.2
	LSTM	19.5	19.7	19.5
	BiLSTM	24.5	24.1	24.5
CPU time (sec)	LSTM-Bahdanau	20.8	24.3	22.5
	CNN	9.5	10.1	10.3
	CNN-Bahdanau	11.3	11.5	11.1
	GRNN	27.1	26.5	26.3
	XGBoost	0.4	0.6	0.5
	EL	2.1	1.2	1.1
	ELM	1.2	1.5	1.2
	GBDT	0.3	0.4	0.3

To further verify the efficiency of the training model, the loss curve of the models was drawn to compare the convergence of the models. The proposed model and comparison models are trained with the optimal parameters and set MSE as the loss function. The curves of the loss of about 100 epochs on the line AB training set and the validation set are shown in Figs 9 and 10. From Fig. 9, the error of the proposed model tends to be stable after training 40 epochs, and the error value is minimal. Deep learning models, including Seq2Seq-Bahdanau, Seq2Seq, LSTM, BiLSTM, LSTM-Bahdanau, CNN, and CNN-Bahdanau, are stable after more epochs, which means that CBAM accelerates model convergence. The construction of other machine learning models is relatively simple and can reach convergence quickly, and the errors are relatively large.

Fig. 9.

The curve of the loss on line AB training set.

From Fig. 10, the convergence performance of the proposed model on the verification set is the best. The proposed model gradually becomes stable within 30 epochs, and the error oscillation is small in the training process. The loss of other models fluctuates wildly in the validation set before 40 epochs. The LSTM and BiLSTM training errors tend to be stable after 60 epochs, and the loss of GBDT increases after 20 epochs, indicating that the model is over-fitting.

Fig. 10.

The curve of the loss on line AB validation set.

5. Conclusion

In this paper, the Seq2Seq model based on the convolutional block attention mechanism is established to forecast the top oil temperature of the UHV reactor. The Seq2Seq framework is used to convert the original sequence to a new sequence and avoid volatility in the original data. The convolutional attention mechanism assigns attention in both spatial and channel dimensions. Instead of traditional feature extraction and data cleaning, the CBAM can effectively use information in data and reduce the impact of abnormal data. The experimental results show that the proposed Seq2Seq-CBAM method can accurately forecast the variation trend of the top oil temperature in the next one to five days. The proposed model’s average RMSE, MAE, and MAPE outperform the other models. The proposed approach improves forecasting accuracy and performs better than other methods. Moreover, the experimental results prove that the time spent in the proposed model forecasting adapts to online prediction scenarios, and the model can converge quickly. In the future, we will focus on two directions: The first is to study new attention mechanisms with a powerful representational ability. Since the proposed model is only theoretical, the other direction is to deploy advanced models in the UHV reactor operation scenario.

References

Habib

M.Z.

Wang

and Taylor

, Phase shift compensation method for the line differential protection on UHV-AC transmission lines, Journal of Engineering2018 (2018), 876–880.

and Shan

, The whole field temperature rise calculation of oil-immersed power transformer based on thermal network method, International Journal of Applied Electromagnetics and Mechanics70(1) (2022), 55–72.

Song

Dai

Luo

Sheng

and Jiang

, Power transformer operating state prediction method based on an LSTM network, Energies11 (2018), 914.

Tan

Chen

and He

, Prediction of UHV shunt reactor top oil temperature based on optimal segmentation and improved semi-physical model, Dianwang Jishu/Power System Technology45 (2021), 3314–3323.

Yuan

Tang

Wang

Yang

Qin

, Optimization design of oil-immersed iron core reactor based on the particle swarm algorithm and thermal network model, Mathematical Problems in Engineering2021 (2021), 6642620.

Zhao

Cheng

Gao

Zhang

and Wang

, A Wide Range Transient Current Sensor Based on GMR Effect for Smart Grid Applications, Chongqing University; Dielectrics and Electrical Insulation - Society of IEEE; Energy Internet Research Institute, Tsinghua University; Mississipi State University; Sichuan Energy Internet Research Institute, Tsinghua University; State Key Lab of Power Systems, Beijing, China, 2020.

Catterson

V.M.

Castellon

Pilgrim

J.A.

Saha

T.K.

Vakilian

, The impact of smart grid technology on dielectrics and electrical insulation, IEEE Transactions on Dielectrics and Electrical Insulation22(6) (2015), 3505–3512.

Zheng

Sun

Yang

and Chen

, Entropy-based bagging for fault prediction of transformers using oil-dissolved gas data, Energies4(8) (2011), 1138–1147.

Lin

C.-H.

Chen

J.-L.

and Huang

P.-Z.

, Dissolved gases forecast to enhance oil-immersed transformer fault diagnosis with grey prediction-clustering analysis, Expert Systems28(2) (2011), 123–137.

10.

Teng

Shan

and Qi

, Prediction of transformer full-domain temperature based on fluid network decoupling, International Journal of Applied Electromagnetics and Mechanics68(4) (2022), 461–481.

11.

Chamara Hewage

and Niles Perera

, Comparing statistical and machine learning methods for sales forecasting during the post-promotional period, in: 2021 IEEE International Conference on Industrial Engineering and Engineering Management, IEEM 2021, 2021, pp. 462–466.

12.

Fei

S.-w.

X.-j.

and Miao

Y.-b.

, A hybrid model of RVM and PSO for dissolved gases content forecasting in transformer oil, Recent Patents on Electrical and Electronic Engineering6(3) (2013), 183–189.

13.

Zeng

Guo

Zhang

Zhu

Xiao

Huang

, Prediction model for dissolved gas concentration in transformer oil based on modified grey wolf optimizer and LSSVM with grey relational analysis and empirical mode decomposition, Energies13(2) (2020), 422.

14.

and Jing

, Diagnosis and prediction of transformer faults of support vector machine optimized by particle swarm algorithm, International Journal of Mechatronics and Applied Mechanics2018(4) (2018), 50–55.

15.

Ding

Zhou

and Bi

, Feature selection based on hybridization of genetic algorithm and competitive swarm optimizer, Soft Computing24(15) (2020), 11663–11672.

16.

Dai

Song

Sheng

and Jiang

, LSTM networks for the trend prediction of gases dissolved in power transformer insulation oil, in: Proceedings of the IEEE International Conference on Properties and Applications of Dielectric Materials, Vol. 2018, IEEE, 2018, pp. 666–669..

17.

Dai

Song

Sheng

Jiang

Wang

and Chen

, Prediction method for power transformer running state based on LSTM network, Gaodianya Jishu/High Voltage Engineering44(4) (2018), 1099–1106.

18.

Song

Dai

Luo

Sheng

and Jiang

, Power transformer operating state prediction method based on an LSTM network, Energies11(4) (2018), 914.

19.

Wang

Zhang

and Wang

, A novel deep recurrent belief network model for trend prediction of transformer DGA data, IEEE Access7 (2019), 80069–80078.

20.

Liu

Zhao

Zhong

Zhao

and Zhang

, Prediction of the dissolved gas concentration in power transformer oil based on SARIMA model, Energy Reports8 (2022), 1360–1367.

21.

Lin

Sheng

Yan

Dai

and Jiang

, Prediction of dissolved gas concentrations in transformer oil based on the KPCA-FFOA-GRNN model, Energies11(1) (2018).

22.

Sun

Zhang

Zhong

and Cheng

, A power transformer fault diagnosis method-based hybrid improved seagull optimization algorithm and support vector machine, IEEE Access10 (2022), 17268–17286.

23.

Lin

Miao

Chen

Xiao

and Jiang

, Forecasting thermal parameters for ultra-high voltage transformers using long- and short-term time-series network with conditional mutual information, IET Electric Power Applications16(5) (2022), 548–564.

24.

Tian

Yang

and Ju

, Spatial correlation and temporal attention-based lstm for remaining useful life prediction of turbofan engine, Measurement Journal of the International Measurement Confederation214 (2023), 112816.

25.

Zang

Cheng

Ding

Liu

Wei

, Residential load forecasting based on LSTM fusing self-attention mechanism with pooling, Energy229 (2021), 120682.

26.

Zhang

Liu

Chen

and Dou

, D-P-transformer: A distilling and probsparse self-attention rockburst prediction method, Energies15(11) (2022), 3959.

27.

Sahrial Alam

M.D.

Sayedur Rahman

M.D.

Ikbal Hosen

M.D.

Anam Mubin

Hossen

and Mridha

M.F.

, Bahdanau attention based bengali image caption generation, in: 2022 International Conference on Decision Aid Sciences and Applications, DASA 2022, Chiangrai, Thailand, 2022, pp. 1073–1077..

28.

Gan

Liu

and He

, Prediction of air pollutant concentration based on luong attention mechanism Seq2Seq model, in: Proceedings - 2021 7th Annual International Conference on Network and Information Systems for Computers, ICNISC 2021, IEEE, 2021, pp. 321–325.

29.

Zhang

Song

and Li

, Dual-aspect self-attention based on transformer for remaining useful life prediction, IEEE Transactions on Instrumentation and Measurement71 (2022), 2505711.

30.

Qin

Zhang

Fan

Huang

, Multi-task short-term reactive and active load forecasting method based on attention-LSTM model, International Journal of Electrical Power and Energy Systems135 (2022), 107517.

31.

Woo

Park

Lee

J.-Y.

and Kweon

I.S.

, CBAM: Convolutional block attention module, in: 15th European Conference on Computer Vision, CHAM, Vol. 11211, 2018, pp. 3-19.

32.

Wang

Qiao

Liu

Wang

Liu

Yao

, Automated ECG classification using a non-local convolutional block attention module, Computer Methods and Programs in Biomedicine203 (2021), 106006.

33.

Aggarwal

Chandrasekaran

and Annamalai

, A complete empirical ensemble mode decomposition and support vector machine-based approach to predict Bitcoin prices, Journal of Behavioral and Experimental Finance27 (2020), 100335.

34.

Wang

Zhang

Wang

and Li

, Prediction method of transformer top oil temperature based on VMD and GRU neural network, in: 7th IEEE International Conference on High Voltage Engineering and Application, ICHVE 2020 - Proceedings, IEEE, 2020.

35.

Ruan

Wang

Meng

and Qian

, A hybrid model for power consumption forecasting using VMD-based the long short-term memory neural network, Frontiers in Energy Research9 (2022), 772508.

36.

Byambadorj

Nishimura

Ayush

and Kitaoka

, Normalization of transliterated mongolian words using Seq2Seq model with limited data, ACM Transactions on Asian and Low-Resource Language Information Processing20(6) (2021), 103.

37.

Suechoey

Tadsuan

Thammarat

and Leelajindakrairerk

, An analysis of temperature and pressure on loading oil-immersed distribution transformer, in: 7th International Power Engineering Conference, IPEC2005, Vol. 2005, 2005.

38.

Arifianto

Josue

Saers

Rosenlind

Hilber

, Investigation of transformer top-oil temperature considering external factors, in: Proceedings of 2012 IEEE International Conference on Condition Monitoring and Diagnosis, CMD 2012, IEEE, 2012, pp. 198–201.

39.

Chen

W.-G.

X.-P.

Zhou

Pan

and Xie

, An improved dynamic model of transformer hot spot temperature based on top oil temperature, Chongqing Daxue Xuebao/Journal of Chongqing University35 (2012), 69–75.

40.

Kherif

Benmahamed

Teguar

Boubakeur

and Ghoneim

S.S.M.

, Accuracy improvement of power transformer faults diagnostic using KNN classifier with decision tree principle, IEEE Access9 (2021), 81693–81701.

41.

Tan

Chen

and He

, Oil temperature forecasting of uhv shunt reactor based on k-means clustering method and similar period, Dianli Zidonghua Shebei/Electric Power Automation Equipment41 (2021), 213–219.

42.

Xiaoqing

and Yuxuan

, Short-time multi-energy load forecasting method based on CNN-Seq2Seq model with attention mechanism, Machine Learning with Applications5 (2021), 100438.

43.

Peng

Wang

and Yin

, Short-term load forecasting model based on attention-LSTM in electricity market, Dianwang Jishu/Power System Technology43 (2019), 1745–1751.

44.

Chen

Zhang

Liu

Tan

Liu

and Chen

, Spatiotemporal convolutional neural network with convolutional block attention module for micro-expression recognition, Information (Switzerland)11 (2020), 1–19.

45.

Chen

Wang

Liu

Cheng

and Qin

, Spatial attention based convolutional transformer for remaining useful life prediction, Measurement Science and Technology33 (2022), 114001.

46.

Chen

Zhao

Peng

Liu

and Zhou

, Analysis of dissolved gas in transformer oil based on laser raman spectroscopy, Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering34 (2014), 2485–2492.

Forecasting top oil temperature for UHV reactor using Seq2Seq model with convolutional block attention mechanism

Abstract

Keywords

1. Introduction

2. Structure of reactor and operation parameter

3.1. Data pre-processing method

4.1. Datasets

4.2. Hyperparameters

Table 2 Accuracy of models with different learning rates Lr Line AB Line AC Line BC MAPE (%) MAPE (%) MAPE (%) 0.05 5.008 4.898 4.501 0.01 4.971 4.827 4.418 0.005 4.796 4.796 4.456 0.001 4.824 4.923 4.334

References

Table 2
Accuracy of models with different learning rates

Lr Line AB Line AC Line BC

MAPE (%) MAPE (%) MAPE (%)

0.05 5.008 4.898 4.501

0.01 4.971 4.827 4.418

0.005 4.796 4.796 4.456

0.001 4.824 4.923 4.334