Abstract
Aluminum electrolytic capacitor (AEC) is one of the most pivotal components that affect the reliability of power electronic systems. The electrolyte evaporation and dielectric degradation are the two main reasons for the parametric degradation of AEC. Remaining useful life (RUL) prediction for AEC is beneficial for obtaining the health state in advance and making reasonable maintenance strategies before the system suffers shutdown malfunction, which can increase the reliability and safety. In this paper, a hybrid machine learning (ML) model with GRU and PSO-SVR is proposed to realize the RUL prediction of AEC. The GRU is used for the recursive multi-step prediction of AEC to model the times series of AEC, SVR optimized by PSO for hyper-parameters is applied for error compensation caused by recursive GRU. Finally, the proposed model is validated by two kinds of data sets with accelerated degradation experiments. Compared with the other methods, the results show that the proposed scheme can obtain greater prediction performance index of RUL under different prediction time points, which can support the technology of health management for power electronic system.
Introduction
Aiming at saving fossil fuels, reducing greenhouse gas and harmful gas emissions to build a lower-carbon and environmentally friendly society, new energy vehicles, photovoltaic power generation, wind power generation, micro grids and public rail transit have developed rapidly [1–3]. Power converters play an important role in these fields. The health status of the power converter directly affects the safety and reliability of the entire system [4]. In order to ensure the long-term reliable and safe operation of the system, the preventive maintenance of power converters and their devices needs to be transformed into active health management. Fault prognostics and RUL prediction in the fields of mechanics and power electronics have been widely researched. In the research process of power converter, the cause of its failure is closely related to the health status of each device in the circuit.
As the premise of the circuit level research, it is necessary to perform fault prediction analysis at the device level first. According to relevant research in switching power supplies [5], the failure rate of AECs accounts for 60% of the total number of failure. The failure distribution rate of each component is shown in Fig. 1. From that, failure rate of AEC is the highest and its health status will have a direct impact on the circuit where the AEC is located [6].

Failure rates of components in power electronic system [5].
Referring to past literatures on failure mechanism analysis of AEC, the failure of AEC is related to various factors [7–12]. National Aeronautics and Space Administration (NASA) Ames Research Center conducted accelerated degradation tests on AECs using over-electrical stress and thermal stress, respectively [8]. Chen [9] and Bhargava [10] analyzed the failure mechanism of capacitors. Combined with the above researches, the failure mechanism of AEC is organized into a fishbone diagram as shown in Fig. 2. Especially in actual working conditions, its degradation may be caused by the combined influence of many factors: environmental factors, electrical factors, physical factors and chemical factors. The degradation of AEC will inevitably lead to the decrease of C and the increase of equivalent series resistance (ESR). The changes of C and ESR can intuitively reflect the health status of AEC. Moreover, the failure criterion of AEC is that C decreases to 80% and ESR increases to 2.8∼3 times of the initial value at 25°C [11, 12].

Fishbone diagram of AEC [9].
The evaporation of electrolyte will increase the ESR of the AEC and reduce the contact area between electrolyte and oxide layer. The vicious circle accelerates the degradation of AEC. In the case of electronic systems with AEC, degradation of AEC performance beyond the normal operating range may cause other components to experience increasing electrical stress, accelerating the degradation of the other devices. The degradation of the device level in the power electronic system will inevitably lead to the degradation of the circuit level, reducing the service life and safety and reliability of the entire power electronic system. Thus, it is necessary to predict the RUL of AEC by effective and reasonable methods.
The RUL prediction for AEC is mainly divided into two methods: based on filter and data-driven [13–18]. Filter-based methods generally require an empirical model to be established first. During the degradation process of AEC, its degradation characteristics are easily disturbed by temperature and voltage, which increases the difficulty of establishing an empirical model. With the rapid development of artificial intelligence in recent years, data-driven methods have been widely used in the establishment of models for AEC remaining life prediction. Delanyo achieved the remaining life prediction of aluminum electrolytic capacitors by using Bi-LSTM [15]. The network can better capture the degradation trend of AEC and improve the prediction accuracy compared with traditional LSTM. Mesquita studied three kinds of neural networks, MLP, RBF and ELM to predict capacitance [16]. Wang used the ARIMA-Bi-LSTM hybrid model, which combined ARIMA’s ability to extract nonlinear features from linear Bi-LSTM to achieve AEC prediction from an early stage [17]. Wang performed fault prediction for AEC in Switch Mode Power Supplies (SMPS) by using Chained-SVR and 1D-CNN [18]. At the same time, the RUL prediction of AEC based on the data-driven method is conducive to generalization to the RUL prediction of other components and finally realizes the fault prediction of the entire circuit of power electronic converter.
Furthermore, the data driven model is widely used in the prediction analysis of other fields. In the study of Wang [19], a model based on bidirectional recurrent neural network and multiscale convolutional units is proposed to realize the prediction of residential building electricity consumption. In the fault prediction study of MOSFET, after analyzing the advantages of different prediction based on filtering algorithm or data driven, an artificial network is used for prediction [20]. A method based on Transformer model was used by Zhou for RUL prediction of bearings. Its advantages are confirmed in the data set PHM2012 [21]. A deep improved deep convolution neural network is used by Guo to predict the RUL of rolling bearings [22]. The above references have confirmed that with the rapid development of artificial intelligence. The algorithm based on machine learning makes the established model more applicable with higher prediction accuracy.
Unfortunately, cumulative errors are generated inevitably when recursive algorithms used in the prediction process for time series. To solve this problem and improve prediction accuracy, numerous literatures have shown that the prediction accuracy of the model can be improved at the same time by means of error compensation [23–26]. Among the error compensation algorithms, SVR is used relatively frequently [24–26]. The reasons are that SVR is easier to implement, has high regression fitting accuracy and quicker speed than some other ML methods.
In response to the above discussions, it is concluded that the RUL prediction of AEC has the following difficulties: During the data-driven method, GRU and LSTM can deeply mine the information of time series and build the prediction model. However, the cumulative error will be generated in the process of recursive prediction, which will reduce the prediction accuracy. Before the prediction based on time series, the experiment data need to be preprocessed based on sliding window. Generally, the suitable size of sliding window can only be obtained through repeated debugging. In the process of degradation, the sudden loss will seriously affect the prediction results.
In view of the above problems to be solved urgently, GRU hybrid with SVR is proposed to establish the RUL prediction model for two data sets respectively. In the model, the multi-step prediction of AEC is established by recursive GRU. For the unavoidable accumulated error, the SVR based on the RBF kernel function is used for the error compensation caused by recursive GRU. Meanwhile, PSO is used to optimize the hyper-parameters of SVR to obtain better prediction results. The content of the method proposed in this paper is as follows: A prediction model of AEC RUL based on GRU-PSO-SVR is proposed. The variance and Euclidean distance of the time series is used to determine the size of the sliding window, which overcomes the problem that the size of the sliding window is difficult to be determined in the process of time series prediction.
The error compensation set is reconstructed by using the degradation time points (tp) and slopes, which improves the sensitivity of SVR to time series fitting and avoids the superposition of lag orders.
In order to realize the RUL prediction of AEC, accelerated degradation experiments are required to be established to obtain the degradation data of AEC [27]. The degradation data of AEC in this paper comes from the Data Set by the test rig of NASA and our research team (called the data set of Set B). Firstly, the research objects in the two data sets are introduced.
Data set from NASA
The Data Set from NASA data repository is shown in Fig. 3, where the abscissa represents the time of AEC degradation, and the ordinate represents the capacitance decay rate. The data set contains the degradation process of 6 groups of AECs with a nominal capacitance of 2200μF under 10 V overstress. The data preprocessing method for this data set is linear interpolation at equal intervals and the interpolation interval is 12 hours.

The degradation rate of C from NASA [8].
Set B is from the experiment of multiple stresses established of 10 groups of AECs with a specification of 470μF/50 V. In this experiment, the degradation experiment of AECs with thermal and electrical stress is realized by thermal chamber and DC power supply. The AECs are placed on a parallel fixture, the fixture is placed in a thermal chamber at 105°C and the output of the DC power supply is 50 V.
The specific experiment method in the process of degradation is shown as follows.
After the AECs was placed on the parallel fixture and placed in the thermal chamber for the first time, the parallel fixture was taken out of it every 24 hours. So that the AECs can be fully discharged and placed at room temperature to cool for 45-60 minutes. Then the capacitance data can be collected.
Degradation data collection and preprocessing follows the sequences: Set the frequency to 10 kHz and perform short-circuit and open-circuit correction of the LCR meter. The LCR meter is used to measure the Cs and ESRs of each AEC. After all the AECs have been tested, the AECs are re-installed to continue the high temperature and DC bias degradation experiment. Repeat the above steps until the end of the experiment. Arrange the collected data, median interpolation is used to impute a small number of missing values and calculate the degradation rate by formula (1):
In (1), DR ij represents the decay rate of the ith group of AECs at the jth time point, Cij represents the capacitance value of the ith group of AECs at the jth time point, and Ci1 represents the initial capacitance value of the ith group of AECs. The Set B is shown in Fig. 4.

The degradation rate of C of Set B.
Generally, the degradation of AEC is relatively smooth and stable. However, for the reasons of manufacturing technology such as vaporization, heat treatment and the degradation of dielectric [28], it could prone to sudden loss. Sudden loss can cause a rapid increase in the capacitance degradation rate, which shorten the life.
Determination of sliding window size
In methods of data-driven based, historical values are generally used to predict future values. Therefore, before establishing a prediction model, it is necessary to process the time series in a sliding window manner to divide the training and prediction set. The size of the sliding window plays an important role in the prediction results: if the time window is too long, it will lead to problems such as large amount of calculation, long training time and large lag order. If the window is too short, data information may be lost [29–33]. Considering that the variance can better reflect the fluctuation of data in a single time window, the Euclidean distance can better reflect the degree of change between two time windows. In this paper, the Euclidean distance and variance are combined to calculate the appropriate window size by (2).
In (2), Lp +1 represents the size of sliding window. Lmin and Lmax represents the minimum and maximum size respectively. ∥ΔCp+ 1 ∥ represents the Euclidean distance between Cp and Cp +1. ΔC0 represents the mean of ∥ΔC∥. |ΔRp + 1| represents the variance between Rp and Rp +1. ΔR0 represents the mean of variance. Finally, after the calculation of the sliding window, the time series of AEC degradation can be expressed as:
In (3), the last column of each window is used as the predicted value of the historical value in the time window and the predicted value of the ith group of time windows is recorded as P i and the historical value is X t . In this paper, the sizes of window in the prediction model for data set of NASA and B are 4 and 5, respectively.
The structure of GRU is similar to that of LSTM [33–36], which can fully mine the information contained in the time series and avoid the problem of gradient disappearance in the long-term memory process. Compared with LSTM, the structure of GRU only has reset gate rt and zt. When Xt is input to the GRU, the weight of the current information extraction is calculated by combining the h
t
- 1 output at the previous moment:
Compared with deep learning, machine learning has no complex neural network structure and faster operation speed. SVR is often used for small sample time series prediction and nonlinear fitting in machine learning [37]. It maps the sample set to a high-dimensional space by means of nonlinear mapping. Thereby, a certain hyperplane is found, so that the deviation of all training samples from this hyperplane is the smallest. The basic principle of SVR is as follows [38–40].
The training data X
t
={x
t
, p
t
} obtained by sliding window, where t represents the tth sample vector, x
t
={x
t
1, x
t
2, . . . , x
tj
} is the input vector of the low-dimensional space, j represents the number of samples of the input vector. p
i
is the output vector, that is, the vector of the predicted value corresponding to the set of input vectors. The SVR model can be expressed as:
In the Equation (9), W
T
and b are the independent coefficients and intercepts of the estimation function, respectively, φ(x) is a nonlinear mapping function. According to the principle of structural risk minimization, f(x) can be equivalent to solving the optimization problem as shown in the Equation (10), where L is the loss function, ɛ is the insensitive loss factor.
At the same time, related researches showed that among the kernel functions introduced by SVR, the Gaussian kernel function has the advantages of good generalization and high stability. Therefore, the RBF is used as the kernel function of SVR and its formula is:
In the Equations (12), C and ϒ s have an important and direct impact on the SVR fitting accuracy and training time. Therefore, reasonable selection of parameter coefficients is beneficial to improve the accuracy of SVR fitting, improving the prediction accuracy and reducing prediction errors.
In this paper, the PSO is used to optimize the parameters of these two parameters. The PSO was proposed in 1995. It is widely used in hyper-parameter optimization in machine learning and deep learning because of its simple structure, easy implementation and good searching effect. Two important properties in PSO are particle velocity and position. Taking the SVR fitting error MSE as the objective function of PSO, the purpose of PSO is to obtain the best values of C and ϒ
s
when the minimum MSE is. That is, the value ranges of C and ϒ
s
are set in advance. Within this specific value range, each particle performs parameter optimization in its own region. By comparing the local optimal solution P
l
searched by each particle, the global optimal solution P
g
in this region is finally determined [39, 40]. The update algorithm for particle position and velocity is:
Where V ij and X ij represent the velocity and position of particle i in the kth iteration and the jth dimension, respectively. ω is the weight factor, c1 and c2 are the acceleration factors and r1 and r2 are random numbers between (0,1).
The principles of the GRU and SVR models are described above. As mentioned above, GRU and SVR, as data-driven methods commonly used in prediction models, have their own advantages and characteristics. However, they also have their own limitations. Therefore, this paper hybrids GRU and SVR to construct the RUL prediction model of AEC. Its flow chart is shown in Fig. 5. The flow mainly consists of three parts. The first is the construction of accelerated degradation experiments and data acquisition. In addition to using the NASA data set, this experiment also set up an AEC accelerated degradation experiment under the environment of 105°C and 50 V. The specific experiment process, operation details and data acquisition are discussed in Section 2.

Flow chart of RUL prediction model based on GRU-PSO-SVR.
The second part is the overall flow chart to realize the RUL estimation of AEC, which fully hybrids the deep mining ability of GRU for time series information, the fitting ability of SVR to small samples and the optimization ability of PSO to hyper-parameters. When performing error compensation, firstly, 10 sets of data need to be divided into training set and error compensation set according to a certain percentage. Then, the error compensation set is reconstructed according to the fG prediction result, which is used for the training of SVR. The error compensation set of the tenth group is the test set for SVR to perform error compensation.
The last part shows the results of prediction and the changes after the degradation.
The model implementation process is as follows: The sliding window expands the one-dimensional data and divides the training set, error compensation set and test set. Normalize the data, the normalization formula is shown in (16):
GRU trains the data to complete the model fG. Reconstruct the error compensation set S
r
according to the data obtained by GRU recursive multi-step prediction, the degradation time of each predicted value and the slope. t
t
represents the degradation time, PG represents the prediction value of recursive GRU, ΔPG represents the prediction value and life and ki is the slope of ith prediction value. PSO optimizes the hyper-parameters of SVR and PSO-SVR completes the training to obtain the prediction model P
i
.
Substitute the 10th set of data to obtain the predicted value of the 10th set of AEC and calculate the prediction error and RUL.
In Section 1, the failure mechanism and characteristic parameters of AECs have been introduced. The capacitance C is used to build the data set. Then, if the degradation rate of the capacitor exceeds 20%, the predicted time point from the predicted point to the threshold is determined as the RUL of the AEC. The formula for the relative error (RE) of the RUL between the predicted and true lifetime value is:
In (17), RARUL represents the RE of RUL. PRUL and ARUL represents the predicted and true lifetime value.
The environment in which the program runs: Win10 x64, i5-10400F CPU @ 2.90 GHz, GeForce GTX 1050 Ti, MATLAB2021a.
In addition, Root Mean Square Error (RMSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) are used to evaluate the prediction effect. The formulas are as follows:
Where, the smaller RMSE, MAE and MAPE of prediction results are, the higher the prediction accuracy is. Before the experiment, the basic parameters of the model are defined firstly: the number of neurons in GRU is 80, the number of iterations is 1000 and the learning rate is 0.005.
The starting tp of prediction are: 96, 120, 144 and 168 h, respectively. In the process of experiments, BP and GRU are used as the comparative prediction algorithms in the paper and the prediction methods of GRU and BP are direct multi-step prediction. The prediction results are shown in Fig. 6.

Prediction results of GRU-PSO-SVR from different prediction time.
Farther the prediction point is to the starting point, the prediction effect of recursive GRU is worse and the accumulated error is larger. Theoretically, if a single model is used for prediction, the closer the prediction point is to the end of lifetime, the better the prediction effect of the model. The reason is that based on data-driven methods, more data are obtained for fitting prediction. However, for composite models, both GRU and SVR are based on data-driven methods. When the training set is larger, the training set of SVR will be reduced, which indirectly leads to a slight decrease in the prediction accuracy. GRU is used to deeply mine and analyze the time series of the data set and SVR is used to effectively overcome the cumulative error caused by recursive multi-step prediction in the hybrid model of GRU and SVR. From Table 1, the prediction effect of this ensemble model is better than other models.
Comparing model performance from different prediction time
In the data-driven method, when the starting point is early, it means that there is less training data, which is not conducive to the training and establishment of the model. Therefore, the farther the prediction starting point is from the lifetime, the worse the prediction effect is. Bi-LSTM and GRU are two algorithms common used for time series prediction. However, if only a single model is used for direct multi-step prediction, the prediction results are less stable and prone to under-fitting. If single-step recursion is used to achieve multi-step prediction, the cumulative error is likely to be too large. Therefore, SVR is used to perform error compensation on the recursive GRU and the problem of uneven distribution of the training set is made up.
The prediction errors of the specific experiment results are shown in Table 2.
Comparison of evaluation indicators of different models
It can be seen from the calculation that the true RUL value of C10 in the data set is 424.003 h. During the experiment, the starting prediction tp are from 120, 168, 216, 264, 312 and 360 h. The experiment results are shown in Figs. 7 (a)–(f) and 8.

The prediction results of the Set B from different prediction time.

Comparison of evaluation indicators from different prediction time.
As shown in Fig. 7(f), curves with different colors represent the prediction results of the model from different tp. The ordinate represents the degradation rate of the prediction results of the model proposed. Considering that the RUL results are obtained by the intersection of the last two prediction values with the threshold. In order to observe the comparison of prediction results from different prediction starting points more intuitively. In Fig. 7, the color in (g) corresponds to the same color in (f). The black ones are the true value and true RUL of C10. Furthermore, the dotted lines represent the tp of prediction RUL results from different tp.
When the starting point of prediction is farther from the end of life, the disadvantage of direct multi-step prediction is more obvious and the error of the prediction result is larger. The GRU-PSO-SVR hybrid model achieves better prediction results and the experiment results show that its average RARUL is 1.450%.
In this paper, an accelerated degradation experiment of AEC based on thermal and electrical stress degradation is established. Then, the size of the sliding window is determined by using a combination of Euclidean distance and variance. Moreover, the recursive multi-step prediction model based on GRU and the error compensation of SVR based on RBF kernel function is proposed. PSO is introduced to optimize the SVR hyper-parameters. The model is experimentally validated on two different AEC degradation data sets. After comparing with traditional and existing methods based on error metrics such as RARUL, RMSE, MAPE and MAE. Experiment results demonstrate that the GRU-PSO-SVR based method outperforms some traditional data-driven methods in the prediction of the RUL for AEC. This model improves the prediction accuracy of RUL and the generalization ability of the prediction model, which laying a foundation for circuit-level failure prediction and RUL prediction of power electronic converters.
Footnotes
Acknowledgment
This work was supported by the National Natural Science Foundation of China (61901212), the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (20KJA510007), the Opening Project of Advanced Industrial Technology Research Institute of Nanjing Institute of Technology (XJY202105), the Scientific Research Foundation for the High-Level Personnel of Nanjing Institute of Technology (TB202217001).
