A Bayesian deep learning framework for reliable fault diagnosis in wind turbine gearboxes under various operating conditions

Abstract

Vibration-based fault diagnostics combined with deep learning approaches has promising applications in detecting and diagnosing faults in wind turbine gearboxes. Specifically when time series vibration data is transformed to a 2-dimensional cyclic spectral coherence maps, the accuracy of deep neural networks in classifying faults increases. Nevertheless, standard deep learning techniques are vulnerable to inaccurate predictions when tested with new data originating from unseen faults or unusual operating conditions. To address some of these shortcomings in the context of wind turbine gearboxes, this paper investigates fault diagnostics using Bayesian convolutional neural network which provide accurate results with uncertainty bounds reducing wrong overconfident classifications. The performance of Bayesian and standard neural networks is compared using a simulation-based dataset of acceleration signals generated from a multibody dynamic model of a 5 MW wind turbine. The framework proposed in this paper has relevance to fault detection and diagnosis in other rotating machinery applications.

Keywords

Wind turbines condition monitoring machine learning cyclostationary Bayesian inference

Introduction

Among different renewable energy sources, wind energy has grown rapidly in recent years in response to the increase of the global demand for energy and as a replacement of fossil fuels to mitigate the effects of climate change. In 2018, wind energy provided 15% of Europe’s electricity demands (Wind Europe, 2022), and it is expected to supply 20% of the US electricity needs by 2030 (Lu et al., 2009). Wind turbines are typically located in remote regions and are subjected to severe weather conditions, with continually variable loads, wind speeds, and energy demands. As a result, they are prone to failures which results in increasing operation and maintenance costs. According to Verbruggen (2003), operation and maintenance costs account for to 15% of overall cost of energy, with this figure rising to 20%–35% for offshore wind farms. In an effort to lower these costs, more research is done, on the one hand, to improve turbines’ design, manufacturing and materials, and, on the other hand, to improve operational reliability by developing novel fault diagnostic techniques to identify faults early before they worsen and cause substantial damages.

The majority of unexpected downtime and maintenance costs are due to gearbox and generator failures, which account for over 95% of all major replacements in wind turbines (Carroll et al., 2016). Planetary gearboxes are the most commonly used in wind turbines, which, in comparison to fixed shaft gearboxes, offers a high transmission ratio in a more compact package for applications requiring higher output. The trade-off is that they have a higher tendency to fail as a result of their increased workload and more demanding working environment with changing loads and speeds (Feng and Zuo, 2012). According to the National Renewable Energy Lab (NREL), bearings cause 76% of wind turbine gearbox failures, with gear faults being the second most prevalent cause of failure, 17% (Sheng, 2017). Manufacturing and installation issues, design and material defects, misalignment, torque loads, wear and fatigue are all causes of bearing and gear failures.

Examples of gear damage include tooth cracks, abrasion, corrosion, fracture, pitting and scuffing (Qiao and Lu, 2015). According to Yang et al. (2012), if defects are not diagnosed and fixed early, further gearbox deterioration might necessitate replacement of the entire gearbox, which could cost up to $628,000 for a $5 MW$ wind turbine and $314,000 for the generator. Therefore, it is essential to conduct online fault diagnostics in real time while wind turbines are operating in the field in order to detect emerging defects in gears and bearings in a timely manner and reduce unscheduled downtime.

Vibration-based fault diagnostics for wind turbine gearboxes is the most extensively used approach because of its sensitive and reliable damage detection. The gearbox consists of multiple rotating shafts, sun-planetary gears and bearings. Faults in the gears and bearings manifest themselves in the vibrations, measured through accelerometers, at specific frequencies related to the rotating and gear mesh frequencies (Qiao and Lu, 2015). Several algorithms and signal processing techniques, such as frequency analysis, time synchronous averaging and envelope analysis are used to de-noise the data and extract meaningful information from vibration signals (Sheng, 2012). Each of these methods has its limitations, such as being restricted to cases when the shaft rotational speed is constant. More advanced methods of analyzing vibration data, such as cyclostationary analysis and cyclic spectral coherence (Mauricio et al., 2019) or kurtograms (Antoni, 2007) have been utilized to identify unique fault signatures, that had previously been difficult to diagnose. However, under realistic operating conditions with variable speeds, signal processing techniques alone are insufficient to enable reliable fault diagnostics. Therefore, there has been an increase in the employment of deep learning-based fault diagnosis frameworks, such as convolutional neural networks (CNN), which can automatically and effectively extract the characteristic features related to faults and then use those features to perform detection and classification.

Based on multi-body dynamic modeling and simulations, our work in Amin et al. (2022, 2023b) proposed a deep learning framework that successfully identifies and detects faults with small magnitude located on different gears of the gearbox. Simulations were done under realistic wind conditions, such as turbulent wind, and by including variation in wind speed and loads. Figure 1 describes the flow chart of this simulation-based study using a SIMPACK $5 MW$ wind turbine baseline model (Zhang et al., 2016). Time series accelerometer data from simulations is preprocessed with synchronous sampling and represented with bivariate cyclic spectral coherence images. These images are used to train convolutional neural networks to diagnose gear faults.

Figure 1.

An overview of the fault identification framework: Wind turbine loads are applied on the nacelle/drivetrain model and time series accelerometer data is acquired. Synchronous sampling is conducted on acceleration signals before generating cyclostationary maps (images) which are then passed to a CNN for fault diagnosis.

While the results from the CNN models were highly accurate in the aforementioned case study, there are still challenges to be addressed in situations in which this CNN-based framework leads to incorrect classification. One of those challenges is that standard neural networks can be overconfident and produce unreliable diagnostic conclusions when presented with unknown data without offering any warnings. This is due to the frequentist approach that standard neural networks use during training, which limits the learnt model’s parameters or filters to point estimate values, resulting in deterministic outputs for any given input (Shridhar et al., 2019). To address this challenge, we use a Bayesian neural network in this study that considers not only the accuracy, but also the level of confidence or uncertainty in any diagnostic result. With this Bayesian framework, each learnt parameter or weight is represented by a probability distribution as opposed to a single value, which allows for the evaluation of certainty of each classification result.

Multi-body dynamic modeling and simulations

Flexible multibody dynamics is a field that deals with using computers to model and analyze the behavior of bodies that are both deformable and constrained, and which undergo significant displacement and rotation. In a flexible multibody system, there may be a combination of rigid and elastic components, connected by joints or force elements such as springs, dampers, and actuators. Because of the constraints imposed by the joints, the movements of the bodies within the system are not entirely independent. The nonlinear nature of the governing dynamic equations of motion and the high dimensionality of flexible multibody systems necessitate the use of computer software that employs numerical algorithms to solve the equations of motion and analyze the system’s behavior (Shabana, 1997; Simeon, 2013).

SIMPACK is a dynamic analysis tool that uses multibody simulation to examine the response of drive systems. It takes into account the behavior of rotating machinery such as gears, bearings and more, by modeling them with the use of springs and dampers. For wind turbines, SIMPACK incorporates the structural flexibility of the drivetrain components like the bed-plate, gearbox housing, and shafts, by utilizing the Craig-Bampton finite element method (Guo and Keller, 2018). An onshore 5 MW three-blade turbine model is represented by the SIMPACK multi-body model in Figure 2. Drivetrain components are housed inside the nacelle, which is mounted to the top of the tower.

Figure 2.

Full onshore wind turbine model, hub and nacelle. Screenshot images from the SIMPACK model are taken by the author.

An input main shaft, a three-stage planetary gearbox, a high-speed coupling that connects to the generator, a main frame, a housing, and a yaw bearing make up the drivetrain. The high-speed shaft (HSS) drives the electrical generator, while the low speed shaft (LSS) attaches the rotor hub to the gearbox. The gearbox model comprises of a flexible main shaft and flexible housing with three active dynamic modes. Eight accelerometers, labeled 1 through 8 in Figure 2, are embedded in the drivetrain model to measure and collect accelerations at different bearing locations on the gearbox. To more accurately model the gear tooth contact and to account for the forces and moments generated in the gear mesh, this SIMPACK model incorporates a high-fidelity gear force element into all gears. All gearbox element details, the calculated rotating and gear mesh frequencies are detailed in Amin et al. (2023a, 2023b), with a final gear ratio of 97.83.

An interface to the wind load simulation program AeroDyn in SIMPACK was utilized to represent the wind loads in full wind turbine simulations performed under nine different wind conditions and speeds, with the gearbox remaining in healthy condition (Jonkman, 2009; Jonkman et al., 2015). Among the nine simulated scenarios, there were three cases of turbulent wind, two events of laminar flow, one instance of an extreme operating gust, two instances of extreme direction changes, and one wind speed run-up scenario with noise. Figure 3 shows examples of the wind speed profile for each of the simulated wind scenarios. After that, wind loads, speeds, and torques are extracted from each scenario and applied to the abstracted nacelle only model to simulate the loads given by wind turbine blades in an experimental configuration. The frequency spectrum of the high dynamic forces and moments created by the blades is often in the low frequency domain, as detailed in our study (Amin et al., 2022), making low-speed stage (LSS) problem diagnostics more difficult.

Figure 3.

Example of simulated wind conditions: (a) laminar wind, (b) turbulent wind, (c) extreme operating gust, (d) extreme direction change, (e) twice extreme direction changes, and (f) run-up 6–12 m/s with random noise.

To simulate the presence of a localized fault or a crack on gears, a tooth pitch error is introduced into the model. On the low-speed planetary stage, the first simulated fault is created on one of the planet gears (LSS). The second fault, shown in Figure 2, is on the driver high speed shaft (HSS) gear. These faults cause periodic impulses or modulation phenomena in the vibration signal, and the corresponding characteristic frequency is linked to the rotating frequency of the damaged gear. To reflect variations in the severity level, fault size change is reflected by altering the magnitude of each fault, ranging from 100 to 50 and finally down to 20 μm. A single tooth on one side of the respective gear wheels is affected by the fault. Because either of the two faults were present in every simulation, they did not exist concurrently in this study. To classify and detect the fault on the LSS, vibration signals from sensors 1 to 4 are selected from the simulation data. The final four sensors, numbered 5–8, are used to classify the HSS fault.

Cyclostationary analysis

A signal is considered to be cyclostationary of order $n$ when its statistics of the $n th$ order vary periodically with time. Many mechanical systems, such as gears, fans, and motors, produce periodic motion of their components, and as a result, their vibration and acoustic data fall into the cyclostationary class. The signal’s cyclic frequency, denoted by $α$ , is the basic frequency of periodicity. A cyclostationary signal $x (t)$ is of order 1 if its mean is periodic; of order 2 if its autocorrelation function $R_{x} (t, τ)$ is periodic after time $T$ (Antoni et al., 2017).

R_{x} (t, τ) = R_{x} [t + T, τ] .

(1)

Since the autocorrelation function $R_{x} (t, τ)$ is periodic, it can be expressed using Fourier series as:

R_{x} (t, τ) = \sum_{α} R_{x}^{α} (τ) e^{- 2 π i α t}

(2)

where $R_{x}^{α} (τ)$ are the Fourier coefficients that define the cyclic autocorrelation function. If we break $R_{x} (t, τ)$ function into a group of individual autocorrelation functions, we get what is called a cyclic autocorrelation function:

R_{x}^{α} (τ) = lim_{T \to 0} \frac{1}{T} \int_{T} x (t + \frac{τ}{2}) x^{*} (t - \frac{τ}{2}) e^{- 2 π i α t} d t

(3)

Using equation (3), we can derive the spectral correlation function $S_{x}^{α} (f)$ as follows:

S_{x}^{α} (f) = \int_{- \infty}^{\infty} R_{x}^{α} (τ) e^{- 2 π if τ} d τ

(4)

where $f$ is the carrier frequency, the dual of the time lag $τ$ and $α$ the dual of $t$ represents the modulation frequency (Antoni et al., 2017; Gardner et al., 2006). Using $S_{x}^{α} (f)$ , a cross correlation between frequency components separated by $f - \frac{α}{2}$ and $f + \frac{α}{2}$ is evaluated to get cyclic spectral coherence $γ_{x} (α, f)$ . This is basically the normalized cyclic spectral correlation with values in the range of $[0, 1]$

γ_{x} (α, f) = \frac{S_{x}^{α} (f)}{\sqrt{S_{x}^{α} (f - \frac{α}{2}) S_{x}^{α} (f + \frac{α}{2})}}

(5)

A bivariable map, based on $S_{x}^{α} (f)$ and $γ_{x} (α, f)$ is generated with cyclic frequency $α$ on the x-axis and spectral frequency content $f$ on the y-axis. This frequency-frequency map generates an improved demodulated spectrum and helps extracting any hidden periodicity.

In our gearbox dataset, HSS fault signature occurs at 4.7 cycle/rad which can be clearly identified using cyclic spectral coherence (CSCoh) maps. As show in Figure 4(a), the fault signature of HSS 100 μm is strong and very clear at 4.7 cycle/rad. As the fault magnitude is small or at an early stage, the signature gets weaker, as shown in CSCoh map for the HSS 20 μm fault in Figure 4(b). Additionally, the y-axis carrier frequency range decreases as fault magnitude decreases. So, the larger the amplitude of the fault, the broader the spectrum of frequencies it excites.

Figure 4.

Comparison of diagnostic results between Standard and Bayesian Convolutional Neural Networks using cyclic spectral coherence maps of vibration signals. Cases (a–d): seen faults on HSS and LSS. Case (e): unseen fault on IMS. BCNN predictions are plotted with a 95% confidence interval and are accompanied by their corresponding true labels in blue, while incorrect labels are displayed in red. Training and testing images provided to the neural network were devoid of any labels, axis numbers, or colorbars.

However, the LSS fault signature is only discernible on CSCoh maps when the fault magnitude is large, such as LSS 100 μm fault. As seen in Figure 4(c) and (d), as fault size decreases, the signature at 0.44 cycle/rad frequency becomes less distinct and harder to trace. These example plots in Figure 4(c) and (d), were generated using data from sensor 1, which is physically located closest to the damage. Sensors 3 and 4, which are further away from the damage, exhibit a weaker signature. Moreover, the spectral/carrier frequency range for the fault is narrower compared with HSS fault. This motivates the use of convolutional neural networks (CNN) for precise defect identification, which is detailed in Amin et al. (2023a, 2023b).

Bayesian convolutional neural network

A convolutional neural network (CNN) is often used in image recognition for damage classification (Jing et al., 2017). A CNN is capable of identifying spatially local correlation features in an image. The performance of CNN depends on the convolution operation for feature extraction and pattern recognition. Typical CNN architectures include four types of layers: a convolutional layer, a pooling layer, a Rectified linear unit (ReLU) layer, and a fully connected layer (Guo et al., 2018). In Amin et al. (2023a), we employed a CNN model trained on cyclic spectral coherence (CSCoh) images to perform fault diagnostics on the two faults introduced into our gearbox model. This conventional CNN works by optimizing a loss function to determine the best model parameters based on a known training data. In machine learning, the loss function serves as a measure of the error or discrepancy between the predicted output and the actual target to quantify how far off the model’s prediction from reality. The goal is to typically minimize this loss value. Adam, an adaptive moment estimation optimizer, is used to optimize a cross-entropy loss function that is then used to provide classification results. A Cross-Entropy loss is often used in classification tasks to quantify the dissimilarity between the predicted probabilities and the true categorical labels (Zhang and Sabuncu, 2018). A maximum likelihood estimation (MLE) is used to determine model parameters, weights, and biases; as a result, these parameters can only have single deterministic values (Blundell et al., 2015). For a set of training data $D = (x_{i}, y_{i})$ , the MLE of the weights are:

\begin{matrix} w^{M L E} = \arg \max_{w} \log P (D | w) = \\ \arg \max_{w} \sum_{i} \log P (y_{i} | x_{i}, w) \end{matrix}

(6)

This approach is effective if the testing data follows the same distribution of the training data. Specifically, if we keep testing images from fault-free cases and the two simulated damaged cases in our gearbox model. This is referred to as the in-distribution dataset. However, the inability of expressing uncertainty in the output is a drawback of this CNN-based approach (Jospin et al., 2022; Shridhar et al., 2019). Another issue is that it is highly unrealistic and unlikely for the test data to have the same distribution as the training data. This is due to the fact that it is extremely challenging to gather training data for each and every potential event or failure mode that may occur in the gearbox operating in the field. As a result, any out-of-distribution testing, such as the existence of new faults that were unknown during training, would result in meaningless or wrong overconfident predictions from this traditional CNN-based framework (Jospin et al., 2022). For instance, the CNN has been trained to accurately diagnose faults on LSS and HSS. Any unseen fault, say on the intermediate speed shaft (IMS), will be blindly diagnosed into any of the existing LSS and HSS faults. This means that the network would still assign a category to any new input image even if it is unrelated to what it has been trained on. Such untrustworthy results fail to meet the reliability and safety requirements for accurate fault diagnosis.

The challenges with conventional CNNs can be resolved by utilizing a Bayesian neural network (BNN), in which a probability distribution is learnt across all model parameters, that is, weights and biases. This gives it advantages to capture prediction uncertainty. In addition, BNNs are considered to be highly data-efficient, allowing them to effectively learn from small datasets without the risk of overfitting (Depeweg et al., 2018). This property makes BNNs particularly well-suited for the kind of simulation dataset that we are working with, which may be limited in size or scope. By utilizing a probabilistic approach to learning, BNNs are able to better model the inherent uncertainty in small datasets, which results in more robust and accurate predictions. This advantage can be especially important in scientific or engineering applications where data collection may be difficult or time-consuming. Moreover, BNNs are often considered to be a special case of ensemble learning (Jospin et al., 2022). In effect, BNNs generate a family of models that represent different possible ways of fitting the data, rather than relying on a single best-fit model. This approach is similar to ensemble learning, where multiple models are trained, and their outputs combined to improve performance. However, BNNs differ from traditional ensemble methods in that they generate a distribution of models from a single training process, rather than training multiple models independently.

As previously stated, the probabilistic approach used by BNNs involves learning a probability distribution, known as a variational posterior, over the model parameters both during training and after examining the training data. This is done using the Bayes’ theorem to find the posterior distribution $p (w | D)$ where $w$ represents the weight parameters and $D (X, Y)$ represents dataset, $X$ is for the input data or the 2D image representations of the sensor signals, and $Y$ is for classes or images’ labels.

p (w | D) = \frac{p (D | w) p (w)}{p (D)} = \frac{p (D | w) p (w)}{\int_{w} p (D | w) p (w) d w}

(7)

where $p (D | w)$ represents the likelihood of data for weights specified by $w$ , $p (w)$ is the prior distribution which represents our prior belief about the weight distribution, and $p (D)$ is the marginal likelihood. So, in this Bayesian context, we are updating the belief about the weight distribution from the prior to the posterior as a result of seeing and analyzing the data. Given a dataset with $n$ training points as $D = {x_{i}, y_{i}}$ , we can construct the likelihood function as:

P (D | w) = Π_{i = 1}^{n} P (y_{i} | x_{i}, w)

(8)

where the function $P (D | w)$ is defined to describe the process by which an observation is generated, given a set of parameters and input variables.

$P (D)$ , also known as the marginal likelihood or model evidence, represents the distribution of the observed data after marginalizing over the parameters. Mathematically, it is expressed as $p (D) = \int_{w} p (D | w) p (w) d w$ . The computation of this marginal likelihood, in the denominator of equation (7), is often challenging due to the intractable nature of the integration over all possible values of the parameters $w$ in deep neural network. This can lead to a lack of analytical solutions for complex models, as the integration often requires consideration of high-dimensional spaces.

Variational inference

The difficulty in computing the complex posterior distribution is addressed by proposing a method based on the variational inference technique known as Bayes through Back-propagation (Shridhar et al., 2018). This method aims to approximate the true posterior distribution by using a computationally tractable approximation. This done by using a known distribution, specifically called variational posterior $q (w | θ)$ to obtain an approximate distribution that is as close as possible to the true posterior distribution, where $θ$ represents the distribution mean and variance:

p (w | D) \approx q_{θ} (w | D)

(9)

In this expression the weights $w$ are sampled in back-propagation from a Gaussian distribution that is based on the parameter $θ : w ~ N (μ_{θ}, σ_{θ})$ . The shape of the Gaussian variational posterior probability distribution, which are characterized by their variance $σ^{2}$ , provides an estimation of the uncertainty for each model parameter. As a result, we need to find the minimum value between $p (w | D)$ and $q_{θ} (w | D)$ . This is done using the Kullback-Leibler (KL) divergence which is the estimate of the difference between the two probability distributions $q (w | θ)$ and $p (w | D)$ , and is to be minimized to get this approximation (Blundell et al., 2015; Zhou et al., 2022).

θ^{o p t} = \arg \min_{θ} KL [q_{θ} (w | D) | | p (w | D)]

(10)

\begin{matrix} KL [q_{θ} (w | D) | | p (w | D)] = KL [q_{θ} (w | D) | | p (w)] - \\ E_{q (w | θ)} [\log p (D | w] + \log p (D) \end{matrix}

(11)

where

KL [q_{θ} (w | D) | | p (w)] = \int q_{θ} (w | D) \log \frac{q_{θ} (w | D)}{p (w)} d w

(12)

where $KL [q (w | θ) | | p (w)]$ is a the complexity cost representing the similarity between $q (w | θ)$ and the prior $p (w)$ . The second term $E_{q (w | θ} [\log p (D | w]$ , known as the likelihood cost, evaluates the model’s ability to accurately fit the given data. It also represents the negative log likelihood of the model and is a crucial component in determining the overall quality of the model’s predictions. In essence, this term measures the deviation between the predicted outcomes and the actual observed values, providing a quantifiable assessment of the model’s performance (Maier et al., 2020). However, due to the expensive computation to exactly minimize the $KL$ divergence, gradient descent and approximations to sample weights from the $q_{θ} (w | D)$ is followed. Instead, the optimization of the network is achieved by minimizing a tractable objective function, called evidence lower Bound in equation (13), with respect to $θ$ during the training phase.

\begin{matrix} E L B O (D | θ) \approx \\ \sum_{i = 1}^{n} \log q_{θ} (w^{(i)} | D) - \log p (w^{(i)}) - \log p (D | w^{(i)}) \end{matrix}

(13)

where $n$ is the number of draws and $w^{(i)}$ is sampled from the approximated distribution $q_{θ} (w | D)$ , which has been set in our model as a Gaussian distribution with mean and standard deviation as its parameters. Since $\log p (D)$ only depends on the prior, minimizing the KL divergence is equivalent to maximizing the log evidence lower bound (Blei et al., 2017). Stochastic gradient descent is computed using back-propagation to optimize the ELBO. This makes it possible for the algorithm to operate with the big datasets typical of neural networks (Shridhar et al., 2019).

Uncertainty in Bayesian neural network

A key benefit of Bayesian neural networks compared to standard frequentist neural networks is their capability to quantitatively express uncertainty. When doing classification tasks, we are interested in the predictive distribution $p_{D} (y^{t}, x^{t})$ . The parameter $x^{t}$ refers to any target or unseen data that does not have the same distribution of the training data and $y^{t}$ refers to the predicted class or label. In Bayesian neural network, the predictive distribution is given by:

p_{D} (y^{t}, x^{t}) = \int p_{w} (y^{t}, x^{t}) p_{D} (w) d w

(14)

As explained in Variational Inference subsection, Gaussian distributions $q_{θ} (w | D) ~ N (w | μ, σ^{2})$ , where $θ = {μ, σ}$ are learnt as approximations during training with some data $D = {x_{i}, y_{i}}_{i = 1}^{n}$ . For most classification tasks, the predictive distribution is usually assumed to be categorical due to its discrete and finite nature. Therefore, we obtain the following expression:

\begin{matrix} p_{D} (y^{t}, x^{t}) = \int Cat (y^{t} | f_{w} (x^{t})) N (w | μ, σ^{2}) d w \\ = Π_{c = 1}^{C} f (x_{c}^{t} | w)^{y_{c}^{t}} \frac{1}{\sqrt{2 π σ^{2}}} e^{-} \frac{{(w - μ)}^{2}}{2 σ^{2}} d w \end{matrix}

(15)

with $C$ is the total number of classes and $\sum_{c} f (x_{c}^{t} | w) = 1$

On real data, however, there is no conjugacy between categorical and Gaussian distributions. Thus, a closed form solution to equation (15) does not exist. Nevertheless, by sampling from the variational distribution $q_{θ} (w | D)$ , we may derive an estimator of the expectation as:

\begin{matrix} E_{q} [p_{D} (y^{t}, x^{t})] = \int q_{θ} (w | D) p_{w} (y | x) d w \\ \approx \frac{1}{K} \sum_{k = 1}^{K} p_{w_{k}} (y^{t}, x^{t}) \end{matrix}

(16)

where $K$ is the predefined number of samples from the trained network on the target data and $p_{w_{k}} (y^{t} | x^{t})$ is a draw from the predictive posterior. The variance of this estimator captures the network’s level of confidence or uncertainty in its prediction.

{V a r}_{q} (p (y^{t} | x^{t})) = E_{q} [y^{t} y^{t^{T}}] - E_{q} [y^{t}] E_{p} [y^{t}]^{T}

(17)

This variance quantifies the amount of uncertainty or randomness in the model’s output, indicating how much the predicted result may deviate from the true value. Essentially, a low variance in the estimator indicates that the network is highly certain in its prediction, while a high variance reflects a greater degree of uncertainty. Understanding the variance of this estimator is essential for gauging the reliability of the network’s predictions. A confident prediction should have a high mean predictive probability and a low variance value. The derivation of these equations is further explained in Shridhar et al. (2019, 2018).

Fault diagnostics using deep learning

This section discusses the implementation of the proposed BCNN and draws comparisons to the results obtained by using a standard CNN on our simulated-based gearbox dataset. There are two types of faults, each of which can occur at one of three possible severity levels (100, 50, or 20 μm) in addition to the healthy case. This gives us a total of seven categories for classifying results. Figure 4 presents examples for CSCoh maps of some simulated faults, followed by visualizations of predictions using standard CNN and BCNN. A total of 276 coherence intensity matrices for all images with the size of $250 \times 20$ are passed to the neural network. One hundred ninety-five matrices, 70%, are for training and validation while the remaining 81 images, 30%, are used for testing.

Standard convolutional neural network

The used CNN architecture contains three 2D convolution layers and three 2D pooling layers: 20 $2 \times 2$ kernels, 30 $2 \times 2$ , 40 $2 \times 2$ kernels in the first, second and the third convolution layer; respectively. The three maximum pooling layers have a size of $2 \times 2$ . Two pixels symmetric padding is added in each layer to ensure including image borders in the processing and to avoid missing information too early in the network. Training batch size is set as 39. The training of CNN is done through 120 iterations, 30 epochs, to guarantee convergence and accuracy. The learning rate of neural network is 0.01 to ensure fine tuning for accuracy improvement. An early stopping criterion is implemented causing the training simulation to stop automatically if the validation accuracy does not improve within 30 epochs. As discussed in Amin et al. (2023a), the accuracy of this CNN model is 88% with all HSS faults correctly identified based on the fault magnitude. However, there were three missed detections but no false positive cases. Figure 4 demonstrates the success of the standard CNN in cases (a), (b), and (c), particularly for HSS 100, HSS 20, and LSS 100 μm defects, by assigning the correct label for those images. In case (d), however, the LSS 20 μm image was incorrectly labeled as healthy by the CNN, with a prediction level of more than 80%. Moreover, standard CNN clearly provided a wrong, overconfident result while dealing with the unseen fault in case (e), which is an image of intermediate speed stage, IMS 50 μm fault. Due to the weights’ deterministic values, standard CNN produces meaningless results without any warnings when tested with images that do not follow the distribution of the training data.

Bayesian convolutional neural network

The used BCNN architecture is similar to the standard CNN with each parameter represented by a normal distribution as shown in Figure 5(a). Thus, the mean and standard deviation of each weight and bias is learnt during training. The learning rate of 0.01 used in the standard CNN is also employed here. The obtained accuracy level was 90%. As explained in 4.2, the weights in a BCNN are not fixed but instead are represented by a learned probability distribution. To make a prediction using the BCNN, we need to generate multiple samples or run the network several times for each input image as depicted in Figure 5(b). Since the model parameters are being sampled each time, each prediction is likely to be slightly different. In our model, we used a predefined $K = 100$ samples in equation (16), resulting in $100$ different predictions for each class. The prediction result is then determined by taking the average of the means of all $100$ samples. The variance of these predictions provides a way to estimate uncertainty, and the 95% prediction interval for each class was obtained and visualized as bars in Figure 4. The height of the bars represents the difference between the $2.5 th$ and $97.5 th$ percentiles, with taller bars indicating a higher level of uncertainty. The model was implemented using Python and the TensorFlow library, which are popular tools for training and testing deep learning models.

Figure 5.

(a) A workflow to implement, train and evaluate the Bayesian Convolutional Neural Network on vibration raw data represented by cyclic spectral coherence maps. (b) An uncertainty-aware prediction for a given input image by drawing $K$ samples from the posterior distribution over its weights. This allows the model to generate a probability distribution over possible outcomes, rather than a single point estimate.

If we examine the BCNN prediction in Figure 4, we see that it is just as accurate as the conventional CNN in cases (a) and (b). In case (c), which is for LSS 100 μm, as explained earlier the fault signature is weaker compared to the HSS. This is reflected by a marginal increase in the height of the bar, indicating a higher level of uncertainty for this class prediction. In scenario (d), the BCNN still gives a larger probability to the healthy class than the LSS 20 μm class, but the level of uncertainty is substantial, as seen by the extreme differences in bar heights. Out of the 100 samples, the output was almost 0.01 in one sample and nearly 0.98 in another for the healthy class. This means that the diagnostic result is untrustworthy and should be investigated further. Case (e) is an example where the BCNN model demonstrates that it has tested something new, as evidenced by the wide probability distribution allocated to the four classes with a significant degree of uncertainty. To avoid incorrect classification, the BCNN here tries to communicate that it is unsure about any category it has not seen or trained on before.

Figure 6 provides a closer look at the probability predictions for the first four images shown in Figure 4. The y-axis of the scatter plots displays the probability of correct predictions for each of the four cases: HSS 100 μm, HSS 20 μm, LSS 100 μm, and LSS 20 μm; respectively. As previously noted, the Bayesian neural network (BNN) produces highly accurate and confident results for cases (a), (b), and (c). However, for the last scatter plot, case (d), the 100 samples from the trained BNN assigned widely varying probabilities for the LSS 20 μm class, ranging from 0 to 1. This indicates a high level of uncertainty in the prediction, as also shown by the large height of the corresponding bar in Figure 4(d). The standard deviation for the first three tested cases is zero, while the fourth case has a standard deviation of $0.425$ . This uncertainty is understandable, given the weak fault signature, particularly at the low amplitude of 20 μm, which can easily be mistaken by the neural network as a healthy case.

Figure 6.

Bayesian neural network probabilistic results from drawing 100 samples for each test image of seen faults (a) HSS 100 μm, (b) HSS 20 μm, (c) LSS 100 μm, and (d) LSS 20 μm.

Conclusion

In this study, the performance of vibration-based fault diagnostic systems employing Bayesian and traditional neural networks is compared and assessed. Using simulation-based dataset, the Bayesian neural network can provide the level of uncertainty in the classification results. Since the level of uncertainty is minimal while dealing with HSS faults, the performance of the two networks, BCNN and standard CNN, is very similar. For the LSS fault, however, the impact signature is negligible at low fault magnitudes. Therefore, the level of uncertainty is higher in this case, which is solely represented in the BCNN’s results in Figure 4(c) and (d). When tested with an unknown fault type, such an IMS fault, the BCNN obviously indicates its degree of doubt about the new pictures by assigning equally distributed predictions over four different classes. This simply means the network is stumped on identifying the new image and requires human intervention for further investigation. In contrast to the BCNN, when tested with this IMS fault type, the standard CNN assigned a completely wrong overconfident classification, claiming instead that the fault is LSS 50 μm as shown in Figure 4(e). This research creates a more comprehensive framework for the application of deep learning algorithms to ensure the safety and reliability of wind turbine fault diagnostics. The Bayesian neural network serves to round out this strategy considering both the accuracy and trustworthiness or the level of confidence of the diagnostic results.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Meghashyam Panyam

Phanindra Tallapragada

References

Amin

Bibo

Panyam

, et al. (2022) Vibration-based condition monitoring in wind turbine gearbox using convolutional neural network. In: 2022 American control conference (ACC), pp.3777–3782. New York, NY: IEEE.

Amin

Bibo

Panyam

, et al. (2023a) Vibration based fault diagnostics in a wind turbine planetary gearbox using machine learning. Wind Engineering 47(1): 175–189.

Amin

Bibo

Panyam

, et al. (2023b) Wind turbine gearbox fault diagnosis using cyclostationary analysis and interpretable CNN. Journal of Vibration Engineering and Technologies. https://doi.org/10.1007/s42417-023-00937-1

Antoni

(2007) Fast computation of the kurtogram for the detection of transient faults. Mechanical Systems and Signal Processing 21(1): 108–124.

Antoni

Xin

Hamzaoui

(2017) Fast computation of the spectral correlation. Mechanical Systems and Signal Processing 92: 248–277.

Blei

Kucukelbir

McAuliffe

(2017) Variational inference: A review for statisticians. Journal of the American Statistical Association 112(518): 859–877.

Blundell

Cornebise

Kavukcuoglu

, et al. (2015) Weight uncertainty in neural network. In: International conference on machine learning, pp.1613–1622. PMLR.

Carroll

McDonald

McMillan

(2016) Failure rate, repair time and unscheduled o&m cost analysis of offshore wind turbines. Wind Energy 19(6): 1107–1119.

Depeweg

Hernandez-Lobato

Doshi-Velez

, et al. (2018) Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning. In: International Conference on Machine Learning, pp.1184–1193. PMLR.

10.

Feng

Zuo

(2012) Vibration signal models for fault diagnosis of planetary gearboxes. Journal of Sound and Vibration 331(22): 4919–4939.

11.

Gardner

Napolitano

Paura

(2006) Cyclostationarity: Half a century of research. Signal Processing 86(4): 639–697.

12.

Guo

Yang

Gao

, et al. (2018) A novel fault diagnosis method for rotating machinery based on a convolutional neural network. Sensors 18(5): 1429.

13.

Guo

Keller

(2018) Investigation of high-speed shaft bearing loads in wind turbine gearboxes through dynamometer testing. Wind Energy 21(2): 139–150.

14.

Jing

Wang

Zhao

, et al. (2017) An adaptive multi-sensor data fusion method based on deep convolutional neural networks for fault diagnosis of planetary gearbox. Sensors 17(2): 414.

15.

Jonkman

(2009) Turbsim user’s guide: Version 1.50. Technical report, National Renewable Energy Lab. (NREL), Golden, CO.

16.

Jonkman

Hayman

Jonkman

, et al. (2015) Aerodyn v15 user’s guide and theory manual. NREL Draft Report, p.46.

17.

Jospin

Laga

Boussaid

, et al. (2022) Hands-on Bayesian neural networks—a tutorial for deep learning users. IEEE Computational Intelligence Magazine 17(2): 29–48.

18.

, et al. (2009) A review of recent advances in wind turbine condition monitoring and fault diagnosis. In: 2009 IEEE power electronics and machines in wind applications, pp.1–7. New York, NY: IEEE.

19.

Maier

Lorch

Riess

(2020) Toward reliable models for authenticating multimedia content: Detecting resampling artifacts with Bayesian neural networks. In: 2020 IEEE International conference on image processing (ICIP), pp.1251–1255. New York, NY: IEEE.

20.

Mauricio

Gryllias

(2019) Vibration-based condition monitoring of wind turbine gearboxes based on cyclostationary analysis. Journal of Engineering for Gas Turbines and Power 141(3): 031026.

21.

Qiao

(2015) A survey on wind turbine condition monitoring and fault diagnosis—part i: Components and subsystems. IRE Transactions on Industrial Electronics 62(10): 6536–6545.

22.

Shabana

(1997) Flexible multibody dynamics: Review of past and recent developments. Multibody System Dynamics 1: 339–348.

23.

Sheng

(2012) Wind turbine gearbox condition monitoring round robin study-vibration analysis. Technical report, National Renewable Energy Lab. (NREL), Golden, CO.

24.

Sheng

(2017) Wind turbine gearbox reliability database, operation and maintenance research update. Technical report, National Renewable Energy Lab. (NREL), Golden, CO.

25.

Shridhar

Laumann

Liwicki

(2018) Uncertainty estimations by softplus normalization in Bayesian convolutional neural networks with variational inference. arXiv preprint arXiv:1806.05978.

26.

Shridhar

Laumann

Liwicki

(2019) A comprehensive guide to Bayesian convolutional neural network with variational inference. arXiv preprint arXiv:1901.02731.

27.

Simeon

(2013) Computational Flexible Multibody Dynamics. A Differential-Algebraic Approach. Berlin/Heidelberg: Springer.

28.

Verbruggen

(2003) Wind turbine operation & maintenance based on condition monitoring wt-Ω. Final Report, April.

29.

Wind Europe (2022) Wind Energy in Europe: 2021 Statistics and the Outlook for 2022–2026. Brussels: Wind Europe.

30.

Yang

Sheng

Court

(2012) Operational-condition-independent criteria dedicated to monitoring wind turbine generators. International Journal of Prognostics and Health Management 3. https://doi.org/10.36001/ijphm.2013.v4i3.2133

31.

Zhang

Wang

, et al. (2016) Bearing fault diagnosis based on variational mode decomposition and total variation denoising. Measurement Science and Technology 27(7): 075101.

32.

Zhang

Sabuncu

(2018) Generalized cross entropy loss for training deep neural networks with noisy labels. In: Bengio

Wallach

Larochelle

, et al. (eds) Advances in Neural Information Processing Systems Montreal, Canada: Curran Associates, pp.8778–8788.

33.

Zhou

Han

Droguett

(2022) Towards trustworthy machine fault diagnosis: A probabilistic Bayesian deep learning framework. Reliability Engineering & System Safety 224: 108525.