Abstract
This study conducts a reliability analysis of a wind turbine system, focusing on shaft, gearbox, and generator. The goal is to evaluate their time between failures and determine the most suitable statistical models to describe their failure behaviours. Four distributions (3-parameter Weibull, 2-parameter exponential, normal, and 3-parameter lognormal) were assessed using maximum likelihood and least squares estimation with goodness-of-fit tests including Anderson-Darling (ADC) and Pearson correlation coefficients (PCC). Results show that gearbox is best modelled by 3-parameter lognormal, while generator fits the 3-parameter Weibull. The shaft yielded contradictory outcomes between ADC and PCC, which were resolved through Akaike and Bayesian information criterions implemented in Python, confirming the normal distribution as optimal. Reliability differences were evident with gearbox exhibited infant mortality and highest failure rates, shaft showed wear-out behaviour and generator demonstrated reliable and stable behaviour. These findings highlight the importance of component-specific modelling to optimize maintenance, reliability and system lifespan.
Keywords
Introduction
General interest in renewable energies has seen dramatic growth since the onset of climate change, while work on many renewable alternatives to coal and oil is underway. In this context comes wind energy as a significant source of electric power development to improve the bankability of a clean energy generation through putting trust in the reliability, availability and robustness of wind turbines. Reliability assessment of wind turbines is a popular topic in the wind energy community and one of the key elements for performing the reliability assessment of wind energy systems is the reliability analysis of its main subsystems and components such as blades, shafts, gearboxes, generators, bearings, and yaws drive, the reliability examination of these parts is conducted to make each one of them work stably during its lifespan. However, large-scale experimental tests are expensive and time-consuming. Therefore, researchers use various statistical methods to simulate experimental and life data for parameter estimation and comparison. In this case, different statistical distributions are utilized widely in reliability engineering in wind energy systems or other fields and the most important and popular ones are Weibull, exponential, normal, and lognormal. All these distributions have their own particular applications when it comes to modelling lifetime data of any system or analysing the reliability of its components.
The studies about wind turbine reliability analysis are emerged with the appearance and the development of harvesting wind power as a renewable energy source, it is being the subject of many researches with the purpose of enhancing the reliability of wind turbine’s components and reducing the cost of operations and maintenance burdening the industry of renewable energy generation true wind turbines. Tavner et al. (2007) analyse the historic reliability of modern wind turbines, using Wind stats survey data and applying reliability analysis methods to assess turbine components, factoring in variables like design, configuration, weather, and maintenance, while Li et al. (2009) focus on examining the Nordex N43/600 wind turbine power system providing an overview of the system’s design and functionality stating that it is characterized by a strong adaptability, high reliability, easy maintenance, and security. Lin et al. (2016) examine the increasing severity of wind turbine component failures identifying key failure causes, including lack of core technologies, price-driven quality issues, design-climate mismatches, and insufficient quality certifications. Chuong et al. (2019) discuss the importance of wind turbine generators in future power systems and their impact on increasing the reliability of distribution networks. Peng et al. (2024) introduce a hybrid fault detection model that combines interpretability and statistical rigour for multi-fault diagnosis of wind turbine system. Lastly, Wang et al. (2025) propose a multi-reliability index evaluation and maintenance period optimization method for wind turbines that accounts for component failure correlations. Many researches focused on studying the reliability of the shaft, gearbox, and generator as main components of the wind turbine, Su and Hu (2018) uses data mining techniques to analyse the reliability of wind turbines in a domestic wind farm, indicating that the main shaft, gearbox, and generator experience lower failure rates but longer maintenance times, and Zhu and Li (2018) analyse the failure modes, causes, and detection methods of key wind turbine components (e.g. shaft, gearbox, and generator), discussing the importance of condition monitoring systems and supervisory control to enhance reliability and reduce operation and maintenance costs. Furthermore, Rafsanjani and Sørensen (2015) address fatigue failure in wind turbine components (shafts, bearings, gearboxes, and generators), with proposing a generic stochastic model based on fatigue test data, the SN-curve approach, and Miner’s rule, with reliability analysed using the maximum likelihood estimation MLE method, also Sa’ad et al. (2022) focus on minimizing downtime by performing maintenance on critical turbine components (bearing, main shaft, gearbox, and generator) when their reliability falls below a set margin and present a preventive maintenance strategy for wind turbines, integrating wind power generation forecasts using artificial neural networks. However, Maatallah and Ouni (2022) propose a nonlinear vibration-based monitoring technique using kernel principal component analysis to detect high-speed shaft bearing faults in wind turbines. Bußkamp et al. (2025) propose a multi objective optimization method to reduce gearbox and high-speed shaft damage risk in wind turbines during grid faults, focusing on gearbox component damages that can be caused by electromagnetic generator torque excitations. On the other side, other researchers concentrate on assessing the reliability of the gearbox as a critical part in the wind turbine system, Smolders et al. (2010) develop a reliability analysis model for wind turbine gearboxes, using a modular structure and reliability block diagrams emphasizing advanced prediction models based on failure modes and load capacity, Pérez et al. (2013) examine wind turbine configurations and their reliability highlighting that blades and gearboxes cause the most downtime and emphasizing the importance of condition monitoring to enhance reliability as turbine sizes increase, and Salem et al. (2017) present a condition monitoring technique for wind turbine gearboxes by tracking shaft torque and vibration signals and applying order analysis to compare signals from normal and abnormal operating conditions to ensure gearbox reliability. Along with Amin et al. (2023) exploring vibration-based fault diagnostics for wind turbine gearboxes using deep learning with a Bayesian convolutional neural network, offering both high accuracy and uncertainty quantification, enhancing reliability in unseen fault conditions, and Liu et al. (2025) introduce a dynamic reliability framework for wind turbine gearboxes that accounts for the coupling between random shock loads, progressive degradation, and component failure correlations.
Some work used the 3-parameter Weibull statistical distribution to estimate the reliability of wind turbines, Guo et al. (2009) address the reliability analysis of wind turbines, emphasizing the use of a 3-parameter Weibull function for improved accuracy with parameters estimated using MLE and least square estimation LSE methods. However, Han et al. (2023) evaluate wind turbine subassembly reliability using the 3-parameter Weibull distribution model with an improved artificial bee colony algorithm to estimate parameters using MLE. Peng et al. (2023) discuss the rapid growth of wind power and the challenges of wind turbine reliability, highlighting the importance of early failure detection and analysing the main components, failure mechanisms, and associated costs of wind power projects. Along with Safaeinejad et al. (2024) evaluating the dynamic performance of permanent magnet synchronous generator-based wind turbines, focusing on blade in-plane fatigue loads, proposing an active mitigation method to reduce blade oscillations improving drivetrain reliability. The latest contributions of the field include El Kihel et al. (2024) evaluating wind power potential in Morocco’s Oriental region using Weibull-based algorithms, wind rose analysis and capacity factor assessment to identify optimal locations, and Zhang et al. (2025) proposing an improved method for calculating Weibull distribution parameters to enhance the reliability evaluation of wind turbine rotor systems. This filed is being a subject to many reviews, Igba et al. (2015) review the evolution of wind turbine gearbox performance research, focusing on reliability, availability and maintainability RAM analysis through in-service data and highlighting the limited literature on gearbox maintainability as a gap to be addressed. Pfaffel et al. (2017) review various initiatives that collect and analyse wind turbine performance and reliability data, revealing significant variation in onshore wind turbine metrics and shows improving performance and reliability trends for offshore turbines over time. Jiang et al. (2017) provide a comprehensive review of structural reliability analysis research on wind turbines from the 1990s to 2017, covering first- and second-order reliability methods, as well as simulation techniques, along with examining structural reliability studies on various wind turbine components, including shaft, gearbox, and generator. While some studies pay attention to the problem of wind turbine reliability data scarcity and inconsistency, Artigao et al. (2018) review wind turbine reliability studies highlighting the lack of public reliability data and developing a common taxonomy for wind turbine assemblies with normalizing failure rates and downtimes to compare studies results, and Leahy et al. (2019) address the challenges in wind turbine reliability due to inconsistent and unstandardized data, often guarded by original equipment manufacturers, emphasizing the need for unified standards to improve reliability analysis. Others like Pandit et al. (2022) review recent progress in data-driven approaches using SCADA data to enhance wind turbine O&M, covering condition monitoring, fault detection, and decision support, Pulikollu et al. (2023) review the evolution of generator technology over 20 years and assess its reliability, along with examining the integration of wind turbine systems like blades, pitch, main bearing, gearbox, and generator into a composite system highlighting their interdependence for efficient operation, and Gbashi et al. (2024) conduct a review about the advancements in rolling element bearings for wind turbine main shaft, focusing on spherical and tapered roller bearings, which are essential for turbine rotor performance, with pointing on industrial and academic contributions in the field. Lastly, Meng et al. (2025) present a systematic review of reliability research on wind turbine gearboxes, with particular focus on offshore applications where environmental factors intensify operational challenges.
Proper selection of statistical distributions in reliability analysis is a crucial step because it forms the basis for estimating parameters for those distributions, it emanates from the estimated parameters specific to a reliability problem using different statistical distributions and the most well-known ones are Weibull, exponential, normal, and lognormal, which are utilized in many researches related to reliability assessment. Barberá et al. (2012) present a RAM analysis of a copper smelting process through estimating parameters of Weibull, exponential, normal, and lognormal distributions to identify critical subsystems. Also, Tsarouhas (2018) conducts RAM analysis on a wine packaging line using failure data and the same probability distributions, applying MLE method with goodness-of-fit using the Anderson-Darling test. However, Talkit et al. (2023) investigate the use of failure data for RAM assessing of a ghee production line applying the same distributions to evaluate the system’s Reliability and maintainability at different mission times. Kishorilal and Mukhopadhyay (2018) assess the reliability of diesel engine subsystems in mining dump trucks through analysing time between failure TBF data using the same four distributions, with MLE, LSE, and Bayesian methods to estimate reliability, and Guraja et al. (2020) evaluate the reliability of Microsoft SQL Server by analysing report failure data and modelling it using the pre-mentioned distributions. However, Ghasemi et al. (2023) review existing literature on estimating and modelling the failure rates of power system equipment, focusing on degradation, prognostics, and remaining useful life estimation including exponential, Weibull, normal, and other models. Recent works comprise Abid et al. (2024) address the challenges of integrating renewable energy sources into power systems by proposing an Enhanced Kepler Optimization Algorithm for solving the stochastic optimal power flow problem, with modelling wind and solar uncertainties using Weibull and lognormal distributions, and Zhu et al. (2025) propose a probabilistic method to improve fatigue damage estimation in offshore wind turbines adopting (Weibull, normal, lognormal, and Gamma) distributions for variables modelling, and addressing the limitations of traditional deterministic models that cannot fully capture real world uncertainties. On the other side, Sembiring et al. (2017) perform reliability analysis to identify failure rates and availability for a machine in sugar production company relying on lognormal, exponential and Weibull distributions to model each component’s damage data, Cu and Vintr (2019) discuss the analysis of accelerated reliability testing data by selecting from the three same distributions the appropriate one with applying Arrhenius and IPL acceleration models to extrapolate reliability metrics. Meanwhile, Taketomi et al. (2022) provide a comprehensive review of various parametric distributions used in survival and reliability analyses, including exponential, Weibull, lognormal, and many others. Zuo et al. (2021) propose a method for calculating reliability indices of repairable systems with imperfect repairs, using generalized renewal processes GRP and adopting an exponential function to model the time-varying scale factor of the wear-out phase. Mozafari et al. (2023) investigate uncertainty in wind turbine fatigue damage estimation caused by limited simulation data, highlighting the lognormal distribution of damage in wind bins with fewer seeds. At last, Akash et al. (2025) address the high operational and maintenance costs of wind turbines, particularly offshore, by introducing a data-driven prognostic method for forecasting the remaining useful life of gearboxes applying an exponential degradation model.
While some researchers based their work on Weibull distribution like Kundu et al. (2017) analysing left-truncated and right-censored competing risks data under the latent failure times model, assuming Weibull-distributed lifetimes for competing failure causes and discussing MLE method, Osuchukwu et al. (2024) reviewing and discussing the use of Weibull analysis for reliability assessment of ceramics and related materials, Gaidai et al. (2024) presenting a novel multivariate Gaidai and 2D modified Weibull reliability methods for assessing wind turbine components, such as the gearbox and drivetrain, under high environmental stresses, Douiri (2024) evaluates particle swarm optimization for estimating Weibull distribution parameters at Moroccan wind farms and Jiang et al. (2025) propose a reliability prediction method for offshore wind turbines using a time-dimensional data migration model to identify potential failure characteristics and applying Weibull distributions to model component lifetimes. Others try to reinforce their work by adding others distribution to Weibull such as Chu et al. (2010) extracting time-varying failure rates TFR for power distribution equipment, considering failure modes and regional effects with classifying failures into random and ageing, using exponential and Weibull distributions, Hu et al. (2017) presenting and comparing a kernel density estimation KDE method for estimating the probability distribution of wind speed to parametric models based on Weibull and Normal distributions, Tizgui et al. (2018) model wind speed distribution to optimize wind energy assessment and project feasibility based on a comparison of four candidate distributions (Weibull, Rayleigh, Gamma, and lognormal) using goodness-of-fit tests, Wang et al. (2019) proposing a framework for the load-sharing model incorporating work history, using a general likelihood function for defining the MLE and demonstrating its use with Weibull and lognormal distributions, and finally Shui et al. (2025) develop a probabilistic framework for modelling extreme offshore wind turbine environmental conditions using wind, wave height and wave period data employing Weibull and lognormal distributions for parameters estimation.
The purposes of this paper is to assess the reliability of a wind turbine electricity generation system composed of shaft, gearbox, and generator as main components installed in series, from a given dataset containing the time between failures TBF of each part. Therefore, to analyse the reliability of the wind turbine whole system, a detailed investigation of the reliability characteristics of each component is conducted including the identification of the best statistical distribution to model each part’s dataset, the examination of the probability density, survival and hazard functions with the failure and survival probabilities evaluation and the comparison between the three components in term of survival and failure features. Afterward, the reliability calculations of each item and the whole wind turbine system are carried out and finally, suggestions are given to improve and maintain the reliability of each element and the wind turbine system in general. Along this evaluation, a focus is attributed to the selection properties of the best-fit statistical distribution from widely known ones (3-parameter Weibull, 2-parameter exponential, normal, and 3-parameter lognormal) to represent each part’s TBF data, depending on the Anderson-Darling and the Pearson correlation tests using the maximum likelihood estimation MLE and the least squares estimation LSE methods. Also, a contradiction is appeared between the results of the Anderson-Darling coefficients ADCs and the Pearson correlation coefficients PCCs from MLE and LSE approaches, respectively, for the shaft regarding the best-fit distribution from (3-parameter Weibull, normal, and 3-parameter lognormal). However, the problem was solved with generating an algorithm to calculate Akaike and Bayesian information criterions AIC and BIC metrics with the purpose of determining the best distribution to model the shaft’s TBF data. This study contributes a novel, integrative framework for wind turbine reliability analysis by combining detailed failure mode characterization with rigorous statistical distribution modelling. Unlike conventional system-level approaches, it provides specific insights for each of the wind turbine components (shaft, gearbox, and generator) requiring tailored maintenance strategies. Methodologically, the work advances reliability modelling by employing a multi-criteria evaluation (MLE, LSE, ADC, PCC, probability plots and information-based metrics AIC/BIC) for the best distribution fits, leading to robust selection of the Normal, 3-Parameter Lognormal, and 3-Parameter Weibull models for the shaft, gearbox, and generator, respectively. Another key novelty of the work is the resolution of distributional contradictions for the shaft. While ADC and PCC yielded conflicting indications, the introduction of AIC and BIC provided a decisive outcome, identifying the Normal distribution as the best fit. This methodological step demonstrates a systematic way to reconcile statistical conflicts, an aspect often overlooked in reliability studies that typically report only one test outcome. By explicitly linking statistical evidence with physical degradation mechanisms and developing a Python-based tool to compute AIC/BIC where unavailable, the study ensures both rigour and reproducibility. These contributions provide both theoretical rigour and practical value, advancing the state of the art in wind turbine reliability analysis. The present paper contains an introduction and methodology of the study. A theoretical section covering industrial and mathematical methods that include statistical distributions and goodness of fit approaches. A detailing case study section with identification of the components and the reliability analysis of the wind turbine system. A section where results are discussed with recommendations given to improve the wind turbine reliability. A section pointing on limitations and challenges of the study, and a final conclusion summarizing the main findings of the paper.
Methodology
In the beginning of the study, collecting and organizing the time between failures dataset for each component is crucial for the conception of the reliability evaluation. However, an identification of the wind turbine’s systems (shaft, gearbox, and generator) is essential to comprehend the equipment’s working mechanism. Subsequently, a thorough analysis of each component’s reliability features is carried out, which includes determining the most appropriate statistical distribution for representing the data for every single part, analysing the probability density, survival and hazard functions with the assessment of failure and survival probabilities, then, comparing the three parts in terms of survival and failure specifications. Following the completion of reliability computations for each component and the entire wind turbine system, recommendations are made to preserve and enhance the reliability of each item and the system as a whole. Additionally, the attention is given to the selection characteristics of the most suitable statistical distribution (3-parameter Weibull, 2-parameter exponential, normal, and 3-parameter lognormal) to fit the TBF data for each part. However, a discrepancy was found between the Anderson-Darling and Pearson correlation coefficients from MLE and LSE approaches relative to the best-fit distribution for the shaft. Therefore, to overcome the issue, an algorithm was created to compute the Akaike and Bayesian information criterions (AIC and BIC) in order to identify the optimal distribution for representing the shaft’s TBF data. This work introduces a novel framework for wind turbine reliability analysis that integrates detailed component-level failure characterization with rigorous statistical distribution modelling. By applying a multi-criteria evaluation (MLE, LSE, ADC, PCC, AIC, BIC, and graphical representations), it resolves distributional contradictions, most notably for the shaft, ensuring robust model selection. The work further enhances reproducibility through a Python-based tool for AIC and BIC computation, advancing both the methodological rigour and practical applicability of wind turbine reliability analysis.
Theoretical and mathematical methods used in the study
Three-parameter Weibull distribution
The 3-parameter Weibull distribution is one of the most widely used probability distributions in modelling time-to-failure data and analysing failure patterns in reliability engineering and life testing fields due to its flexibility in modelling different types of failure rates. The 3-parameter Weibull distribution extends the basic 2-parameter Weibull (Weibull, 1951) by introducing a location parameter. The probability density function PDF of this distribution is given in the equation (1).
Shape parameter (β): Describes the shape of the failure rate. β < 1: Decreasing failure rate (infant mortality). β = 1: Constant failure rate (random failures, similar to the exponential distribution). β > 1: Increasing failure rate (wear-out failures).
Scale parameter (η): Indicates the characteristic life (the point where 63.2% of items are expected to fail).
Threshold parameter (γ): Shifts the distribution along the time axis, representing a minimum time to failure.
Two-parameter exponential distribution
The exponential distribution is a simpler distribution but plays a major role in reliability analysis (Epstein and Sobel, 1953), it is well associated with representing constant failure (or repair) rates. The exponential distribution (Balakrishnan and Basu, 1995) is a special case of the Weibull distribution with a shape parameter β = 1. The 2-parameter exponential distribution introduces a threshold parameter to model the minimum time to failure. The 2-parameter exponential distribution is the most widely used distribution in the estimation of parametric performance and reliability functions. This is due to its appealing mathematical form and the properties it possesses such as its PDF shown in equation (2).
Rate parameter (λ): Defines the rate of failure, where
Threshold parameter (γ): Represents a minimum failure time.
Normal distribution
The normal distribution (De Moivre, 1738) is a symmetric, bell-shaped distribution that is widely used in statistics and reliability analysis. It is not often used to model time-to-failure data because it allows for negative values, which are not realistic for time-to-failure modelling. However, it is used in tolerance analysis and degradation modelling. The normal distribution is defined by two parameters, mean and standard deviation (Gauss, 1809) shown in its probability density function with formula (3).
Mean (μ): The central value around which the data are symmetrically distributed. It represents the average time to failure or the central tendency of the data.
Standard deviation (σ): Measures the spread of the distribution, or how much the data deviates from the mean.
Three-parameter lognormal distribution
The lognormal distribution is often used to model situations where observed data have multiplicative effects. The normal distribution is the parent distribution of the lognormal one, which is one-sided, meaning it can only deal with values in the positive range. This makes it particularly suitable for reliability analysis when modelling quantities with positive variable and greater inherent variation. The 3-parameter lognormal distribution (Aitchison and Brown, 1957; Galton, 1879) extends the 2-parameter lognormal by introducing a location parameter as determined by its PDF in equation (4).
Location parameter (μ): The logarithmic mean or location of the data.
Scale parameter (σ): Describes the spread of the distribution.
Threshold parameter (γ): Shifts the distribution along the time axis, indicating a minimum possible failure time.
Anderson-Darling test
The Anderson-Darling coefficient ADC (Anderson and Darling, 1952) is a statistic used to assess the goodness-of-fit for a dataset to a specific theoretical distribution. In reliability analysis, it is often used to test whether failure times follow a particular probability distribution, such as the exponential, Weibull, normal or lognormal distribution. The Anderson-Darling test is an improvement on the Kolmogorov-Smirnov test because it gives more weight to deviations in the tails of the distribution. This is important in reliability because early failures (in the lower tail of the distribution) or late failures (in the upper tail) can be critical in assessing system performance. The ADC measures the distance between the empirical cumulative distribution of the observed data and the cumulative distribution function of the theoretical distribution to which the data are compared.
The principle of the Anderson-Darling test is based on two hypothesis, the first one is the null hypothesis (H0), which considers that the data follow the chosen distribution and the alternative hypothesis (H1), which assumes that the data do not follow the chosen distribution. The A
2
test statistic is given by the following formula (5).
n is the number of observations; F(X i ) is the distribution function of the theoretical distribution evaluated at X i (the observation number (i) of the sorted data).
If the value of the A 2 is low, it means that the empirical distribution of the data is close to the theoretical distribution, and therefore the null hypothesis (H0) is accepted. If A 2 is high, it indicates that the data do not follow the theoretical distribution well, and the null hypothesis is rejected. The results of the Anderson-Darling test are often compared to critical values that depend on the chosen confidence level (e.g. 5%). If the calculated statistic exceeds the critical value, the hypothesis that the data follow the theoretical distribution is rejected.
The Pearson correlation test
The Pearson correlation coefficient PCC (Pearson, 1895) is a statistical measure that evaluates the strength and direction of the linear relationship between two continuous variables. In the context of reliability analyses, the PCC can be used to examine the relationship between different variables associated with the performance or failures of a system. The PCC (often denoted r) is defined by formula (6).
The interpretation of PCC is as follow, if r = 1, there is a perfectly positive linear relationship as X increases, Y increases proportionally. If r = −1, there is a perfectly negative linear relationship as X increases, Y decreases proportionally. If r = 0, means that there is no linear relationship between X and Y. The r values between −1 and 1 indicate the strength of the relationship, 0 < r < 0.3 indicates a weak positive relationship, 0.3 < r < 0.7 shows a moderate positive relationship and r > 0.7 demonstrates a strong positive relationship. This is similar for negative relationships with negative r values.
Maximum likelihood estimation MLE
The maximum likelihood estimation MLE (Fisher, 1922) is a statistical method used to estimate the parameters of a probability model from observed data. In reliability analyses, MLE is commonly used to estimate the parameters of reliability distributions (such as Weibull, exponential, lognormal, etc.), which model the failure times of systems or components. The principle of the maximum likelihood method is to find the values of the model parameters that make the observed data most probable under the assumption of this model, supposing that the failure times (t
1
, t
2
,…, t
n
) follow a certain parametric distribution, this distribution is described by unknown parameters that their estimation is wished. The likelihood function L(θ) represents the joint probability of observing the data as a function of the parameters θ to be estimated. This function is usually given by the product of the probability densities evaluated for each observed data as indicated in equation (7).
f (t i ;θ) is the density function of the distribution for each failure time ti.
The idea behind MLE is to maximize this likelihood function L(θ) with respect to the parameters θ. In other words, the principle is to look for the parameter values maximizing the probability that the observed data come from the assumed distribution. The maximum likelihood estimation (MLE) method is an essential tool in reliability analysis to estimate the parameters of failure time distributions. It is robust, flexible and can be applied to various types of data and models, but requires numerical techniques to maximize the likelihood functions in some cases.
Least squares estimation method LSE
The least squares method LSE (Gauss, 1809; Legendre, 1805) is a statistical technique used to fit a model to a set of data by minimizing the sum of squares of the deviations between the observed values and the values predicted by the model. In reliability analyses, this method is mainly used to fit linear models, as in the case of fitting the linearized Weibull distribution to estimate reliability parameters. The principle is to find the model parameters that minimize the error between the observed values and the model’s predictive values. These errors are measured by the sum of squares for deviations between the observed values y
i
and the predicted values
y
i
is the observed value (e.g. failure time, TBF);
The least squares method is often used in the context of linear regression. In reliability, it allows to fit a linear model between two variables, such as failure times and environmental factors, or to fit a linearized model as in the case of Weibull distribution. In a linear regression, the objective is to fit a model with the form of equation (9).
a and b are the parameters to be estimated; x is the independent variable; y is the dependent variable.
The least squares is a useful technique for estimating the parameters of linear or linearized models in reliability analyses, although simple to use and effective in many cases, it has limitations when dealing with censored data or modelling nonlinear relationships.
Akaike and Bayesian information criterions AIC and BIC
The Akaike information criterion AIC and the Bayesian information criterion BIC are both statistical metrics used to compare and evaluate different models (or distributions) fitted to a dataset. They are particularly useful in model selection, as they help balance model fit with model complexity, guiding you toward choosing the best model.
The purpose of Akaike information criterion AIC (Akaike, 1974) is to measure the relative quality of a statistical model for a given dataset based on equation (10). It evaluates the trade-off between the goodness-of-fit of the model and its complexity. The model with the lowest AIC is generally preferred because lower AIC values indicate a better trade-off between model fit and complexity. It’s typically used in the case of having several competing models for the same dataset and wanting to find the best model in terms of fit without over fitting.
The Bayesian information criterion BIC (Schwarz, 1978) is also used to select among models by evaluating the goodness-of-fit versus the complexity of the model following equation (11). However, it introduces a larger penalty for models with more parameters, making it more conservative. Since BIC penalizes model complexity more heavily than AIC, it tends to favour simpler models when the dataset size n is large. The model with the lowest BIC is considered the best and the lower BIC values suggest a better fit with fewer parameters. The BIC is often preferred when dealing with larger sample sizes or wanting to avoid over fitting even more strictly than AIC allows.
k is the number of estimated parameters in the model; n is the number of data points (sample size); L is the log-likelihood of the model.
Both criteria are used for model comparison, but they differ in how they penalize the number of parameters. While AIC focuses more on the goodness-of-fit and allows for more complex models by penalizing less for extra parameters, BIC penalizes complexity more strongly, favouring simpler models particularly with larger datasets. Therefore, AIC is more suitable when caring more about prediction accuracy rather than model simplicity. On the other side, BIC is better when the concern is about over fitting and wanting to choose a more parsimonious and simpler model.
The wind turbine case study
Definition of the studied wind turbine components
A wind turbine is a device that converts the kinetic energy of the wind into mechanical energy, which is then transformed into electricity. This transformation is made possible by mechanical and electrical components that work in synergy to capture the movement of the wind and convert it into usable energy. Wind turbines are mainly used in wind farms to produce electricity in a renewable way. Modern wind turbines are composed of several key components, which all have an important role to play in converting wind energy into usable power. Traditionally, as shown in Figure 1 these components more commonly included rotor blades, shaft, gearbox, generator, electrical converter, yaw drive, nacelle, tower, and hub. In this study our main focus will be on the three critical components responsible for the conversion of the wind power into electric power, which are the shaft, the gearbox and the generator. Each component plays a crucial role in the efficiency and overall performance of the wind turbine. The technical specifications of the studied wind turbine are listed in Table 1. Wind turbine main components. The wind turbine technical sheet.
The shaft
The shaft of a wind turbine is the mechanical part that connects the hub of the blades to the gearbox. There are generally two types of shafts in a wind turbine, the low-speed shaft and the high-speed shaft. The function of the slow shaft is to transmit the rotational energy of the blades, which rotate at low speed to the gearbox, while the fast shaft receives the high-speed energy from the gearbox and transfers it to the electric generator. The shaft must be able to withstand significant mechanical loads and wind-induced torsional forces.
The gearbox
The gearbox is a mechanism composed of gears that increases the rotational speed transmitted by the low-speed shaft to the levels necessary for the generator to produce electricity efficiently. It converts the slow rotation of the main shaft (generated by the movement of the blades) into a much faster rotation sufficient to operate the generator allowing a more efficient conversion of wind mechanical energy into electrical energy, since most generators require higher rotational speeds to operate properly. The gearbox must be robust, able to withstand fluctuations in wind speed, and often requires regular maintenance due to mechanical constraints.
The generator
The generator of a wind turbine is the component that transforms the mechanical energy of rotation into electrical energy. It converts the rapid rotation of the shaft (after passing through the gearbox) into electric current. It converts the mechanical energy transmitted by the high-speed shaft into electricity via electromagnetic induction. The generator uses magnets and coils of wire to create an electrical voltage when the internal components are rotated. There are mainly two types of generators used in wind turbines, the induction generator and the synchronous generator, these two types of generators must be sized to efficiently manage the expected powers according to the size of the wind turbine. Some generators may also include regulation systems to maintain the frequency and output voltage.
The case study
TBF historical data of the wind turbine components.
Best-fit statistical distribution identification
Goodness-of-fit of MLE and LSE methods for TBF-shaft, gearbox, and generator.
First thing to notice in Table 3 is that there is a difference between the results of the Anderson-Darling coefficient ADC for the MLE and LSE methods in all the statistical distributions of the three components, this is due to the difference in calculations approaches between MLE, LSE and to the fundamental differences in how these two estimation methods fit the distribution to the data. The cells with “*” indicate the lowest ADC for the MLE and LSE methods and the highest LSE Pearson correlation coefficient PCC for the different statistical distribution of each component (shaft, gearbox, and generator).
In the case of the gearbox and the generator, there is no problem in choosing the best statistical distribution because both ADC for the MLE and LSE methods and PCC for LSE fall in the favour of the same statistical distribution for each of the two components (3-parameter lognormal for the gearbox and 3-parameter Weibull for the generator). For the gearbox, the lowest ADC for the MLE (0.624) and LSE (0.561) methods and the highest PCC for LSE (0.995), also the good alignment of the points in the probability plot for TBF-Gearbox in Figure 2 fall in the favour of the same statistical distribution (3-parameter lognormal) to model the TBF-Gearbox data. For the generator, the lowest ADC for the MLE (0.777) and LSE (0.699) approaches and the highest PCC for LSE (0.994) along with the graphical results from the probability plot for TBF-Generator in Figure 3 reveal the same statistical distribution (3-Parameter Weibull) to represent the TBF-Generator dataset. The 3-parameter Weibull model effectively assesses the lifetime distribution of key component like generator with the location parameter playing a significant role (Han et al., 2023). However, for the shaft, the lowest ADC for the MLE (0.666) and LSE (0.646) methods are indicating equally the normal and the 3-parameter lognormal distributions, which are both the best to fit our dataset, also the probability plot for TBF-Shaft in Figure 4 provides visual insight into the goodness-of-fit for the two early mentioned distributions. On the other side, the highest PCC for LSE (0.995) determines the 3-parameter Weibull distribution as the best one to represent the TBF-Shaft data. Apart from these previously mentioned distributions, the 2-parameter exponential is out of context with extremely high ADC from MLE and LSE methods. The ADC for the MLE and LSE methods indicates the normal and the 3-parameter lognormal distributions; however, the PCC for LSE points to the 3-parameter Weibull distribution. This presents a challenge in selecting the most appropriate distribution, but, due to the fact that the ADC from both MLE and LSE methods consistently indicate particular distributions for the shaft (the normal and the 3-parameter lognormal distributions), these distributions should be given more weight, as it likely better represents the full dataset, including the tails, which are important in reliability analysis. It is essential to note that the Anderson-Darling test is more sensitive to deviations in the tails of the distribution than other tests (Anderson and Darling, 1952; Stephens, 1974), making it particularly useful for reliability analysis, where extreme events (early or late failures) are critical. Finally, to decide which distribution from (3-parameter Weibull, normal, and 3-parameter lognormal) is the best to fit the TBF-Shaft data, an Akaike and Bayesian information criterions AIC and BIC are performed for the four distributions to solve this contradiction and pick the best distribution to model the time between failures TBF data of the shaft. It was decided to run AIC and BIC for the four distributions to demonstrate that these metrics indicate also that the 2-parameter exponential distribution is out of context with significantly high AIC and BIC values compared to the other distributions’ outcomes. Probability plot for TBF-gearbox (h). Probability plot for TBF-generator (h). Probability plot for TBF-shaft (h).


AIC and BIC values of the four distributions for TBF-Shaft.
PDF, survival, and hazard functions analysis
After illustrating the appropriate distribution from Tables 2 and 3 and Figures 2, 3 and 4, the distribution overview plots are displayed including the probability density function, the survival function, and the hazard function for the time between failures of the gearbox, the shaft, and the generator in Figures 5, 6 and 7, respectively.
In the distribution overview plot for the time between failures TBF of the wind turbine shaft in Figure 6, using a normal distribution based on maximum likelihood estimation MLE with the following parameters (mean = 2493.04 and standard deviation = 534.068), the probability density function PDF shows the likelihood of different TBF values occurring, and it is bell-shaped, indicating the distribution of time between failures. It peaks at the mean of 2493.04 hours, suggesting that failures are most likely to occur within (2000 h −3000 h) time range, early failures (before 1500 h) are less likely but still possible, while late failures are rarer but may have significant consequences. The survival function represents the probability that the wind turbine shaft will survive beyond a certain number of operating hours. The survival curve starts at 100% and steadily declines over time, indicating that the probability of surviving without failure decreases as hours increase. By approximately 3750 hours, the survival probability is close to 0, indicating that the shaft will have failed by this point. This curve is useful for predicting the remaining life of the shaft. For example, at approximately 2493.04 hours, the survival probability is at 50%, indicating that there is 50% chance that the shaft is expected to have failed by then. The hazard function shows the instantaneous failure rate or the risk of failure at any given point in time, given that the shaft has survived up to that point. It starts low and gradually increases with time reflecting wear-out behaviour as the shaft approaches the end of its operational life. After around 2000 hours, the hazard function increases more rapidly, indicating that the shaft is more likely to fail as operating time increases and it becomes more prone to failure with age. Distribution overview plot for TBF-gearbox (h).
In the distribution overview plot for the time between failures TBF of the wind turbine gearbox in Figure 5, using a 3-parameter lognormal distribution based on maximum likelihood estimation MLE with the following parameters (location = 5.14494, scale = 1.30666, and threshold = 1015.76), the PDF curve is highly skewed to the right, indicating a sharp peak at the beginning and a long tail extending toward higher times indicating that failures are most likely to occur within a very short timeframe after the gearbox has started operating, with the peak probability of failure between 1015.76 and 1187.32 hours. The long tail indicates a rare chance that the gearbox might last much longer without failure. This type of distribution suggests a relatively high rate of early-life failures, also known as ‘infant mortality’ failures, where most failures occur shortly after installation. Once a gearbox survives the early critical period, it is less likely to fail in the short term. The survival function drops sharply during the first few hundred hours, indicating a great probability that the gearbox fails early on. After around 1500 hours, the survival probability levels gradually decrease, implying that if the gearbox survives the early period, it has a higher likelihood to continue operating due to the failure probability, which reduces significantly. The hazard function starts high and quickly drops after about 1100 hours. This is a classic pattern for early-life failures, where the hazard is high initially and decreases over time. The gearbox exhibits infant mortality behaviour, where the failure rate is highest early in its life and decreases as time goes on. Distribution overview plot for TBF-shaft (h).
In the distribution overview plot for the time between failures TBF of the wind turbine generator in Figure 7, using a 3-parameter Weibull distribution based on maximum likelihood estimation MLE with the following parameters (shape = 1.44427, scale = 1112.19, and threshold = 2727.43), the probability density function has a clear peak around 3271 hours and is skewed to the right, which characterize a Weibull distribution with a shape parameter greater than 1. Failures are most likely to occur between 2727 and 3271 hours. After this peak, the probability of failure decreases as the time between failures increases. The right skew indicates that a small probability that the generator may experience much longer lifetime. The distribution suggests that most wind turbine generator failures are concentrated in the mid-life period, with relatively fewer early or very late failures. This pattern is common in systems that undergo wear and tear, leading to increased failure rates over time. The survival function probability declines steadily, reaching about 50% around 3590 hours. By 5000 hours, the survival probability is very low, indicating a higher chance that the generator is expected to fail by this time. This indicates that the expected operational life of the generator is centred before 3000 hours, with a great probability of generator failing by 5000 hours. The hazard function starts low, increases steadily and continues to rise as time progresses. This pattern suggests that the failure rate increases with time slowly and more gradually reaching a small hazard rate levels over time, indicating that the generator is slightly more prone to failure after around 3500 hours of operation. This tendency points to an arbitrarily failure trend, in which malfunctions happen regardless of the element’s age or operating duration. Distribution overview plot for TBF-generator (h).
The joint survival and hazard functions assessment
To conduct a well performed comparison between the reliability results of the three components, the survival and hazard functions of the shaft, the gearbox and the generator are displayed in the same graphs in Figures 8 and 9, respectively, to have a better view on the differences of the parts’ survival and failure behaviours. Survival plot for TBF-shaft, gearbox, and generator (h). Hazard plot for TBF-shaft, gearbox, and generator (h).

The survival plot in Figure 8 shows the time between failures TBF in hours for the shaft, gearbox, and generator of a wind turbine, the Y-axis represents the survival percentage (probability that a component has not yet failed at a given time), and the X-axis represents time (TBF). The shaft has a steep survival curve, indicating a moderately lower reliability and shorter time between failures. The survival probability starts decreasing rapidly around 1750 hours and approaches 0% survival probability by approximately 3750 hours. This indicates that the shaft experiences frequent failures and fails quickly after a certain time. The gearbox survival function declines earlier than the shaft (around 1000 hours), meaning its failures occur even earlier, but it declines more gradually, the survival function reaches 0% by about 4600 hours. This behaviour shows that the gearbox has the shortest overall lifetime but fails more gradually compared to the shaft. Therefore, while the gearbox has lower reliability initially, its failure distribution spreads out over time. The generator shows the slowest decline in survival probability, indicating higher reliability and longer time between failures. Its survival curve begins to drop noticeably around 2750 hours, with survival probabilities extending up to 5000-5500 hours reaching 0% of survival probability by about 6750 hours. This suggests that the generator is the most reliable component of the three.
The hazard plot in Figure 9 shows the failure rates of the shaft, gearbox, and generator over time, where the Y-axis represents the instantaneous failure rate (hazard rate), and the X-axis represents the time between failures TBF. Each curve corresponds to one of the three components (shaft, gearbox, and generator), and they provide insights into the failure behaviour of these components. The hazard rate for the shaft increases steadily over time. Initially, it starts at a low value and then grows linearly and consistently, particularly after approximately 2000 hours, this behaviour indicates a wear-out failure pattern, where the failure rate increases as the shaft ages. The increasing hazard rate suggests that the shaft becomes progressively more unreliable over time due to fatigue, wear or gradual damage accumulation. For the gearbox, the hazard rate is high initially, peaking within the first 1100 hours, and then decreases gradually before stabilizing at a low rate, this pattern is typical of early failures (infant mortality phase), where defects or improper installation cause failures early on. Once the early failures are resolved, the hazard rate stabilizes at a relatively low value, indicating improved reliability after the initial phase. The generator’s hazard rate starts low and remains relatively constant throughout the observed time range, a slight increase in the hazard rate around 2700 hours is noticed, but the rate remains much lower compared to the shaft and gearbox. This behaviour suggests a random failure pattern, where failures occur independently of the component’s age or time in operation. The generator demonstrates a high level of reliability compared to the other components, as the failure rate does not increase significantly over time. All this indicates that the generator follows a constant failure rate pattern, consistent with a highly reliable component where failures are random and not strongly tied to operational age or time.
Failure and survival probabilities examination
Characteristics of distribution for TBF-shaft, gearbox, and generator.
Failure and survival probabilities for TBF-shaft, gearbox, and generator.
The interquartile range IQR is a statistical measure that describes the spread of the middle 50% of the data used to understand the dispersion or variability in the life data, focusing on the central part of the distribution, it is the range between the 25th percentile (first quartile Q1) and the 75th percentile (third quartile Q3) (Wilder, 1977). The IQR for the shaft determines that the survival probability for this part drops from 75% to 25% in 720.447 h, which is approximately two times bigger than the one for the gearbox (343.099 h) and lower than the IQR for the generator (925.041 h), which is the highest one. A larger IQR suggests more variability in the failure times, meaning that the part life distribution is more spread out (Peck and Short, 2019), this is the case of the generator in first place and the shaft in second place. While a smaller IQR indicates that most failures happen within a relatively short timeframe that is the case of the gearbox.
Reliability analysis of the wind turbine
The operation of the electricity generation from the mechanical rotation of the blades depends on the proper functioning of the three main components of the wind turbine, the shaft, gearbox, and generator. The reliability of the system R
wt
(t) after an operating time (t) is a function of the reliability of each of the components (R
S
, R
gr
, and R
gn
) constituting it. Since the wind turbine’s system is implemented in series as shown by Figure 10, its reliability function is given by the equation (12). Simplified functional block diagram of the wind turbine.

R i (t) is the reliability of the i th component as a function of time t.
Since the wind turbine system is composed from three critical parts (the shaft, gearbox, and generator) that are implemented in series, the system’s reliability R
wt
(t) can be calculated through the following formula (13).
Reliability calculations of each component and the whole system.

Reliability plot for the shaft, gearbox, generator, and the wind turbine.
The shaft maintains high reliability initially but gradually declines over time, its reliability decreases to 0.7084 at 2200 hours, and by 4000 hours, it drops to 0.0024. This trend suggests the shaft is more reliable than the gearbox but less reliable than the generator over the observed time. The gearbox experiences the fastest reliability decay as by 1200 hours, its reliability drops below 0.5 and reaching 0.0144 at 4000 hours. This demonstrates that the gearbox is the least reliable component and a critical contributor to the wind turbine’s overall reliability decline. The generator maintains the highest reliability throughout; it only starts to significantly decrease after 3000 hours, with reliability still above 0.5 until 3500 hours. This indicates the generator is the most reliable component and has the least impact on system failure in early time intervals. The overall wind turbine reliability follows a compounded decline due to the series configuration. The turbine reliability is the product of all three components, initially starting with 0.9974 at 1000 hours. However, it drops drastically as the gearbox reliability decays as seen from Figure 11, the purple curve of the wind turbine reliability R wt follows almost perfectly the red one of the gearbox R gr , by 2000 hours, the wind turbine reliability is just 0.0745, and it further declines to 0.0000102 by 4000 hours, almost entirely due to the gearbox failure. Assuming the gearbox reliability is perfect (R gr = 1), the combined impact of the shaft and generator is isolated. Under this assumption, the reliability (R s *R gn ) is significantly higher compared to the overall wind turbine system R wt , at 2200 hours, the hypothetical reliability (0.7084) is about 14 times higher than the actual turbine reliability (0.04933). At 4000 hours, the hypothetical reliability (0.000709) is almost 70 times higher than the wind turbine reliability (0.0000102). Figure 11 shows that the orange curve of the shaft/generator reliability (R s *R gn ) follows almost entirely the blue curve of the shaft reliability R s , indicating that this hypothetical reliability is entirely influenced by the shaft reliability, while the generator reliability flows solely away from the other curves. Summarizing the previous results, the gearbox is the most critical reliability bottleneck in the wind turbine system, its early and rapid decline causes a cascading effect on the overall reliability, so, if its reliability were perfect (R gr = 1), the system’s reliability would be primarily influenced by the shaft, which significantly improves reliability at all times. Finally, the generator, being the most reliable component, has the least impact on reducing system reliability, even when paired only with the shaft. Also it is important to note that the system’s series configuration amplifies the effects of the least reliable component; therefore, any improvement in gearbox reliability would exponentially improve the turbine’s overall reliability.
Results, discussion, and recommendations
The statistical distribution analysis reveals clear outcomes for the gearbox and generator but a more complex case for the shaft. For the gearbox, both ADC (from MLE and LSE) and PCC (from LSE), supported by the probability plot, consistently confirm the 3-parameter lognormal distribution as the best fit. Similarly, for the generator, all statistical indicators (lowest ADC values, highest PCC and graphical evidence) unanimously support the 3-parameter Weibull distribution. These results align with existing literature that highlights the Weibull model’s strength in capturing generator lifetime characteristics. In contrast, the shaft presents conflicting results. While ADC from both MLE and LSE methods suggest that the normal and 3-parameter lognormal distributions provide the best fit, the PCC for LSE points instead to the 3-parameter Weibull distribution. This inconsistency underscores the difficulty of accurately modelling the shaft’s TBF data. Importantly, the 2-parameter exponential distribution was consistently ruled out due to poor performance across all metrics. Given the sensitivity of the Anderson-Darling test to tail deviations, which is critical in reliability analysis where extreme failures are significant, greater weight was placed on normal and 3-parameter lognormal fits. To resolve this contradiction, Akaike and Bayesian information criterions were applied across the candidate distributions. The results decisively indicated that the normal distribution yields the lowest AIC and BIC values, confirming it as the most appropriate model for the shaft’s TBF data. This multi-step approach combining MLE, LSE, ADC, PCC, AIC, BIC, and graphical analysis ensures robust selection of statistical distributions, strengthening the reliability analysis of wind turbine components.
The wind turbine components’ parametric modelling and optimal distribution selection leads to the following outcomes. The shaft exhibits a classical wear-out failure pattern, evidenced by a steadily rising hazard rate over time. This progression highlights the dominance of degradation mechanisms such as fatigue accumulation, surface erosion, and crack propagation in driving its failures. Reliability analysis reveals a critical deterioration phase, with survival probability dropping sharply after 1750 operating hours and failure probability of 25% beyond 2800 hours. Such behaviour underscores the importance of life-cycle management and targeted intervention. To counteract this decline, a two-pronged strategy is recommended. From a maintenance perspective, proactive inspections should begin around 2500 hours, employing advanced diagnostic techniques such as vibration monitoring, ultrasonic testing, and shaft alignment assessments, to detect early-stage fatigue or imbalance. From a design perspective, extending shaft longevity may be achieved through the use of higher fatigue-resistant materials (e.g. alloy steels with surface treatments) and structural refinements such as optimized fillet radius or surface hardening, which can reduce localized stresses and delay crack initiation. Given the shaft’s predictable wear-out phase, scheduled replacement or overhaul around 2800 hours presents a cost-effective alternative to unplanned corrective maintenance, thereby mitigating the risk of catastrophic failure and excessive downtime. In contrast, the gearbox demonstrates a markedly different reliability trajectory, characterized initially by infant mortality failures with an elevated hazard rate in the first 1100 hours. This pattern indicates susceptibility to early defects such as misalignments, assembly errors, or material flaws. However, once this burn-in phase passes, the failure rate stabilizes, suggesting improved consistency over time. This reliability profile supports a burn-in and monitoring approach, emphasizing rigorous early-stage interventions. Recommended practices include factory level burn-in testing to eliminate weak units prior to field deployment, supplemented by non-destructive evaluation techniques (e.g. acoustic emission testing, oil debris monitoring) during commissioning, because any key component fails in the gearbox, it may result in high cost of maintenance and high production loss and may take longer time to repair (Zhu and Li, 2018). During the first 1000–1100 hours, intensive condition monitoring, leveraging SCADA-based analytics, should be applied to capture abnormal vibrations, load fluctuations, or temperature spikes indicative of underlying defects. From a design standpoint, improvements may include optimized gear tooth profiles for better load distribution, advanced lubrication strategies to reduce early wear, and the use of higher-capacity bearings such as tapered or spherical roller types, as supported in recent literature (Gbashi et al., 2024). Beyond the initial failure-prone phase, the gearbox shifts toward a more predictable linear degradation trend, enabling performance based on predictive maintenance rather than fixed time driven interventions. Finally, the generator emerges as the most reliable component, with a relatively flat hazard function and extended mean time between failures MTBF. Its reliability pattern indicates that failures are largely stochastic in nature, with minimal dependence on cumulative operational stress. Despite this robustness, long-term reliability requires structured maintenance planning. Periodic inspections, supplemented by condition-based tools such as thermal imaging, voltage imbalance monitoring and load fluctuation tracking, which are recommended to detect subtle anomalies in stator windings, rotor balance or insulation degradation before they escalate. Proactive interventions should be scheduled around 3000 hours, prior to the 3500 hours mark where failure probability accelerates toward 50%, to avoid unexpected downtime. From a design perspective, advances in generator cooling systems, insulation technologies and magnetic material resilience can further fortify its reliability under highly variable wind and weather conditions. Taken together, these findings illustrate the necessity of differentiated reliability strategies tailored to component-specific failure modes. The shaft requires structured end-of-life planning, the gearbox demands early-phase defect management, while the generator will benefit from relatively minimal but well-timed interventions. Embedding these insights into a reliability-centred maintenance framework not only minimizes unplanned downtime and operational costs but also maximizes system longevity. Moreover, the outcomes provide actionable guidance for future design optimization, emphasizing the integration of fatigue-resistant materials, stress-mitigating structural enhancements and advanced fault detection technologies to achieve a more resilient and efficient wind turbine system.
The study demonstrates that the proposed framework provides a robust and systematic approach to wind turbine reliability analysis by addressing both statistical and physical aspects of component failure. The analysis confirmed that the gearbox and generator failure data are best represented by the 3-parameter lognormal and 3-parameter Weibull distributions, respectively, while the shaft required a more detailed investigation due to contradictory indications from ADC and PCC. By introducing AIC and BIC as complementary evaluation metrics, the normal distribution was ultimately identified as the most suitable model for the shaft, highlighting the value of multi-criteria assessment in resolving statistical inconsistencies. This outcome not only strengthens the reliability of distributional fitting but also links statistical evidence with degradation mechanisms, ensuring more accurate representation of component behaviour. Moreover, the implementation of a Python-based tool for AIC/BIC computation adds reproducibility and flexibility, allowing the framework to be extended to other components or datasets, thus reinforcing its practical significance for maintenance planning and system reliability improvement.
Limitations and challenges
Data quality and availability is a major issue in the field of wind turbines reliability analysis. Therefore, the reliability analysis in this study are heavily depend on the quality and quantity of the time between failures TBF data for the three components (shaft, gearbox, and generator), which is limited to these three part and falls short in term of quantity (the sample size for the shaft is 25, for the gearbox is 30 and for the generator is 21). This may affect the accuracy of selecting the best-fit statistical distribution due to lack of data and could impact the assessment of the wind turbine reliability because of the scarcity of statistics for other components (blades, yaw drive, bearings, etc.) along with the quality and diversity of data. Consequently, a real time data generated by sensors or a public data base of metrics about wind turbines would be very beneficial to the field’s advancements in term of RAM analysis.
Choosing the most appropriate statistical distribution for each component’s TBF data (shaft, gearbox, and generator) is complex and subject to variability based on the dataset’s characteristics. The studied distributions (3-parameter Weibull, 2-parameter exponential, normal, and 3-parameter lognormal) are well known, but they are not the only ones that can model the TBF data for the wind turbine’s parts. This is why an investigation of other statistical distributions could enhance the TBF data modelling process and selecting the best-fit distribution that align perfectly with the actual failure behaviour of each component.
The study’s findings and distribution selection are based on specific data, tests and results, which may not generalize to all wind turbines or operating environments. Variability in wind speeds, loads and maintenance practices can limit the study’s applicability. Therefore, integrating external factors in the research such as material fatigue, extreme weather or manufacturing defects can lead the reliability analysis of wind turbines to another level.
Translating reliability results into actionable maintenance strategies is complex due to the fact that optimal intervals for preventive maintenance depend not only on failure probabilities but also on cost considerations, which are outside the scope of this study. Also the system’s series configuration is affecting its reliability badly because the component with the lowest reliability, which is the gearbox, significantly impacts the overall performance. The possibility of implementing the system in parallel is not up for discussion due to the limited size of the nacelle, the position of the wind turbine and the components’ high cost, because one of the most expensive and critical component in wind turbines is the gearbox (Salem et al., 2017). Therefore maintenance strategies, periodic replacements, quality control actions through initial testing or burn-in periods are the best way to enhance the reliability of the wind turbine and should be implemented with a great regard for cost consideration.
Conclusion
This study provides a reliability analysis of a wind turbine system, comprising the shaft, gearbox, and generator, with a focus on the time between failures TBF for each component. The analysis demonstrated that the reliability characteristics of each component are markedly different, with the gearbox exhibiting the lowest reliability and highest failure rates compared to the shaft and generator. The wear-out failure trend of the shaft is evident, with failure rate gradually rising over time, while, infant mortality problems dominate the gearbox, but it quickly stabilizes, demonstrating increased reliability beyond the early phase, and the most reliable part is the generator, which exhibits stable reliability and random failures with noticeably longer TBF readings and a slower survival decrease. This finding highlights the gearbox as the critical failure-prone component within the system, significantly influencing the overall reliability of the wind turbine because of its series configuration meaning that the reliability of the entire system is dictated by the weakest component. Consequently, improving the reliability of the gearbox would yield the greatest impact on extending the operational lifespan and reducing downtime for the wind turbine. The survival and hazard analysis showed how each component’s reliability degrades over time, providing key insights for designing proactive maintenance schedules and optimizing system performance.
To model the TBF data, this study evaluated four widely used statistical distributions (3-parameter Weibull, 2-parameter exponential, normal, and 3-parameter lognormal). The selection of the most appropriate distribution was based on a comprehensive goodness-of-fit analysis using the Anderson-Darling and Pearson correlation coefficients. These parameters estimation was performed utilizing the maximum likelihood estimation MLE and the least squares estimation LSE methods. The MLE method was particularly effective in estimating parameters due to its ability to maximize the likelihood of the observed data, while the LSE provided a comparative basis by minimizing the squared errors. The use of both MLE and LSE introduces challenges in balancing computational complexity with accuracy because MLE is sensitive to initial parameter estimates and can be computationally intensive, while LSE might not always provide the best fit for skewed distributions. Also the reliance on Anderson-Darling and Pearson correlation tests to determine the best-fit distribution has its limitations, as both tests are influenced by sample size and the underlying data distribution, these tests may produce conflicting results like in the case of the shaft, requiring subjective judgement to determine the final selection. In this context, the algorithm of calculating the Akaike and Bayesian information criterions AIC and BIC is created to assist the final decision about the best-fit statistical distribution to represent the shaft’s TBF data. Among the tested distributions, the 3-parameter Weibull, the normal, and 3-parameter lognormal distributions consistently provided the best fit, reflecting their flexibility in accommodating the skewness and dispersion commonly observed in failure data.
This study underscores the critical role of statistical modelling in reliability analysis. By leveraging advanced statistical methods and goodness-of-fit evaluations, this research identified suitable models that can serve as predictive tools for maintenance decision-making and risk assessment. It presents a systematic framework for wind turbine reliability analysis that integrates both statistical rigour and physical failure understanding. The results demonstrated that the gearbox and generator are best modelled by the 3-parameter lognormal, and 3-parameter Weibull distributions, while the shaft required a more nuanced assessment due to conflicting outcomes across different statistical tests. By introducing AIC and BIC as decisive criteria, the normal distribution was identified as the most appropriate fit, thereby resolving the contradiction and enhancing confidence in the reliability modelling process. Importantly, this approach advances beyond conventional practices by systematically reconciling statistical inconsistencies, explicitly linking distributional evidence with degradation mechanisms, and enabling reproducibility through a Python-based tool. Collectively, these contributions not only strengthen theoretical reliability analysis but also provide practical guidance for targeted maintenance strategies, ultimately supporting improved performance, reduced downtime and extended lifespan of wind turbine systems. The findings emphasize the need to focus on the gearbox for reliability improvement, as its failure behaviour substantially impacts the overall system reliability. Future research should consider integrating other operational factors, such as environmental conditions, variable loading and wear patterns, into the reliability models. Additionally, exploring cost-optimization frameworks for maintenance and replacement decisions, as well as incorporating modern machine learning techniques for reliability prediction, would further enhance the applicability and accuracy of such analysis. Ultimately, this study provides a robust framework for reliability assessment and statistical modelling that can be adapted to other complex mechanical systems beyond wind turbines.
Supplemental Material
Supplemental Material - Improving wind turbine reliability through the analysis of critical component failures, parametric modelling and maintenance optimization
Supplemental Material for Improving wind turbine reliability through the analysis of critical component failures, parametric modelling and maintenance optimization by Youssef Sadraoui, Mohamed Er-Ratby, Moulay Saddik Kadiri, Abdessamad Kobi in Wind Engineering.
Footnotes
Acknowledgments
We would like to thank the editors and reviewers for their valuable feedback to improve the quality of this paper.
Author contributions
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data will be made available upon reasonable request.
Supplemental Material
Supplemental material for this article is available online.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
