Abstract
The different operating conditions of wind turbines pose great challenges for efficient and reliable fault detection. Therefore, a good analysis of wind turbine data is essential in assessing the state of the wind turbines, since the traditional threshold cannot provide a timely warning as it indicates that the malfunction has already occurred. This paper presents a new method for analyzing the actual data of the turbines, using aggregated model consisting of the neighborhood comparison method, K-means clustering and decision tree model to diagnose faults. The wind speed of the adjacent turbines is compared with each other, then other parameters of the same wind speed are also compared with each other. The purpose of comparison is that, the wind turbines which are similar in wind speed are similar in performance as well. This approach helps us to discover the abnormal data for turbine performance with in the normal operating range. The abnormal performance of any turbine destroys the similarity relationship between its data and the neighboring unit’s data. The main advantage of this approach is the possibility to detect the beginning of abnormal performance in real time, a case study using real SCADA data is used to validate this approach, which demonstrates its effectiveness and advantages.
Keywords
Introduction
With the increasing of industrial activities, the demand for energy increased dramatically, which leads to a strong trend toward renewable energy sources. As of 2016 Wind powered systems provided more than 420 gigawatts, and this is expected to rise to more than 1000 gigawatts in 2030 (Global Wind Energy Council, 2011). In some countries, such as Denmark, Wind turbines are a major part of the national energy networks (Pérez et al., 2013).
As wind turbine production capacity increases, wind turbines have become a competitor to conventional energy in economic terms (IRENA, 2017). The maintenance and operation costs which affect the economic competitiveness of wind turbines, made the presence of a strong predictive maintenance indispensable.
Considering that the conventional strategies for maintenance revolves around periodic or preventive maintenance, which does not take into account the real operational state of the turbines as a factor of time (Garcia et al., 2006), condition-based maintenance strategies, which rely on early fault detection, become our best option.
There are many ways of detecting faults, the most important ones are: signal-based techniques (Liu, 2017), knowledge-based techniques, and model-based techniques. Signal-based fault detection techniques depend on the analysis of the spectrum components of measured signals (Kolios et al., 2014). The fault detection process analyzes the measured signals such as vibration, noise, acoustic sound, pressure or bases on the analytical parameters to generate the faulty symptoms (Chong, 2011).The model-based techniques need a precise model of the system to simulate real process behavior. Knowledge-based technologies use intelligent methods such as neural networks (Luengo and Kolios, 2015; Miljković, 2011). In fact, a knowledge-based technology is a computer-based system, which uses and generates knowledge from data, information and knowledge. These systems are capable of understanding the information under process and can take decision based on the residing information/knowledge in the system whereas the traditional computer systems do not know or understand the data/information they process (Sajja and Akerkar, 2010).
With the using of SCADA systems in most wind farms, knowledge-based fault detection techniques are one of the most appropriate and cost-effective method. SCADA systems receive a huge amount of operational data regarding turbines via sensors and measuring devices in the turbines that analyze the state of the wind turbine (Liu et al., 2012).
The biggest challenge now is detecting faults early before reaching critical levels. So, the researchers all over the world try to use SCADA data to early detect the faults. Chang Sun and Peng Guoused believed in the fact that “the operating status and measurement parameters of similar units with similar wind resource are similar” to identify the obscure anomaly data hidden in the normal operation data, and used the LS-SVM method to establish the neighbor model which reflects the similarity relation (Sun and Guo, 2017).
Hong Wang and others used a k-means clustering to utilize general multi operation condition partition scheme to segment the whole operation into multiple sub-operation conditions, and then build a normal turbine behavior model for each sub-operation condition. For normal behavior modeling, an optimized deep belief network is proposed (Wang et al., 2019).
Zhou et al. (2017) used the Adaptive neuro-fuzzy Inference System to establish the fault early warning and diagnosis model. I.
Abdallah et al. (2018) introduced an ensemble Bagged decision tree classifier trained on a dataset from an offshore wind farm comprising 48 wind turbines, and used it to automatically extract paths linking excessive vibrations faults to their possible root causes.
Cheng Qiang (2019) proposed a failure diagnostic algorithm based on Gradient Boosting Decision Tree framework, to recognize acoustic emission source in wind turbine drive train also to diagnose the components operational condition.
Hsu et al. (2020) Supplied statistical process control and machine learning techniques to diagnose wind turbine faults, and predict maintenance needs, by analyzing 2.8 million sensor data collected from 31 wind turbines from 2015 to 2017 in Taiwan. They clustered and classified the failure types of wind turbines by using Pareto analysis, scatter plots, and the cause and effect diagram.
This paper mainly contributes to the following:
(1) Defining an easy and clear reference to evaluate the various parameters of the wind turbine that can be compared at all times. This reference is characterized by:
(a) The use of the neighboring turbine parameters.
(b) Using the mode of actual data in real time, which shows the effect of time on turbine performance.
(2) Finding an easy method to monitor and detect failure indicators, this method is represented by:
(a) The use of the Statistical mode to compare the different parameters of wind turbines.
(b) The use of an aggregated model of K means clustering algorithm and the decision tree with adjacent turbines to compare the turbine parameters.
(3) Detecting gearbox faults earlier than SCADA monitoring system.
(4) Visualizing the abnormal behavior of the gearbox.
(5) Presenting a suitable model to be used in all wind turbine systems.
(6) A real case of wind turbine was used to evaluate the performance of the proposed health monitoring approach, using SCADA data for multiple wind turbines.
Problem description
The sudden failure of wind turbine components is considered one of the biggest problems facing operators, due to the economic losses and production stoppages, so the direction of research was to try to predict failure and early detect the faults, and despite the operating parameters of the turbines are recorded in the SCADA system, the benefit with this data has not yet reached the required level. from the analysis of these data and the practical follow-up of the turbines, it was found that most failures in any of the turbine components is preceded by indications of a decrease in the efficiency of this component in a gradual manner, meaning that failures do not occur suddenly in most cases, and although there are thresholds for the parameters related to the performance of the turbine, these thresholds are indicators of failure appearance, that is, reaching that threshold means that the state of the failure has already appeared.
As for the indicators of low efficiency and the beginnings of the failure, they are not used because there is no clear reference to compare the performance of the various components of the turbine. In wind farms, groups of identical turbines are installed, which means that these turbines, when exposed to the same operating conditions (wind speed, direction, ambient temperature, terrain, etc.) the performance of the components of these turbines will be very similar - called the neighbor units-.
Accordingly, comparing the performance of these components together leads to the discovery of anomalies in the performance of any component, which contributes to discovering the failure at the beginning of its occurrence. The data of 142 turbines was compared from December 2018 to February 2019 with a focus on comparing wind speed, energy Produced and gearbox oil temperature to detect failures.
Methodology
Figure 1 shows the methodology implemented in this paper which is summarized in the following:
Collect SCADA data from wind turbines on a wind farm.
Use the wind speed as the clustering input.
Select K objects as the initial clustering centers from the data set.
Calculate the distance between the data objects and the clustering center, then classify them to the nearest cluster.
After the turbine is clustered, depending on the wind speed, calculate the mode for each parameter, if there is more than one mode, calculate the mean value of the modes.
Compare the parameters in each cluster with each other on the basis of the mode.
Make the comparison using the decision tree.
Compare different parameters according to their priority.
If there is any difference between a parameter of a turbine and the mode above the permissible limit, this parameter is classified as going to failure.
In this case, the maintenance department determines the required procedure.
Variation in wind speed between adjacent turbines in the same terrain is small. The aim of this comparison is to check the orientation and measurement systems.

Methodology for wind turbine early fault detection.
One important note is the variations between the spaced turbines is large (unlike that of the adjacent ones) so every turbine is compared with two adjacent turbines to detect the deviation in any turbine, the deviation is related to the power, which means if there is an error in the orientation, the generated power is reduced, which leads to financial loss. The next step is to cluster the turbines based on the wind speed, this is by k-means algorithm, the aim of the grouping step is to capture the similarities of output characteristics of the wind turbines. As mentioned before the similarity in wind resources lead to the similarity of wind turbine performance. The cluster number is determined based on the present wind speed, on the condition that the difference between the largest and the lowest speed doesn’t exceed 2 m/sec.
Then the mode for every parameter is calculated. The decision tree is used to specify the fault
Possibilities by analyzing the wind turbine parameters and comparing them with the mode, the comparison indicates clear difference between faulty and healthy conditions, this study focuses on wind speed, generated power and gearbox oil temperature, but it can be applied to other parameters such as gearbox bearing temperature, generator coil temperature, generator bearing temperature, pitch angle and other parameters that can be used for early fault detection.
The proposed method
This paper is proposing that the performance of adjacent turbines be compared to discover anomalies in the performance of any component of the turbine taking into account the following notices:
Difference in terrain leads to large difference in wind speed measurements in adjacent turbines. This problem can be solved by finding the relationship between wind speeds in adjacent turbines over the past years and inferring the impact of the terrain, but that is outside the scope of this study.
A slight difference in wind speeds in adjacent turbines mostly doesn’t exceed 1 m/s.
Although the difference in wind speeds is small, its effect on the produced power is large where the power is a function of the speed cube.
The waking effect on the turbines has been neglected.
The comparison is made by comparing the wind speed, where the wind speed in the turbine is compared with the turbine before and after it (Table 1), and this comparison is an indication of the malfunction in the wind speed measurement system or the direction system in the turbine, as the terrain in the site that was studied is close to what does not affect wind speeds.
Proposed method for wind speed comparison.
The comparison is made according to the following formula:
If
This indicates a malfunction of the wind speed measurement system or direction system, c is determined experimentally, where v is the wind speed; v r is the rated wind speed.
K-means clustering algorithm
In this paper the k-means clustering algorithm is used, it is an unsupervised learning method. it is widely used because it can handle large amount of data sets effectively, it aims to allocate all data samples into K clusters by minimizing the sum of the squared error over all K clusters, it is based on partitioning the data according to the distance between the data and k center points with the Euclidean distance, it can be expressed as:
Where
Clustering procedures
Collect SCADA data from wind turbines on a wind farm
Use the wind speed as the clustering input
Select K objects as the initial clustering centers from the data set.
Calculate the distance between the data objects and the clustering center, then classify them to the nearest cluster.
Compute the mean value of the cluster then update the old center.
Repeat the center calculations until the centers doesn’t change
After the Clustering process, the mode is calculated for each parameter, if there is more than one mode, their mean value should be calculated. The mean wasn’t initially used instead of the mode because the mean is sensitive to anomalous values.
Decision Tree
Decision Trees (DT) are a classification and pattern recognition algorithm, and are characterized by
The rules for creating DT structures are simple and easy to be understood.
DT uses multi-step to perform the classification process.
DT allows the addition of possible new scenarios.
DT contributes to identifying the worst and best expected value for different scenarios.
The decision tree performs the classification process in two stages:
The learning stage, which is the rule-making stage, and the classification stage, where DT applies the rules. In the case of obtaining high accuracy, these rules are adopted and used to classify the new data. DT consists of nodes, branches and leaves. A node represents a test of the attribute. Each branch represents test of the outputs, and every leave represents the decision made after calculating all the properties. The first part is called a root. The path from the root to the leaf expresses a possible scenario. Figure 2 illustrates the decision tree diagram used in this paper.

DT classifier for wind turbine fault detection.
Case study
This method was implemented on a part of the Zafarana wind farm that contains 142 turbines of G52 type, their specifications shown in Table 2.
Wind turbine characteristics.
Discussion and evaluation
A project consisting of 142 turbines at the Zafarana site was chosen to implement the proposed strategy, focusing on some parameters such as wind speed, produced power, gearbox oil temperature, the study was conducted from December 2018, to February 2019 and the results were as presented in the following figures. It is clear that the temperature of the gearbox which is the most important and the most expensive component of the change, and takes the longest period of change between the turbine components, is detectable for all of the gearbox failures. The neighboring turbines are compared according to the temperature, wind speed and the power. The wind speed is the main input which results in power generation and the rise of temperature as a power loss in different components such as the temperature of the gearbox oil, the gearbox bearing, the generator bearing and coils.
Losses in the gearbox components are mostly converted to a rise in the temperature and cause reduction in power; the results are shown in the drawings.
In turbine 3 (Figures 3,4,5), the temperature of the gearbox oil is very high, despite the clear convergence of the wind speed, and the decrease in the produced energy, which suggest that there is a defect. The oil temperature doesn’t reach the alarm threshold; this means the alarm didn’t appear despite the fact that the oil temperature is higher than normal. When this turbine was checked a wear in the bearing of the high-speed shaft was found, and there was a shock in the heat exchanger of the oil cooler, which reduced the cooling efficiency, this may have been the main reason that caused the bearing to wear which can reduce the lifespan of the gearbox.

Wind speed comparative WTGs 2,3,4.

Power comparative WTGs 2,3,4.

G.B oil temperature comparative WTGs 2,3,4.
In turbine 43 (Figures 6,7,8), it is clear from the diagram that the temperature of the gearbox oil is high, the power and wind speed readings are very close to the two neighboring turbines, the turbine has been checked, and a defect in the electric circuit of the oil pump of the gearbox have been found, but it wasn’t affecting the gearbox, which explains the similarities in the three turbines readings.

Wind speed comparative WTGs 42,43,44.

Power comparative WTGs4 2,43,44.

G.B oil temperature comparative WTGs 42,43,44
In Turbine 134 (Figures 9,10,11), there is a significant difference in temperature between Turbine 134 and the neighboring turbines, the drawing shows that there is a great variation in the temperature of the gearbox oil, despite the convergence of the produced power and the wind speed, and this indicates a defect in the temperature sensor. The gearbox oil has already been checked for this turbine and it shows that there is a defect in the electric circuit of the gearbox oil sensor.

Wind speed comparative WTGs 133,134,135.

Power comparative WTGs 133,134,135.

G.B oil temperature comparative WTGs 133,134,135.
Conclusions
This paper presented a new way to exploit SCADA data, A designed algorithm consists of wind speed comparative, k – means cluster and decision tree classifier is used to compare turbine performance, it takes the mode as a comparative reference for the parameters and compare the performance with it all the time, the approach relied mainly on the calculation of energy loss and abnormal temperature, with the aim being early fault detection.
The method proved to be very effective in detecting faults early through the above comparisons. It also reduced the effort to examine the turbine to find the cause of the malfunctions. Although the paper focused mainly on the gearbox, this method is suitable for all other turbine components with appropriate parameters. This method can also be used to determine the economic feasibility of the continuation of a specific component in the turbine, using a neural network that predicts component performance and compares it to the reference, which is the next phase to be implemented.
Future work
In the future we will focus on comparing and measuring the rest of the parameters, which will provide a strong analysis and improve the accuracy and effectiveness of early fault detections. We will also design a neural network that predicts the performance of turbine components for a certain period, that can then be compared with the reference that was created to determine the remaining period, in which it is economical to the use of the different components.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
