Feature ranking for multi-fault diagnosis of rotating machinery by using random forest and KNN

Abstract

Gearboxes and bearings play an important role in industries for motion and torque transmission machines. Therefore, early diagnoses are sought to avoid unplanned shutdowns, catastrophic damage to the machine or human losses; additionally, an appropriate diagnosis contributes to increase productivity and reduce maintenance costs. This paper addresses a methodological framework for the diagnosis of multi-faults in rotating machinery through the use of features rankings. The classification uses K nearest neighbors and random forest, based on the information that comes from the measured vibration signal. Thirty features in time domain are calculated from the vibration signal, twenty-four features commonly used in fault diagnosis in rotating machinery, and six features are used from the field of electromyography. Feature ranking methods such as ReliefF algorithm, Chi-Square, and Information Gain are used to select the ten most relevant features, the same ones that enter the classifiers. Five databases were used to validate the proposed methodological framework. The results show good accuracy in classification for the five databases; furthermore, in all the databases in the first ten features ranked by the three rankings methods are present at least two nonconventional features.

Keywords

Feature ranking multi-fault diagnosis rotating machinery time features

1 Introduction

Gearboxes and bearings play an important role in industries for motion and torque transmission machines. They are used in several industries such as aircrafts, automobiles, wind turbines, among other applications. Breakdowns of rotating machinery are mostly produced by gear and bearings failures; hence, some strategies are sought to avoid unscheduled stops, or catastrophic damages, in order to reduce maintenance costs and increase reliability [1 –3]. Due to the complicated configuration of the gearbox, it is challenging to recognize fault existences and, if any, the failure patterns because the bearing and the gear have different fault patterns; in addition, the fault can appear in both elements. Therefore, multi-fault diagnosis of rotating machinery has gained much attention in order to guarantee safe operations in industries [4, 5].

Data-driven methodologies for condition monitoring use four basic stages to diagnose gearbox and bearing fault patterns: i) acquisition and conditioning of the signal, ii) feature extraction, iii) feature selection, iv) classification [6].

The signal can be acquired from various physical variables such as vibration [3 , 8], infrared thermal [9], acoustic emission signals [10], current [11], among others. Vibration signal analysis is the most commonly used technique for condition monitoring because it is easy to measure it; furthermore, it is not necessary to stop the machinery for diagnosis.

Feature extraction of acquired signals can be characterized and analyzed into three domains: time domain, frequency domain, and time-frequency domain; the goal of feature extraction is to obtain the information relevant for understanding and interpretation of the rotating machinery condition [12]. To extract features in the frequency domain techniques such as the Fast Fourier Transform [13], cepstrum [14], power spectral density [15], among others, are used. In the time-frequency domain the Wavelet Transform (WT) is reported as a powerful tool that has gained great attention in several fields as a powerful signal analysis tool for rotating machinery fault detection and diagnosis [16, 17]; additionally, the Empirical Mode Decomposition (EMD) is one of the most powerful signal processing techniques and has been extensively studied and widely applied in fault detection and diagnosis of rotating machinery in this domain [18]. The domain frequency and time-frequency approaches employ complex algorithms to extract a representative set of features. Time domain signals generally contain information on how signal amplitude is varied with respect to time; their analysis can be computationally less expensive to implement and the only necessary preprocessing is to condition the signal. This analysis is simple, involves visually inspecting parts of the time waveform, and detecting abnormal behaviors; however, vibration signals produced by a machine contain many components and it can be very difficult to see them in the time domain, so, it is unlikely that a fault will be detected by a simple visual inspection. Hence, instead of visual inspection of the signal a set features of statistical parameters, called condition indicators, can be extracted; these parameters can be compared with predefined thresholds to determine whether the machine presents normal or fault condition [19 –21].

The feature selection is very important prior to the stage of pattern recognition because it allows the elimination of noisy, redundant or irrelevant features by which the set of features is considerably reduced, in most cases allowing to optimize classification tasks and increasing the performance in the learning algorithms. Selecting an appropriate feature or set of features to reflect the condition of the equipment is a critical task in the diagnosis process. It is expected that a good feature or set of features will allow one to distinguish between normal and fault conditions; in addition, it allows the establishment of a trend analysis, avoiding the influence of other parameters of operation of the equipment [22]. In most study cases, feature selection has been considered as a dimensionality reduction problem where several techniques are used, such as Principal Component Analysis, Multidimensional Scaling, Factor Analysis, Projection Pursuit, Kernel Fisher Discriminant Analysis, and other techniques. However, these types of methods commonly generate synthetic features with lower dimension than the original set; consequently, these reduced set features lack of physical significance [23, 24]. The feature ranking methods such as Fisher score, ReliefF algorithm, Wilcoxon rank, Gain ratio, Memetic feature selection, Chi-square, and information gain have been used to select relevant features and improve precision in the diagnosis of failures in rotating machinery [25] and particularly bearing failures [26 –29].

Different artificial intelligence techniques have been employed to classify the healthy condition of different failure patterns in rotating machinery. Artificial Neural Networks (ANN) have been used for fault detection in gearboxes since they can be trained for early detection of faults [30]. Recently, Random Forest (RF), as a classification technique, has been used for diagnosis in several areas of engineering [31, 32]; RF it is one of the most accurate machine learning techniques available, and it runs efficiently in case of having a large number of input features and low number of available samples [33, 34]. RF has been used in the diagnosis of failures in gearboxes [16] and bearings [29 , 36]. The K nearest neighbor (KNN) is a classification algorithm it has been studied extensively and used successfully in many tasks of pattern recognition. But the KNN present serious problems when features of different classes overlap in some regions in the feature space, so it is a classifier that needs relevant features [3, 37]. Deep learning techniques have also been used for gearboxes condition monitoring [5, 38]. Although the classification techniques reported in this section have had high accuracy, it is necessary to configure several parameters for each of them, specially the number of features at the input. Usually the input features to the classifier must be extracted from the two or three domains to obtain a high accuracy in the classification, as reported in [7 , 39–41] or use features obtained from the time-frequency domain [16 , 29]. But the feature extraction in two or three domains represents a high computational load to process the signal; for this reason, the authors of this article propose to use time-domain statistical features to classify the failures in rotating machinery with less computational complexity in the signal processing.

In references [27, 42], features are extracted in the time domain considering statistical parameters and entropy as a measure of disorder; good results are reported in the classification of failures in bearings. In reference [43], Zero-crossing (ZC) features extracted from time-domain motor vibrations are useful in classification of bearing faults. ZC features have been effectively used in various signal processing and pattern recognition engineering fields such as speech recognition, automobile classification, and biomedical applications. The simplicity in extracting such features make them very attractive when compared to the features extracted from frequency and the time-frequency domains. Recently, in [44] authors used six time-domain features that are normally used in the biomedical field and six conventional features that are used in the diagnosis of rotating machinery, very promising results were obtained with the inclusion of the non-conventional features for the classification of faults in bearings. It is also emphasized that the nonconventional features improve the accuracy of the classification compared to the conventional features for the case of bearings.

In this paper, the authors address a methodological framework to select the optimal number of feature set using Chi-square, ReliefF, and information gain feature ranking methods; RF and KNN classifiers are used for the evaluations. This article uses thirty features, of these six features ZC, slope sign change, Wilson amplitude, log detector, wave length, and square root amplitude value are not reported in the literature that have been used for the fault diagnosis in gearboxes. Five databases are used to evaluate the methodology and relevance of the six new features.

The document is organized as follows. Section 2 presents the five experiments in rotary machinery used to evaluate the proposed methodological framework. Section 3 presents the theory used in this work. Section 4 introduces the proposed methodological framework for Multi-fault diagnosis of rotating machinery by using feature ranking methods and machine learning. Section 5 shows the results of diagnosis using the feature ranking in the time domain by following our framework, and finally, Section 6 presents some conclusions.

2 Data acquisition

Five vibration signals datasets were used to validate the proposed methodology. Four datasets correspond to different configurations of a gearbox that were used to carry out the experiments, the fifth used dataset was from the Case Western Reserve University (CWRU) Bearing Data Centre [45].

Four experiments were carried out on a gearbox fault diagnosis testbed, the gearbox can be configured to experiment in one stage (coupling Z1 and Z2 or Z3 and Z4) or two stages as shown in Fig. 1. The gearbox to be diagnosed is driven by a motor through a coupling. The speed of the motor is controlled by a frequency inverter (DANFOSS VLT 1.5 kW). The output shaft of the gearbox is connected to an electromagnetic torque load (ROSATI 8.83 kW) through a belt transmission. The electromagnetic torque load was controlled by a torque controller (TDK-Lambda, GEN 100-15-IS510), which allows one to adjust the torque of the load manually. The vibration signals of the gearbox are collected by an accelerometer. The outputs of the accelerometer are fed into a laptop through a Data acquisition systems (DAQ).

Fig.1

Schematic of the experimental test bed for simulation of faults in rotating machinery.

Dataset 1 (DB1): the configuration #1 is as shown in Fig. 1, two spur gears (number of teeth Z₃ = 53, and Z₄ = 80 with modulus 2.25 and impact angle 20°) were installed on the input and the output shafts of the gearbox, respectively. We used 10 condition patterns, normal gears, 5 faulty input gears, 3 faulty output gears and 1 faulty in the output and input gear. The experiments were performed with using three loads, three constant speeds, and three variable speeds, in each of them, five samples were collected; in this way, we obtained 900 vibration signals corresponding to 10 condition patterns. More details of the experimentation can be found in references [24, 46].

Dataset 2 (DB2): configuration #2 as shown in Fig. 1, the input gear and the output gear were chosen as helical gears (number of teeth Z₁ = 30, and Z₄ = 80 with modulus 2.25, impact angle 20°, and helical angle 20°). An intermediate shaft with two helical gears (Z₂ = Z ₃ = 45) was installed between the input and the output shafts for the transmission. In this experiment we used 11 condition patterns, normal gears, 6 faulty gears (1 input gear, 3 intermediate ones and 2 output ones), 3 faulty bearings (with 1 inner race fault, 1 outer race fault and 1 ball fault, respectively), and 1 eccentric bearing box. The experiments were performed with three constant loads and three speeds, in each of them, five samples were collected; the experimental data are acquired for each one of the gearbox conditions patterns, resulting in a data set with 495 vibration signals. More details of the experimentation can be found in references [38, 46].

Dataset 3 (DB3): the configuration #3 is as shown in Fig. 1, two spur gears (number of teeth Z1 = 27, and Z2 = 53 with modulus 2.25 and impact angle 20°) were installed on the input and the output shafts of the gearbox, respectively. We used 7 condition patterns, normal gears, 2 faulty input gears, 3 faulty output gears and 1 faulty misalignment in output gear. The experiments were performed with three loads and five constant speeds, in each of them, ten samples were collected; in this way, we obtained 1050 vibration signals corresponding to 7 condition patterns. More details of the experimentation can be found in reference [34].

Dataset 4 (DB4): configuration #4 as shown in Fig. 1, the input gear was chosen as helical gear (Z₁ = 30, with modulus 2.25, impact angle 20°, and helical angle 20°) and as output gear were chose helical gear Z₂ = 45. In this experiment, 10 condition patterns were used to simulate tooth breaking severity of gear Z1, normal gears, 9 levels of severity of broken tooth in Z1 and Z2 in healthy conditions. The experiments were performed with 3 loads, 3 constant speeds and 2 variable speeds, in each of them, 5 samples were collected; in this way, we obtained 750 vibration signals corresponding to 10 condition patterns. More details of the experimentation can be found in references [47, 48].

Dataset 5 (DB5): was obtained from CWRU Bearing Data Centre [24] relative to the 6203-2RS JEM SKF deep groove ball bearing. Vibration signals acquired through accelerometers placed at 12 o’clock on the bearing housing, sampled at 12 kHz the signals were collected under the 0 load at four successive rotation speeds 1730, 1750, 1772, and 1797 r/m. Four condition patterns were simulated: the fault diameters were 0,1778 mm in i) inner race, ii) outer race, iii) ball, and iv) no fault. For each of the above operating conditions, 20 data acquisition experiments were performed, for each signal the data length was 2000 points.

3 Background

3.1 Feature extraction

Several traditional techniques analyze the statistical properties derived from vibration signals for the detection of faults in rotating machinery [49]. Other fields, such as biomedical engineering use electromyography (EMG) signals [50]. These techniques could work together with traditional features to improve the performance of fault classification systems.

For this purpose, a total of thirty features are obtained from a vibration signal and are show in Table 1. Of these, twenty four (T₁ to T₂₄) are commonly features used for the diagnosis of failures in rotating machinery [51 –53] and the six (T₂₅ to T₃₀) remaining come from the analysis of EMG signals [50, 54].

Table 1
Time-domain features

Feature Name Formula Feature Name Formula

T₁ Kurtosis $\frac{N \sum_{i = 1}^{N} {(x_{i} - μ)}^{4}}{{[\sum_{i = 1}^{N} {(x_{i} - μ)}^{2}]}^{2}}$ T₁₆ Threshold entropy $\begin{matrix} 1, & if | x_{i} | > p, and \\ 0, & elsewhere \end{matrix}$

T₂ Energy Operator $\frac{N^{2} \sum_{i = 1}^{N} {(Δ y_{i} - Δ \bar{y})}^{4}}{{[\sum_{i = 1}^{N} {(Δ y_{i} - Δ \bar{y})}^{2}]}^{2}}$ Where: $Δ \bar{y} = mean of Δ y$ $Δ y_{i} = x_{i + 1}^{2} - x_{i}^{2}$ T₁₇ Sure entropy $\begin{matrix} n - # {i such that | x_{i} | \leq p} \\ + \sum_{i} min (x_{i}^{2}, p^{2}) \end{matrix}$

T₃ Mean $\frac{1}{N} \sum_{i = 1}^{N} x_{i}$ T₁₈ Shannon Entropy $- \sum_{i = 1}^{N} x_{i}^{2} log (x_{i}^{2})$

T₄ Root mean square (RMS) $\sqrt{\frac{1}{N} [\sum_{i = 1}^{N} {(x_{i})}^{2}]}$ T₁₉ CPT1 $\frac{Vmax | x (i) |}{SRAV}$

T₅ Vmax max (x_i) T₂₀ CPT2 $\frac{Vmax | x (i) |}{RMS}$

T₆ Clearance Factor $\frac{Vmax}{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i})}^{2}}$ T₂₁ CPT3 $\frac{Vmax | x (i) |}{Mean_abs}$

T₇ Variance (Var) $\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - μ)}^{2}$ T₂₂ CPT4 $\frac{\sum_{i = 1}^{N} log (| x (i) | + 1)}{N \cdot \log (σ + 1)}$

T₈ Standard Deviation (STD) $\sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - μ)}^{2}}$ T₂₃ CPT5 $\frac{\sum_{i = 1}^{N} exp (x (i))}{N \cdot \exp (σ)}$

T₉ Skewness $\frac{N \sum_{i = 1}^{N} {(x_{i} - μ)}^{3}}{σ^{3}}$ T₂₄ CPT6 $\frac{\sum_{i = 1}^{N} \sqrt{| x_{i} |}}{N \cdot Var}$

T₁₀ Shape factor $\frac{RMS}{\frac{1}{N} \sum_{i = 1}^{N} | x_{i} |}$ T₂₅ Square root amplitude value (SRAV) ${(\frac{\sum_{i = 1}^{N} \sqrt{| x_{i} |}}{N})}^{2}$

T₁₁ Mean of absolute (Mean_abs) $\frac{1}{N} \sum_{i = 1}^{N} | x_{i} |$ T₂₆ Slope sign change (SSC) $\sum_{i = 2}^{N} step [(x_{i} - x_{i - 1}) \cdot (x_{i} - x_{i + 1})]$

T₁₂ Impulse factor $\frac{Vmax}{\frac{1}{N} \sum_{i = 1}^{N} | x_{i} |}$ T₂₇ Wilson amplitude $\sum_{i = 1}^{N} step (| x_{i} - x_{i + 1} | - T)$

T₁₃ Crest Factor $\frac{V max}{RMS}$ T₂₈ Log detector $e^{\frac{1}{N} \sum_{i = 1}^{N} log | x_{i} |}$

T₁₄ Norm entropy $\sum_{i = 1}^{N} {| x_{i} |}^{p}$ T₂₉ Zero crossing $\sum_{i = 1}^{N} step [Sign (- x_{i} \cdot x_{i + 1})]$

T₁₅ Log energy entropy $\sum_{i = 1}^{N} \log (x_{i}^{2})$ where, log(0) = 0 T₃₀ Wave length (WL) $\sum_{i = 1}^{N} | x_{i + 1} - x_{i} |$

Feature Name	Formula	Feature Name	Formula
T₁	Kurtosis	$\frac{N \sum_{i = 1}^{N} {(x_{i} - μ)}^{4}}{{[\sum_{i = 1}^{N} {(x_{i} - μ)}^{2}]}^{2}}$	T₁₆	Threshold entropy	$\begin{matrix} 1, & if \| x_{i} \| > p, and \\ 0, & elsewhere \end{matrix}$
T₂	Energy Operator	$\frac{N^{2} \sum_{i = 1}^{N} {(Δ y_{i} - Δ \bar{y})}^{4}}{{[\sum_{i = 1}^{N} {(Δ y_{i} - Δ \bar{y})}^{2}]}^{2}}$ Where: $Δ \bar{y} = mean of Δ y$ $Δ y_{i} = x_{i + 1}^{2} - x_{i}^{2}$	T₁₇	Sure entropy	$\begin{matrix} n - # {i such that \| x_{i} \| \leq p} \\ + \sum_{i} min (x_{i}^{2}, p^{2}) \end{matrix}$
T₃	Mean	$\frac{1}{N} \sum_{i = 1}^{N} x_{i}$	T₁₈	Shannon Entropy	$- \sum_{i = 1}^{N} x_{i}^{2} log (x_{i}^{2})$
T₄	Root mean square (RMS)	$\sqrt{\frac{1}{N} [\sum_{i = 1}^{N} {(x_{i})}^{2}]}$	T₁₉	CPT1	$\frac{Vmax \| x (i) \|}{SRAV}$
T₅	Vmax	max (x_i)	T₂₀	CPT2	$\frac{Vmax \| x (i) \|}{RMS}$
T₆	Clearance Factor	$\frac{Vmax}{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i})}^{2}}$	T₂₁	CPT3	$\frac{Vmax \| x (i) \|}{Mean_abs}$
T₇	Variance (Var)	$\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - μ)}^{2}$	T₂₂	CPT4	$\frac{\sum_{i = 1}^{N} log (\| x (i) \| + 1)}{N \cdot \log (σ + 1)}$
T₈	Standard Deviation (STD)	$\sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - μ)}^{2}}$	T₂₃	CPT5	$\frac{\sum_{i = 1}^{N} exp (x (i))}{N \cdot \exp (σ)}$
T₉	Skewness	$\frac{N \sum_{i = 1}^{N} {(x_{i} - μ)}^{3}}{σ^{3}}$	T₂₄	CPT6	$\frac{\sum_{i = 1}^{N} \sqrt{\| x_{i} \|}}{N \cdot Var}$
T₁₀	Shape factor	$\frac{RMS}{\frac{1}{N} \sum_{i = 1}^{N} \| x_{i} \|}$	T₂₅	Square root amplitude value (SRAV)	${(\frac{\sum_{i = 1}^{N} \sqrt{\| x_{i} \|}}{N})}^{2}$
T₁₁	Mean of absolute (Mean_abs)	$\frac{1}{N} \sum_{i = 1}^{N} \| x_{i} \|$	T₂₆	Slope sign change (SSC)	$\sum_{i = 2}^{N} step [(x_{i} - x_{i - 1}) \cdot (x_{i} - x_{i + 1})]$
T₁₂	Impulse factor	$\frac{Vmax}{\frac{1}{N} \sum_{i = 1}^{N} \| x_{i} \|}$	T₂₇	Wilson amplitude	$\sum_{i = 1}^{N} step (\| x_{i} - x_{i + 1} \| - T)$
T₁₃	Crest Factor	$\frac{V max}{RMS}$	T₂₈	Log detector	$e^{\frac{1}{N} \sum_{i = 1}^{N} log \| x_{i} \|}$
T₁₄	Norm entropy	$\sum_{i = 1}^{N} {\| x_{i} \|}^{p}$	T₂₉	Zero crossing	$\sum_{i = 1}^{N} step [Sign (- x_{i} \cdot x_{i + 1})]$
T₁₅	Log energy entropy	$\sum_{i = 1}^{N} \log (x_{i}^{2})$ where, log(0) = 0	T₃₀	Wave length (WL)	$\sum_{i = 1}^{N} \| x_{i + 1} - x_{i} \|$

Where x_i is a signal for i = 1, 2, …, N. N is the number of data points.

3.2 Feature ranking

3.2.1 ReliefF algorithm

It is an extension of the Relief algorithm (used for binary classification), which seeks to estimate the quality of the features that allow discriminating between neighboring instances [55]. ReliefF algorithm starts by selecting a random instance then looks k nearest instances of the same class and k nearest neighbors for the different classes. This updates a weighting vector W , which gives more weight to the features that differentiate better between neighbors of different classes [28] and is defined by:

$\begin{matrix} W_{f} & = & P (diferent value of f | nearest instances from differete class) \\ - (diferent value of f | nearest instances from same class) \end{matrix}$ (1) where W_f represents the weight of the feature f.

3.2.2 Chi-square

This test is used in statistics to test the independence of two events [56]. For feature ranking it allows testing if the occurrence of a specific feature and the occurrence of a specific class are independent. Thus, when a feature is independent of the class this is discarded [29]. It can be computed by:

$x^{2} = \sum_{j = 1}^{N} {(Y_{j} - u_{j})}^{2} / u_{j},$ (2) where Y_j is the number of observations in class j, u_j is the expected value of Y_j. For u_j = NP_j where N is the number of observations and P_j is the probability of occurrence.

3.2.3 Information gain

A metric that allows quantifying the amount of impurity in a data set is known as entropy and is represented by H. From this, it is possible to define an IG measure that reflects the additional information about the class Y provided by the feature X that represents the amount by which the entropy of Y decreases [57]. This measure is given by: $IG = H (Y) - H (Y | X) = H (X) - H (X | Y)$ (3)

3.3 Classifiers

3.3.1 K-nearest neighbor (KNN)

It is a non-parametric classification model which uses a training dataset as the basis for the classification of new samples, belonging to the test dataset, applying the criterion of the nearest neighbor. This approach allows finding k samples of the training set that are closer to the new test sample, whose labeling will be based on the classes that predominate in the neighborhood [58].

Given a training set D( x , y) (where x represents, the sample and y its respective class) and a test sample, z = (x′, y′), the algorithm calculates the distance between z and all training samples, (x, y) ∈ D, getting a list of nearest neighbors. The class assignment for y′ corresponding to test sample x′ will be based on a majority vote of the classes of its nearest neighbors: $y^{'} = \underset{v}{\underset{︸}{argmax}} \sum_{(x_{i}, y_{i}) \in D} I (v = y_{i})$ (4) where v is the class tag, y_i is the class tag of the nearest neighbor i - th and I (·) is a function that returns 1 if the argument is true and 0 if it is not.

One of the key points when using KNN is the choice of k value, if it is very small the classifier can be very sensitive to noise, on the other hand, if it is very large it can group many neighboring samples belonging to several classes. Therefore, it is necessary to use a more sophisticated approach for class labeling. This is based on weighting the contribution of each neighboring sample as a function of the distance that exists with respect to the sample to be classified, giving greater weight to the nearest neighbors. For this, it is used a weighting factor that is a function of the inverse square of its distance: $ω_{i} = \frac{1}{d {(x^{'}, x_{i})}^{2}}$ (5)

In this way the KNN algorithm is defined as follows: $y^{'} = \underset{v}{\underset{︸}{argmax}} \sum_{(x_{i}, y_{i}) \in D} ω_{i} \times I (v = y_{i})$ (6)

Another key point when using KNN is the selection of the distance metric, usually using Euclidean distance, but also has the measure of similarity cosine or other metrics such as Minkowsky1, Correlation, Chi-Square among others [59].

3.3.2 Rondom forest

It is a potent classifier based on multiple sets of decision trees (CART), which are created by selecting a fraction of bootstrap samples of the input data and the random choice of a subset of variables [60]. In this way the ideas of bagging and the random selection of features are combined; this helps reducing variance and over fit. Each tree gives a sorting, and random forest decides which class is featured to each sample by counting a majority of votes from the response of each tree.

The input data, which were selected to create the trees, are known as data in the bag and the remaining data are out-of-bag observations (OOB) [61]. For each tree t of the forest, OOB_t is his sample OOB associated. And the error for the classification is defined by: $errForest = \frac{1}{n} Cart {i \in {1, \dots, n} | y_{i} \neq {\hat{y}}_{i}}$ (7) where ${\hat{y}}_{i}$ is the most common class predicted by t trees for which a sample i is in OOB_t.

4 Methodological framework

The vibration signals produced in healthy and faults conditions of the machine are used to calculate 30 features in time domain. Finding the best features that allow good performance in diagnosis will help to optimize the learning process.

In order to analyze the contribution of EMG attributes in fault diagnosis a ranking stage is used, where the best Z attributes will be evaluated by two classifiers RF and KNN to measure their performance through the accuracy of each of the classifiers.

The methodological process, exposed in Fig. 2, consists of the following stages:

Fig.2

Methodological process.

Stage 1: acquisition and conditioning of the vibration signal of the rotating machinery.

Stage 2: calculation of 30 features for each signal as shown in Table 1.

Stage 3: form the feature vector W with the 30 features obtained in stage 2.

Stage 4: feature ranking is performed using three different methods; ReliefF algorithm, Chi-Square and Information Gain.

Stage 5: vector Z, corresponding to the first 10 features ranked by each of the rankings methods, is formed.

Stage 6: RF and KNN are used for the classification stage. Each features groups of the previous stage are evaluated. A cross-validation criterion was implemented using 5 folds: 4 for training and 1 for test. For RF training setting it was used 40 trees, to number of variables to select at random for each decision split were square root of the number of total variables and out-of-bag error was used for training evaluation, on the other hand, for KNN training setting it was implemented cosine as distance metric with 3 nearest neighbors.

Stage 7: the results delivered by each classifier are presented for each group of rankings method.

This methodology was evaluated for five different databases.

5 Results and discussion

The objective of this study was to evaluate a methodological framework to rank features in multi-fault diagnosis of rotating machinery by using RF and KNN classifiers, and know the relevance of using nonconventional features in fault diagnosis processes for rotating machinery. Five databases were used to evaluate the methodology and relevance of the six nonconventional features.

For each database were extracted thirty features in time domain, of these, six features ZC, Slope Sign Change, Wilson Amplitude, Log Detector, Wavelength and Square root amplitude value are not reported in the literature that have been used for the fault diagnosis in gearboxes and the remaining twenty-four are conventional features in fault diagnosis. The thirty features were ranked through Chi-square, ReliefF, and information gain feature ranking methods. Then the top ten features ranked by each of the methods are entered into the RF and KNN classifiers. Table 2 presents the results obtained for each dataset; it can be observed the top ten features by ranking method and the classification accuracy for RF and KNN. The minimum nonconventional features are two which appear in each ranking method and for all databases. On the other hand, the maximum number is 5 of the total 6 this for DB3 using the ReliefF algorithm. This shows that the nonconventional features are among the top ten for the different rankings methods. As for the attributes, Zero crossing and SSC these are present in the 5 databases when was used ReliefF.

Table 2
Ranking and classifier accuracy results for each data sets

ReliefF Chi-Square Information Gain

Ranking RF (%) KNN (%) Ranking RF (%) KNN (%) Ranking RF (%) KNN (%)

DB1 1 Mean 44,8 19,8 1 Mean 44,8 19,8 1 Mean 44,6 19,8

2 Zero crossing 69,4 43,6 2 CPT5 75,3 46,2 2 Zero crossing 67,7 45,9

3 CPT4 89,0 78,9 3 Zero crossing 86,8 75,7 3 Skewness 86,1 76,4

4 Skewness 93,8 94,0 4 Wave length 93,3 91,1 4 Vmax 93,0 92,2

5 Shape factor 93,0 94,6 5 Shape factor 94,6 96,9 5 Threshold entropy 94,9 96,2

6 Kurtosis 95,2 97,2 6 CPT4 96,3 98,2 6 Sure entropy 95,9 97,9

7 Vmax 96,8 99,0 7 Skewness 97,2 98,9 7 Shape factor 97,4 98,8

8 CPT3 97,4 98,9 8 Mean of absolute 96,9 99,2 8 Kurtosis 98,2 99,1

9 SSC 97,9 99,2 9 CPT6 97,7 99,3 9 CPT4 98,6 99,1

10 CPT1 97,7 99,1 10 Shannon entropy 97,8 99,3 10 Wilson Amplitude 97,8 99,3

DB2 1 Mean 45,5 10,1 1 Mean 45,5 10,1 1 Mean 45,9 10,1

2 Skewness 57,8 39,4 2 SSC 64,8 40,2 2 CPT5 66,5 37,6

3 SSC 84,4 69,9 3 Zero Crossing 87,8 76,8 3 Thres Entropy 81,8 60,4

4 Zero Crossing 92,1 91,9 4 Sure Entropy 92,1 89,7 4 Wilson Amplitude 86,3 74,5

5 Shape Factor 90,5 97,1 5 Kurtosis 92,9 97,2 5 Sure Entropy 87,9 78,2

6 CPT4 92,1 97,4 6 Threshold Entropy 95,6 98,4 6 SSC 92,5 86,9

7 Kurtosis 96,6 97,9 7 CPT5 96,2 98,6 7 Kurtosis 94,1 93,9

8 Crest Factor 96,4 96,9 8 Log Energy Entropy 96,4 98,6 8 Zero Crossing 97,6 98,4

9 CPT2 96,8 95,2 9 Log Detector 97,8 98,4 9 Shape Factor 97,4 98,8

10 CPT3 94,7 93,5 10 RMS 95,9 98,4 10 Skewness 97,9 98,4

DB3 1 SSC 33,8 14,3 1 SSC 32,9 14,3 1 SSC 32,9 14,3

2 Zero Crossing 47,9 32,3 2 Norm Entropy 60,6 31,4 2 Zero Crossing 48,1 32,3

3 Wave Length 76,8 62,1 3 SRAV 62,9 53,2 3 RMS 77,0 61,7

4 Vmax 80,9 69,3 4 STD 69,2 62,0 4 Norm Entropy 78,5 64,5

5 STD 82,4 71,4 5 Variance 70,1 64,0 5 STD 77,9 67,1

6 RMS 82,2 72,3 6 Shannon Entropy 69,8 63,5 6 Mean of Absolute 81,5 70,5

7 Norm Entropy 84,1 76,2 7 CPT6 70,2 65,2 7 Wave Length 80,4 72,2

8 Mean of absolute 83,0 76,3 8 Mean of Absolute 69,4 65,0 8 SRAV 80,9 75,0

9 SRAV 82,7 76,6 9 RMS 70,0 65,5 9 Log Detector 79,7 75,5

10 Log Detector 82,6 76,6 10 Log Detector 69,6 67,1 10 Vmax 82,4 76,9

DB4 1 Skewness 18,5 10,0 1 Zero Crossing 21,2 10,0 1 Skewness 18,5 10,0

2 Vmax 52,7 31,5 2 Skewness 54,7 27,5 2 Wave length 63,7 32,7

3 Zero crossing 77,9 66,1 3 Wilson amplitude 81,9 64,4 3 Shape factor 84,9 70,9

4 Impulse factor 86,3 84,1 4 Norm entropy 91,1 83,9 4 RMS 88,9 81,9

5 Wave length 91,1 86,9 5 SRAV 88,9 85,3 5 STD 89,5 85,6

6 Clearance factor 91,6 90,5 6 Kurtosis 94,1 91,5 6 CPT6 91,9 90,0

7 Crest Factor 91,9 90,8 7 Vmax 94,4 93,2 7 Vmax 94,0 93,6

8 CPT2 90,8 91,5 8 CPT5 94,8 93,3 8 Zero crossing 95,5 95,6

9 Shape factor 93,2 93,5 9 EO 94,9 93,3 9 Variance 94,7 95,7

10 SSC 95,6 93,5 10 Wave length 95,1 93,7 10 SRAV 95,6 95,6

DB5 1 Wilson amplitude 79,1 50,0 1 SSC 81,6 50,0 1 Wave length 79,4 50,0

2 Shannon Entropy 85,9 77,5 2 Wilson amplitude 97,5 90,6 2 Shannon entropy 85,9 70,6

3 Norm Entropy 82,8 79,7 3 Wave length 97,8 97,2 3 SRAV 85,0 81,3

4 CPT6 82,8 86,1 4 Shannon entropy 97,8 97,5 4 Norm entropy 84,7 84,7

5 SRAV 84,4 85,9 5 Sure entropy 97,8 97,5 5 Mean of absolute 84,7 84,1

6 Mean of absolute 84,7 85,3 6 Norm entropy 97,5 97,5 6 CPT5 92,5 96,6

7 Wave length 84,7 84,1 7 SRAV 97,8 97,5 7 SSC 97,5 96,9

8 CPT5 89,7 89,4 8 Mean of absolute 97,5 97,5 8 Wilson amplitude 97,5 97,2

9 SSC 97,5 97,8 9 RMS 97,8 97,5 9 CPT6 97,5 97,8

10 Zero crossing 97,5 97,2 10 CPT6 97,5 96,9 10 RMS 97,5 97,5

	ReliefF	Chi-Square	Information Gain
	Ranking	RF (%)	KNN (%)	Ranking	RF (%)	KNN (%)	Ranking	RF (%)	KNN (%)
DB1	1	Mean	44,8	19,8	1	Mean	44,8	19,8	1	Mean	44,6	19,8
	2	Zero crossing	69,4	43,6	2	CPT5	75,3	46,2	2	Zero crossing	67,7	45,9
	3	CPT4	89,0	78,9	3	Zero crossing	86,8	75,7	3	Skewness	86,1	76,4
	4	Skewness	93,8	94,0	4	Wave length	93,3	91,1	4	Vmax	93,0	92,2
	5	Shape factor	93,0	94,6	5	Shape factor	94,6	96,9	5	Threshold entropy	94,9	96,2
	6	Kurtosis	95,2	97,2	6	CPT4	96,3	98,2	6	Sure entropy	95,9	97,9
	7	Vmax	96,8	99,0	7	Skewness	97,2	98,9	7	Shape factor	97,4	98,8
	8	CPT3	97,4	98,9	8	Mean of absolute	96,9	99,2	8	Kurtosis	98,2	99,1
	9	SSC	97,9	99,2	9	CPT6	97,7	99,3	9	CPT4	98,6	99,1
	10	CPT1	97,7	99,1	10	Shannon entropy	97,8	99,3	10	Wilson Amplitude	97,8	99,3
DB2	1	Mean	45,5	10,1	1	Mean	45,5	10,1	1	Mean	45,9	10,1
	2	Skewness	57,8	39,4	2	SSC	64,8	40,2	2	CPT5	66,5	37,6
	3	SSC	84,4	69,9	3	Zero Crossing	87,8	76,8	3	Thres Entropy	81,8	60,4
	4	Zero Crossing	92,1	91,9	4	Sure Entropy	92,1	89,7	4	Wilson Amplitude	86,3	74,5
	5	Shape Factor	90,5	97,1	5	Kurtosis	92,9	97,2	5	Sure Entropy	87,9	78,2
	6	CPT4	92,1	97,4	6	Threshold Entropy	95,6	98,4	6	SSC	92,5	86,9
	7	Kurtosis	96,6	97,9	7	CPT5	96,2	98,6	7	Kurtosis	94,1	93,9
	8	Crest Factor	96,4	96,9	8	Log Energy Entropy	96,4	98,6	8	Zero Crossing	97,6	98,4
	9	CPT2	96,8	95,2	9	Log Detector	97,8	98,4	9	Shape Factor	97,4	98,8
	10	CPT3	94,7	93,5	10	RMS	95,9	98,4	10	Skewness	97,9	98,4
DB3	1	SSC	33,8	14,3	1	SSC	32,9	14,3	1	SSC	32,9	14,3
	2	Zero Crossing	47,9	32,3	2	Norm Entropy	60,6	31,4	2	Zero Crossing	48,1	32,3
	3	Wave Length	76,8	62,1	3	SRAV	62,9	53,2	3	RMS	77,0	61,7
	4	Vmax	80,9	69,3	4	STD	69,2	62,0	4	Norm Entropy	78,5	64,5
	5	STD	82,4	71,4	5	Variance	70,1	64,0	5	STD	77,9	67,1
	6	RMS	82,2	72,3	6	Shannon Entropy	69,8	63,5	6	Mean of Absolute	81,5	70,5
	7	Norm Entropy	84,1	76,2	7	CPT6	70,2	65,2	7	Wave Length	80,4	72,2
	8	Mean of absolute	83,0	76,3	8	Mean of Absolute	69,4	65,0	8	SRAV	80,9	75,0
	9	SRAV	82,7	76,6	9	RMS	70,0	65,5	9	Log Detector	79,7	75,5
	10	Log Detector	82,6	76,6	10	Log Detector	69,6	67,1	10	Vmax	82,4	76,9
DB4	1	Skewness	18,5	10,0	1	Zero Crossing	21,2	10,0	1	Skewness	18,5	10,0
	2	Vmax	52,7	31,5	2	Skewness	54,7	27,5	2	Wave length	63,7	32,7
	3	Zero crossing	77,9	66,1	3	Wilson amplitude	81,9	64,4	3	Shape factor	84,9	70,9
	4	Impulse factor	86,3	84,1	4	Norm entropy	91,1	83,9	4	RMS	88,9	81,9
	5	Wave length	91,1	86,9	5	SRAV	88,9	85,3	5	STD	89,5	85,6
	6	Clearance factor	91,6	90,5	6	Kurtosis	94,1	91,5	6	CPT6	91,9	90,0
	7	Crest Factor	91,9	90,8	7	Vmax	94,4	93,2	7	Vmax	94,0	93,6
	8	CPT2	90,8	91,5	8	CPT5	94,8	93,3	8	Zero crossing	95,5	95,6
	9	Shape factor	93,2	93,5	9	EO	94,9	93,3	9	Variance	94,7	95,7
	10	SSC	95,6	93,5	10	Wave length	95,1	93,7	10	SRAV	95,6	95,6
DB5	1	Wilson amplitude	79,1	50,0	1	SSC	81,6	50,0	1	Wave length	79,4	50,0
	2	Shannon Entropy	85,9	77,5	2	Wilson amplitude	97,5	90,6	2	Shannon entropy	85,9	70,6
	3	Norm Entropy	82,8	79,7	3	Wave length	97,8	97,2	3	SRAV	85,0	81,3
	4	CPT6	82,8	86,1	4	Shannon entropy	97,8	97,5	4	Norm entropy	84,7	84,7
	5	SRAV	84,4	85,9	5	Sure entropy	97,8	97,5	5	Mean of absolute	84,7	84,1
	6	Mean of absolute	84,7	85,3	6	Norm entropy	97,5	97,5	6	CPT5	92,5	96,6
	7	Wave length	84,7	84,1	7	SRAV	97,8	97,5	7	SSC	97,5	96,9
	8	CPT5	89,7	89,4	8	Mean of absolute	97,5	97,5	8	Wilson amplitude	97,5	97,2
	9	SSC	97,5	97,8	9	RMS	97,8	97,5	9	CPT6	97,5	97,8
	10	Zero crossing	97,5	97,2	10	CPT6	97,5	96,9	10	RMS	97,5	97,5

Table 3 was created using Table 2; the total number of replicates of each of the features for all databases is presented. Where Zero crossing, SSC and Wave length fill the first positions, with 12, 11 and 9 repetitions respectively, proving that these features are the most influential for fault diagnosis in rotating machinery.

Table 3

Repetitions for each feature

Features	Repetitions	Features	Repetitions
Zero crossing	12	Shannon Entropy	5
SSC	11	CPT4	4
Wave length	9	Log Detector	4
Skewness	8	STD	4
SRAV	8	Sure Entropy	4
Mean of absolute	7	Threshold Entropy	3
Norm Entropy	7	CPT2	2
RMS	7	CPT3	2
Shape factor	7	Crest Factor	2
Vmax	7	Variance	2
CPT5	6	Clearance factor	1
CPT6	6	CPT1	1
Kurtosis	6	EO	1
Mean	6	Impulse factor	1
Wilson amplitude	6	Log Energy Entropy	1

The accuracy in the classification in DB1 exceeds 95.0% from using the sixth feature of any of the ranking methods for both the RF and KNN classifier, while the highest accuracy is 99.3% with the nine features ranked with the Chi-Square method for KNN, which is slightly higher than the one obtained in the reference [24, 46] in reference [24] that achieved 98.5% and proposed Hierarchical feature selection based on relative dependency for gear fault diagnosis and use the three domains to extract the features. The reference [46] was 97.08% and proposed Multimodal deep support vector classification with homologous features and its application to gearbox fault diagnosis also processed the three domains of the signal.

In DB2 exceeds 96.0% accuracy using the eighth feature of any of the rankings methods for the RF and KNN classifiers. The highest accuracy is achieved 98.8% with the nine features ranked with the Information Gain method for KNN, accuracy is higher than that reached in the reference [46] and slightly in the reference [38]. At reference [46], 88.41% accuracy was obtained for this purpose Multimodal deep support vector classification with homologous features and its application to gearbox fault diagnosis but processed the three domains of the signal. At reference [38] 97.68% accuracy was obtained for this purpose diagnosis based on deep random forest fusion of acoustic and vibratory signals.

For the case study of DB3, 80% accuracy is exceeded with the RF classifier from the sixth feature entered with the ReliefF and Information Gain rankings. For the KNN classifier, 76.9% accuracy is achieved by entering the ten ranked features. The features ranked by the Chi-Square method are the least accurate in the classification achieved by both RF and KNN. This database was studied in the reference [16, 34] and accuracy was reached higher than 94%, but 256 and 185 features were used respectively, the features were extracted from the domain time frequency in concrete energy from the coefficient of wavelet packages.

In DB4 it exceeds 93.0% accuracy from using the ninth feature of any of the ranking methods for both the RF and KNN classifier. The highest accuracy is 95.7% with the nine features ranked with the Information Gain method for both the RF and KNN classifiers, similar to the one obtained in the reference [47] that achieved 95.5% and proposed automatic feature extraction of time-series applied to fault severity assessment of helical gearbox in stationary and non-stationary speed operation.

For the DB5 with the first three features ranked by the Chi-Square method, 97.5% accuracy in the classification is exceeded, both by RF and KNN, to reach this accuracy with the remaining two ranking methods, the nine features must be used and KNN, to reach this accuracy with the remaining two ranking methods, the nine features must be used.

6 Conclusions

This paper addressed a methodological framework for the selection of the optimal number of feature set using Chi-Square, ReliefF, and Information Gain feature ranking methods. The evaluation was performed by using RF and KNN classifiers and five databases were used to evaluate the methodological framework.

For each of the five databases, 30 features were calculated in the time domain, 24 correspond to conventional features and 6 nonconventional features, but are generally used in the EMG field.

The calculated features of each of the databases were ranked by the methods Chi-Square, Information Gain, and ReliefF. From the rankings of the features, it was observed in Table 2 that in the five databases and by the three ranking methods used at least two nonconventional features were present within the first ten features. In addition, the nonconventional features ZC, SSC, WL, and SRAV, were the most present in the first ten features by the different rankings methods. This indicates that the inclusion of the nonconventional features is relevant for the diagnosis of failures in rotary machinery.

On the other hand, for DB1, DB2, DB4 and DB5 the accuracy of RF and KNN classification was high, but for DB3 83.0% was reported with RF and 76.9% with KNN, indicating that the methodology can be improved with the inclusion of new features in either time, frequency, or time-frequency domains. As indicated in the discussion the DB3, good results were obtained in the references [16, 34], but they employ a great number of features in time-frequency domain.

The easy implementation of the methodology compared to the works referenced in the discussion, where the same case studies were used, suggests that the use of nonconventional features helps to improve the accuracy in the classification and it would be feasible to implement it online for the diagnosis of failures in rotary machinery, in particular because features are processed only in the time domain.

Footnotes

Acknowledgments

This work was funded by the Universidad Politécnica Salesiana under grant No. 002-002-2016-03-03. The experimental work was developed at the GIDTEC research group lab of the Universidad Politécnica Salesiana de Cuenca, Ecuador.

References

Galar

, Thaduri

, Catelani

, Ciani

, Context awarenessfor maintenance decision making: A diagnosis and prognosis approach, Measurement67 (2015), 137–150.

Lei

, Lin

, Zuo

M.J.

, He

, Condition monitoring and faultdiagnosis of planetary gearboxes: A review, Measurement48 (2014) 292–305.

Lei

, Zuo

M.J.

, Gear crack level identification based onweighted K nearest neighbor classification algorithm, Mechanical Systems and Signal Processing23 (5) (2009), 1535–1547.

Hussain

, Fuzzy information system for condition based maintenanceof gearbox,(6), Journal of Intelligent & Fuzzy Systems28 (2015), 2509–2518.

, Sánchez

R.-V.

, Zurita

, Cerrada

, CabreraFault

, Diagnosis for Rotating Machinery Using Vibration MeasurementDeep Statistical Feature Learning, Sensors16 (6) (2016), 895.

Niu

, Data-driven Technology for Engineering System HealthManagement: Design Approach, Feature Construction, Fault Diagnosis,Prognostics, Fusion and Decisions, Springer, 2016.

Cerrada

, Sánchez

R.V.

, Cabrera

, Zurita

, LiMulti-stage

, feature selection by using genetic algorithms for faultdiagnosis in gearboxes based on vibration signal, Sensors15 (9) (2015), 23903–23926.

, Valente de Oliveira

, Sanchez

R.-V.

, Cerrada

, Zuritaand

and Cabrera

, Fuzzy determination of informative frequency bandfor bearing fault detection, Journal of Intelligent & Fuzzy Systems30 (6) (2016), 3513–3525.

Younus

A.M.D.

, Yang

B.-S.

, Intelligent fault diagnosis ofrotating machinery using infrared thermal image, Expert Systemswith Applications39 (2) (2012), 2082–2091.

10.

Elasha

, Greaves

, Mba

, Fang

, A comparative study ofthe effectiveness of vibration and acoustic emission in diagnosing adefective bearing in a planetry gearbox, Applied Acoustics115 (2017), 181–195.

11.

Mohanty

A.R.

, Kar

, Fault detection in a multistage gearbox bydemodulation of motor current waveform, IEEE transactions on Industrial Electronics53 (4) (2006), 1285–1297.

12.

Liu

, Chen

, Dong

, Zero crossing and coupled hidden Markovmodel for a rolling bearing performance degradation assessment, Journal of Vibration and Control20 (16) (2014), 2487–2500.

13.

Feng

, Ma

, Zuo

M.J.

, Amplitude and frequency demodulationanalysis for fault diagnosis of planet bearings, Journal ofSound and Vibration382 (2016), 395–412.

14.

Ziaran

, Darula

, Determination of the state of wear of highcontact ratio gear sets by means of spectrum and cepstrum analysis, Journal of Vibration and Acoustics135 (2) (2013), 021008.

15.

Mohammed

A.A.

, Neilson

R.D.

, Deans

W.F.

, MacConnell

, Crackdetection in a rotating shaft using artificial neural networks andPSD characterization, Meccanica49 (2) (2014), 255–266.

16.

Cabrera

, Sancho

, Sánchez

R.-V.

, Zurita

, Cerrada

, Li

and Vásquez

R.E.

, Fault diagnosis of spur gearbox based onrandom forest and wavelet packet decomposition, Frontiers of Mechanical Engineering10 (3) (2015), 277–286.

17.

Yan

, Gao

R.X.

, Chen

, Wavelets for fault diagnosis of rotarymachines: A review with applications, Signal Process96 (2014), 1–15.

18.

Lei

, Lin

, He

, Zuo

M.J.

, A review on empirical modedecomposition in fault diagnosis of rotating machinery, Mechanical Systems and Signal Processing35 (2013), 108–126.

19.

Tom

K.F.

, Survey of diagnostic techniques for dynamic components, Army Research Lab Adelphi Md Sensors And Electron DevicesDirectorate, 2010.

20.

Jardine

A.K.S.

, Lin

, Banjevic

, A review on machinerydiagnostics and prognostics implementing condition-basedmaintenance, Mechanical Systems and Signal Processing20 (2006), 1483–1510.

21.

Ninoslav

Z.F.

, Rusmir

, Cvetkovic

, Vibration featureextraction methods for gear faults diagnosis-a review, Facta Universitatis, Series: Working and Living Environmental Protection12 (2015), 63–72.

22.

Ang

J.C.

, Mirzal

, Haron

, Hamed

H.N.A.

, Supervised, unsupervised, and semi-supervised feature selection: A review ongene selection. IEEE/ACM Transactions on Computational Biologyand Bioinformatics13 (5) (2016), 971–989.

23.

Bartkowiak

, Zimroz

, Dimensionality reduction via variablesselection – Linear and nonlinear approaches with applicationto vibration-based condition monitoring of planetary gearbox, Applied Acoustics77 (2014), 169–177.

24.

Cerrada

, Sánchez

R.-V.

, Pacheco

, Cabrera

, Zurita

, Li

, Hierarchical feature selection based on relative dependencyfor gear fault diagnosis, Applied Intelligence44 (2016), 687–703.

25.

Liu

, Zuo

M.J.

, Xu

, Feature ranking for support vectormachine classification and its application to machinery faultdiagnosis, Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science227 (2013), 2077–2089.

26.

Vakharia

, Gupta

V.K.

, Kankar

P.K.

, A comparison of featureranking techniques for fault diagnosis of ball bearing, Soft Computing20 (4) (2015), 1601–1619.

27.

Sharma

, Amarnath

, Kankar

, Feature extraction and faultseverity classification in ball bearings, Journal of Vibrationand Control22 (1) (2016), 176–192.

28.

Vakharia

, Gupta

V.K.

, Kankar

P.K.

, Bearing fault diagnosisusing feature ranking methods and fault identification algorithms, Procedia Engineering144 (2016), 343–350.

29.

Vinay

, Kumar

G.V.

, Kumar

K.P.

, Application of chi squarefeature ranking technique and random forest classifier for faultclassification of bearing faults, Proceedings of the 22th International Congress on Sound and Vibration, Florence, Italy (2015), 12–16.

30.

Pacheco

, de Oliveira

J.V.

, Sánchez

R.-V.

, Cerrada

, Cabrera

, Li

, Zurita

and Artés

, A statistical comparison of neuroclassifiers and feature selection methods for gearbox fault diagnosis under realistic conditions, Neurocomputing194 (2016), 192–206.

31.

Menze

B.H.

, Kelm

B.M.

, Masuch

, Himmelreich

, Bachert

, Petrich

and Hamprecht

F.A.

, A comparison of random forest and its Gini importance with standard chemometric methods for the featureselection and classification of spectral data, BMC Bioinformatics10 (2009), 213.

32.

Liu

, Wang

, Li

, Comparison of random forest, support vector machine and back propagation neural network forelectronic tongue data classification: Application to therecognition of orange beverage and Chinese vinegar, Sensors andActuators B: Chemical177 (2013), 970–980.

33.

Caruana

, Karampatziakis

, Yessenalina

, An empiricalevaluation of supervised learning in high dimensions, Proceedings of the 25th international conference on Machinelearning (2008), 96–103.

34.

Cerrada

, Zurita

, Cabrera

, Sánchez

R.-V.

, Artésand

, Li

, Fault diagnosis in spur gears based on genetic algorithmand random forest, Mechanical Systems and Signal Processin70 (2016), 87–103.

35.

Han

, Jiang

, Rolling bearing fault diagnostic method basedon VMD-AR model and random forest classifier, Shock and Vibration2016 (2016).

36.

Satishkumar

, Sugumaran

, Vibration based health assessmentof bearings using random forest classifier, Indian Journal ofScience and Technology9 (2016).

37.

Wang

, K-nearest neighbors based methods for identification ofdifferent gear crack levels under different motor speeds and loads:Revisited, Mechanical Systems and Signal Processing70 (2016), 201–208.

38.

, Sanchez

R.-V.

, Zurita

, Cerrada

, Cabrera

and Vásquez

R.E.

, Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals, Mechanical Systemsand Signal Processing76 (2016), 283–293.

39.

Pacheco

, de Oliveira

J.V.

, Sánchez

R.-V.

, Cerrada

, Cabrera

, Li

, Zurita

and Artés

, A statistical comparison of neuroclassifiers and feature selection methods for gearbox fault diagnosis under realistic conditions, Neurocomputing194 (2016), 192–206.

40.

Pacheco

, Cerrada

, Sánchez

R.V.

, Cabrera

, Li

and de Oliveira

J.V.

, A methodological framework using statistical tests forcomparing machine learning based models applied to fault diagnosisin rotating machinery, Computational Intelligence (LA-CCI),2016 IEEE Latin American Conference on (2016), 1–6.

41.

Yang

B.-S.

, Di

, Han

, Random forests classifier for machinefault diagnosis, Journal of Mechanical Science and Technology22 (2008), 1716–1725.

42.

Patel

R.K.

, Giri

V.K.

, Feature selection and classification ofmechanical fault of an induction motor using random forest classifier, Perspectives in Science8 (2016), 334–337.

43.

William

P.E.

, Hoffman

M.W.

, Identification of bearing faultsusing time domain zero-crossings, Mechanical Systems and signalProcessing25 (2011), 3078–3088.

44.

Nayana

B.R.

, Geethanjali

, Analysis of statistical time-domainfeatures effectiveness in identification of bearing faults from vibration signal,, IEEE Sensors Journal17 (2017), 5618–5625.

45.

Loparo

K.A.

, Bearing vibration data set, (2003). https://csegroups.case.edu/bearingdatacenter/pages/download-data-file.

46.

, Sanchez

R.-V.

, Zurita

, Cerrada

, Cabrera

and Vásquez

R.E.

, Multimodal deep support vector classification with homologous features and its application to gearbox fault diagnosis, Neurocomputing (2015), 119–127.

47.

Cabrera

, Sancho

, Li

, Cerrada

, Sánchez

R.-V.

, Pacheco

and de Oliveira

J.V.

, Automatic feature extraction oftime-series applied to fault severity assessment of helical gearboxin stationary and non-stationary speed operation, Applied SoftComputing58 (2017), 53–64.

48.

Pacheco

, Cerrada

, Sánchez

R.-V.

, Cabrera

, Li

and Valente de Oliveira

, Attribute clustering using rough set theory forfeature selection in fault severity classification of rotatingmachinery, Expert Systems with Applications71 (2017), 69–86.

49.

Zhao

, Zuo

M.J.

, Liu

, Diagnosis of pitting damage levels ofplanet gears based on ordinal ranking, Prognostics and HealthManagement (PHM), 2011 IEEE Conference on (2011), 1–8.

50.

Phinyomark

, Limsakul

, Phukpattaranont

, A novel feature extraction for robust EMG pattern recognition, arXiv preprintarXiv:0912.3973 (2009).

51.

Bagheri

, Ahmadi

, Labbafi

, Application of data mining and feature extraction on intelligent fault diagnosis by artificial neural network and k-nearest neighbor, Electrical Machines (ICEM), 2010 XIX International Conference on (2010), 1–7.

52.

Lei

, He

, Zi

, A new approach to intelligent faultdiagnosis of rotating machinery, Expert Systems withApplications35 (4) (2008), 1593–1600.

53.

Sharma

, Amarnath

, Kankar

, Feature extraction and faultseverity classification in ball bearings, Journal of Vibrationand Control22 (1) (2016), 176–192.

54.

Chowdhury

, Reaz

, Ali

, Bakar

, Chellappan

, ChangSurface

, electromyography signal processing and classificationtechniques, Sensors13 (9) (2013), 12431–12466.

55.

Robnik-Šikonja

and Kononenko

, Theoretical and empiricalanalysis of ReliefF and RReliefF, Machine Learning53 (1-2) (2003), 23–69.

56.

Plackett

R.L.

, Karl Pearson and the chi-squared test, International Statistical Review/Revue Internationale deStatistique (1983), 59–72.

57.

Novakovic

, Strbac

, Bulatovic

, Toward optimal featureselection using ranking methods and classification algorithms, Yugoslav Journal of Operations Research21 (1) (2011), 119–135.

58.

, Kumar

, Ross Quinlan

, Ghosh

, Yang

, Motoda

, McLachlan

G.J.

, Ng

, Liu

, Yu

P.S.

, Zhou

Z.-H.

, Steinbach

, Hand

D.J.

and Steinberg

, Top 10 algorithms in data mining, Knowledge and Information Systems14 (1) (2008), 1–37.

59.

L.-Y.

, Huang

M.-W.

, Ke

S.-W.

, Tsai

C.-F.

, The distancefunction effect on k-nearest neighbor classification for medicaldatasets, SpringerPlus5 (1) (2016), 1304.

60.

Breiman

, Random forests, Machine Learning45 (1) (2001), 5–32.

61.

Genuer

, Poggi

J.-M.

, Tuleau-Malot

, Villa-Vialaneix

, Randomforests for big data, Big Data Reserch9 (2017), 28–46.