A semi-supervised approach based on evolving clusters for discovering unknown abnormal condition patterns in gearboxes

Abstract

Fault diagnosis plays a crucial role to maintain healthy conditions in rotating machinery. In real industrial applications, a Machine Learning based Classifier (ML-C) analyses data from a current machinery condition to detect abnormal behaviours. Usually, this is achieved through a previous training of the ML-C model, under supervised learning; however, for new machinery conditions, the classifier is not able to correctly identify these new condition. This paper proposes a framework to detect new patterns of abnormal conditions in gearboxes, that could be associated to new faults. The framework relies on an algorithm to build evolving models in simultaneous scenarios of classification and clustering. The design is inspired by the main principles of the K-means and the One Nearest Neighbour (1-NN) algorithms. A heuristic metric is defined to analyse the new discovered clusters; as a result, these new clusters can be labelled as new classes corresponding to new faulty patterns. Once a new pattern is identified, the associated data feeds a dedicated supervised classifier which is updated through a new training phase. The proposed framework is tested on data collected from a gearbox test bed under realistic conditions of faults. Experimental results show that the algorithm is able to discover new valuable knowledge than can be identified as new faulty classes.

Keywords

Knowledge discovery machine learning semi-supervised learning fault detection fault diagnosis gearboxes

1 Introduction

Fault diagnosis, in rotating machinery, is classically addressed by data driven classification, under a supervised learning [6]. Several works follow this approach for fault diagnosis in rotating machinery, such as the recent works in [3, 16] where deep learning and deep convolutional neural network are used for feature fusion and extraction, and Support Vector Machine (SVM) is used for classification. A multidimensional hybrid intelligent method for gear fault diagnosis is reported in [24], where several classifiers are trained and the classification result is combined with a genetic algorithm to get a better performance. The proposal in [22] uses multi-class SVM for rolling bearing fault diagnosis. In [14, 27], supervised approaches combining optimization process are developed, to obtain classifiers with high separability and compactness of different classes.

Unfortunately, supervised approaches for data driven fault diagnosis are not fully useful in real industrial applications, because the knowledge of the all faulty modes is hard to characterize for building a good fault classifier. When a classifier analyses an unlabeled sample, this sample is assigned to the nearest known class. This classification is wrong as the real machinery condition was not considered in the classifier training phase. For this real case, a procedure based on knowledge extraction is needed to detect whether a new sample could be a new pattern.

Semi-supervised learning is used to exploit unlabeled data in the field of fault diagnosis. The work in [28] analyses labeled and unlabeled data to build classification models for fault detection based on SVM under a semi-supervised scenario. The approach in [25] uses manifold regularization based on semi-supervised learning in order to detect faults by using labeled and unlabeled data. A recent work also addressing the semi-supervised approach by using dictionary learning in fault diagnosis of rotating machinery is developed in [12]. Semi-supervised fuzzy C-means cluster analysis for fault diagnosis is performed in [4].

On the other hand, approaches for discovering faulty patterns in rotating machinery have been also reported. In [18], a neuro-fuzzy system is used to discover knowledge rules from vibration signals associated with faulty bearings. A rule interpretation process allows detecting new faulty behaviours. The proposals in [1, 26] develop unsupervised approaches for fault detection and diagnosis with cluster analysis of vibration signals. Clustering techniques are mainly studied in these types of problems due to their skills of finding hidden patterns in the data with unknown symptoms [8]. Unsupervised learning algorithms are extended in [13] for machine defects classification, based on some improvements of local and global regressive mapping for machine fault diagnosis.

The knowledge discovery is one of the main focus in semi-supervised and unsupervised approaches for data driven applications. Under unsupervised learning, different approaches aim at discovering new clusters that might be associated with new unknown classes. This type of learning is commonly called ‘partial supervision’ [21]. From the semi-supervised strategy, the most common approaches are oriented to: (i) generate a model with labelled samples, and then evaluate the unlabelled samples over the model [25], and (ii) build an unsupervised model with the unlabelled examples, and the clusters are analysed by a supervised scheme to define the classes [23, 28]. Recent works have also included evolving properties on the clustering and classification algorithms to reconfigure the cluster and classes structures under on-line and data stream scenarios [2 , 17]. The main idea is the control of the clusters formation for new knowledge discovery.

This work deals with the problem of new faulty pattern discovery from unlabelled samples that are collected from monitoring process, by proposing a semi-supervised framework for building fault classifiers that are updated in an on-line manner. This is accomplished by using a new Heuristic Algorithm for Evolving Models in scenarios of Classification and Clustering (HHA-EMCC). HHA-EMCC is able to combine the supervised and unsupervised concepts of 1-NN and K-means, under some criteria. HHA-EMCC follows some guidelines to build classes with labelled data and to discover new patterns from the unlabelled data. These new clusters are evaluated by a metric, which identifies whether the clusters are new classes, i.e. new patterns, or they are composed by inconclusive samples or noisy samples. A dedicated supervised classification model runs in parallel to the HHA-EMCC, and it is constantly updated when new clusters are discovered, i.e, new patterns feed the classification model for a new training phase. Inconclusive samples are also treated by the classification models, as it is assumed that all the new incoming unlabelled samples belong to the current operational condition. The proposed framework is applied to detect new faulty conditions in gearboxes. Results show that new clusters associated to new faulty condition are adequately identified.

This paper is organized as follows. Section 2 introduces the main characteristics of HHA-EMCC to build hybrid evolving models. Section 3 details the proposed semi-supervised framework. Section 4 presents the experimental procedure to collect the dataset that will be used to test our proposal. Section 5 evaluates our proposal in different scenarios, and addresses the results analysis. Finally, Section 6 concludes this work and outlines future research directions.

2 Background

HHA-EMCC is an algorithm inspired by the proposal in [19]. This algorithm aims at combining the principles of 1-NN and K-means approaches to solve: (i) classification tasks with labelled samples, (ii) clustering tasks with unlabelled samples, and (iii) both classification and clustering in the same scenario with labelled and unlabelled samples. These three main features are in the same ML model which in turn will be composed by clusters or classes, or both. In general, clusters and classes will be denoted as group. In this sense, HHA-EMCC can create hybrid models with classes and cluster simultaneously.

Let X ={ (x₁, y₁) , ⋯ , (x_n, y_n) } be the available dataset, with x_i a sample of m-dimensional features and y_i defined by the expression in (1). $y_{i} = {\begin{matrix} 0 & if x_{i} isunlabelled \\ l_{i} & if x_{i} islabelled \end{matrix}$ (1)

where l_i is a label.

Definition 1. A group P_r is a collection of data samples described by the tuple in (2), $P_{r} = (C_{r}, t_{r}, {\bar{X}}_{r}, type, name)$ (2) where C_r is the centroid, t_r is the neighbour threshold, ${\bar{X}}_{r}$ is the set of samples (x_i, y_i) in P_r, type is either ‘class’ or ‘cluster’, and name is the identifier of P_r. If type = class then name = l_r, where l_r is the label concatenating all the different labels y_i = l_i of the samples $x_{i} \in {\bar{X}}_{j}$ . Otherwise, type = cluster and name = Cluster_id. The threshold t_r is a measure of the group compactness, and it is given by $t_{r} = \frac{\sum_{i = 1}^{n_{r}} \sum_{j = 1}^{n_{r}} | x_{i} - x_{j} |}{n_{r}} j \neq i$ , where $x_{i}, x_{j} \in {\bar{X}}_{r}$ , n_r is the number of samples in P_r.

2.1 Model principles

HHA-EMCC design allows the model to reconfigure its structure (if needed) in presence of a new incoming sample. More precisely, the core micro-processes for training and testing the model remain even when the model is already working in an on-line manner. The steps to build and reconfigure the model are: (i) in an off-line manner, the training set will provide some parameters to the model in order to guide the process, (ii) the input data, either the training set or an incoming sample, is evaluated by a sample assignment process which decides whether the sample belongs to an existing group or a new group has to be created, (iii) after a sample is assigned to a group, the next step is to find the group similarity with its neighbour group. This evaluation can activate the merging or the split process of the current groups in the model.

Figure 1 shows the workflow, where the main result of the training phase is a model composed of groups (clusters and/or classes). The result from either the sample assignment, split or merging process modifies the model (these connections are standing up in blue rows).

Fig.1

Workflow of the proposed algorithm.

Definitions and functionalities of the HHA-EMCC were already given by the authors in [19], in which the workflow in Fig. 1 was used to cluster similar features, also called attributes. HHA-EMCC principles remain the same as presented in [19] with the following differences: (i) features are replaced by samples, (ii) the similarity between samples is computed trough the Euclidean distance. The most important characteristics of each component in Fig. 1 are described as follows.

2.1.1 Sample assignment

The main K-means and 1-NN basis are combined to find the similarity between an incoming sample and the classes or clusters in the model. Given a sample x_i, the similarity is computed through three procedures in the following order:

The centroid similarity proposed by K-means is used to assign the sample x_i into a provisional group P_r. This allows identifying the region where the nearest sample, and neighbour group to x_i may be located.

1-NN is used to locate the nearest sample $x_{{nn}_{P_{r}}}^{i}$ to x_i, just considering the samples in the provisional group assigned P_r and neighbour group P_v to x_i. A local search is proposed instead of computing the distance among all the sample space, as proposed by the classic 1-NN algorithm.

Thereafter, the final assignment is performed in the proper group P_j which could be P_r or P_v. P (x_i) = P_j if the distance between x_i and its nearest neighbor $x_{{nn}_{P_{j}}}^{i}$ meets the inequality $d (x_{i}, x_{{nn}_{P_{j}}}^{i}) < t_{j}$ .

The distance, between the groups where the sample was finally assigned and its neighbours, is compared with reference values in order to perform group merging, group split or new group construction.

After applying the previous procedure, the results of the sample assignment are: (i) an unlabelled sample assigned to a class is labelled with the class label, (ii) groups composed by labelled samples represent a class, (iii) groups composed by unlabelled samples represent a cluster, (iv) a labelled sample assigned to a cluster makes the cluster becomes a class, and (v) a labelled or unlabelled sample, that is not assigned to any existing group, creates a new class or cluster, respectively.

2.1.2 Merging process

As proposed in [19], two neighbour groups should be merged, i.e. class-cluster or cluster-cluster, either because a low distance or high density exist between their borders. The previous scenario can be detected through the use of: (i) a distance metric to measure the proximity between groups, and (ii) a density metric to measure the proportion of elements in the overlapped area, if any. The merging between two classes is not allowed by our proposal. When successive assignments of samples with another label are placed into a class, it might imply a data problem commonly associated to unbalanced data or overlapping between classes. This case is not studied by this work.

2.1.3 Split process

As proposed in [19], this scenario appears when a new sample x_i does not belong to its provisional group P_i, but it is assigned to its neighbour group P_j. After verifying some conditions, it can be concluded that some neighbours of x_i in P_i may be closer to P_j than to P_i. This is verified by measuring the similarity between the neighbours of x_i to P_j. At the end of the split process, some samples initially reassigned to P_i are now assigned to P_j.

2.2 Example

An example of the HHA-EMCC reconfiguration capability is presented in Fig. 2. Gaussian mixtures are used to create an artificial dataset whose samples, features, classes and variance are previously defined. The original dataset has four well-defined classes as shown in Fig. 2(a). Labels 1 and 3 were removed from the samples, and the model was trained with only labelled samples in classes 2 and 5; next, the unlabelled samples are evaluated. According to Fig. 2(b), HHA-EMCC was able to create the original classes 2 and 5 and new clusters that represent the original classes 1 and 3 (clusters 10 and 41, respectively). In addition, HHA-EMCC creates other clusters with the samples placed in the border of the clusters 10 and 41 (the clusters 33, 130, 133, 135 and 136). The samples in the mentioned clusters, that originally belong to classes 1 or 3, are not recognized by the algorithm in the corresponding cluster 10 and 41, but they are associated to noise or anomalous behaviours in the case study.

Fig.2

(a) Classes from an artificial dataset created by using Gaussian mixtures. (b) Classes and clusters generated by the proposal: training with labelled samples corresponding to classes 2 and 5 and, testing with unlabelled examples corresponding to classes 1 (cluster 10) and 3 (cluster 41), and other clusters associated with noise or anomalous behaviour.

3 Framework for discovering new condition patterns in fault diagnosis using HHA-EMCC and RF

In this section, the framework for discovering new patterns for fault diagnosis is presented. The main objective is to have an extended diagnoser with new discovered classes. Figure 3 overviews the activities performed by the framework. HHA-EMCC is the core of our proposal, and it allows detecting new knowledge in fault diagnosis through cluster constructions. This diagram covers: (i) the historical data analysis through feature extraction and feature selection, that is essential for applying HHA-EMCC, (ii) the training and re-training phases to build the HHA-EMCC-based hybrid model and classification-based diagnoser, (iii) the iterative process for knowledge discovery, until all the new unlabelled monitoring samples are processed. The short acronym HACC in Fig. 3 states for Heuristic Algorithm for Classification and Clustering.

Fig.3

Activity diagram of the proposed framework.

To apply our framework, two assumptions are stated:

Assumption 1: there is historical data with normal or certain faulty condition. At least two labelled condition patterns are known. This assumption helps illustrating the feature selection effect in the framework, however, this is not a necessary requirement for running HHA-EMCC.

Assumption 2: in the working phase, the data collected in a time interval is related to a current operating condition, and it is stored in a separated dataset as unlabelled samples. The results after applying the knowledge discovery phase is related to this operating condition.

The main components of the framework are detailed as follows. Section 3.1 gives the guidelines to prepare the data for the hybrid process. The training (or re-training) phase is discussed in Section 3.2. In Section 3.3 the hybrid process for discovering new knowledge will be fully detailed.

3.1 Data analysis phase

This phase is composed of the data acquisition process, feature extraction and feature selection. The data acquisition and feature extraction processes depend on the particular application domain. In this work, the data acquisition is oriented to measure vibration signals for rotating machinery, and the feature extraction is devoted to compute statistical parameters from vibration analysis on time domain, frequency domain and time-frequency domain, as presented in Section 4. Next, the RF algorithm is used to perform the feature selection process. RF is a well-known classifier that is also used for feature selection. RF detects the most significant variables, called important or significant variables, through a metric that measures the information degree contributed from each feature to all the classes. If the information degree is high, the feature is more significant [10].

3.2 Training phase

Once the labelled dataset L is available after the data analysis phase, a training process is accomplished by using HHA-EMCC and the RF classifier. Two models are built: (i) one hybrid model for discovering new patterns, and (ii) other one for classification (diagnoser). This component must run at the beginning, and it aims at having the initial setting of the classification models, according to Assumption 1. Therefore, the first HHA-EMCC-based hybrid model is trained only with the classification part execution as described in Fig. 3. From this initial setting, HHA-EMCC can run in the hybrid mode when new unlabelled data is obtained for the current operating condition. A classifier is also built in the initial setting, in order to have an specialized model for diagnosing the initial historical labelled samples. In this work, the RF based model has been selected as diagnoser due to its good performance in classification problems. This initial diagnoser model is able to process samples that are recognized as known patterns by the framework. The training phase for the diagnoser is activated again in the working phase, if a new cluster representing a new condition (new group) is detected by the HHA-EMCC.

3.3 Knowledge discovery phase

The knowledge discovery phase is activated when new unlabelled samples are available from the monitoring process in the working phase. The best features, identified previously in the data analysis phase, are extracted from the new unlabelled samples, and the new samples are stored in the dataset U. The core of the knowledge discovery phase is the HHA-EMCC described in Section 2. This phase performs an iterative process that runs until the entire new dataset U is processed; consequently, it is necessary to read each new behavior i stored in U, at each iteration. The Assumption 2 applies for the dataset U. The results given by HHA-EMCC are clusters or classes. However, the HHA-EMCC model in this framework must be able to identify whether the clusters represent new knowledge or not. For this functionality, an heuristic metric H is defined to determine whether the new cluster C_k represents new knowledge or not. The heuristic metric H_k is defined by (5), where k, j are cluster identifiers; p is the number of prototypes (centroids) created by HHA-EMCC. $H_{k} = \frac{\sum_{j = 1}^{p - 1} M_{kj} d_{kj}}{p - 1}$ (3) where $d_{k, j} = \frac{N_{k}}{N_{k} + N_{j}}$ is the density of the cluster C_k regarding the cluster C_j, N_k and N_j are the number of the samples in C_k and C_j respectively; $M_{kj} = 1 - \frac{{Co}_{k}}{{Sep}_{kj}}, k \neq j$ , and it compares two different clusters or classes contrasting their cohesion and separation. The cohesion Co is the distance between a sample and its corresponding centroid, while the separation Sep is given by the difference between two different centroids.

The heuristic metric H can have the following values for decision-making: (i) H_k ≈ 1, then a cluster k is defined as a new class because that means the cluster is well-formed and it has a high density compared to other clusters and classes, (ii) H_k ≈ 0, the cluster and its samples are targeted as noisy, and (iii) H_k ≈ 0.5, the cluster and its samples are marked as inconclusive data. Finally, after applying the metric H_k, the results in the knowledge discovery phase are: (i) diagnosis of those samples that the framework has assigned to a class, (ii) new labelled data belonging to the class newClass_i that was identified as new knowledge (dense clusters), (iii) inconclusive data associated to samples, which belong to low-density clusters, and (iv) noise data that is neither dense or low-density cluster.

Once new groups are obtained, the known patterns are processed by the classifier to make the final diagnostic decision. The new labelled samples (newClass_i), extend the labelled dataset for a re-training phase to include this knowledge in the classifier model. From here, and based on the Assumption 2, the inconclusive data are validated with the updated classifier (diagnoser), and the samples that match with the newClass_i are labelled. Finally, these labelled samples extend again the dataset for a next re-training phase. At the end of the process, a new diagnoser is extended with the new newClass_i. The pseudo code of the knowledge discovery phase is described in Table 1.

Table 1

Pseudo code of the knowledge discovery phase

Macro algorithm
Input:
Unlabelled dataset U
HHA-EMCC based hybrid model (HHA-EMCC-HM)
Diagnoser D
Procedure:
1. For i = 1 to size (U)
1.1. Evaluate U (i) in HHA-EMCC-HM
1.2. If U (i) is a known class,
i. Send the sample to the diagnoser D
ii. Obtain the diagnosis diag
1.3. Else,
i. Run HHA-EMCC in HHA-EMCC-HM
ii. Generate new clusters or extend the existing clusters C_k.
2. For k = 1 to p
2.1. Evaluate the cluster C_k through the heuristic metric H_k
2.2. If the cluster C_k is identified as a new pattern, assign the label newClass_i
2.3. Else, mark the cluster C_k as inconclusive data or noisy data.
3. If the previous step generated new labelled data (newClass_i) or diagnosis diag,
3.1. Extend the labelled dataset L with diag or with the newClass_i’s samples.
3.2. Re-train the classification model and update the diagnoser D
3.3. If the previous step generated inconclusive data (ID),
3.3.1. Evaluate ID in the diagnoser D and obtain diag
3.3.2. If diag matches with the newClass_i, label the examples in ID with diag
3.3.3. Extend the dataset L with this labelled data for the next re-training phase.
Output:
Diagnoser D with new discovered knowledge
HHA - EMCC - HM with new classes and clusters

4 Experimental test bed

This section describes the experimental test bed, and the procedure to build the raw dataset that will be used by our proposal. The feature extraction process transforms signals into condition parameters in order to be used by the ML algorithms. Section 4.1 details the procedure to collect these raw signals from a test bed, and Section 4.2 describes the feature extraction process to obtain the dataset L. More detailed information about the test bed and faults related to this case study are given in [16, 20].

4.1 Measurement procedure

Figure 4 shows all the elements that are placed in the experimental test bed. The gearbox structure is depicted in Fig. 5, two spur gears Z1 and Z2 –with number of teeth 53 and 80, respectively, modulus 2.25 and impact angle 20°– were installed on the input and the output shafts of the gearbox. A three phase motor generates the rotation motion, this motor is supplied with 220 V at 60 Hz and nominal speed 1650 rpm. The motor was used with an adjustable-speed drive to generate different constant speeds at 8 Hz, 12 Hz and 15 Hz. Variable speed was also considered over ranges at 5–12 Hz, 12–18 Hz, 8–15 Hz. The measure procedure starts when a motor torque motion is transmitted to the gearbox; as a consequence, a second torque is produced and transmitted to a pulley. The pulley is part of a magnetic brake control for the load regulation according to voltage inputs on 10 V and 30 V. The main objective is collecting several samples of the vibration signals for each faulty condition, including the healthy one (without fault), under different operational conditions of the motor speed and load [20]. This is achieved through the PBC IEPE accelerometer with a sensitivity of 100 mV/g, that is vertically placed in the gearbox case. The data acquisition system was performed with the NI CompactDAQ-9191 of National Instruments and the module NI 9234 which is inserted in the DAQ slot. The data acquisition was developed in our laboratory over a NI LabVIEW environment.

Fig.4

Experimental test bed for feature extraction from vibration signals associated to several fault conditions in gears.

Fig.5

Gearbox structure for fault analysis.

Table 2 describes the different induced faults. These faults correspond to particular physical conditions of the device, such as healthy operation, moderate or severe faults.

Table 2

Gear fault conditions

Label	Description
f1	Healthy pinion (Z1), healthy gear (Z2)
f2	Pinion tooth chafing (Z1), healthy gear (Z2)
f3	Pinion tooth wear (Z1), healthy gear (Z2)
f4	25% Pinion tooth breakage (Z1), healthy gear (Z2)
f5	50% Pinion tooth breakage (Z1), healthy gear (Z2)
f6	100% Pinion tooth breakage (Z1), healthy gear (Z2)
f7	Healthy pinion (Z1), 25% gear crack (Z2)
f8	Healthy pinion (Z1), 100% gear crack (Z2)
f9	Healthy pinion (Z1), 50% gear chafing (Z2)
f10	25% Pinion tooth breakage (Z1), 25% gear crack (Z2)

4.2 Feature extraction

In the previous step, 900 raw vibration signals have been collected, 90 samples for each fault condition in Table 2. This phase processes the signals in order to extract the most common features that describe all the condition states. Therefore, 817 features were obtained from three different domains:

Time domain parameters In this experimental setting, seven classical statistical features were obtained over all the signal length, for each sample in time domain, such as: Root mean square (RMS), energy operator, crest factor, mean, standard deviation, variance and skewness.

Frequency domain parameters The raw vibration signals were transformed into frequency signals using Fast Fourier Transform (FFT), as follows: (i) two signals on decade frequency domain, one with dimensionless magnitude and other one with decibels magnitude, (ii) two signals on octave frequency domain one with dimensionless magnitude and other one with decibels magnitude. These four frequency domain signals were split into bands: 80 bands in case (i) and 15 bands in case (ii). Statistical parameters such as RMS, mean, standard deviation and kurtosis were calculated for the two signals in case (i) and mean, standard deviation and kurtosis were calculated for the two signals in case (ii). As a result, 730 features were obtained with this analysis.

Time-frequency domain parameters Wavelet Packet Decomposition (WPD) is applied over the raw vibration signal in time domain, using five mother wavelets: Daubechies (db7), Symlet (sym), Coifier (coif4), Biorthogonal (bior6.8) and Reverse Biorthogonal (rbior6.8). These mother wavelets are used due to the good performance that they have exhibited in machine learning classification [5, 7]. WPD is performed until four levels for each mother wavelet, then 2⁴ coefficients are obtained for each one. Finally, 80 features associated to the energy parameter are extracted.

5 Results and analysis

This section presents the results after applying the framework for knowledge discovery in fault scenarios of gearboxes. The experimental test shows that our proposal can deal with this scenario.

5.1 Data analysis phase

Data acquisition and feature extraction were presented in Section 4 for obtaining the raw dataset. Feature selection is accomplished in two sequential steps: (i) correlation analysis was developed to reduce the redundancy and retain the low correlated features, according to a specified threshold, (ii) a Random Forest (RF) algorithm was applied to measure the relevance of the low correlated features, under complete and partial information. The latter is considered due to a few set of known classes are usually available in real cases, e.g. some common conditions such as the normal operation and few faulty states previously identified from historical analysis.

In the step (i), more than 400 features with a correlation higher than 95% were deleted, and only 330 features from the original dataset were retained. In the step (ii), under complete information scenario, the entropy based information degree was computed by applying the RF algorithm over the complete database with 330 features and 10 fault classes. The feature selection was accomplished considering three thresholds for the entropy values: higher than 40%, 50% and 60%. Then, 29 features were selected with entropy higher than 40%, 14 and 9 features for 50% and 60%, respectively. Under partial information scenario, the samples with labels f1, f5, f6 and f8 have been selected as the known classes (see Assumption 1 in Section 3). The rationale for this selection is because these faults are related to the healthy condition and severe faults, and usually they are easy to detect and characterize. RF is applied on this dataset, and 8 features with entropy upper than 50% are retained. By comparing this result with those ones obtained under complete information scenario, six additional features are required to keep the same degree of information.

In order to evaluate this selection, several classification models have been trained by considering the previous feature selection. Table 3 shows the classification accuracy is slightly decreased with 8 features selected under partial information, regarding the accuracy obtained with 14 features selected under complete information. Then, the 8 features selected under partial information were considered to the next phase.

Table 3
Accuracy values by using different classifiers for diagnosing 10 faults

Number of features

Classifier 330 29 14 9 8

Random Forest 0.9932 0.9852 0.9629 0.9555 0.9549

Decision tree 0.87037 0.8963 0.8926 0.8962 0.9390

1-NN 0.7555 0.9888 0.9741 0.9741 1

Gaussian 0.7037 0.8962 0.8629 0.8629 0.9149

Number of features
Random Forest	0.9932	0.9852	0.9629	0.9555	0.9549
Decision tree	0.87037	0.8963	0.8926	0.8962	0.9390
1-NN	0.7555	0.9888	0.9741	0.9741	1
Gaussian	0.7037	0.8962	0.8629	0.8629	0.9149

5.2 Training phase

As described is Section 3.2, this phase is executed at first, to set the initial configuration of the classification models, according to Assumption 1. Two initial classification models are obtained in the first execution: (i) one model by using the HHA-EMCC under the classification functionality, and (ii) one model by using the supervised Random Forest algorithm. This initial setting was executed under the following scenarios: (a) Faults f1, f5, f6 y f8 are known classes, and the associated samples are labelled samples in the historical dataset L, see Fig. 3, (b). RF model is used as classifier with 100 random trees, and (c) Both classification models based on HACC and RF are trained with the historical labelled dataset L, as proposed by the framework in Section 3.

Once the initial setting is accomplished, both classification model are ready to process new unlabelled data in the next phase. HACC based model is able to discover new patterns different from the current known classes, in this case RF based model is trained again to create a new classifier updated with the new discovered pattern.

5.3 Knowledge discovery phase

This section details the experiments to test the ability of our approach for discovering new knowledge, by processing the available dataset which has been split in the set L of labelled samples and U of unlabelled samples. The following scenarios are assumed: (a) Two initial classification models are available from the first training with the labelled data in the set L, related to known faults, (b) The remaining faults are considered as unknown classes, and their unlabelled samples are in the set U, see Fig. 3. They are new patterns that could be discovered from the monitoring process, (c) Samples related to certain fault conditions will be analysed as a group of samples, according to Assumption 2 in Section 3. Physically, they represent the faulty conditions f2, f7, f9 and f10 (see Table 2) of the gears, and (d) The 70% of the available samples for each condition are used for training and the remaining for test analysis. Tr denote the number of samples for training.

The results after running the framework are presented in the following items. Each item is related to a set of samples i in U_k (i), k = 1, . . . , 4, where k defines a new working phase, for a new condition of the gearbox, in the data set U (i). All the figures are in the 3-Dimensional (3-D) space after applying PCA over the 8 selected features.

Experiment with k = 1:

This experiment considers that the gearbox is in the faulty condition f2, and a set of unlabelled samples U₁ (i), i = 1, . . . , Tr, related to this condition is available. The sample distribution is shown in Fig. 6(a), with the known dataset (classes) and the unlabelled dataset in L and U₁ (i), respectively. Once the framework is executed, the results are the classes and clusters in Fig. 6(b). HHA-EMCC created correctly the classes associated with faults f1, f5, f6 y f8, and new clusters with the new condition were also identified. The heuristic metric in Equation (5) was able to label one of them as newClass0 corresponding to the new discovered condition f2. The clusters 5 and 12 are marked as noisy clusters, and their samples are excluded in the next experiments. The new labelled samples feed the dataset for training the RF classifier with this new knowledge (see Fig. 7(a)). The classification accuracy was around 0.9924.

Experiment with k = 2:

This experiment considers the gearbox is in the faulty condition f7, and a set of unlabelled samples U₂ (i), i = 1, . . . , Tr, related to this condition is available. The result obtained from the previous experiment are included as a new class, and the dataset L is updated. Figure 7(a) shows the known dataset (labelled samples) in L, and the unlabelled dataset in U₂ (i). After running, HHA-EMCC efectively detects a new behaviour creating a new cluster with the unlabelled samples, according to the heuristic metric in Equation (5); then, they are labelled as newClass1, (see Fig. 7(b)). There are neither inconclusive data nor noisy data in this experiment. The RF classifier is retrained with this new class, and the classification accuracy was around 0.9875.

Experiment with k = 3:

This experiment considers the gearbox is in the faulty condition f9, and a set of unlabelled samples U₃ (i), i = 1, . . . , Tr, related to this condition is available. Figure 8(a) shows the current known classes in L including those discovered in the previous experiments, and the unlabelled dataset in U₃ (i). HHA-EMCC generated the clusters shown in Fig. 8(b). Notice that the unlabelled samples were segmented into 5 clusters (with id 7, 8, 9, 19 and 13) with low density, and one cluster that is considered a new class associated with the unknown condition f9. After evaluating the heuristic metric H_k in Equation (5), the new class is labelled as newClass2, and the samples assigned to the clusters 7, 8, 9, 19 and 13 are identified as inconclusive data. RF classifier is trained with the extended dataset including the new class, and the inconclusive data is evaluated for possible labelling. L is extended for the next experiments with the results given by the classifier for each sample in the inconclusive data. According to the Assumption 2, these samples are labelled as “newClass2” also, if apply. The classification accuracy, after evaluating all the inconclusive data, was around 0.9718. Particularly, for the inconclusive data in U₃, 36 samples were correctly classified, 4 samples were incorrectly classified and the accuracy was around 0.9.

Experiment with k = 4:

The same procedure was addressed by considering the gearbox in the fault f10, then the dataset U₄ (i) is related to samples associated with this faulty condition. HHA-EMCC was able to create a new class labelled as newClass3, and four clusters (with id 8, 12, 13 and 14) have been identified as inconclusive data. The RF classifier was trained with this new class, and the global classification accuracy was around 0.9575. Regarding the inconclusive data in U₄, 9 samples were correctly classified, 21 samples were incorrectly classified and the accuracy was around 0.3. We can see from the original data and the results in the Fig. 9(a) and (b), that the low accuracy is because most of samples in U₄ (i) are overlapped with the newClass2. On the other hand, they do not met the similarity criterion in HHA-EMCC for being included in newClass2. This case reflects that HHA-EMCC-HM works better than the RF classifier to discriminate two different classes.

Fig.6

(a) Known classes corresponding to faults f1, f5, f6 y f8) and the unlabelled dataset corresponding to fault f2, experiment k = 1. (b) Resulting classes and clusters after applying the proposed framework: new class newClass0 identified from the unlabelled samples corresponding to fault f2, and clusters 5 and 12 associated with noisy data.

Fig.7

(a) Known classes corresponding to faults f1, f5, f6, f8 and newClass0, and the unlabelled dataset corresponding to fault f7, experiment k = 2. (b) Resulting classes and clusters after applying the proposed framework: new class newClass1 identified from the unlabelled samples corresponding to fault f7.

Fig.8

(a) Known classes corresponding to faults f1, f5, f6, f8, newClass0 and newClass1, and unlabelled dataset corresponding to fault f7, experiment k = 3. (b) Resulting classes and clusters after applying the proposed framework: new class newClass2 identified from the unlabelled samples corresponding to fault f9, and clusters 7, 8, 9, 19 and 13 associated with noisy data.

Fig.9

(a) Known classes corresponding to faults f1, f5, f6, f8, newClass0, newClass1 and newClass2, and unlabelled dataset corresponding to fault f10, experiment k = 4. (b) Resulting classes and clusters after applying the proposed framework: new class newClass3 identified from the unlabelled samples corresponding to fault f10, and clusters 8, 12, 13 and 14 associated with noisy data.

6 Conclusion

This work presents a semi-supervised framework for finding new patterns that could be associated with new faulty conditions in data-driven fault diagnosis applications. The core of the framework is HHA-EMCC, which is an algorithm that creates on-line evolving models to discover new classes or clusters when new incoming samples feed the current structure. Some new clusters can be associated to a new pattern and, in consequence, to a new faulty/abnormal condition. The resulting clusters are analysed with a heuristic metric that identifies either noisy data, inconclusive data or new knowledge patterns.

The approach progressively creates accurate classifiers, as the new classes feed the existing classifier and the inconclusive unlabelled data can be labelled by following a semi-supervised approach. In this sense, two ML models run simultaneously in our framework: one generated by the HHA-EMCC, and the other one is the supervised classifier. The experimental results show that the framework works correctly, and it was able to identify new pattern from cluster constructions. The accuracy in classification results, after including the new classes, shows that is possible the refinement of the supervised diagnoser under our proposed semi-supervised framework.

Most of the works in fault diagnosis are addressed by classification under supervised schemes. This work aims at changing that point of view by facing the problem as a hybrid task discovering new classes and clusters into an evolving model. In future works, we look forward two main aspects: (i) the on-line feature analysis at time when new samples associated with a new condition feed the HHA-EMCC, and (ii) the on-line training of the RF classifier considering only the new discovered class.

References

Alaei

, Salahshoor

, Alaei

, A new integrated on-line fuzzyclustering and segmentation methodology with adaptive pca approachfor process monitoring and fault detection and diagnosis, Soft Computing17 (3) (2013), 345–362.

Amini

, Saboohi

, Herawan

, Wah

T.Y.

, Mudistream: A multidensity clustering algorithm for evolving data stream, Journalof Network and Computer Applications59 (2016), 370–385.

Cabrera

, Sancho

, Li

, Cerrada

, Sánchez

R.-V.

, Pacheco

, Valente de Oliveira

, Automatic feature extraction oftime-series applied to fault severity assessment of helical gearboxin stationary and non-stationary speed operation, Applied Soft Computing58 (2017), 53–64.

Cao

, Ma

, Zhang

, Luo

, Yi

, A fault diagnosis method based on semisupervised fuzzy c-means cluster analysis, International Journal on Cybernetics and Informatics4 (2) (2015), 281–289.

Cerrada

, Sanchez

R.V.

, Cabrera

, Zurita

, Li

, Multi-stage feature selection by using genetic algorithms for fault diagnosis in gearboxes based on vibration signal, Sensors15 (9) (2015), 23903–23926.

Cerrada

, Sánchez

R.-V.

, Li

, Pacheco

, Cabrera

, Valente de Oliveira

, Vásquez

R.E.

, A review on datadriven fault severity assessment in rolling bearings, Mechanical Systems and Signal Processing99 (2018), 169–196.

Cerrada

, Zurita

, Cabrera

, Sánchez

R.-V.

, Artésand

, Li

, Fault diagnosis in spur gears based on genetic algorithm and random forest, Mechanical Systems and Signal Processing70-71 (Supplement C) (2016), 87–103.

Datta

, Mavroidis

, Hosek

, A role of unsupervised clustering for intelligent fault diagnosis. In Proceedings of IMECE 2007, Washington, USA, November 2007.

de Andrade Silva

Hruschka

E.R.

and Gama

, An evolutionary algorithm for clustering data streams with a variable number of clusters, Expert Systems with Applications67 (2017), 228–238.

10.

Genuer

, Poggi

, Tuleau Malot

, Variable selection using random forests, Pattern Recognition Letters14 (31) (2010), 2225–2236.

11.

Hyde

, Angelov

, MacKenzie

, Fully online clustering of evolving data streams into arbitrarily shaped clusters, Information Sciences382-383 (2017), 96–114.

12.

Jiang

, Zhang

, Li

, Zhang

, Zhao

, Jin

, Joint label consistent dictionary learning and adaptive label prediction for semi-supervised machine fault classification, IEEE Transactionson Industrial Informatics12 (1) (2016), 248–256.

13.

Jin

, Yuan

, Chow

T.W.

, Zhao

, Weighted local and global regressive mapping: A new manifold learning method for machine fault classification, Engineering Applications of Artificial Intelligence30 (Supplement C) (2014), 118–128.

14.

Jin

, Zhao

, Chow

T.W.S.

, Pecht

, Motor bearing fault diagnosis using trace ratio linear discriminant analysis, IEEE Transactions on Industrial Electronics61 (5) (2014), 2441–2451.

15.

Klancar

, Skrjanc

, Evolving principal component clustering with a low run-time complexity for {LRF} data mapping, Applied Soft Computing35 (2015), 349–358.

16.

, Sánchez

, Zurita

, Cerrada

, Cabrera

and ásquez

R.V

, Multimodal deep support vector classification with homologous features and its application to gearbox fault diagnosis, Neurocomputing168 (2015), 119–127.

17.

, Valente de Oliveira

, Cerrada

, Pacheco

, Cabrera

, Sanchez

and Zurita

, Observer-biased bearing condition monitoring: From fault detection to multi-fault classification, Engineering Applications of Artificial Intelligence50 (2016), 287–301.

18.

Marichal

, Artés

, García Prada

and C.O. Extraction of rules for faulty bearing classification by a neurofuzzy approach, Mechanical Systems and Signal Processing25 (6) (2011), 2073–2082.

19.

Pacheco

, Cerrada

, Sánchez

R.-V.

, Cabrera

, Li

and Valente de Oliveira

, Attribute clustering using rough set theory for feature selection in fault severity classification of rotating machinery, Expert Systems with Applications71 (2017), 69–86.

20.

Pacheco

, Valente de Oliveira

, Sánchez

R.-V.

, Cerrada

, Cabrera

, Li

, Zurita

and Artés

. A statistical comparison of neuroclassifiers and feature selection methods for gearbox fault diagnosis under realistic conditions, Neurocomputing194 (Supplement C) (2016), 192–206.

21.

Qing

, Jingran

, Dongxu

, Chang

, Fault diagnosis basedon fuzzy c-means algorithm of the optimal number of clusters and probabilistic neural network, International Journal of Intelligent Engineering & Systems4 (2) (2011), 51–59.

22.

Rajeswari

, Sathiyabhama

, Devendiran

, Manivannan

, Bearing fault diagnosis using multiclass support vector machine with efficient feature selection methods, International Journal of Mechanical & Mechatronics Engineering IJMME-IJENS15 (2015), 1–12.

23.

Salama

, Freitas

, Classification with clustering-based bayesian multi-nets using ant colony optimization. In Evolutionary Computation (CEC), Cancun, Mexico, (2013), pp 3079–3086.

24.

Yaguo

, Ming

, Zhengjia

, Yanyang

, A multidimensional hybrid intelligent method for gear fault diagnosis, Expert Systems with Applications37 (2010), 1419–1430.

25.

Yuan

, Liu

, Semi-supervised learning and condition fusionfor fault diagnosis, Mechanical Systems and Signal Processing38 (2013), 615–627.

26.

Zhao

, Yan

, Ant colony clustering analysis based intelligent fault diagnosis method and its application to rotating machinery, Pattern Analysis and Applications16 (2013), 19–29.

27.

Zhao

, Jin

, Zhang

, Li

, Fault diagnosis of rolling element bearing via discriminant subspace learning: Visualization and classification, Expert Systems with Applications41 (7) (2014), 3391–3401.

28.

Zhao

, Li

, Xu

, Song

, An effective procedure exploiting unlabeled data to build monitoring system, Expert Systems with Applications38 (2008), 10199–10204.

Number of features
Classifier	330	29	14	9	8
Random Forest	0.9932	0.9852	0.9629	0.9555	0.9549
Decision tree	0.87037	0.8963	0.8926	0.8962	0.9390
1-NN	0.7555	0.9888	0.9741	0.9741	1
Gaussian	0.7037	0.8962	0.8629	0.8629	0.9149