Abstract
To overcome the low diagnosis accuracy caused by the scarcity of labeled training samples, a fault diagnosis method was proposed using orthogonal Semi-supervised linear local tangent space alignment (OSSLLTSA) for feature extraction and transductive support vector machine (TSVM) for fault identification. Through extracting the statistical features were extracted from the sub-bands of vibration signals decomposed by wavelet packet decomposition (WPD), the high-dimensional feature set could be obtained. Following that, the improved kernel space distance evaluation method was applied to remove non-sensitive fault features. Then, a semi-supervised manifold learning method (OSSLLTSA) was proposed to reduce the dimensionality of the fault feature set, and thus to extract fused fault features with high clustering performance. OSSLLTSA overcomes the over-learning of supervised manifold learning and projection aimlessness of unsupervised manifold learning. Finally, the low-dimensional feature set after dimension reduction was inputted into TSVM for fault diagnosis. TSVM was able to completely utilize the fault information contained in unlabelled samples to modify the model, and the trained fault diagnosis model has better generalization ability. The effectiveness of the proposed method was verified based on the case of gearbox fault. Experimental results showed that the proposed method is able to achieve very high fault diagnosis accuracy even when labeled samples were insufficient.
Introduction
Rotating machinery has become increasingly more important in modern industry, such as wind turbine, steam turbine, etc. However, many factors can cause faults during operation of rotating machines, including poor lubrication, fatigue damage, wear and harsh running environment, etc. It has been reported that gearbox faults account for 11% of total wind turbine faults, and bearing faults account for 44% of total faults in large induction motors [1, 2]. Fortunately, accurate identification of running states can well monitor the condition of machinery and diagnosis the occurring fault, by which the repair time and cost is reduced and economic loss can be avoided in large degree. Currently, machinery condition monitoring and fault diagnosis has developed as an important research issue in mechanical engineering.
The vibration signal generated during operation is the key reference for analyzing rotating machines and their running states. However, the vibration signal has very complicated frequency components and is always non-stationary and nonlinear state. So effectively extracting fault features is essential for machinery fault diagnosis. Many research works have pointed out that energy or information distribution in each frequency band of vibration signals varies with the change of running states, for example D. Wang and C.Q. Shen et al. [3, 4] applied statistical parameters of wavelet packet paving to fault diagnosis of bearings and gears, Y.N. Pan [5] applied wavelet packet node energies to track bearing health. Therefore, wavelet packet decomposition (WPD) was adopted to decompose vibration signals into several sub-bands [6], and then statistical features are extracted from each frequency sub-band to obtain a high-dimensional feature set. The fault features extracted from each sub-bands are more sensitive than those being directly extracted from original vibration signals, and the former is able to more effectively characterize machinery running states. However, the high-dimensional fault feature set contains a large amount of redundant information and noise, therefore it is still a challenging problem to extract the most useful fault information from the high-dimensional fault feature set. Dimensionality reduction can largely compress the dimension of fault feature set and maintain the essential information of original feature sets. Commonly used dimensionality reduction methods include principle component analysis (PCA) [7], multidimensional scaling (MDS) [8], independent component analysis (ICA) [9], etc. These methods are easy to implement with small amount of calculation and stable performance. However, these linear dimensionality reduction methods are inadequate in handling fault data sets with non-linearity and non-Gaussian structures. Manifold learning, represented by local linear embedding (LLE) [10], Isomap [11], local tangent space alignment (LTSA) [12], etc., is a kind of non-linear dimensionality reduction method, which can effectively extract the non-linear structural information contained in datasets. To meet the demand of pattern recognition and fault diagnosis, a number of improved manifold learning methods have been proposed, including the linear improvements of manifold learning, such as neighborhood preserving embedding (NPE) [13], locality preserving projection (LPP) [14], linear local tangent space alignment (LLTSA) [15], etc. These improved methods give out the explicit mapping from input space to low-dimensional feature space, and solve the problem of “out of sample”, therefore, they are more efficient and practical. However, these methods did not take the supervisory role of class label information into consideration. As a pattern recognition problem, one of the main purposes of dimensionality reduction in fault diagnosis is to enlarge diversity of the fault samples. To the end, some supervised variants have been proposed, such as supervised LPP [16], supervised LLTSA (SLLTSA) [17], etc. These methods have already been applied in machinery fault diagnosis [18, 19]. However, in machinery fault diagnosis, labeling fault samples is usually costly and time-consuming, which means that one may always face situation of scarcity of labeled training samples when diagnosing machinery faults. Worse still, supervised methods will be prone to over-learning if labeled training samples are not enough. On the other hand, unlabelled samples are easy to access, so over-learning problem can be relieved or even avoided if information contained in unlabelled samples is used to improve dimensionality reduction process. To deal with this problem, a semi-supervised manifold learning method, orthogonal semi-supervised LLTSA (OSSLLTSA), was proposed based on LLTSA. OSSLLTSA is able to simultaneously extract the structural information contained in labeled and unlabelled samples, overcoming the over-learning of supervised manifold learning and projection aimlessness of unsupervised manifold learning. OSSLTSA also adopts orthogonal iteration method to solve the orthogonal mapping matrix from high-dimensional observation space to low-dimensional feature space, which make it possible to rapidly calculate the optimal low-dimensional coordinates of high dimensional fault samples.
The fault diagnosis of rotating machinery is essentially a pattern recognition problem. Therefore, after feature extraction, pattern recognition algorithms are needed for realizing fault diagnosis or classification after feature extraction. Commonly used pattern recognition algorithms include k-nearest-neighbors classifier (KNNC) [20], artificial neural network (ANN) [21], support vector machine (SVM) [22], etc. As a pattern recognition method based on data statistics, KNNC requires a large numbers of labeled samples for model training, while ANN algorithm easily suffers from over-learning and local optima problems due to using empirical risk minimization criterion. SVM is a kind of pattern recognition algorithm based on structural risk minimization, which can effectively overcome over-learning and local optima problems. SVM has been widely used in machinery fault diagnosis and condition monitoring [23, 24]. However, in the case of lacking labeled samples, the classification hyperplane obtained by SVM seriously deviates from the optimal classification hyperplane, causing unsatisfactory fault recognition accuracy. Transductive SVM (TSVM) [25] is the semi-supervised improvement of SVM, and which can realize model training using the mixture of labeled and unlabelled samples, and modify the model using the fault information in unlabelled samples. The trained fault diagnosis model has better generalization ability. Based on OSSLLTSA and TSVM, a new fault diagnosis method was proposed in this study. Firstly, high dimensional fault features were extracted from vibration signals. Then, OSSLLTSA algorithm was adopted to reduce the dimension of the high dimensional fault feature set and extract out fused fault features with low dimension and high clustering performance. Finally, the fused fault features were inputted into TSVM for fault pattern recognition.
The rest of this paper is organized as follows. In Section 2, the feature extraction by semi-supervised manifold learning is described, and algorithm of Transductive support vector machine is discussed in section 3. Then, a new method for machinery fault diagnosis is presented in Section 4. Experiment set up and signal acquisition is described in Section 5. Following that, fault diagnosis test is performed in Section 6 to verify the proposed method. Finally, some conclusions are drawn in Section 7.
Feature extraction by semi-supervised LLTSA
High-dimensional fault feature set
Original fault feature set extraction
The vibration signals generated during operation of rotating machinery contain rich running states information. The statistic features in time and frequency domains can directly reflect the change of vibration signals and thus characterize different types of faults [26]. When the machinery running states changes, the energy distribution of vibration signals in each sub-band also accordingly changes [27]. The distribution and the information amount contained in each sub-band also vary with the change of running states [28]. Moreover, autoregressive (AR) model parameters are also very sensitive to running states variation [29]. However, the frequency components of vibration signal with non-stationary and non-linear characters, making it difficult to extract effective fault features from original vibration signals. Therefore, WPD was firstly adopted to decompose vibration signals into several sub-bands. Then, the vibration signals in each sub-bands were processed to extract 8 statistical features in both time and frequency domain, including mean value, peak value, root mean square, standard deviation, waveform factor, kurtosis, mean frequency, standard deviation of frequency. AR model parameters, instantaneous amplitude, Shannon entropy, and wavelet packet energy feature are used to form the original high dimensional fault feature set. The definition and calculation methods of relevant features are described in [30].
Fault feature selection
The original high dimensional feature set extracted from vibration signals is inevitably mixed with insensitive features or even interference noise features, which will not only increases computational burden, but also affects the accuracy of fault recognition. Therefore, feature selection for the original feature set is needed to remove the insensitive and interference noise features. Distance evaluation method is a simple and applicable method for sensitive feature selection. However, the performance of this method is largely affected by outliers. To deal with this problem, a kernel space distance evaluation method which uses kernel space distance as discrimination criterion was proposed [31, 32]. Given that there is a feature set containing K classes of states, {fi,k,j, i = 1, …, I
k
, k = 1, …, K, j = 1, …, J}, where I
k
is the number of samples in the k-th class, K is the number of states, and J is the number of features. The implementation of the kernel space distance evaluation method is shown in Table 1. It is clear from the implementation process that the larger the
Implementation of kernel space distance evaluation method
Implementation of kernel space distance evaluation method
By feature selection, the interference information in feature set can be preliminarily removed. However, the non-linear coupling relation between features also exists, causing large amounts of redundant information existing in the feature set. This redundant information can directly affect the accuracy of further fault diagnosis.
Therefore, we need to perform non-linear dimensionality reduction for the high dimensional feature set to extract fused fault feature vector with low dimension, good clustering performance and high sensitivity degree. For this, OSSLLTSA was proposed based on LLTSA algorithm, in order to extract sensitive fault features in the high dimensional feature set.
Brief review of LLTSA
The basic idea of LLTSA is that the local geometric structure of a data set can be represented by low dimensional local tangent space. Performing global alignment for the obtained local tangent space can realize the attribute reduction of high dimensional sample set. Given the high dimensional data set X ={ x
i
∈ R
D
, i = 1, …, N }, the implementation of LLTSA mainly includes the following steps. Constructing local neighborhoods. For each data sample x
i
, (i = 1, …, N), k neighboring points are selected from X according to Euclidean distance, forming the local neighborhood of this sample, X
i
= [xi1, …, x
ik
]. Obtaining the neighborhood tangent space. Find a set of orthogonal bases, Q
i
, from the local neighborhood X
i
of data sample x
i
, and project X
i
to Q
i
in order to extract the main geometric structural information of the neighborhood, i.e.,
Global alignment of local tangent space. The global alignment of local tangent space is indeed the reconstruction process for the intrinsic structure of data set, and it can be converted into the approximate solution process of a minimization problem.
B = SVV
T
S
T
is global alignment matrix; S = [S1, S2, …, S
N
]; S
i
is a selection vector; V = diag (V1, V2, …, V
N
);
where Y
i
is the global low dimension coordinates of X
i
, and L
i
is local transform matrix. When
The generalized eigenvectors corresponding to the first d, (d << D) minimum non-zero generalized eigenvalue in Equation (3) form a matrix A = [α1, α2, …, α d ], which is the mapping matrix from high dimensional observation space to low dimensional feature space. The low dimensional global coordinates of data sample set X is Y = A T XH k .
In the case of lacking labeled samples are insufficient, supervised manifold learning will suffer from over-learning problem in feature extraction, while unsupervised manifold learning has projection aimlessness in feature extraction. An orthogonal semi-supervised linear local tangent space alignment (OSSLLTSA) algorithm was proposed to integrate sample class information into dimensionality reduction process, so as to guide feature extraction and improve the discriminability of high dimensional features. Given a sample set X {x1, x2, …, x
L
, xL+1, xL+2, …, X
N
} containing N training samples, where X
L
={ x1, …, x
L
} is labeled training samples, and X
U
={ x1, …, x
N
} is unlabelled training samples, then the implementation steps of OSSLLTSA algorithm are as follows. Constructing the local neighborhood of sample points
When the local neighborhood X
i
of sample point x
i
is constructed, for labeled training samples x
i
, i = 1, …, L, a supervised method is used to select neighboring points. The distance between samples is redefined in constructing neighborhood [26]:
l i , i = 1, …, L are the class labels of x i ;
D ij , i, j = 1, …, L is the distance between samples x i and x j ;
λ is the distance penalty parameter and usually set to the average distance between samples;
σ ∈ [0, 1] is an adjustment coefficient;
for unlabelled training sample x
i
, i = L + 1, …, N, Euclidean distance is adopted for selection in training sample set X. Constructing objective function
According to the implementation process of LLTSA algorithm, it can be known that the essence of dimension reduction is to realize the extraction of local tangent space using local PCA algorithm. It was proved in [34] that using MDS for extracting local tangent space is equivalent to local PCA. Moreover, the distance information can be sufficiently utilized when MDS is used for extracting local tangent space. Given the squared distance matrix of local neighborhood,
It is clear in the above equation that the local low dimensional coordinates of X
i
is
The local neighborhood X i of sample x i is an approximately linear space, and therefore, X i can be used to reconstruct x i .
Clearly, reconstruction coefficient ω
ij
reflects the importance or contribution of x
ij
in reconstructing x
i
. The larger the reconstruction coefficient ω
ij
is, the higher the probability that sample points x
ij
and x
i
are from the same class is. Therefore, in constructing squared distance matrix, the reconstruction coefficient can be used to weight the distance between samples. To avoid too big weighted distance between samples caused by small or negative ω
ij
, ω
ij
is preprocessed as Equation (7).
It is clear from Equation (7) that the value range of
For samples x
i
, i = L + 1, …, N, since the neighborhood contains at least one unlabelled sample, in constructing the weighted distance matrix of local neighborhood, only the distances between x
i
, i = 1, …, L and the neighboring samples x
ij
, j = 1, …, k are weighted. The nearest neighboring samples x
ij
, j = 1, …, k of sample x
i
, i = 1, …, L are completely labeled samples. Therefore, in constructing the weighted distance matrix of local neighborhood, all the inter-class distances of local neighborhood X
i
are weighted, and the two limit values of
From the above analysis, it can be known that after weighting the inter-class distance within local neighborhood, the inter-class distance within the same class is compressed while the distance between samples of different classes is stretched. Therefore, MDS was used to extract the local structural information of sample neighborhood, so as to effectively separate samples in different classes in local neighborhoods. The average distance matrix of local neighborhood constructed by using weighted distance was adopted, and MDS algorithm was used to calculate the local low dimensional coordinates Solving orthogonal mapping matrix
The advantage of using orthogonal mapping lies in that the low dimensional fault sample set after dimensionality reduction also maintains the geometric structural information contained in high dimensional fault sample set. Compared with non-orthogonal mapping, orthogonal mapping has better discriminability. Let the mapping matrix be M, and then after dimensionality reduction of any two samples ξ i , ξ j by M, the distance calculation in low dimensional sub-space is as follows.
If the mapping matrix M is an orthogonal matrix, MMT = I (where I is an identical matrix). It is clear that the orthogonal projection of ξ
i
, ξ
j
does not change the distance between samples, which maintains the geometric structural information contained in the original sample set. Therefore, orthogonal iteration method was adopted in this study to calculate the orthogonal mapping matrix from high dimensional observation space to low dimensional feature space. The detailed derivation and calculation process of orthogonal iteration algorithm are referred to literature [35]. Let Ψ ={ ψ1, …, ψ
d
} be the orthogonal base vector of the orthogonal mapping matrix to be solved, and define two matrices, namely ψ(m-1) = [ψ1, …, ψm-1] , m = 2, …, d and Φ(m-1) = (Ψm-1)
T
(XH
N
X
T
) Ψm-1. The calculation steps of orthogonal iteration are as follows. Calculate the eigenvector corresponding to the minimum eigenvalue of Equation (3), and define this eigenvector as ψ1; Calculate the following orthogonal optimization objective function; Repeat the above process until all orthogonal base vectors Ψ ={ ψ1, …, ψ
d
} of the orthogonal mapping matrix are solved. Then the mapping process from high dimensional observation space to low dimensional feature space is Y = Ψ
T
XH
N
. To avoid singular matrices generated in dimensionality reduction, PCA transform can be performed for the original data set, and the finally obtained mapping matrix is Ψ
PCA
Ψ.
SVM is a kind of pattern recognition algorithm which realizes actual risk minimization based on structural risk minimization. SVM can realize the class recognition of test samples by constructing a classification hyperplane composed of a few support vectors, moreover, it has advantage in dealing with small sample sets and non-linear pattern recognition. However, when there are too few labeled samples, the classification hyperplane trained by SVM deviates from the optimal hyperplane, reducing the fault diagnosis accuracy. TSVM algorithm is a semi-supervised improvement of SVM, which uses both labeled and unlabelled samples for model training. The sample distribution information contained in unlabeled samples is used to modify the classification hyperplane, so that the finally obtained final classification hyperplane approaches the optimal hyperplane.
Assume there are a labeled training sample set S
l
= { (x
i
, y
i
) } , x
i
∈ R
d
, y
i
∈ { - 1, + 1 } , i = 1, …, L, and an unlabeled training sample set
w is the normal vector of the optimal classification hyperplane; b is the offset;
In real application, to ensure that the obtained classification hyperplane is optimal and avoid over-learning, a relaxation factor,
C and C* are the penalty factors of labeled and unlabelled training samples, respectively; ξ
i
and ξ* are the relaxation factors of labeled and unlabelled training samples, respectively.
Fault diagnosis is mainly multi-class problems, and one-versus-rest SVM model was used to train TSVM. More detailed derivation and calculation processes of TSVM algorithm are referred to [36].
Figure 1 shows the algorithm flowchart of the proposed fault diagnosis method for rotating machinery based on OSSLLTSA and TSVM. The algorithm mainly consists of 3 parts, namely high dimensional feature set extraction, dimensionality reduction for the feature set, and fault pattern recognition.

Fault diagnosis method based on OSSLLTSA and TSVM.
The frequency components of the collected original vibration signals are complicated with non-linear and non-stationary characters, making it difficult to extract fault features that are sensitive to fault states from original vibration signals. WPD was adopted to orthogonally decompose the original vibration signals in several frequency sub-bands. Then the fault features were extracted from sub-band signals to form the original high dimensional fault feature set. Since the high dimensional feature set extracted from vibration signals inevitably contains insensitive features or even interference features, a kernel distance evaluation technique was used to select features and remove insensitive and interference noise features in the original feature set.
By feature selection, the interference information in the feature set can be preliminarily removed. However, the non-linear coupling between features causes much redundant information existing in the feature set, which directly affects the accuracy of fault diagnosis. Therefore, the proposed OSSLLTSA was used to realize the dimensionality reduction for the high dimensional fault feature set, and extract fault features with good clustering performance and high sensitive degree. OSSLLTSA overcomes the over-learning problem of supervised manifold learning in situations where labeled samples are insufficient, and avoids the projection aimlessness of unsupervised manifold learning in feature extraction. Moreover, the optimal low dimensional coordinates of high dimensional fault samples can be rapidly obtained using the orthogonal mapping matrix calculated by OSSLTSA.
Faults states identification can be realized by inputting the extracted low dimensional fault features into pattern recognition algorithm. TSVM algorithm was adopted as the fault recognition algorithm, which is a kind of semi-supervised pattern recognition algorithm. It can simultaneously learn the structural information from both labeled and unlabelled training samples, resulting in a classification hyperplane that is more approximate to the optimal classification hyperplane as well as the trained fault diagnosis model that has better generalization ability.
The effectiveness of the fault diagnosis model for rotating machinery based on OSSLLTSA and TSVM was demonstrated by the simulation experiment of gearbox faults. In real engineering application, gearbox faults account for 80% of transmission system faults and 10% of rotating machinery faults. Therefore, the fault diagnosis for gearboxes is of great importance and engineering application value. In this study, the simulation of operation state of rotating machine was carried out based on one-stage planetary gear box and two-stage parallel shaft gearbox, and the laboratory built gearbox fault simulation experiment table is shown in Fig. 2.

The test point arrangement in gearbox vibration test.
The gearbox test rig mainly consists of speed control device, variable frequency AC drive motor, planetary gear box, second stage parallel shaft gearbox, magnetic brake, etc. The drive motor speed can be adjusted within 0 to 5000 rpm. In this experiment, the test rig is composed of a one-stage planetary gearbox and a two-stage parallel shaft gearbox. In the experiment, the drive motor speed was set to 1200 rpm, the load of magnetic brake was 1.5 pounds, and no radial load was made for the gearbox. The vibration signals were collected by the vibration acceleration sensor installed at the bearing cover of the input end of intermediate shaft, with sampling frequency set at 5 kHz.
A total of 5 kinds of gear states were simulated, namely (1) normal; (2) broken tooth fault; (3) missing tooth fault; (4) tooth surface wear; and (5) synthetic fault, and the main parameters of the experiment are listed in Table 2.
The main parameters of the gearbox fault experiment
The original vibration signals were collected by the acceleration sensors. Figure 3 shows the time domain waveform of the original vibration signals collected by the sensor in different operation states. A number of 120 sets of samples were collected for each state, among which 80 sets were training samples and the rest were used for testing the fault diagnosis model, with the length of each sample being 2048 sampling points.

The time waveform and frequency spectrum of original vibration signals: (a) mixed bearing fault; (b) outer ring fault; (c) missing tooth fault; (d) rolling element fault; and (e) tooth surface wear fault.
Experiment results
Since the frequency components of the collected original vibration signals are complicated with significant non-linear and non-stationary characters, the original signals were decomposed by WPD by using ‘db8’ wavelet basis on 3 scales, and the selection of wavelet basis can refer to [37]. Then, 8 statistic features in time-frequency domain, 4 AR model parameters, 1 instantaneous amplitude Shannon entropy and 1 wavelet packet decomposition energy were extracted from each sub-band to form the high dimensional fault feature set. In other words, the original high dimensional fault feature set contained 112 fault features, which could characterize faults from multiple aspects, and the original feature set was rich of fault information. Since the original high dimensional fault feature set was inevitably mixed with insensitive and interference features, IKMD-FS was adopted to select fault features, with feature selection results shown in Fig. 4. It is clear that after feature selection, 67 fault features remained in the feature set, the dimension of the feature set still being very high. Moreover, the non-linear coupling relation between fault features also caused the high dimensional feature set to contain a large amount of redundant information. Therefore, after feature selection, the high dimensional fault feature set was then inputted into OSSLTSA algorithm for dimensionality reduction. The objective dimension d was set to 5 for dimensionality reduction, and the neighborhood size k was set to 12. Table 3 shows the low dimensional coordinates of each class of fault samples when the ratio of labeled samples to the total samples is a = 0.4. Inputting the obtained low dimensional fault feature set into a pattern recognition method can realize fault class identification. TSVM was used in this study to establish the fault pattern recognition model. Table 4 lists the diagnosis results of the proposed method based on OSSLLTSA and TSVM. It is clear from Table 4 that the proposed method has advantages in situations where labeled training samples are insufficient. When a = 0.4, the fault recognition accuracy reached 95.5%.

Result of feature set after feature selection.
Low-dimensional coordinates of fault samples by using OSSLLTSA
The change of fault diagnosis accuracy with ratio of labeled samples a
To further verify the effectiveness of the proposed method, the following comparative experiments were made. Comparative analysis was made for the effectiveness of different construction methods of high dimensional fault feature sets. For simplicity of analysis, it is assumed in the comparative experiments that all training samples are labeled samples, and KNNC algorithm is adopted for fault diagnosis. Firstly, the necessity and effectiveness of using WPD to preprocess the vibration signals are analyzed. Table 5 lists the diagnosis results of using WPD for preprocessing and those without WPD preprocessing. It is clear from the results that the diagnosis accuracy using original vibration signals for fault feature extraction is 89.5%, while the diagnosis accuracy using the WPD to preprocess vibration signals for fault feature extraction is 94.5%. Moreover, the necessity and effectiveness of feature selection are analyzed. Table 6 lists the fault diagnosis results of using feature selection and those without feature selection. It is clear from Table 6 that feature selection removes the insensitive and interference noise features in the original high dimensional fault feature set. Therefore, the fault diagnosis accuracy using feature selection is higher than that of directly using the original fault feature set. The effectiveness of OSSLTSA algorithm is analyzed. The fault diagnosis accuracy of OSSLLTSA algorithm is compared with PCA, LLTSA and SLLTSA. KNNC algorithm is adopted for fault diagnosis, and only the labeled samples are used for KNNC training. Table 7 shows the fault diagnosis accuracy after using these dimensionality reduction methods for feature extraction. It is clear from Table 7 that when SLLTSA algorithm is used, the fault diagnosis accuracy with fewer labeled samples is very low. The reason is that SLLTSA algorithm is easy to be trapped in over-learning when labeled training samples are insufficient. Conversely, the fault diagnosis accuracy for LLTSA and PCA is low. Moreover, the fault diagnosis accuracy of PCA is lower than LLTSA. The reason is that PCA is unable to extract the nonlinear structural information contained in data set, while LLTSA has certain structural extraction ability. In addition, by comparing SLLTSA and LLTSA, it can be found that when labeled samples increase to a certain degree, the SLLTSA algorithm outperforms LLTSA. OSSLLTSA is a semi-supervised dimensionality reduction method, which is able to simultaneously extract the structural information of labeled and unlabelled samples, avoiding the over-learning problem of SLLTSA where labeled samples are insufficient and the projection aimlessness of LLTSA in dimensionality reduction. Therefore, the performance of using OSSLTSA for dimensionality reduction is better than using other methods. It is clear from Table 7 that the fault diagnosis accuracy of OSSLTSA is the best among these methods. The effectiveness of TSVM is compared with SVM and KNNC. In the comparative experiment, the low dimensional fault features extracted by OSLLTSA were the input of pattern recognition algorithms. Table 8 lists the fault diagnosis results of using 3 pattern recognition methods. KNNC is a pattern recognition algorithm based on statistics, with advantages of low computational cost, stable performance, etc. However, when labeled training samples are insufficient, the recognition accuracy of KNNC algorithm decreases. When the ratio of labeled samples is a = 0.2, the recognition accuracy of KNNC is 90%, which is lower than that of SVM and TSVM. SVM is a pattern recognition algorithm based on structural risk minimization, which realizes fault diagnosis by constructing classification hyperplane. However, when labeled samples are insufficient, the classification hyperplane calculated by SVM deviates from the optimal classification hyperplane, causing the over-learning problem of SVM. When the ratio of labeled samples is a = 0.2, the recognition accuracy of SVM is 92%. TSVM is the semi-supervised improvement of SVM, which is able to simultaneously extract the distribution information of labeled and unlabelled training samples, and use the sample distribution information contained in unlabelled samples to modify the classification hyperplane. Therefore, when labeled samples are rare, TSVM is still able to achieve good fault diagnosis results. When the ratio of labeled samples is a = 0.2, the recognition accuracy of TSVM is 94%, which is better than that of KNNC and SVM.
Fault diagnosis accuracy with versus without WPD
Fault diagnosis accuracy with versus without WPD
Running identification rate with versus without feature selection
The change of fault diagnosis accuracy with ratio of labeled samples a
The change of running state identification accuracy with ratio of labeled samples a
A fault diagnosis model was proposed using OSSLTLTSA for feature extraction and TSVM for fault identification. According to the experimental results, some conclusions are summarized as follows: The features extracted from sub-bands decomposed by WPD can characterize fault better than those extracted from original signals, and feature selection based on IKDM-FS can remove insensitive features or even interference feature from the feature set. OSSLTSA can is able to simultaneously extract the structural information from both labeled and unlabelled samples, thus overcoming the drawback of unsupervised manifold learning that it is aimless in feature extraction. Dimensionality reduction by OSSLLTSA can well extract the sensitive fusion feature for fault diagnosis. TSVM is able to modify the model and improve fault classification accuracy by fully using the fault information in unlabelled samples. The effectiveness of the proposed method was verified by a fault diagnosis test in a gearbox.
Footnotes
Acknowledgments
This project was supported by National Natural Science Foundation of China (Grant Nos. 51705059, 51605065), by Scientific and Technological Research of Chongqing Municipal Education Commission (Grant Nos. KJ1600428, KJ1600443).
