Abstract
Detecting the mechanical faults of rotating machinery in time plays a key role in avoiding accidents. With the coming of the big data era, intelligent fault diagnosis methods based on machine learning models have become promising tools. To improve the feature learning ability, an unsupervised sparse feature learning method called variant sparse filtering is developed. Then, a fault diagnosis method combining variant sparse filtering with a back-propagation algorithm is presented. The involvement of the back-propagation algorithm can further optimize the weight matrix of variant sparse filtering using label data. At last, the developed diagnosis method is validated by rolling bearing and planetary gearbox experiments. The experiment results indicate that the developed method can achieve high accuracy and good stability in rotating machinery fault diagnosis.
Introduction
Gears, bearings, and shafts are widely used in transmission systems that exist in automobiles, power equipment, and construction machinery. Defects could appear in rotating machinery after a long period of operation. As a result, the life of a machine is shortened, and even casualties could occur, which can be avoided greatly using a health monitoring system. Massive signals are collected by a health monitoring system. Thus, how to process signals quickly and improve the diagnostic performance of the health monitoring system has become a great challenge. 1
Traditional fault diagnosis methods involving signal processing techniques require specialized knowledge and rich experience to extract appropriate fault information or features.2,3 With the coming of the big data era, data-driven methods using machine learning models are attracting the attention of scholars.4,5 The main diagnostic steps of the data-driven method are (1) obtaining features using a feature learning method and (2) recognizing health conditions using a feature classifier. A data-driven method can achieve the fault diagnosis by training the machine learning models, which can save the time required for extracting features deliberately and manually.
There are various machine learning models that can learn useful features in an unsupervised or supervised way, 6 such as back-propagation neural networks (BPNNs), 7 autoencoders, 8 restricted Boltzmann machine (RBM), 9 and convolutional neural networks (CNNs). 10 As a traditional machine learning model, the core idea of BPNN has been widely applied in other models. Wang et al. 11 achieved rotating machinery fault diagnosis by combing stacked autoencoders (SAEs) with batch normalization. Stacked RBMs can compose a deep belief network (DBN), 12 which can further improve the feature learning capability of the network. Zhang et al. 13 developed an enhanced CNN for bearing fault diagnosis by combing short-time Fourier transform and hierarchical regularization. An et al. 14 presented a three-stage diagnosis method based on a recurrent neural network to recognize faults under time-varying working conditions. Capsule neural network has recently been a hot topic. 15 Zhu et al. 16 established a learning model by combing a CNN and a capsule neural network. The model can possess a strong generalization ability for bearing fault diagnosis.
Sparse feature learning methods have attracted much attention because sparse features are discriminative. Various sparse feature learning methods have been proposed, such as sparse autoencoder, 17 sparse coding, 18 sparse RBM, 19 and sparse filtering. 20 One of the advantages of sparse filtering is that it does not need to reconstruct the input data. Lei et al. 21 successfully applied sparse filtering to the bearing fault diagnosis. Compared with other data-driven methods, the diagnosis method based on sparse filtering can achieve high accuracy using fewer training samples. Wang et al. 22 developed an enhanced sparse filtering method by applying an L1/2 regularization term to recognize health conditions under variable rotational speed.
The core idea of sparse filtering is applying l2-normalization to achieve competition between the normalized features. However, it has been demonstrated that l2-normalization is not the optimal choice for learning discriminative features. 23 Besides, the weights of sparse filtering are not optimized when a classifier is trained using label data, which limits the feature learning capability.
One of the main contributions of this work is the development of a variant sparse filtering (VSF) method involving l1-normalization and l3-normalization. Compared with l2-normalization, the application of l1-normalization and l3-normalization can improve the feature learning capability of sparse filtering. Another main contribution is the combination of VSF and back-propagation (BP) algorithm to achieve the VSF-BP method. The weights of the VSF model can be further optimized with label data by applying the BP algorithm.
The rest of the paper is as follows. In The proposed method, VSF is developed and further combined with the BP algorithm to achieve a VSF-BP method. In Experimental validation, the developed diagnosis method is verified by a rolling bearing dataset and a planetary gearbox dataset. Conclusions are drawn in Conclusions.
The proposed method
Variant sparse filtering
VSF can be regarded as a two-layer neural network including an input layer and a feature layer. The inputs of VSF are collected samples. It is assumed that there is a training dataset
To achieve a sparse feature representation, each row of the feature matrix
The representation of sparse features can be achieved by minimizing a cost function that involves an l2-norm
This cost function can be minimized with a limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) algorithm.
24
To ensure that the cost function is differentiable at any point, a soft-absolute value function is used to activate the features. Thus, the features are obtained by
To explain the mechanism of VSF, a feature set containing only two feature vectors is taken as the example. Each feature vector contains only two features. The two feature vectors are represented as Illustration of VSF learning sparse solutions.
The cost function in equation (8) is shown in Figure 1(c). It can be found that there are two sparse solutions due to the constraint in equation (7). The L-BFGS algorithm would search for the sparse solutions from an initial solution along the direction of gradient descent. As a result, VSF can learn a sparse representation of features.
VSF-BP method for fault diagnosis
The VSF-BP method mainly contains three learning stages: training a VSF model, training a softmax regression model, and fine-tuning the network. The training process of the VSF-BP method is shown in Figure 2. (1) Training a VSF model Training process of the VSF-BP method.

Frequency-domain data are used as inputs to train the VSF model. Spectra can be easily obtained from time-domain signals using fast Fourier transformation (FFT). Therefore, the training dataset is (2) Training a softmax regression model
Once the VSF model is trained, a feature set (3) Fine-tuning the network
Once the VSF model and the softmax regression model are trained, the weight matrices
After the three-layer network is optimized, the spectra are fed into the network for testing. The health conditions of rotating machinery can be recognized by carrying out a forward propagation.
Experimental validation
Case 1: Rolling element bearing fault diagnosis
Description of the rolling element bearing dataset.
In this case, the dimensions of the input layer, the feature layer, and the output layer were 600, 500, and 10, respectively. The regularization parameter
Figure 3 shows the effect of the percentage of samples for training on the average accuracy. Note that the average accuracy increased first and then decreased with the increase in the number of training samples, indicating that the developed method can achieve better performance using less training samples. The average accuracy got the highest value and a small standard deviation when 15% of the samples were applied for training. Note that the proposed method also obtained great diagnostic performance when 10% of the samples were used for training. To balance the diagnostic performance against the cost of collecting training samples in practice, 10% of the samples were selected for training. Average accuracy under different percentages of samples for training.
To demonstrate the importance of fine-tuning, the testing accuracy of 20 trials using the VSF and VSF-BP methods is shown in Figure 4. The average accuracy and standard deviation of the VSF method were 99.74% and 0.07%, respectively. Specifically, the testing accuracy varied from 99.58% to 99.89%, showing significant fluctuation. Note that the testing accuracy obtained by the VSF-BP method varied from 99.78% to 99.94%, showing better stability. Therefore, the fine-tuning process that can be achieved by the BP algorithm plays an important role in improving the diagnostic performance of the VSF-BP method. Figure 4 also shows an accuracy comparison between the proposed VSF-BP method and existing methods. It can be seen that the testing accuracy obtained by DNN
26
varied from 98.9% to 99.91%. The testing accuracy obtained by CNN
27
varied from 99.69% to 99.93%. The testing accuracy obtained by CNN
28
varied from 99.3% to 99.86%. The comparison shows that the existing methods are less stable in testing accuracy than the proposed methods. Testing accuracy of 20 trials using the rolling element bearing dataset.
The effect of fine-tuning on the weight vector of the VSF model is shown in Figure 5. Note that the weight vector varied little after fine-tuning, demonstrating the weights of the VSF model were fine-tuned. Thus, the fine-tuning can further optimize the network parameters of the VSF model, which can improve the diagnostic performance. Weight vectors obtained by the VSF-BP method before and after fine-tuning.
The confusion matrix of the rolling element bearing dataset using the VSF-BP method is shown in Figure 6. It can be seen that the average accuracy of the health conditions NC, BF1, IF1, IF2, and IF3 was 100%. Note that the average accuracy corresponding to the health conditions of OF1, OF2, and OF3 was almost 100%. Besides, the VSF-BP method can also recognize the health conditions of BF2 and BF3 with high accuracy. Specifically, 0.24% of the testing samples belonging to BF2 were misdiagnosed as BF1 and IF3. For the health condition of BF3, 0.43% of the testing samples were misdiagnosed as BF1, while 0.6% of the samples were misdiagnosed as BF2. It can be found that the VSF-BP method shows weaker performance in diagnosing BF. The main reason for misdiagnosis is the similarity between the signals of BF1, BF2, and BF3. On the whole, the proposed method can recognize the rolling element bearing faults effectively. Confusion matrix of the rolling element bearing dataset using the VSF-BP method.
Diagnostic results of the VSF-BP method and existing methods using rolling element bearing dataset.
Case 2: Planetary gearbox fault diagnosis
A planetary gearbox dataset provided by Nanjing University of Aeronautics and Astronautics
29
was used to validate the VSF-BP method. The test bench was mainly composed of a motor, a planetary gearbox, couplings, bearing blocks, and rotary tables, as shown in Figure 7. Seven kinds of planetary gear conditions were deliberately set up, including a normal health condition (NC), worn teeth (WT), a broken tooth (BT), a crack in the planetary gear (CG), pitting on the planetary gear (PG), a compound fault of WT and BT (CF1), and a compound fault of CG and PG (CF2), as shown in Figure 8. The time-domain vibration acceleration signals were collected from the surface of the gearbox. Two hundred signals were obtained for each health condition, and each signal had 1280 data points. The sampling frequency was 12.8 kHz. A sample with 640 spectrum lines was obtained from a signal by applying FFT. Thus, the planetary gearbox dataset contained 1400 samples. Test bench of the planetary gearbox. Planetary gears with different health conditions: (a) NC, (b) WT, (c) BT, (d) CG, and (e) PG.

In this case, the dimensions of the input layer, the feature layer, and the output layer were 640, 500, and 7, respectively. The regularization parameter
The testing accuracy of 20 trials using the VSF and VSF-BP methods is shown in Figure 9. The average accuracy and standard deviation of the VSF method were 97.55% and 1.37%, respectively. The testing accuracy varied from 92.22% to 98.81%, showing significant fluctuation. On the contrary, the testing accuracy of the VSF-BP method varied from 99.68% to 100%, showing better stability. Note that the VSF-BP method achieved 100% accuracy in 13 trials. Therefore, the BP algorithm plays an important role in improving the diagnostic performance of the VSF-BP method. Testing accuracy of 20 trials of the planetary gearbox dataset using the VSF-BP method.
The confusion matrix of the planetary gearbox dataset using the VSF-BP method is shown in Figure 10. Note that the average accuracy of the health conditions of NC, WT, CG, and PG was 100%. The average accuracy corresponding to the health condition of CF2 was almost 100%. The VSF-BP method obtained lower accuracy for the health conditions of BT and CF1. Specifically, 0.17% of the testing samples belonging to BT were misdiagnosed as WT, and 0.14% of the testing samples belonging to CF1 were also misdiagnosed as WT. On the whole, the VSF-BP method can recognize the planetary gearbox faults with high accuracy. Confusion matrix of the planetary gearbox dataset using the VSF-BP method.
The effect of fine-tuning on the weight vector of the VSF model using the planetary gearbox dataset is shown in Figure 11. It can be observed that there is no significant difference between the weight vectors before and after fine-tuning, indicating the weights of the VSF model were fine-tuned. The results demonstrate the necessity of fine-tuning for the VSF-BP method again. Weight vectors obtained by the VSF-BP method before and after fine-tuning.
Two-dimensional features were obtained by applying t-distributed stochastic neighbor embedding (t-SNE),
30
and the results are shown in Figure 12. Note that the VSF-BP method without fine-tuning can divide the learned features into seven clusters. Most of the features belonging to the same health condition were classified into the same cluster. However, some features that belong to the health condition of WT were misclassified into the cluster of BT. For the VSF-BP method with fine-tuning, all the features belonging to the same health condition were classified into the same cluster, as shown in Figure 12(b). The results indicate that fine-tuning can improve the feature learning capability of the VSF-BP method. Two-dimensional feature visualization of the planetary gearbox dataset: (a) before fine-tuning and (b) after fine-tuning.
Diagnostic results of the VSF-BP method and existing methods using planetary gearbox dataset.
Conclusions
A three-layer neural network method based on VSF and BP algorithm was presented in this paper to improve the diagnostic performance in rotating machinery. The performance of the VSF-BP method was verified by rolling bearing and planetary gearbox experiments. It is indicated that the application of l1-normalization and l3-normalization can help the VSF model learn useful features. The results show that the application of the BP algorithm can optimize the weight matrix of the VSF model and improve the feature learning ability, which can improve diagnostic accuracy. By Comparing with the existing diagnosis methods, the proposed VSF-BP method can obtain higher accuracy and lower standard deviation using fewer training samples, showing high effectiveness and good stability.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 18KJD460003) and the Natural Science Research Foundation of Jiangsu Normal University (Grant No. 18XLRS009).
