Abstract
It is a great challenge to accurately and automatically identify different faults of the key components in rotating machinery. In this paper, a new method called feature fusion deep belief network is proposed for the intelligent fault diagnosis of rolling bearing. Firstly, a deep belief network (DBN) is constructed with several pre-trained restricted Boltzmann machines for feature learning of the raw vibration data. Secondly, locality preserving projection (LPP) is adopted to fuse the deep features to further enhance the quality of the learned deep features. Finally, the fusion deep features are fed into Softmax for automatic and accurate fault diagnosis. The proposed method is applied to analyze the experimental rolling bearing signals, and the results show that the proposed method is more effective than the traditional intelligent diagnosis methods.
Keywords
Introduction
Prognostics and health management (PHM) of rotating machinery has attracted more and more attention due to its significance for enhancing the overall machine performance. With the rapid development of science and technology, modern rotating machinery equipment has become larger, more complex and more integrated [1], which makes its key components (rotor, bearing and gearbox) usually run under the severe working conditions with heavy load, strong impact and high speed. If no effective and timely actions are taken, these key components will inevitably get various faults, which may lead to economic losses and serious casualties [2]. Therefore, it is very meaningful to accurately and automatically identify the different faults of rotating machinery.
Vibration signals collected from rotating machinery usually carry rich and valuable information [3]. Currently, two types of methods have been proven effective for mechanical equipment fault diagnosis: signal processing technique [4] and intelligent diagnosis technique [5]. Intelligent diagnosis is the new development of fault detection technology, in which artificial neural network (ANN) and support vector machine (SVM) are the most widely applied [6]. The intelligent fault diagnosis using ANN or SVM can be transformed to a typical pattern recognition problem, which mainly contains three steps: feature extraction, feature selection and fault identification [7]. Feature extraction is to extract the feature parameters which correctly reflect the different working conditions of the machine. Feature selection is to choose the most sensitive feature parameters to reduce the computational cost and avoid information redundancy. Fault classification is to identify the different faults based on the selected features. However, the traditional intelligent diagnosis methods such as ANN-based or SVM-based methods have two obvious shortcomings. One is that the sensitive features input into the intelligent classifiers are selected manually [8]. In addition, although the selected features are effective for some specific diagnosis issues, they are probably unsuitable for another new issue. The other is that ANN-based and SVM-based methods both belong to shallow architectures, which means that only one hidden layer is included [9]. Such shallow architectures limit the capacity of learning the complex non-linear relationships in fault diagnosis issues [10]. Therefore, it is very necessary to construct deep networks for effectively identifying the different working conditions of the key components in rotating machinery.
Deep learning is a new machine learning method, which holds the potential to overcome the shortcomings in the traditional intelligent diagnosis methods. Deep learning models contain multiple hidden layers, and these hidden layers of features are not designed by human engineers, which is, they are learned from the input data automatically [9]. Deep belief network (DBN) is a popular deep learning model, which has been successfully applied to fault diagnosis in the last five years [11–13]. The success of DBN can be mainly attributed to two main aspects: Firstly, DBN can automatically learn useful feature from the input data, which removes the necessity for feature extraction and selection. Secondly, DBN is constructed with some trained restricted Boltzmann machines (RBMs), which makes it more likely to effectively learn the complex non-linear relationships than ANN or SVM.
Despite the deep learning models such as DBN can automatically learn the useful features from the raw data to a great extent, the learned deep features are always high-dimensional and contain redundant information [14], which may decrease the classification accuracy and result in more training time. In order to further enhance the quality of the learned deep features, locality preserving projection (LPP), a new feature fusion method, can be considered to fuse the deep features learned by DBN to extract the most representative information and reduce the dimension. Several studies have confirmed the remarkable superiorities of LPP for feature fusion compared with principal component analysis (PCA) [15].
In this paper, a novel feature fusion deep belief network method is proposed for the intelligent fault diagnosis of rotating machinery. First, the deep belief network (DBN) is constructed with a series of pre-trained restricted Boltzmann machines (RBMs) for feature learning of the raw vibration data. Second, locality preserving projection (LPP) is adopted to fuse the deep features to further enhance the quality of the learned features. Finally, the fusion deep features are fed into Softmax for automatic fault diagnosis. The proposed method is applied to analyze the experimental rolling bearing signals. The results confirm that the proposed method can get rid of the dependence on manual feature extraction, which is more effective and reliable than the traditional methods and the standard DBN.
The rest of this paper is organized as follows. The proposed fault diagnosis approach is described in Section 2. In Section 3, the experimental diagnosis results for rolling bearing are analyzed and discussed. Finally, general conclusions are given in Section 4.
The proposed method
The method mainly includes four parts: restricted Boltzmann machine training, deep belief network construction, deep feature fusion using locality preserving projection and the general procedures of the proposed method.
Restricted Boltzmann machine training
The standard DBN architecture is similar to the stacked network of several RBMs. Each RBM consists of one hidden layer and one visible layer, as shown in Fig. 1. The visible layer accepts the input data and transfers the input data to the hidden layer.

RBM structure.
Let
The joint probability of
The hidden units and the visible are independent, so the conditional probabilities of them are
Since there are no hidden-hidden or visible-visible connections, the conditional probability distributions of the units can be expressed as
The RBM training process can be described as follows. A training sample is first presented to the visible units to produce v
i
. Then, the hidden units h
j
are sampled according to the probabilities in Equation (9). Repeating this process once more to update the visible and then the hidden units produce one-step reconstructed states
In this paper, a momentum α ∈ [0, 1) is used in updating the weight. With the momentum, the weight update at the current epoch can be related to the previous epoch and formulated as
The RBM has been trained, and the next step is to construct the DBN model. The construction process of a DBN with three RBMs is shown in Fig. 2. Each RBM layer is trained by using the activation probabilities of the lower-layer RBM as the input data and its output is the input for the next RBM layer up. The visible layer and the first layer hidden layer (Hidden 1) forms the first RBM (RBM 1), the first layer hidden layer and the second hidden layer (Hidden 2) forms the second RBM (RBM 2), and the second hidden layer and the third hidden layer (Hidden 3) forms the third RBM (RBM 3). Finally, a Softmax classifier is added at the top layer for fault classification based on the learned deep features [13].

The construction process of a DBN with three RBMs.
The deep features learned by the standard DBN are usually high dimension and contain superfluous information. LPP is a new data fusion method, which can effectively recover the significant aspects of the intrinsic manifold structure. Thus, in this paper, LPP is used for removing the redundant information of the high-dimensional deep features to find the valuable low-dimensional representations.
Let
Following some algebraic steps, the minimum optimization problem in Equation (13) can be transformed into the following equation
Then, the transformation matrix can be acquired by solving the generalized eigenvalue problem
The fused low-dimensional vector Y preserves the local important information of the original deep features and will be used as the final input features of the classifier. Usually, Softmax is a good choice for fault classification based on the deep features. In a word, in the proposed method, the DBN is used as the feature extractor, LPP serves as the feature refiner, and Softmax as the fault classifier.
The flowchart of the proposed method is shown in Fig. 3, and the general procedures are summarized as follows.

The flowchart of the proposed method.
The fused deep features are fed into a Softmax classifier for fault diagnosis.
Experimental setup and data acquisition
Rolling bearing is the most widely applied part in rotating machinery, and the operating conditions of rolling bearing will directly affect the machine security. Thus, accurate identification of rolling bearing faults is meaningful. In this paper, experimental rolling bearing vibration data is used to verify the effectiveness of the proposed method. The experimental test rig consists of a rotation shaft, a driving motor, test bearings and couplings shown in Fig. 4. Both vertical and horizontal vibration signals are acquired by two piezoelectric acceleration sensors. The sampling frequency is 20 kHz, and the rotating speed is about 900 rpm.

Rolling bearing fault test rig.
In this experiment, by installing the test bearing with different types of faults, five kinds of rolling bearing operating conditions are created, which are normal condition, inner race fault condition, outer race fault condition, ball slight fault condition and ball severe fault condition. Here, we have to solve the 5-class fault diagnosis issue, which includes not only different fault categories but also different fault severity.
Each bearing condition consists of 100 samples, and each sample is a collected vibration signal segment containing 512 data points. In order to avoid contingency of the diagnosis results, the random 70 samples of each condition are used for training and the rest 30 samples for testing. More details about the nine bearing conditions are listed in Table 1 and Fig. 5 (first 20000 data points).
Description of the five rolling bearing operation conditions

The mearsured vibration signals of the five rolling bearing conditions: (1) Normal condition; (2) Inner race fault condition; (3) Outer race fault condition; (4) Ball severe fault condition; (5) Ball slight fault condition.

Detailed diagnosis results of different methods for ten trials.
Diagnosis results of different methods
For comparison, four other methods are also used to analyze the same dataset, including the standard DBN, BP neural network (BPNN), wavelet neural network (WNN) and SVM. Two important points need to be emphasized:
Completely different from the traditional intelligent methods, the proposed method mainly focuses on the intelligent fault diagnosis of rolling bearing without manual feature extraction. In other words, the inputs of the proposed method are always the 512-dimensional raw vibration data. The inputs of BPNN, WNN and SVM have two types. One is the 512-dimensional raw vibration data, and the other is the 10-dimensional feature parameters manually extracted from each raw sample. More details about the 10 feature parameters can be seen in Ref. [17].
In order to show the stability and effectiveness of the proposed method, ten trials are carried out to analyze the same dataset. Figure 6 shows the detailed results in each trial, and the average testing accuracies are listed in Table 2. It can be clearly observed from Table 2 that the average testing accuracy of the proposed method is 90.60% (1359/1500), which is slightly higher than the standard DBN (83.73%, 1256/1500), and much higher than BPNN (52.80% , 792/1500), WNN (58.27% , 874/1500) and SVM (52.20% , 783/1500) using the raw vibration data. After feature extraction, although the testing accuracies of BPNN, WNN and SVM increase to 76.13% (1142/1500), 84.33% (1265/1500) and 85.20% (1287/1500), respectively, their performance still cannot be compared with the proposed method. The standard deviation of the proposed method is 0.7337, which is much smaller than other seven methods (1.3771, 6.0189, 2.8940, 4.0515, 2.2717, 2.4954 and 1.2090). In addition, Table 2 also gives the average computing time of all the methods (Core i5, 12-GB memory). The average computing time of the proposed method is 28.29 s, compared with other methods, which are 27.61 s, 15.36 s, 5.55 s, 14.58 s, 4.86 s, 9.64 s and 3.27 s, respectively.
From Fig. 6, the testing accuracy of the proposed method in each trial is 90.00% (135/150), 91.33% (137/150), 90.00% (135/150), 90.67% (136/150), 90.67% (136/150), 91.33% (137/150), 92.00% (138/150), 90.00% (135/150), 90.00% (135/150) and 90.00% (136/150), respectively. Figure 7 gives the multi-class confusion matrix of the proposed method for the first trial. The ordinate axis of the confusion matrix represents actual label of classification, and the horizontal axis represents predicted label. Therefore, the element on the main diagonal of the confusion matrix represents the classification accuracy of each condition. From Fig. 7, we can find that the lowest accuracy happens between condition 4 (Ball severe fault condition) and condition 5 (Ball slight fault condition).

The multi-class confusion matrix of the proposed method for the first trial.
By comparison, some conclusions can be clearly concluded. (1) The diagnosis performance of WNN, SVM and BPNN depends largely on manual feature extraction. Probably their testing accuracies will be further improved after designing some new features or selecting the most sensitive features from the original feature set. However, it is a very time-consuming and labor-intensive task. (2) The proposed method shows much higher testing accuracy and better stability compared with BPNN, WNN and SVM using the raw vibration data. The superiority of the proposed method mainly arises from the layer-by-layer feature learning process, which is able to automatically capture the useful representative information from the raw data. The feature visualization of the layer-by-layer learning process is shown in Fig. 8, in which the LPP1 and LPP2 represent the first two principle components given by LPP. The results show that the higher layer features can represent the measured signals in a more precise and identifiable way than raw data features and lower layer features do. (3) The accuracy of the proposed method is slightly higher than the standard DBN. The reason is that the proposed method takes full advantage of feature learning and feature fusion, which can further enhance the quality of the deep features learned from the input data. (4) The computing time of the current deep learning models is more than BPNN, WNN and SVM due to the increase of the hidden layers and units. However, as the modern hardware technology and training algorithm develop rapidly, we fully believe that various deep learning models can be trained more efficiently in the near future [18].

The two-dimensional LPP projections of the learned features. (a) Raw data features, (b) Features in the first hidden layer, (c) Features in the second hidden layer, (d) Features in the third hidden layer.
Parameters selection of the proposed method
In this study, the main parameters of the proposed method are listed in Table 3. The architecture selection of deep learning models is still a great challenge. Currently, there is not a mature method in theory to select the optimal structure of deep learning models [19]. In this paper, when determining the DBN architectures, we follow a simple idea similar to Ref. [20]. Specifically speaking, we investigate how the proposed DBN model behaves as we increase the capacity both in depth (the number of hidden layers) and in breadth (the number of units per hidden layer). Figure 9 shows the evolution of average diagnosis accuracy (ten trials) as we increase the number of hidden layers (from 1 to 5) and the number of units per hidden layer (from 25 to 250). From Fig. 9, the architecture of the proposed DBN is selected as “512-100-100-100”. In other words, it has 3 hidden layers, and each hidden layer contains 100 units. The unit number of the input layer is decided by the dimension of the samples (512), and the unit number of the output layer is determined by the number of bearing operating conditions (5). Besides, Fig. 10 shows the relationship between the number of units per hidden layer and the computing time (fixed 3 hidden layers). It can be found from that the size of the hidden layers and units influence the feature learning performance of the deep model. Smaller size of the hidden layers and units may lead to lower accuracies, while larger size cannot overcome high computational cost.

The relationship between the average testing accuracy and the proposed deep architecture.

The relationship between the number of units per hidden layer and the computing time (fixed 3 hidden layers).
In the LPP algorithm, the fused feature dimension d and the number of the nearest neighbors k should be predetermined. In most cases, k is set to 12 by experience. Figure 11 shows the evolution of diagnosis accuracy (ten trials) as we increase the fused feature dimension d (from 1 to 20). It can be seen that the best choice for d of the proposed method is 7.

The relationship between the average testing accuracy and the fused feature dimension.
The main parameters of the other seven methods are described as follows.
Method 2 (Standard DBN): The architecture of the standard DBN is 512-100-100-100-5, which is also determined by experimentation. The learning rate is 0.1 and the iteration number is 120. Method 3 (BPNN with raw data): The architecture is 512-900-5, which is determined by the guiding principles and experiences. The learning rate is 0.1 and the iteration number is 600. Method 4 (BPNN with 10 features): The architecture is 10-21-5, the learning rate is 0.1 and the iteration number is 500. Method 5 (WNN with raw data): Morlet wavelet function is used as the activation function. The architecture is 512-900-5, which is decided by the guiding principles and experiences. The learning rate is 0.1 and the iteration number is 600. Method 6 (WNN with 10 features): Morlet wavelet function is adopted as the activation function. The architecture is 10-21-5, the learning rate is 0.1 and the iteration number is 500. Method 7 (SVM with raw data): RBF kernel is applied. The penalty factor and the radius of the kernel function are set to 20 and 0.071, respectively. Each of them is determined through a 10-fold cross validation. Method 8 (SVM with 10 features): RBF kernel is applied. The penalty factor and the radius of the kernel function are set to 40 and 0.28, respectively.
In this paper, a novel method called feature fusion deep belief network is proposed for rotating machinery fault diagnosis. The proposed method can be divided into three main stages: firstly, a DBN is constructed with a series of pre-trained RBMs for feature learning of the raw vibration data. Secondly, locality preserving projection (LPP) is adopted to fuse the deep features to further enhance the quality of the learned features. Finally, the fusion deep features are fed into S oftmax for automatic fault diagnosis.
The proposed DBN method is applied to analyze the rolling bearing experimental signals. The results confirm that the proposed method can get rid of the dependence on manual feature extraction to some degree, which is more effective and reliable than the traditional methods and the standard DBN. The future study will be paid more attention to improving the calculation efficiency of the proposed method. The DBN will be further investigated for more prognostics and health management (PHM) applications.
Footnotes
Acknowledgments
This research is supported by the National Natural Science Foundation of China (no 51475368), Shanghai Engineering Research Center of Civil Aircraft Health Monitoring Foundation of China (no GCZX-2015-02), and the Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University (no CX201710).
