Abstract
Features of raw bearing vibration signals aren’t invariant with the change of rotating speed. As a result, determining the proper features is essential for the feature learning based intelligent fault diagnosis method for rolling element bearing with varying rotating speed. To address this issue, a convolutional neural network (CNN) based fault diagnosis approach is proposed. In the proposed method, envelope order spectra extracted from the raw vibration signals are used to provide abundant information about the fault characteristic orders, which are features invariant to the rotating speed. Subsequently, to extract these representative features automatically, a CNN model is constructed and employed, which avoid the manual feature selection. Finally, the type of bearing defects can be recognized successfully. In the experimental verification, the CNN is trained using a data set corresponds to one revolution per minute (RPM), while the data sets correspond to other RPMs are employed to verify the classification accuracy of the trained CNN, which can reflect the effectiveness of proposed method for bearing fault detection under different rotating speed. Experimental results show the satisfactory performance of fault-pattern recognition for the proposed method. When compared with some other approaches using intelligence-based fault diagnosis method, the results show the superiority of the proposed method.
Introduction
Process monitoring and fault diagnosis has been a highly significant research direction in the past few decades [1], which can enhance the safety and reliability of the whole system so as to increase the economical profits for the industrial processes [2]. Rolling element bearings (REBs) are widely used as crucial elements in all kinds of rotating machinery. However, their harsh work conditions often result in bearing defects, which have significant influence on the healthy working of the whole machine. As a result, accurate and timely bearing fault detection is of great importance and has draw considerable attention.
To address this issue, vibration based fault feature extraction is accepted as an useful tool for the condition monitoring of REBs [3] and many useful signal processing methods have been proposed, such as envelope analysis [4], wavelet transform [5], spectral kurtosis [6], singular value decomposition (SVD) [7], empirical mode decomposition (EMD) [8], variation mode decomposition (VMD) [9] and so on. However, some problems still exist, for example, both proper center frequency and filtering bandwidth are needed for the envelope analysis. SVD is an efficient denoising method which may still be useless if the original vibration signal has low signal-to-noise ratio (SNR). EMD is usually used for noise reduction due to its excellent signal decomposition ability, however, the performance of EMD may still suffer from end effects and mode mixing greatly. What’s more, all these approaches highly depend on the expertise to achieve ideal results.
With the rapid development of the intelligent technology, machine learning has received more and more attention and has been applied to the fault diagnosis of REBs, which aims to identify the work condition of bearings automatically. Ju et al. [10] proposed an improved multi-scale entropy based fault diagnosis method for REBs and the support Vector Machine (SVM) is employed for fault classification. Singh et al. [11] combined the stockwell transform and SVM together for the detection of bearing fault in a three-phase induction motor. A time-frequency atoms-driven SVM method was proposed by Liu et al. [12] to identify the bearing defect. Jiang et al. [13] developed an probabilistic neural network (PNN) based fault recognition method for rotating machinery. The wavelet neural network was employed along with manifold learning by Wu et al. [14] to improve the accuracy of fault diagnosis of REBs. Apart from SVM and neural network, some other machine learning approaches are also applied, such as the extreme learning machine [15] and k-nearest neighbour [16]. However, manual feature extraction is usually essential for the above intelligent fault diagnosis approaches [17], which may be easily influenced by the background noise. Therefore, to ensure a high recognition accuracy, proper selection of fault features is of great importance.
Recently, deep learning (DL) has show significant advantages in feature extraction, which can learn the representation features from raw signals automatically and avoid the manual feature selection [18]. The DL methods such as auto-encoder, deep belief network (DBN), restricted boltzmann machine (RBM) and recurrent neural network (RNN) have been applied for the fault diagnosis of rotating machinery [19]. Shao et al. [20] developed an optimization DBN for the fault identification of REBs, and the structure of DBN is optimized by the particle swarm optimization (PSO). Chen et al. [21] employed the deep neural network for the diagnosis of rolling element bearings. Lu et al. [22] proposed the stacked denoising auto-encoder based fault diagnosis method for rotary machinery components. Jiang et al. [23] proposed an improved RNN for intelligent fault diagnosis of rolling bearings. Though excellent performance has be achieved by DL compared with some traditional machine learning methods, especially for large data sets [19], the DL based fault diagnosis method is still developing [24].
As one of the DL approaches, the CNN has achieved numerous investigation and has been employed to the fault diagnosis, including the REBs. Since the CNN is usually applied with two-dimensional (2-D) images as the input, to use this kind of CNN model, the measured one-dimensional (1-D) signals used for fault recognition should be converted to 2-D format while the time-frequency representation is usually employed [25–27]. Han et al. [28] also proposed an adaptive spatiotemporal feature learning approach to extract features from the 1-D signals in the 2-D format, which can elevate the diagnostic accuracy of CNN. Moreover, to avoid the complex preprocessing step above, fault detection based on the features of raw time-domain signals in 2-D domain is adopted [29] and the 2-D gray images are applied to the CNN models for the fault diagnosis [30–32]. In addition, 1-D CNN model uses raw signals have also been developed without the hand-craft feature extraction process. Zhang et al. [33] proposed a convolution neural network with training interference (TICNN) for bearing fault diagnosis, which worked directly on raw vibration signals. Jing et al. [34] applied the multi-sensor signals together to the CNN model for the fault diagnosis of planetary gearbox. Jia et al. [35] developed the deep normalized CNN for imbalance fault classification of machinery.
Though the CNN based fault diagnosis method have achieved great progress, due to complex working conditions of the REBs, the proper design and use of CNN is still a great challenge. Based on the existence research, the majority of CNN based bearing fault detection methods are verified using the training and testing data from the same source, which means the working condition of REBs are stable. However, in real application, the rotating speed may change, which has an impact on the features of raw vibration signals so as to influence the accurate identification of bearing defects. As a result, designing the CNN model which trained and tested using data corresponds to different rotating speed may be more meaningful, To address the problem, several researches are conducted [36–38]. Appana et al. [39] combined the envelope spectrum and CNN for reliable fault diagnosis of bearings with varying rotating speed, which concluded that the proposed method can improve the classification accuracy. However, features of the envelope spectrum may still change greatly if the RPMs have great variations.
In this paper, the fault characteristic orders which are features invariant to rotating speed is employed and a novel fault diagnosis method based on the envelope order spectrum and CNN is proposed. First, raw vibration signals collected under different RPMs are resampled from time-domain to angle-domain, and then the envelope order spectra are acquired based on the combination of Hilbert transform and fast fourier transform (FFT) which are stored in the form of 2-D images. Second, an effective CNN is established to extract features of these 2-D images. Finally, the performance of the proposed method is demonstrated through two experimental studies. The main contributions of this paper are summarized as follows. First, in order to provide features invariant to the change of rotating speed, envelope order spectrum is employed as the input of CNN. Second, to get rid of the manual feature extraction and selection, an effective CNN is utilized to automatically learn features from the input data. Third, the proposed method show significant improvement for the detection of bearing defects when the training and testing data are acquired under different rotating speed when compared with other DL and traditional intelligent methods.
The remainder of this paper is organized as follows. Section 2 provides brief introduction of CNN. The detailed process of the proposed method is described in Section 3, including the structure of the CNN model and the conversion method from time-domain signals to envelope order spectrum. Verification is presented in Section 4 using experimental bearing fault signals and some comparisons are also provided. Conclusions are draw in Section 5.
The basic theory of CNN
CNN is an important type of the deep neural networks which is inspired by visual structure [34], which has been widely accepted as an effective tool for numerous classification problems. A typical CNN consists three main layers, which are convolutional layer, polling layer and full-connected layer. The detailed describe of these layers are shown as follows.
Convolutional layer
In a convolutional layer, the input features maps are convolved with a number of 2-D trainable kernels [30], and the outputs are used as the input for the next layers. Different from other traditional neural networks, each kernel used in convolutional layer shares the same weighted parameters for all the patches of the input feature maps, so that the training time can be reduced, which can be called weights share. Supposing that the convolutional layer is the l-th layer in the CNN, and the convolution operation can be expressed as [17]:
Pooling layer is another important layer in CNN, which is employed to reduce the revolution of the input feature maps. In the pooling layer, the input feature maps are divided into sub-regions (the size of sub-regions can be selected in different types), and the average operator or the max operator is used to generate the output of these sub-regions. In this paper, the max pooling is chosen that can be expressed as [30]:
The full-connected layer is the last layer in the CNN, which follows a serious of combination of convolutional layer and pooling layer. Similar to the traditional multi-layer neural network, each neuron in the full-connected layer is connected to every neuron in the previous layer, which consists one or several hidden layers and a classification layer using the softmax function. The output of softmax function can be calculated as [28]:
2-D image construction method
In this paper, the CNN model is trained with a data set corresponds to one RPM while the fault diagnosis should be able to conduct based on the data sets correspond to the other RPMs. Hence, features invariant to rotating speed should be available in the 2-D images as the input of CNN. Since the fault characteristic orders are typical features for defective bearings that are constant with the change of rotating speed and usually applied for the bearing fault diagnosis under fluctuant rotating speed, the envelope order spectrum is adopted as the 2-D image in this study which can be acquired by applying the FFT to the angle domain signal. What’s more, to show more prominent fault features and reduce the influence of background noise, the envelope order spectrum within a certain range around the fault characteristic order is used.
To acquire enough training samples for the CNN model, a moving window is applied to the original measured signal to create numerous sub-signals. As shown in Fig. 1, the total length of the signal is L, the step is L
s
and the length of sub-samples is assumed as L
t
, then the maximum number of samples n
s
is:

Technique to acquire sub-signals.
Assuming that x (t) is one of the 1-D sub-signals, then the envelope order spectrum is obtained via the following steps:
Then the angle corresponds to the time for signal x (t) is:
All the envelope order spectra of the sub-signals are stored as grey-scale images, size of these images are determined to be 128×128 in this study.
Once the raw signals have been converted to the 2-D images, a CNN model can be trained and classify these images [24]. Based on the input size of 2-D grey-scale images, a CNN model is established, the parameters are listed in Table 1 and the structure of the proposed CNN model is exhibited in Fig. 2. Five convolutional layers are applied to learn the features from the envelope order spectra, where the filter size is 5×5 for the first two convolutional layers and 3×3 for the other three convolutional layers, while the number of kernels ranges from 16 to 256. The max-pooling layers are located after each convolutional layers whose sizes are 2×2, respectively. The ReLU is chosen as the activate function and the Batch Normalization is added before the activate function to improve the training efficiency of the network and enhance the generalization ability. Dropout (the neurons are randomly frozen with a probability P for each training process, and P = 0.5 in this study) is employed before the full-connected layer to prevent the over-fitting which further improve the generalization ability of the CNN model. Moreover, zero-padding is applied on the CNN model (displayed in Table 1) which can prevent the dimension loss of the feature maps.
Parameters of the CNN model
Parameters of the CNN model

Structure of the proposed CNN model.
The proposed fault diagnosis method for the REBs with variable rotating speed mainly comprises the following steps and a flowchart is presented inFig. 3:

Flowchart of the proposed method.
In this part, experimental signals are employed to verify the effectiveness of the proposed CNN-based fault diagnosis method. The CNN model and other comparison intelligence based methods are written in Matlab, the model of CPU and GPU are Intel Core i5-4200 and NVIDIA GeForce GT 750M, respectively.
Case 1: Study on well-known fault diagnosis data set
In this section, the experimental signals are from Case Western Reserve University [40]. Four bearing conditions are simulated in the experiment, including normal (No), inner race fault (IF), outer race fault (OF) and roller fault (RF), signals of drive end bearing with the fault diameter of 0.54mm are selected in this verification. Meanwhile, three motor speeds are selected (1797RPM, 1772RPM and 1750RPM), while the load are 0hp to 2hp respectively. Each signal is recorded at the sample frequency of 12000Hz.
The fault characteristic orders for different localized defects can be calculated by the following equations:
To ensure enough resolution for the order spectrum, the length of sub-signals are determined to be L t = 4000 in this verification. Based on the fault characteristic orders, all the envelope order spectra used for CNN model are shown range from 2.5 to 8 order. In addition, all the envelope order spectra are normalized in advance to ensure the amplitude of all components range from 0 to 1.
Three data sets with 450×4 samples (450 samples for each condition) are established with the rotating speed of 1797r/min, 1772r/min and 1750r/min, respectively. The randomly selected conversion results are presented in Fig. 4.

Conversion results for all bearing conditions.
Stochastic gradient descent with momentum optimizer is employed for training the CNN to reach the minimum value of loss function, and cross entropy is adopted as the loss function. In the training progress of this study, the maximum number of epochs is 20. To prevent over-fitting, early-stopping method is used which stops the training if the value of loss function on the validation set can be larger than or equal to the previously smallest loss for 5 times. According to the results of numerous trials, the learning rate and batch size are determined to be 0.002 and 50, respectively.
Six tests are conducted where the training and testing data are from the different data sets, detailed information of these tests are presented in Table 2. For each test, total samples 1800 of training data set are employed for network training, while 600 samples (150 samples for each bearing condition) are randomly selected in the testing data set for network validation and test. Each test is performed for ten trails, and the average results are presented in Table 3. Meanwhile, the time for training and classification is also recorded.
Detailed information of the six tests in case 1
Detailed information of the six tests in case 1
Results of the four methods
From results displayed in Table 3, Test 4 achieves the best result, its mean accuracy is 99.22%, while test 2 achieves the minimum accuracy of 96.38%. All six tests achieve high classification accuracy using the proposed fault diagnosis method. which reveals that the proposed method can effectively and stably detect bearing fault with varying rotating speed.
Three comparison methods are applied to further demonstrate the superiority of the proposed method for bearing fault diagnosis under varying rotating speed, which are the CNN based approach proposed in Ref. [24] with one full-connected layer, the DBN based approach with two hidden layers and the stacked auto encoder (SAE) based approach with only one hidden layers. For the DBN and SAE based methods, the number of neuron in each layer is 200, the learning rate is 1 and the epoch for the training stage is chosen to be 100. The feature vectors for DBN and SAE is constructed according to Ref. [21] using features of time domain, frequency domain and time-frequency domain, and the level for wavelet packet decomposition is set to be 5. According to the pre-test, all the three comparison can achieve high diagnostic accuracy if the training and testing samples are from the same data set.
Results of the three comparison methods are still presented in Table 3. To show the comparison results more clearly, the average classification accuracy of the six tests using all four methods are exhibited in Fig. 5. As shown in Table 3 and Fig. 5, the diagnostic accuracy of the three comparison for test 1 to test 6 are not stable, for example, the best result for SAE is 98% for test 4 while the worst result is only 66.17% for test 5. What’s more, in the same test, almost all the best results are achieved by the proposed method, which further demonstrates the effectiveness of the proposed method.

Analyzed results of test 1 to test 6 using the four methods.
Due to five convolutional layers are employed and the size of 2-D input images are bigger than those in Ref. [24], the proposed method is more time-consuming for both training and classification process when compared with other three methods. But the time for classification is only around 1.65 seconds for each test using the proposed method with 600 testing samples. Hence, if the proposed method is applied for real-time fault detection, and the CNN model is trained offline in advanced, the time for online implementation phase could be acceptable.
Meanwhile, the construction of input samples of the proposed method is simple compared with the DBN and SAE based method (this time is not included in the training and classification time shown in Table 3), which makes the proposed method suitable for real-time application.
As in case 1, signals with little rotating speed variation are employed which may not have a big influence on the features of raw vibration signals with the change of rotating speed. Hence, to further demonstrate the validity of the proposed method, the experimental signals acquired from a test rig with larger speed variation is used in this section.
The experimental data
The test rig showed in Fig. 6(a) consists several main components which are bearing support structure, main shaft, experimental bearing, lubricating oil system, servo-driven motor, radial loading device, axial loading device and control system.

The experimental setup in case 2: (a) the test rig, (b) bearing with inner race fault, (c) bearing with outer race fault, (d) bearing with roller fault.
Two measuring points are selected to collect the vibration signals, one (point 1) is near the defective bearing while the other (point 2) is away from the defect bearing. The type of test bearing is NSK 7010c, the outer and inner diameter of the bearing are 80mm and 50mm respectively, thus the pitch diameter is D = 65mm; the diameter and number of rolling elements are d = 8.7mm and Z = 19; the contact angle is α=15°. Hence, the theoretical O j , O o and O b is calculated as 10.71, 8.27 and 3.67, respectively. The experimental signals are collected by the B&K vibration test system, the models of the acceleration sensor, the signal conditioning and acquisition module are B&K4354B-004 and B&K3053-B-120, respectively.
During the experiment, four conditions of REBs are also used, inner race fault (shown in Fig. 6(b)), outer race fault (shown in Fig. 6(c)), roller fault (shown in Fig. 6(d)) and normal conditions (NO). Meanwhile, three rotating speeds are employed which are f r 1 = 2000r/min, f r 2 = 3000r/min and f r 3 = 4000r/min. The sampling frequency of the data acquisition device is 32768 Hz. Figure 7 displays the vibration signals of four bearing conditions under all three rotating speeds.

Experimental signals in case 2.
In this section, the length of sub-signals are chosen to be L t = 8192, and the length of step L s =L t /2 = 4096. For the measured signal with roller fault, the amplitude of the 2th harmonic of the fault characteristic order is more prominent in the envelope order spectrum than other components, therefore, all the envelope order spectra used for CNN model are shown range from 7 to 12 order and the normalization is also conducted. Three data sets with 450×4 samples (450 samples for each condition) are established with the rotating speed of 2000r/min, 3000r/min and 4000r/min, respectively. The conversion results are exhibited in Fig. 8.

Conversion results for all bearing conditions.
The basic structure of the CNN and the diagnosis process in this case are the same with case 1. Six tests are used to show the performance of the proposed method, and the detailed information of these tests are shown in Table 4.
Detailed information of the six tests in case 2
Detailed information of the six tests in case 2
As point 2 is away from the defective bearing, the vibration signals measured at this point may contain more noise and interference components, which may influence the the performance of the proposed method. As a result, the experimental signals measured at point 2 are firstly used and the results are shown in Table 5.
Results of the proposed method using signals measured on point 2
From results displayed in Table 5, the proposed fault diagnosis method provide a reliable diagnostic performance. All six tests achieve high classification accuracy with low standard deviation, which reveals that the proposed method can effectively and stably detect bearing fault with large rotating speed variation.
To show the influence of the complicated transmission path, signals measured at point 1 are employed. Furthermore, a fault feature enhancement approach is utilized based on the maximum second-order cyclostationary blind deconvolution (CYCBD) [41], and the preprocessed signals are then used for analysis using the proposed method. Therefore, three types of signal are employed in the next step, which are the raw signal at point 1, the raw signal at point 2 and the preprocessed signal at point 2.
Six tests are also conducted and the results are presented in Fig. 9. The results show that: (1) due to the spectral analysis method is highly sensitive in noisy conditions, hence, signals with little noise and interference components may provide more prominent features in the envelope order spectra, which is benefit for the fault diagnosis under varying rotating speed using the proposed method. As a result, better performance is achieved using raw signal at point 1; (2) a good performance can be achieved using the preprocessed signals, which indicates that some simple signal processing approach can enhance the diagnostic ability of the proposed method, especially for raw signals with low signal-to-noise ratio.

Analyzed results of test 1 to test 6 using the three type signals.
What’s more, the effectiveness of the proposed method is also demonstrated using the training and testing samples from the same data set. For each data set, 350×4 samples are randomly selected for training while the other 100×4 are testing samples, the analyzed results are presented in Fig. 10 using the three type signals described above. Results in Fig. 10 show that nearly 100% diagnostic accuracy can be achieved despite the difference of input samples, even the worst accuracy achieved from the preprocessed signal at point 2 is 99.9% (parameters in the CNN isn’t very appropriate may account for the worst result), which further shows the effectiveness of the proposed method.

Analyzed results when training and testing samples are from the same data set.
The results of the proposed CNN based method are still compared with the results achieved by three DL methods. Moreover, another two machine learning approaches are also employed, including the SVM based method and the PNN based method. All the methods use the experimental signals collected at point 1. As for the traditional intelligent fault diagnosis method, manual feature extraction should be conducted in advance, hence, ten statistical features shown in Table 6 [30, 42] are selected for SVM and PNN. Radial basis function (RBF) is employed as the kernel function in SVM and the spread is chosen as 0.6 in PNN. Table 7 shows average classification accuracy of the six tests using the five comparison approaches and the results are also plotted in Fig. 11.

Analyzed results of test 1 to test 6 using the six methods.
Ten statistical features for SVM and PNN
Comparison results using signals measured at point 1
From the results, it can clearly seen that the proposed CNN method outperforms five comparison methods in terms of the mean accuracy, which can also achieve more stable results in the six tests. The comparison results indicate that with the change of rotating speed, the features of the raw vibration signals are also changed, which pose a great challenge to the fault diagnosis method based on feature learning. Meanwhile, as the envelope order spectra can provide features invariant to rotating speed, which results in the superior ability for the fault diagnosis under varying rotating speed using the proposed method.
In this paper, a new fault classification algorithm based on the CNN and the envelope order spectrum is proposed to detect fault type of REBs under variable rotating speed. Envelope order spectrum is employed as the input of CNN, which can provide fault related features invariant to rotating speed. Subsequently, a robust CNN is constructed to automatically learn underlying fault features for completing fault classification. Experimental signals are employed to demonstrate the effectiveness ofthe proposed method. Analyzed results show that the proposed method can achieve high diagnostic accuracy under different rotating speed. Meanwhile, the proposed method is more effective and robust than some other intelligent approaches, and the diagnostic ability of the proposed method can also be improved combined with proper signal preprocessing method. However, some limitations are still exist in the proposed method, more work is need to optimize the construction method of CNN model and demonstrate the performance when analysis the bearing signals with compound faults.
