Abstract
Intelligent bearing fault diagnosis plays an important role in improving equipment safety and reducing equipment maintenance costs. Noise in the signal can seriously reduce the accuracy of fault diagnosis. To improve the accuracy of fault diagnosis, a novel noise reduction method based on weighted multi-scale morphological filter (WMMF) is proposed. Firstly, Teager energy operator (TEO) is used to amplify the morphological information of the signal. Then, a scale filtering operator using envelope entropy (SFOEE) is proposed to select appropriate scales. At these scales, the noise in the signal can be adequately suppressed. A new weighting method is proposed to integrate the selected scales to construct the WMMF. Finally, multi-headed self-attention capsule restricted boltzmann network (MSCRBN) is proposed to diagnose bearing faults.The performance of the TEO-SFOEE-WMMF-MSCRBN fault diagnosis method is verified on the CWRU dataset. Compared with existing fault diagnosis methods, this approach achieves 100% identification accuracy. The experimental results indicate that the proposed diagnosis method can effectively resist noise and precisely diagnose bearing faults.
Keywords
Abbreviations
Teager energy operator
scale filtering operator using envelope entropy
weighted multi-scale morphological filter
Restricted boltzmann machine
multi-headed self-attention capsule restricted boltzmann network
Introduction
Rolling bearings are one of the most important components of rotating machinery and their fault can affect the safety of the equipment. Bearings are exposed to harsh environmental conditions of high temperatures, high speeds and high loads during operation, so their performance degrades with time. In such cases machinery does not work properly and even some failures can have catastrophic consequences, leading to safety incidents and causing huge economic losses. To reduce all kinds of financial losses caused by untimely handling of equipment failures, intelligent bearing fault diagnosis methods are widely used. However, most of the fault diagnosis methods nowadays are based on data-driven methods and the quality of the data collected will usually affect the accuracy of the fault diagnosis. Due to the fact that the acquisition equipment is subject to strong noise interference from the environment and from itself, all data is often loaded with noise from various frequency bands. Noise in the signal can seriously interfere with the diagnosis of bearing faults. Suppressing noise in signals and improving diagnosis accuracy is one of the key issues in the field of bearing fault diagnosis [1–3].
Bearing vibration signal is a non-linear and unstable signal and the acquired signal is mostly disturbed by strong noise. This makes traditional signal analysis methods unable to accurately diagnose bearing faults in practical applications [4]. In recent years, data-driven bearing fault diagnosis methods have become a new research topic, and the combination of deep learning methods and traditional signal analysis methods has led to an increase in the diagnostic accuracy of bearing faults. In order to distinguish different fault signals, various feature extraction methods are widely used. Feature extraction methods such as wavelet transform (WT) [5], empirical modal decomposition [6] and variational modal decomposition [7] have played a great role in fault diagnosis as a supplement to traditional time-frequency domain features. Machine learning methods such as SVM play an important role in the field of fault diagnosis [8–10]. However, such methods are more difficult in parameter determination and need to rely on some parameter optimization algorithms [11]. The accuracy of these methods is relatively insufficient when the sample size is large. So deep learning based fault diagnosis methods are gradually being widely used. Deep learning models such as Long Short-Term Memory Network (LSTM) [12], Deep Belief Network (DBN) [13] are widely used in the field of fault diagnosis. On the basis of traditional deep learning methods, some modern deep learning based diagnostic models have been proposed. Zhang et al. proposed a fault diagnosis method that combines multi domain image features of signals with improved CNN [14]. Feng et al. proposed a diagnosis model combining improved singular value decomposition (ISVD) with deep residual network (ResNet) for diagnosis [15]. Wang et al. proposed a diagnosis model by combining WaveletKernelNetwork, BiLSTM, and attention mechanism [16]. Chen et al. proposed a fault diagnosis method based on envelope analysis [17].
The acquired bearing vibration signals are inevitably disturbed by noise. Noise reduction methods can effectively reduce the proportion of noise in the signal. Mathematical morphology as a nonlinear signal noise reduction technique plays a great role in the field of bearing fault diagnosis. Mathematical morphology uses different types of structural elements to perform morphological operations with signals to achieve noise reduction. With the development of mathematical morphology, many mathematical morphological operators have been proposed to solve the problem of filtering low-frequency noise and high-frequency noise. Compared with traditional noise reduction methods such as wavelet transform, mathematical morphology is able to retain and enhance the morphological information in the signal while reducing noise. Compared with traditional noise reduction methods such as wavelet transform, mathematical morphology can well retain and enhance the morphological information in the signal while reducing noise. Therefore, the signal processed by mathematical morphology can be sufficiently used by deep learning models to extract features. Mathematical morphology was originally proposed as an image processing algorithm [18]. However, in recent years, mathematical morphology has started to be applied to the processing of one-dimensional signals and has produced good results in both feature extraction [19] and signal noise reduction [20]. Yan et al. combined morphology-hat product operation (MHPO) and diagonal section spectroscopy (DSS) to propose a enhanced scale morphological-hat product filtering (ESMHPF) [21]. Li et al. proposed a method for selecting morphological operators in the problem of fault diagnosis of bearings [22]. Due to the complexity of the signal, a single scale morphological operator does not work well. So multi-scale morphological filtering methods are starting to be widely used. Li et al. proposed a multi-scale morphological filter based on a feature selection framework [23]. Yu et al. proposed a morphological filter based on the whale optimisation algorithm (WOA) [24]. Luo et al. proposed an adaptive multi-scale morphological filtering method [25].
All of these researches have made significant contributions to the fault diagnosis of rolling bearings. However, the noise immunity of their methods is not sufficient and the accuracy can still be improved. The method proposed in this paper can effectively reduce the noise in the signal and improve the accuracy of fault identification. TEO-SFOEE-WMMF is proposed to reduce noise and MSCRBN is proposed to identify bearing faults. The performance of the proposed diagnosis method is verified on CWRU dataset. The experimental results show that the performance of the proposed multi-scale morphological filter is superior to other morphological filters and that the proposed fault diagnosis method is superior to other diagnosis methods.
The main contributions of this paper are as follows:
(a) TEO is introduced to amplify the morphological information of the signal.
(b) SFOEE is proposed to select less noisy scales, which optimizes the size of the scale space of the WMMF and reduces the computational effort in constructing the WMMF.
(c) A new weighting method is proposed to integrate the scales selected by SFOEE and to construct the WMMF.
(d) A new classification model MSCRBN is proposed. The signals processed by the TEO-SFOEE-WMMF are fed to the MSCRBN to recognize bearing faults.
The rest of this paper is organized as follows: the part 2 details the TEO-SFOEE-WMMF filtering method. The part 3 details the structure of MSCRBN. The part 4 verifies the effectiveness and superiority of the diagnosis method proposed in this paper, and finally this paper is concluded in the part 5.
The proposed filtering method using an improved weighted multi-scale morphological filter
The mathematical morphological theory of one-dimensional signal
Let f (n) be a one-dimensional discrete signal on the set F = {0, 1, 2, ⋯ , n - 1} and g (m) be a structure element (SE) on the set G = {0, 1, 2, ⋯ , m - 1} and set n ≥ m. The basic one-dimensional morphological operators include the erosion operator, the dilation operator, the opening operator, and the closing operator. All morphological operators are derived from these four basic operators. The four operators are defined as follows.
According to the above discussion, the opening-closing operator and the closing-opening operator is able to remove the positive and negative pulse components of the signal. The two operators are defined as follows.
There is a statistical deviation between the two operators: the FOC operator makes the amplitude of the signal smaller while the FCO operator makes the amplitude of the signal larger. This makes it impossible to use FCO and FOC directly for signal noise reduction. To solve this problem, the Combination morphological filter (CMF) are introduced as follows.
The ACDIF [20] is adopted as the basic morphological filter for the filtering method proposed in this paper. The prototype of the ACDIF operator is the CMF operator and they share a similar analytic form. In order to introduce ACDIF, the dilation-closing operator, the closing-dilation operator, erosion-opening operator and the opening-erosion operator are defined as follows.
The combination of the 4 operators above gives the ACDIF operator as follows. Both the corrosion operator and the open operator have the ability to suppress positive pulses, so the combination of the corrosion operator and the open operator will enhance the suppression of positive pulses. ACDIF combines the above ideas and therefore has a superior noise reduction performance.
Introduction of multi-scale morphological filtering theory
Single-scale morphological filtering has theoretical limitations because of the difficulty of choosing the optimal scale and the tendency to lose important feature information of the signal. The multi-scale morphological filtering method compensates for the shortcomings of the single-scale morphological filtering method. It uses multi-scale structure element to process the signal, which can effectively reduce the information loss during the filtering process. The multi-scale structural elements are defined as follows.
λ is the scale and thus the erosion operator and the dilation operator can be rewritten in multi-scale form as follows.
The F
DC
,F
CD
and F
EO
can be rewritten in multi-scale form as follows.
The multi-scale ACDIF operator is defined as follows.
The shape of the structure element is extremely important for the construction of a morphological filter. The structure elements come in several shapes: triangular, circular and flat. Considering that the vibration signal is one-dimensional data, the flat-shaped structure element (e.g. g = {0, 0, 0}) is used in this paper.
The determination of the scale range for multi-scale morphology is also important. The scale corresponds to the length of the structure element. The maximum value of the structure element length can be given by the spacing of the signal peak points [26], or can be calculated by
In practice, it is usually not necessary to have so many scales to reduce the noise of a signal. A large number of scales can lead to excessive computational effort in signal processing. As the scale increases, the morphological information of the output signal of the filter decreases. Even though the signal is relatively less noisy, such a scale is not sufficient to participate in the model building task.
In this paper, a novel method of scale selection based on envelope entropy is proposed. Let the envelope entropy be H
e
(λ) and define the scale selection operator using envelope entropy (SFOEE) as follows.
Equations 24–26 give the formula for SFOEE. h is the output signal of ACDIF at a given scale and f - h is the noise filtered out by ACDIF. k represents the loss of information in the signal after filtering. Equation 26 normalizes k to between 0 and 1. The envelope entropy reflects the sparse feature of the signal. As the envelope entropy decreases, the sparse property of the signal increases. Therefore envelope entropy is used to calculate the loss of information in the filtered signal. SFOEE measures the information loss during the signal filtering process. As the value of SFOEE decreases, the importance of the scale increases. The SFOEE-based scale filtering method is defined as follows. First, for each scale in the scale space {λ1, λ2, ⋯ , λ max }, the SFOEE of the scale is computed. Then, according to the threshold, the scales with SFOEE values greater than the threshold are removed.
Once the scales have been determined, it is important to integrate the filtering results at these scales. Typically, integration schemes for multi-scale problems use the idea of weighted averaging. This is why it is important to determine the weights of the different scales. The traditional way of setting weights would be to set them all to 1, but this does not work well. This is because the results of the analysis at different scales generally have a different impact on the overall noise reduction performance. There are two modern methods for setting weights: 1. adaptive estimation of weights using heuristic algorithms [24], and 2. formal definition of the relationship between weights and scales.
This paper proposes a new definition of the relationship between weights and scales. For the input signal f and the structure element g, the weight corresponding to the scale λ is defined as follows.
Where f1 is the negative pulses of the noise and f2 is the positive pulses of the noise. α and β represent the weights of positive and negative impulses, respectively. The wλ represents the importance of the scale. A larger wλ indicates that the scale signal is less noisy and the scale is more important. Based on the above weighting method, the improved weighted multi-scale morphological filter is defined as follows.
A new morphological filter is proposed in this paper. The filtering framework proposed in this paper introduces the Teager Energy Operator (TEO) as a data pre-processing operator. TEO can detect the transient energy of the signal. After TEO processing, the morphological information in the signal is amplified and more easily captured. For the signal f, TEO is defined as follows.
The filtering framework (TEO-SFOEE-WMMF) proposed in this paper first processes the signal using TEO. The processed signal is then filtered with the proposed WMMF. In this process, SFOEE is used for scale number optimization. Finally, the filtered signal is used to train the MSCRBN. The flow of TEO-SFOEE-WMMF is shown in Algorithm 1.
Raw signal f, maximum scale of analysis
The filtered signal f o
1: Use TEO to pre-process f.
2: Extract the filter output at different scales.
3: Initialize scale_array.
4: Scale filtering with SFOEE.
for i=1 to λ max do
if SFOEE(λ i ) ≤T
scale_array[i] = λ i
5: Build the multi-scale morphological filter.
6: Filter f and return the result.
Restricted boltzmann machine
Restricted Boltzmann Machine (RBM) is a typical unsupervised non-linear feature learning model and it is a two-layer neural network. The data input layer (also called visible layer) is composed of visible unit v = (v1, v2, v3, . . . , v n ), and the hidden layer is composed of hidden unit h = (h1, h2, h3, . . . , h m ). Visible unit and hidden unit are connected by weight matrix wn*m. The RBM can represent a discrete distribution. The energy function of the RBM is defined as follows:
Where v
i
and h
j
represent the states of the i-th unit in the visible layer and the j-th unit in the hidden layer, respectively.The a
i
and b
j
are the bias and w
ij
= w
ji
is the bidirectional weight between the i-th unit and j-th unit. The smaller the value of the energy function, the higher the Boltzmann probability. The conditional probability of moving from one unit to another is defined as follows.
Where σ is a sigmoid function that normalizes the data between 0 and 1. In this paper, the contrast scattering algorithm (CD-k) is used to accelerate the training of the RBM and take k = 1.
Due to the shortcomings of CNN in ignoring spatial information and inefficient pooling operations, capsule networks were first proposed as an alternative model to CNN in 2011 by Hintion et al. An improved version of capsule networks was proposed by their team in 2017. The framework of a capsule network contains a convolutional layer, a primary capsule layer, a digital capsule layer and a decoder. The convolutional layer extracts the features of the input data. The features are further processed by the capsule layers, and two adjacent capsule layers rely on a dynamic routing algorithm to pass information between them. The decoder reconstructs the input data from the features [27].
Capsule networks have achieved great results in the field of target detection and are also widely used in the field of fault diagnosis. The primary capsule layer is located at the lowest end of the capsule layer. The primary capsule consists of a number of convolutional units which perform the feature processing on the output of the convolutional layer. The number of capsules in the digital capsule layer is equal to the number of classes in the classification problem. Unlike the scalar output of neurons, the capsule outputs are vectors. To express the degree of association of the samples with each category in terms of the length of the output vector, the squashing function is used to activate the output vector of the capsule. The squashing function is a non-linear function. For the output s
j
of the jth capsule, squashing is defined as follows.
The digital capsule traverses all the capsule outputs of the previous layer, weighting and summing them. The formula is as follows. where u
i
is the capsule output of the previous layer.
The output of the digital capsule layer will be used to calculate the margin loss. The formula of calculating the loss is as follows.
The self-attention mechanism is well suited to capture relationships between elements that are distant in the one-dimensional data. The self-attention module has three weight matrices Q,k,V. For multi-headed self-attention module, each head has a different weight matrix [28, 29]. In the multi-head self-attention module, the scaled dot product attention is used to calculate the output of each head. The output of each head is concatenated and then multiplied with the weight matrix W. Let x be the input data and d
i
be the scale factor, the formula for the multi-headed self-attention module is as follows.
In this paper, a multi-head self-attention capsule restricted Boltzmann network (MSCRBN) is proposed to diagnose bearing faults. The proposed model consists of three parts: the RBMs-convolution feature detector, the capsule network layer and the multi-head self-attention layer.
In this paper, the RBMs-Convolution feature detector is constructed by combining the RBM with a one-dimensional convolution layer. The RBM compresses the data when extracting features and the compressed data is more easily parsed by the convolution layer. The capsule network layer is the backbone part of the whole model. This layer processes the features extracted from the RBMs-Convolution layer into features that can be used to classify faults. The multi-headed self-attention layer is used to enhance the capsule network layer. In the capsule network layer, the decoder reconstructs the input data. The multi-headed self-attention mechanism is used to parse the long-term dependency information in the reconstructed data and extract global features of the reconstructed data.
The structure of MSCRBN is given in Fig. 1. The 3-layer RBMs are proven to work well in this structure. The number of units in the hidden layer of the RBM is 1024, 512 and 256 respectively. The input channel of the convolution layer is 1 and the output channel is 20. Kernel size is 15 and stride is 2. Relu is used as the activation function of the convolution layer. The primary capsule layer has 20 channels and each primary capsule contains 4 one-dimensional convolution units. The kernel size of each convolution unit is 15 and stride is 2. The digital capsule layer has 6 digital capsules corresponding to the 6 categories of bearing failure. Each digital capsule has a dimension of 8. The digital capsule layer has 6 digital capsules corresponding to the 6 categories of bearing failure. The 2-norm of the activation vector of a digital capsule represents the probability that the sample belongs to this class. The decoder layer receives the output from the previous layer and reconstructs the original data. The multi-headed self-attention mechanism is used to augment the reconstructed data. Some other parameters are as follows: the number of iterations of the dynamic routing algorithm is 3, the model uses the Adam optimization algorithm and has a learning rate of 0.001, the batch size is chosen to be 30 and the maximum epochs is 500. The flow chart of the fault diagnosis method proposed in this paper is shown in Fig. 2.

The structure of MSCRBN.

The flowchart of TEO-SFOEE-WMMF-MSCRBN fault diagnosis method.
Introduction to experimental data
The performance of the proposed TEO-SFOEE-WMMF-MSCRBN fault diagnosis method is validated with the CWRU bearing dataset [30]. The experimental platform is shown in the Fig. 2. The platform consists of a 1.5kw motor, a torque sensor, a decoder and a dynamometer. The faults of bearings are all single point faults with electrical discharge machining. The bearings can be divided into the drive end bearings and the fan end bearings. Types of faults include inner race faults, outer race faults and ball faults. The outer race faults can be further divided into 3 o’clock, 6 o’clock and 12 o’clock faults.
The data of the drive end bearing 6205-2RS JEM under different loads are used as experimental data in this paper. The bearing fault diameter is 0.007 and the motor sampling frequency is 12 kHz. Data of one normal state and five fault states were sampled at different loads. Overlapping sampling is used by this paper to increase the number of samples [31]. For each state, 500 sets of samples are collected, each containing 1024 points. 350 samples are randomly selected to be the training set, 50 samples are randomly selected to be the validation set, and 100 samples are selected to be the test set. The structure of the experimental dataset is shown in Table 1. The descriptions of bearing states are shown in Table 2. Accuracy, precision, recall and weighted F1-score are used to evaluate the experimental results.
The structure of the experimental dataset
The structure of the experimental dataset
Description of bearing state
To determine the number of layers of RBMs, the MSCRBNs with different number of layers of RBMs were trained on Dataset1, Dataset2 and Dataset3. Since the sample contains 1024 points, the alternative depths are from 1 to 3. The structures of RBMs with different number of layers are 256,512-256,1024-512-256. The samples are first processed by the TEO-SFOEE-WMMF and then used to train the MSCRBN. In order to fully extract the morphological information of the signals and to reduce the overhead of the computational resources, the maximum analysis scale of the WMMF is set to 28. Taking Dataset1 as an example, the scale filtering effect of SFOEE is shown in Fig. 3. The SFOEE threshold is set to the average of all SFOEE values. For each curve in Fig. 3, the WMMF is constructed using only the scales corresponding to the curves below y = threshold. As shown in Table 2, for three-layer RBMs, MSCRBN achieves 100% accuracy on all three datasets.

SFOEE curves of different bearing states.
Classification accuracy of MSCRBN with different depths of RBMs
In order to verify the performance of the proposed TEO-SFOEE-WMMF filter, several different filters are used for comparison. Literature [19] proposed the Teager energy kurtosis to find the optimal morphological scale, and as a result, an optimal scale morphological filter (TEKMF) was proposed. Literature [23] proposed a multi-scale morphological filter (WOA-MMF) based on the Whale Optimisation Algorithm (WOA). The comparison results of this paper’s method with the TEO-removed version of TEO-SFOEE-WMMF, TEKMF and WOA-MMF on dataset 1 are shown in Table 4 and Fig. 4. The experimental results show that TEO-SFOEE-WMMF has superior performance to several other methods. Some of the conclusions are listed below.TEO can enhance the performance of the proposed filter. Multi-scale based morphological filters are significantly better than single-scale morphological filters. The proposed weighting method is superior to the adaptive weighting method (WOA-MMF).
Classification indicator values of different filtering methods
Classification indicator values of different filtering methods

Classification effects of defferent filtering methods.
After being processed by the TEO-SFOEE-WMMF, the data are fed to MSCRBN, CNN, BP and SVM respectively. The CNN consists of three one-dimensional convolutional layers. The output channels of the convolutional layers are 512, 256, and 128, and the convolutional kernel sizes are 64, 32, and 16, respectively. The model structure of BP is 1024-512-256-6. The learning rate, batch size and optimiser of CNN and BP are the same as MSCRBN. The SVM penalty factor is set to 50 and the kernel function is set to a Gaussian kernel with kernel parameter 0.0001. The comparison results are shown in Table 5 and Fig. 5. The experimental results show that MSCRBN is superior to several other models and that MSCRBN is the most suitable model for TEO-SFOEE-WMMF model.
Classification indicator values of different models
Classification indicator values of different models

Classification effect of different models.
Signals can be disturbed by background noise during the acquisition process. Usually the amount of noise in the signal is given by the signal-to-noise ratio SNR. Let PS be the signal power and PN be the noise power, the SNR is defined as follows.
A smaller SNR represents a larger percentage of noise in the signal, and a SNR of 0 makes the noise percentage equal to the signal percentage. In order to verify the effectiveness of MSCRBN in strong noise environments, The TEO-SFOEE-WMMF preprocessed data are added with 0db, 2db, 4db, 6db and 8db of white noise respectively and then used to train the model. The model performance under different noises is shown in Table 6 and Fig. 6. The accuracy of MSCRBN decreases slightly when the noise reaches 2db and 0db, but is still above 99.67%. The experimental results show that the noise resistance of MSCRBN is superior to that of other models.
Classification accuracy of different models under white noise

Classification effect of different models under white noise.
The effectiveness and superiority of the TEO-SFOEE-WMMF filtering method and the MSCRBN model proposed in this paper have been verified through comparative experiments. In order to further validate the superior performance of this paper’s method, several fault diagnosis methods have been used to compare with the TEO-SFOEE-WMMF-MSCRBN fault diagnosis method. The experimental results are shown in Table 7 and Fig. 7. The experimental results show that the performance of the method proposed in this paper is superior to several diagnosis methods.
Classification indicator values of different fault diagnosis methods
Classification indicator values of different fault diagnosis methods

In this paper, a noise reduction method based on multi-scale morphological filter is proposed. Firstly, TEO is used to amplify the morphological information of the signal, which makes the signal more amenable to morphological analysis. Then, a new multi-scale mathematical morphological filter is constructed. This filter uses ACDIF as the base operator. SFOEE is proposed to select the useful scales. This approach reduces the number of scales that need to be integrated into the morphological filter and improves the performance of the filter. A new weighting method is proposed to construct a WMMF, which gives a formal definition of the relationship between scales and weights. A new fault diagnosis model MSCRBN is proposed in this paper.MSCRBN combines unsupervised and supervised learning methods to extract signal features and identify bearing faults. The superiority of the TEO-SFOEE-WMMF-MSCRBN fault diagnosis approach is verified on the CWRU dataset. In the future, we will refine the methodology proposed in this paper to address the problem of bearing fault diagnosis under variable operating conditions.
Footnotes
Acknowledgments
This work has been supported by National Natural Science Foundation of China (No. 52175379), Liaoning Provincial Science and Technology Department of China (No. 2022JH2/101300268).
Conflict of interest
The authors declare that they have no conflict of interest.
