Abstract
This study presents a fault diagnosis method for rolling bearing based on multi-scale deep subdomain adaptation network (MSDSAN). The proposed MSDSAN, as improvement of deep subdomain adaptation network (DSAN), is an unsupervised transfer learning method. MSDSAN reduces the subdomain distribution discrepancy between domains rather than marginal distribution discrepancy, and so better domain invariant fault features are derived to avoid misalignment between domains. Aiming at avoiding fault information loss by fixed receptive fields feature extraction, selective kernel convolution module is introduced into feature extraction of MSDSAN, by which multiple receptive fields are applied to ensure an optimal receptive field for each working condition. Moreover, contribution rates are adaptively assigned to all receptive fields, and the disturbing information extracted by inappropriate receptive fields is further eliminated. As a result, more comprehensive and effective fault information is derived for bearing fault diagnosis. Fault diagnosis experiment of bearings is performed to verify the superiority of the proposed method, and the experimental results demonstrate that MSDSAN achieves better transfer effects and higher accuracy than SOTA methods under varying working conditions.
Introduction
As one of the key components of the transmission system in mechanical equipment, the health status of the rolling element bearings is very important to the operation stability and safety of mechanical equipment [1, 2]. However, rolling element bearings often suffer from failures due to the harsh working environment and long running time, resulting in the shutdown of mechanical equipment, economic losses and even significant accidents. According to statistics, more than half of the mechanical equipment failures are related to bearings [3, 4]. Therefore, condition monitoring and fault diagnosis of the bearings is vital to ensure the normal operation of mechanical equipment and avoid catastrophic accidents.
Compared with traditional machine learning methods, deep learning has powerful nonlinear feature extraction capabilities, which can adaptively extract feature information from original data to achieve pattern recognition [5–7]. In recent years, several deep learning methods, such as convolutional neural networks (CNN) [8–10], deep belief networks (DBN) [11, 12], and deep autoencoders [13, 14], have already been applied to the fault diagnosis of rolling element bearings. However, deep learning is based on the assumption that training data and testing data have the same distribution [15]. In terms of fault diagnosis, deep learning is used to establish the fault diagnosis model, requiring that fault samples must be collected under the same working condition (load and rotating speed). Unfortunately, the working conditions of mechanical equipment often vary in large range in practical applications, resulting in great distribution discrepancy of fault samples from different working conditions. Meanwhile, it is almost impossible to collect enough labeled fault samples to train a fault diagnosis model for each specific working condition, which greatly limits the generalization capacity of deep learning-based fault diagnosis.
Transfer learning is a method to mine the similarity between data with different distributions, which can transfer knowledge from source domain to target domain [16]. Therefore, transfer learning is an ideal solution of rolling element bearing fault diagnosis with varying working conditions [17]. Furthermore, it is very hard and time-consuming to label the fault samples for all working conditions, and so unsupervised transfer learning fault diagnosis is an issue with more important practical application value to address. Aiming at this issue, several researches have been carried out, for example Wen et al. [18] combined a three-layer sparse auto-encoder and maximum mean discrepancy (MMD) to diagnosis motor bearing fault under varying working conditions; Li et al. [19] applied deep generative neural networks to solve the domain shift phenomenon in transfer learning fault diagnosis of bearings; Udmale et al. [20] developed a CNN-based transfer learning approach for bearing fault diagnosis with limited fault data, et al. However, the above methods only aim to reduce the marginal distribution discrepancy between source domain and target domain without considering subdomain distribution similarity, and as a result some fault categories of target domain may be totally misaligned.
Deep subdomain adaptation network (DSAN) is a new unsupervised transfer learning method, and the key innovation of DSAN is the local maximum mean difference (LMMD) [21]. LMMD aims to capture the fine-grained information of each category by aligning the relevant subdomain distribution, resulting in extracting more domain invariant feature information to increase distribution similarity between source domain and target domain. Therefore, DSAN has a better cross-domain transfer ability to avoid misalignment of target domain. However, the working conditions, such as rotating speed, of mechanical equipment vary greatly in practical applications. As a result, the optimal receptive fields of feature extraction under different working conditions are not the same. In other words, a fixed receptive field cannot ensure that effective fault information can be well extracted under all working conditions, namely a large amount of useless information may be extracted or the fault information is not fully extracted with an inappropriate feature extraction receptive field.
Aiming at this problem, this paper proposes an improvement of DSAN denoted as multi-scale deep subdomain adaptation network (MSDSAN) to realize the fault diagnosis of rolling element bearings. In MSDSAN, the selective kernel convolution module is introduced to extract multi-scale fault information, namely multiple receptive fields are applied for fault feature extraction [22]. Therefore, a relative optimal receptive field is guaranteed for specific working condition and more effective fault information can be extracted for bearing fault diagnosis under different working conditions. Moreover, contribution rates are adaptively assigned to all receptive fields to eliminate the disturbing information extracted by inappropriate receptive fields. In a summary, on one hand the proposed MSDSAN inherits the ability of capturing fine-grained information of each fault category, and on the other hand MSDSAN applied multiple receptive fields to ensure quality of fault information extraction under different working conditions, resulting in a further improved fault diagnosis accuracy.
The structure of this paper is organized as follows. Section II introduces the basic theoretical background of the proposed method. In Section III, the proposed multi-scale deep subdomain adaptation network is introduced in detail. In Section IV, experimental results and analysis are displayed. Conclusion is given in Section V.
Theoretical backgrounds of the proposed method
Problem definition
This paper mainly focuses on unsupervised transfer learning fault diagnosis of rolling element bearings, in which the fault samples in target domain are unlabeled [23]. In this scenario, source domain
Deep subdomain adaptation network (DSAN)
DSAN aims to solve the problem that merely reducing marginal distribution discrepancy may lead to misalignment between source domain and target domain, and LMMD is the key innovation of DSAN. LMMD can capture the fine-grained information of each category by aligning the relevant subdomain distribution to extract more domain invariant feature information. In the next section, LMMD will be described in detail.
Local maximum mean difference (LMMD)
The drawback of MMD is illustrated as shown in Fig. 1. MMD mainly aims to reduce the marginal distribution discrepancy between source domain and target domain, namely global domain adaptation, ignoring the subdomain distribution of different categories. As a result, the fault categories will be misaligned between source domain and target domain, leading to poor fault diagnosis performance.

The distribution discrepancy of global domain adaptation.
In order to deal with the drawback of MMD, LMMD takes the subdomain distribution similarity between source domain and target domain into consideration to evaluate the distribution discrepancy more comprehensively. Specifically, LMMD aims to reduce the distribution discrepancy of each specific category in source domain and target domain, so as to extract more domain invariant feature information. The calculation of LMMD is shown in Equation (1):
Where x s and x t are fault examples of D s and D t respectively, and p(c), q(c) are respectively the distributions of the c-th fault category in source domain and target domain. Different from MMD that reduces the discrepancy of marginal distributions, Equation (1) can measure the discrepancy of subdomain distributions. By minimizing Equation (1), the fine-grained information of each category can be captured, and meanwhile relevant subdomain distribution is aligned to derive more domain invariant feature information.
Moreover, given the weight of the i-th fault sample x
i
of the c-th fault category as
Where
Where y ic is the c-th entry of fault label y i . For source domain, y i is the true label of x i , and y i is a one-hot label vector. As without labeled data, the probability distribution is applied to calculate Equation (3) for target domain, and probability distribution is the output of transfer learning model under training. Using probability prediction (soft prediction) might alleviate the negative impact caused by the wrong predicted labels. By calculating the weighted class center of each class, the source domain and target domain are aligned by reducing the distribution discrepancy of each class. Hence, DSAN has better cross-domain transfer learning ability.
In practical scenario, the working condition of rolling element bearings always varies in large range, and a fixed receptive field cannot ensure that effective fault information can be well extracted under all working conditions. Low quality fault features will seriously prevent further improvement of fault diagnosis accuracy. Aiming at this problem, an improvement of DSAN denoted as MSDSAN is proposed by introducing the selective kernel convolution module into DSAN as the feature extractor [22]. By which, multiple receptive fields are applied for fault feature extraction to guarantee a relative optimal receptive field for each specific working condition, and as a result more effective fault information can be extracted for bearing fault diagnosis under different working conditions. In this section, architecture of selective kernel convolution module will be firstly described in detail, and then detailed structure of the proposed MSDSAN will be introduced.
Selective kernel convolution module
The main idea of selective kernel convolution module (SK Module) is applying multiple receptive fields for feature extraction, and contribution rate of each receptive field is also adaptively assigned to eliminate the disturbing information. Therefore, selective kernel convolution module can extract more feature information and avoid the information loss which is common in feature extraction with fixed receptive field. The selective kernel convolution module contains three operations: Split operation, Fuse operation and Select operation, and the specific structure of selective kernel convolution module is shown in Fig. 2.

The structure of selective kernel convolution module.
Split operation: Given a feature X ∈ RH′×W′×C′, two transformations
Fuse operation: The goal of Fuse operation is to obtain contribution rate of all receptive fields. Specifically, a gate mechanism is designed to control the flow of information into different branches of the next convolution layer. First of all, a simple pixel-level addition fusion of features of all branches is carried out before inputted into gate mechanism.
Then the global average pooling operation (
Next, a compact feature z ∈ Rd×1 is generated by a simple fully connected layer
Select operation: The contribution rates of different receptive fields are derived by cross-channel soft attention, which is guided by the compact feature descriptor z. Specifically, the softmax operator is applied to calculate the contribution rates. Contribution rates is adaptively assigned to receptive fields to eliminate the disturbing information, by which the final feature map V is derived. The specific calculation is as follows:
Where a c is the c-th element of a, and a is the probability value of softmax, likewise b c and b. And V = [V1, V2, ⋯ , V C ], V c ∈ RH×W.
The Framework of the proposed MSDSAN is shown in Fig. 3. The flow chart of fault diagnosis based on MSDSAN is shown in Fig. 4. MSDSAN mainly consists of three part: feature extractor, domain adaptive part and classifier. Firstly, the original vibration signal is transformed into a time-frequency signal as the input of the network through wavelet transform; Secondly, feature extractor with multiple receptive fields is used to extract the multi-scale feature information, and following that the predicted label of the target domain is obtained through the classifier; Thirdly, the local maximum mean difference between source domain and target domain is calculated by Equation (2) using the real label of the source domain and the predicted label of the target domain, so as to reduce the subdomain distribution discrepancy between source domain and target domain and extract the domain invariant fault features; Finally, the above procedures are repeated till the optimal parameters are obtained, and the trained model can be applied for fault diagnosis of rolling element bearings.

The framework of MSDSAN.

The flow chart of fault diagnosis based on MSDSAN.
By realigning the distribution of subdomains in both source domain and target domain, MSDSAN captures the fine-grained information of each category. Applied Equation (2) directly as the domain adaptive loss on a specific l-th layer, the final optimization objectives are as follows:
Where J (· , ·) is the cross-entropy loss function (classification loss) and
Description of experimental data
In order to verify the effectiveness of the proposed fault diagnosis method, the transmission diagnosis simulator (DDS) is taken as the experimental object. DDS is mainly composed of driving motor, a planetary gearbox, a two-stage fixed axis gearbox and magnetic brake [26]. To obtain the original vibration signal, the accelerometer (model: PCB352C03) is applied and installed in the vertical direction of middle shift of the fixed axis gearbox, as shown in Fig. 5.

The experimental site of DDS.
In this paper, four states of rolling bearings are simulated, including normal state, inner-race fault, outer-race fault and rolling element fault, and Table 1 shows the four states of rolling bearings. In order to simulate the working conditions of bearing in the actual application, experiments under driving speed of 20 Hz, 30 Hz, 40 Hz are performed. For convenience, the fault datasets collected under three working conditions are denoted as dataset A, B, and C respectively, as shown in Table 2.
Health states of the rolling bearing
Description of three working conditions
During the experiment, the vibration signals are collected by a NI 9234 data acquisition card, and the sampling frequency is set to 12800 Hz, and Fig. 6 shows the time-domain waveforms of the collected vibration signals of the working condition under driving speed of 20 HZ. Then, 800 fault samples are collected for each fault category, and each fault sample contains 1024 signal points. As there are 4 kinds of bearing health states, 3200 fault samples are collected under each working condition, and the vibration signals are transformed into time-frequency signals by wavelet transform before inputted into MSDSAN. Selecting an appropriate wavelet basis function is necessary for feature extraction, and Daubechies (db1) is selected as the wavelet basis function in this paper through several experiments. For detail selection of wavelet basis function, please refer to [27–29]. During experiment, the fault samples of each fault category is divided into training set and testing set, and 400 samples are randomly selected as training set while the rest samples are used as testing set.

Vibration signals of four healthy states: (a) Outer race fault; (b) Inner race fault; (c) Rolling element fault; (d) Normal state.
In order to ensure the reliability of the experimental results, fault diagnosis experiment is repeated for 5 times [30, 31]. In this experiment, 6 kind of transfer tasks are performed, including A⟶C, B⟶C, A⟶B, C⟶B, B⟶A and C⟶A. To validate the advantage of the proposed method, the fault diagnosis performance of MSDSAN is compared with several SOTA methods.
First of all, the proposed method is compared with the normal CNN. It can be seen from Table 3 and Fig. 7 that the average fault diagnosis accuracy of CNN is only 59.1%. CNN without transfer learning strategy requires that the training data and the testing data gets the same distribution. However, the distribution of fault samples under different working conditions gets large discrepancy. As a result, the fault diagnosis model trained under one working condition is not applicable to the other working conditions.
Results of fault diagnosis under cross-domain tasks
Results of fault diagnosis under cross-domain tasks

Cross-domain fault diagnosis results based on different methods.
As shown in Table 3 and Fig. 7, the fault diagnosis accuracies of deep adaptation network (DAN) [32] and domain adversarial neural network (DANN) [33] are much better than CNN, which are 72.3% and 73.0% respectively. That is because both DAN and DANN belong to transfer learning methods, among which DAN applies MMD and DANN applies domain discriminator to decrease the distribution discrepancy between source domain and target domain. However, both DAN and DANN only reduce the marginal distribution discrepancy without considering subdomain distribution similarity, and as a result some fault categories of target domain may be totally misaligned.
Moreover, the proposed method is also compared with conditional deep adaptation network (CDAN) [4], which mainly focuses on reduce the condition distribution similarity between source domain and target domain. CDAN applied conditional maximum mean difference (CMMD) as the distribution discrepancy evaluation method, which uses hard predicted pseudo-label information to reduce subdomain distribution discrepancy. However, misclassification of pseudo-label information will sharply degrade the performance or even enlarge the distribution discrepancy of between source domain and target domain. It can be seen from Table 3 and Fig. 7 that the fault diagnosis accuracy of CDAN is only 46.6%.
Compared with the above method, DSAN can capture the fine-grained information of each category by aligning the distribution of subdomains, so as to extract more domain invariant feature information to increase the distribution similarity between the source domain and the target domain. However, a fixed receptive field applied in DSAN is not optimal for all working condition, and that is to say the optimal receptive field of feature extraction is always not the same under different working conditions. Therefore, loss of fault information is unavoidable if a fixed receptive field is applied for all working condition. The fault diagnosis accuracy of DSAN is 82.0%, which is lower than the proposed method. Aiming at the shortage of DSAN, the proposed MSDSAN introduces the selection kernel convolution module to extract multi-scale fault information. That is to say multiple receptive fields are used to extract fault features, which can further improve the accuracy of fault diagnosis. As shown in Table 3 and Fig. 7, the fault diagnosis accuracy of MSDSAN reaches 90%, which is far higher than the other SOTA methods.
For more detailed comparison, the confusion matrixes of the diagnosis results of CNN, DAN, DANN, CDAN, DSAN and MSDSAN are listed in Fig. 8, and the diagnosis results are obtained on the cross-domain task A⟶B. It can be seen from Fig. 8(a) that the fault diagnosis of CNN has noticeable classification errors, and the errors mainly concentrate in fault class-0 and fault class-2. The diagnosis result of CNN indicates that there is a significant distribution discrepancy between source domain and target domain, and CNN trained in source domain is not applicable for fault diagnosis in target domain. As shown in Fig. 8(b), the fault diagnosis results of DAN are better than that of CNN. The classification errors of fault class-0 and fault class-2 have been reduced, and the accurate classification samples of the class-2 reach 313, while that of CNN is only 133. However, there are still large amounts of misclassified fault samples, especially for fault class-0. That is because the MMD applied by DAN can only reduce the overall distance between source domain and target domain without considering subdomain distribution similarity, and as a result it may still lead to misclassification of some fault categories in target domain.

The diagnostic accuracy confusion matrixes of different methods under A⟶B task.
Fault diagnosis accuracy of DANN is similar to that of DAN, and the confusion matrix of DANN is shown in Fig. 8(c). DANN applied adversarial learning to extract the domain-invariant feature, but DANN still does not consider the subdomain distribution. CDAN applied conditional maximum mean difference (CMMD) as the distribution discrepancy evaluation method, which introduces hard predicted pseudo-label information to reduce subdomain distribution discrepancy. However, pseudo-label information is always wrong, which degrades the performance or even enlarge the distribution discrepancy of between source domain and target domain. As shown in Fig. 8(d), the fault diagnosis performance of CDAN is even worse than CNN.
As shown in Fig. 8(e), the fault diagnosis accuracy of DSAN is better than DAN, DANN and CDAN. That is because, DSAN reduces the subdomain distribution discrepancy between source domain and target domain by the probability prediction of fault samples in target domain. Probability prediction is actually the soft prediction pseudo-label information, and can alleviate the negative impact of wrong pseudo-label information. However, a fixed receptive field applied in DSAN cannot ensure that effective fault information can be well extracted under all working conditions, which restrict the further improvement fault diagnosis accuracy. It can be seen from Fig. 8(f) that the proposed MSDSAN achieves the best fault diagnosis accuracy, and the misclassified fault samples of all fault categories are far less than the other methods. That is because multiple receptive fields are applied for fault feature extraction, and the contribution rate of each receptive field is adaptively assigned. Therefore, more effective fault information can be extracted for bearing fault diagnosis under different working conditions. The fault diagnosis accuracy of MSDSAN of task A⟶B is as high as 87.8%, which is consistent with that shown in Table 3.
Fault features are visualized using t-SNE to compare feature extraction capacities of different methods [34]. In this section, we still choose fault diagnosis task A⟶B for analyzing, and the two-dimensional fault features are shown in Fig. 9. According to feature visualization results, it is clear that the transfer effect of MSDSAN is the best, and fault samples belonging to the same category of source domain and target domain are clustered together. The fault clustering effects of CNN, DAN, DANN, CDAN and DSAN are not satisfactory, and the same fault category of source domain and target domain is apart from each other or even misaligned. The feature visualization results further prove that MSDSAN can well deal with the problem of distribution discrepancy in bearing fault diagnosis under varying working conditions, resulting in better fault diagnosis performance.

Feature Visualization under A⟶B task: (a) CNN, (b) DAN, (c) DANN, (d) CDAN, (e) DSAN, (f) MSDSAN.
This study proposed a multi-scale deep subdomain adaptation network (MSDSAN) for rolling element bearing fault diagnosis under varying working conditions. MSDSAN extracts fine-grained information of each fault category to reduce subdomain distribution discrepancy. Moreover, multiple receptive fields are applied by MSDSAN to ensure complete fault information extraction under different working conditions. Finally, bearing fault diagnosis experiment is performed, and the average fault diagnosis accuracies of CNN, DAN, DANN, CDAN, DSAN and MSDSAN are respectively 59.1%, 72.3%, 73.0%, 46.6%, 82.0% and 90.0%. The experimental results show that MSDSAN has well transfer effects and provides a new solution for rolling element bearing fault diagnosis under varying working conditions.
