Abstract
Domain adaptation (DA) technology has the ability to solve fault diagnosis (FD) problems under variable operating conditions. However, DA technology faces two issues: (1) in general, vibration signals inevitably contain noise, which makes it difficult to extract discriminant features.(2) there are unknown fault types in target domain. These issues will lead to poor diagnostic performance. To solve above issues, a new cross-domain open-set transfer FD method called feature improvement adversarial network (FIAN) is proposed in this article. Specifically, to alleviate noise interference, a feature improvement module (FIM) is proposed and embedded into the backbone convolutional neural network to form new feature extractor. FIM uses soft threshold function to enhance important information and suppresses redundant information. Furthermore,open-set DA by back-propagation (OSBP) is introduced into FIAN. OSBP can predict the probability that a target domain sample belongs to an unknown category, so that it can effectively identify unknown and known category samples. Experimental results demonstrated its effectiveness and superiority in two bearing datasets.
Keywords
Introduction
Rolling bearing is one of the most basic components of rotating machinery. They often work in harsh environments and complex working conditions, whose degradation and failure are often inevitable [1-3]. Rolling bearings fault diagnosis (FD) can not only reduce the maintenance cost of mechanical equipment, but also help to reduce the occurrence of major accidents. Therefore, rolling bearings FD has always been one of the most interesting issues for researchers.
In recent years, various FD methods emerge in endlessly. The traditional FD method based on signal processing has been widely used, but it has no advantages in feature learning and autonomous decision-making. The FD method based on machine learning(ML) alleviates the above problems to a certain extent such as support vector machine [4], artificial neural network [5, 6], and their variants [7]. ML-based methods are limited in their diagnostic effectiveness due to the need for expert experience and simple structure [8]. To solve the above problems, deep learning(DL)-based diagnostic methods [9] are considered to be effective means, and it has also achieved some satisfactory progress in the field of bearing FD. Compared with shallow methods, DL-based diagnostic methods have excellent feature extraction capabilities and the ability to process large data. At the same time, it is an end-to-end method that reduces the dependence on manual labor and expert knowledge. FD method based on convolutional neural networks (CNN), autoencoders, transformers, graph neural networks and their variants are among the highlights. For example, du et al. [10] proposed a 1D-2D joint CNN for rotating machinery FD, which combines the adaptive ability of 1DCNN with the feature learning ability of 2DCNN. Li et al. [11] used two cost functions based on the acoustic emission network structure to reconstruct the input data, maintaining its local and global geometry, and displaying mechanical health information. Ding et al. [12] proposed a new end-to-end FD framework, which uses the time–frequency transformer to obtain effective information from the time-frequency map. Wang et al. [13] designed a new few-shot learning method based on graph neural network for the mechanical FD problem.
The DL model can achieve ideal diagnostic performance when the training samples and testing samples have the same distribution. However, this assumption is usually difficult to achieve in real-world scenarios due to changes in working conditions, environmental disturbances, and equipment degradation. The diagnostic performance of some models deteriorates significantly when the training and testing sample distributions are misaligned [14, 15]. There is a difference between different domains, this difference is called domain shift [16]. Re-collecting and labeling data in the FD field is very time-consuming and even impossible in some cases. It is necessary to develop more effective methods that can exploit labeled samples from the source domain (SD) and reuse them for unlabeled samples in the target domain (TD).
Domain adaptation (DA) is a branch of transfer learning [16] that can explore the relationship between different domains and learn invariant features across different domains. DA alleviates the negative impact of domain shift and improves the performance of TD classifier when the target sample is unlabeled. DA methods can be roughly divided into two categories: statistical distance-based and adversarial-based methods. The method based on statistical distance adds discrepancy metrics to the objective function. Liu et al. [17] proposed deep feature adaptation networks, which utilize gaussian priors to build a shared latent space that can facilitate distributional alignment. Wang et al. [18] developed a FD model based on pseudo-label learning and subdomain adaptation to further improve the bearing FD accuracy under different working conditions or machines. Meng et al. [19] designed a rolling bearing FD method that uses local central moment discrepancy. Adversarial-based methods utilize adversarial training methods to extract domain-invariant features, where the domain discriminator and feature extractor compete with each other. Jiao et al. [20] used a residual network to obtain features, which improves the final diagnostic performance. Wu et al. [21] proposed a network based on gaussian distribution and dual classification domain adversarial for solving bearing FD. Wu et al. [22] proposed a domain-adversarial rolling bearing FD method based on attention mechanism. Kuang et al. [23] proposed a self-supervised learning based dual classification adversarial transfer network for rotating machinery FD.
Although the above approach achieves some important results, two issues remain to be considered. Firstly, vibration signals usually contain noise, which brings great trouble to the extraction of discriminant features and affects the final diagnostic performance. Secondly, unknown types of faults may occur in TD, resulting in negative impacts. This is because of the high economic and labor costs of collecting data and the general difficulty in obtaining various types of labeled training data. This phenomenon is called the open-set DA issue. Figure 1 shows the difference between closed-set DA and open-set DA. Most of the existing methods do not consider these two issues at the same time, which urgently needs to be solved.

Differences between (a) closed-set DA and (b) open-set DA.
Motivated by the above issues, a new cross-domain open-set transfer FD method called feature improvement adversarial network(FIAN) is designed in this paper for cross-domain open-set FD issue under noise condition. To alleviates the negative impact of noise, feature improvement module (FIM) is designed and embedded into the CNN backbone to form new feature extractor. FIM uses convolution with different probabilities to obtain more discriminative features, and uses soft threshold operations to enhance useful features and suppress useless features. In order to be able to identify samples of unknown categories in the target domain, open-set DA by back-propagation (OSBP) is introduced into FIAN. OSBP uses the domain classifier to calculate the probability that the TD sample belongs to an unknown category and identifies the unknown category sample. Specifically, the main contributions of this paper are as follows:
(1) FIM is designed to enhance valuable information and suppress redundant information, which alleviates the negative impact of noise.
(2) OSBP is introduced into FIAN, which can identify unknown fault types and improves the diagnostic performance of open-set DA issue.
(3) FIAN demonstrates its performance and advantages by studying the transfer FD task of two bearing datasets.
The rest of the paper is organized as follows. Section 2 shows the preliminaries of FIAN. Section 3 shows details of FIAN. Section 4 presents the results of the FIAN on two bearing datasets. Section 5 describes the conclusion.
Problem description
O = {χ, P (χ)} represents a domain, where χ is the feature space and P (χ) represents the marginal probability distribution. SD data is denoted by
Domain adversarial neural network
To solve the DA problem, domain adversarial networks(DAN) are proposed [24]. Domain-invariant features are learned by an adversarial learning strategy. The task of the feature extractor is to try its best to obtain high-level representations from SD and TD to fool the domain discriminator. The role of adversarial learning is to reduce the inconsistency between SD and TD, and to obtain domain-invariant features.
In the cross-domain FD problem, DAN consists of three components, namely feature extractorG f , health state classifier G y and discriminatorG d , and their parameters are denoted by θ f , θ y and θ d respectively. G f strives to learn to maximize the loss of L d in the domain discriminator and thus deceive. Therefore, θ d strives to learn to minimize L d so that G d is not deceived by G f . In addition, θ y is learned by minimizing the loss of classifier L y .
This section shows our method in three componens, feature extractor named FICN, the composition of OSBP and optimization algorithm. The overall overview of FIAN is shown in the Fig. 2.

Structure diagram of the FIAN in training, testing.
FICN is proposed for feature extractors, whcih uses CNN as the backbone and the newly designed FIM is embedded in it. Figure 3 shows the structure of the feature extractor.

Network structure of feature extractor.
FIM focuses on enhancing important features and suppressing useless features in mechanical vibration signals. The specific structure of FIM is shown in the Fig. 4, which consists of four parallel convolutional branches with different receptive field and soft thresholding operation. The input feature map first passes through Bconv layer with a kernel size of 1×1, setting the number of channels to 16. Bconv includes convolutional operations, Batch Normalization(BN) operation, and rectified linear unit (ReLU)function. Each branch has a Bconv layer with an dilation ratio of (2i-1) (when i>2) and a convolutional kernel size of 3×1, which are used to obtain the contextual features. Next, the output features of the four branches are concatenated and fed into 1×1 Bconv layer. Finally, the input and output features are combined through skip connections. The above calculation process can be represented by the following equation:

Network structure of FIM.
In addition, useful features can be enhanced after the above operations, but useless features may also be enlarged. To solve this problem, inspired by [25], soft threshold function and squeeze-and-excitation operation [26] are applied to filter out useless components from the extracted features. The principle of soft denoising is to process nearly zero features as noise information in the feature map. The soft threshold function is represented as:
Then, multiply the scaling parameters by the output of the extrusion operation to obtain the threshold v:
Finally, the function of skip connections is to add inputs, which can avoid performance degradation and facilitate FIM training.
As shown in the Fig. 2, the domain classifier G
C
1
is used to replace G
y
and G
d
, and its input is G
f
, which means that the output is E+1-dimensional probability in OSBP. E represents the number of known types, and the probability of an unknown class is represented by the (E+1) th value. The function of the softmax activation function is to obtain the E+1-dimensional class probability vector, where the meaning of
The objective function includes the following two: 1)The cross entropy (CE) of SD data needs to be minimized by training
2)The binary CE loss of G
C
1
needs to be maximized through adversarial training to distinguish known and unknown TD data.
0.5 is usually the value of the threshold δ, which is empirically derived.
The standard backpropagation algorithm based on Adam with momentum [27] trains the proposed method. The parameter of the feature extractor is represented by θ
f
, the parameter of the label predictor is represented by θ
g
, the parameter of the domain discriminator is represented by θ
d
. θ
f
,θ
g
,θ
d
’s parameters are updated as follows:
Algorithm 1 shows the pseudocode of FIAN.
Input: SD data D s and TD data D t
Initialize network parameters
While not max training epochs
Forward-propagation:
1) Feed the SD and TD data into feature extractor G f to extract common features:
2) Feed the common features of the SD and TD data into domain classifier G C 1 ;
3) For G C 1 , obtain the CE loss for the SD with (2);
4) Obtain the loss of G C 1 with (6).
Back-propagation:
Adam is useded to optimize
the parameters of FIAN by (10) and (11)
Output: FIAN model
Dataset description
CWRU bearing dataset
The Case Western Reserve University(CWRU) [28] bearing dataset was selected for experimentation. The test bench of CWRU is shown in the Fig. 5. The vibration data used in this article was collected at the motor drive end under four loads (0, 1, 2, and 3 hp), and it contains four health state data. The four health states include (1) normal state (N), (2) outer ring fault (OF), (3) inner ring fault (IF), and (4) ball fault (BF). These three faults are artificially manufactured, with diameters of 7, 14, and 21 mils, respectively. Therefore, ten bearing conditions were considered for two speed domains. The detailed information of the CWRU dataset is shown in Table 1. The detailed information of cross-domain diagnostic tasks on the CWRU dataset is shown in the Table 2.

Experimental platform of CWRU.
Information of the FD tasks in this CWRU dataset
Information of the FD tasks in this CWRU dataset
The Shandong University (SDU) dataset is a self built dataset that we build our own experimental platform and collect data. Whole test rig is designed to acquire vibration signals, as shown in Fig. 6. The test rig includes a three-phase asynchronous motor, a motor control system, a support shaft, a three-support rolling bearing, and a radial force loading system. The motor control system is used to control the speed of the motor. The test rig is equipped with three rolling bearings on a shaft, one of which is used to simulate the real fault. N205 and N205U represent non-detachable inner rings and detachable inner rings, respectively. The faults in various parts of the bearing are all rectangular grooves. N represents health status. O, I and B refer to the location of bearing faults, specifically referring to outer, inner, and rolling bodies.

Experimental platform of SDU.
In this experiment, one normal state and three fault types data are collected. Each sample consists of 1024 signal sampling points. The details of the SDU dataset are indicated in Table 3. The detailed information of cross-domain diagnostic tasks on the SDU dataset is shown in the Table 4.
Description of the SDU dataset
Information of the FD tasks in this SDU dataset
(1) CNN: The baseline method does not use the DA technology, and the network structure of the feature extractor only uses the backbone structure. The feature extractor and classifier are trained on SD data and tested directly on TD data.
(2) Multikernel maximummean discrepancies (MK-MMD): On the basis of CNN use MK-MMD loss to learn domain invariant features, thereby minimizing the data distribution differences between SD and TD
(3) Correlation alignment network (CORAL) [29]: CORAL has the same network structure as CNN, but differently, the second- order moment matching (correlation alignment) is used for minimizing the distribution discrepancy instead of MK-MMD.
(4) OSBP [30]: On the basis of CNN, which is a classic method for solving open-set FD problems.
(5) Adversarial network with multiple auxiliary classifier(ANMAC) [31]: ANMAC assign recognizable weights to known and unknown target data.
Implementation details
The epoch and batch sizes for training are 300,64, respectively. The input size is 1024. Adam algorithm selected for network optimization. Learning rate is 0.001, whose schedule is step. FIAN is programmed in Python 3.9 with Pytorch 1.12 and runs on the Win10 operating system with Intel Xeon (R) e5-2650 V4 CPU and NVIDIA Tesla V100 GPU. CWRU and SDU adopt the same preprocessing strategy. By sliding a fixed length window of 1024 on the vibration signal, the sample is obtained, and the current sample does not overlap with the previous sample. Each health condition includes 100 samples. In order to eliminate the effects of randomness of experiments, we performed five experiments and took their average. The noise level is expressed by the signal-to-noise ratio (SNR), which is defined as:
To test the performance of each method under noisy conditions, 8db of noise was added to the signal.

FD result in the CWRU bearing dataset under the first circumstance.
In order to further evaluate the new fault detection performance of the proposed method, in addition to classification accuracy, the H-score [32] evaluation index is also introduced, and its formula is as follows:
To demonstrate the performance of the method, 12 different cross-domain diagnostic tasks are designed on the basis of the CWRU dataset for the first case and the second case. Table 5 and Fig. 7 show the classification accuracy under the first circumstance. Table 6 and Fig. 8 show the classification accuracy under the second circumstance. It can be seen from Table 4 that the average classification accuracy of FIAN for 12 cross-domain diagnostic tasks can reach 97.55%, which is better than the four comparison methods. The CNN without DA does not consider the domain differences between SD and TD, and performs unsatisfactorily in various cross-domain FD tasks. Although MK-MMD and CORAL consider domain differences, they require the assumption of closed sets. If there are outlier classes in TD, they will lead to negative transfer due to their emphasis on marginal distributions, reducing the diagnostic accuracy. OSBP is an effective method to solve the open-set cross-domain FD problem, and it has a very obvious improvement compared with MK-MMD and CORAL. The average diagnostic accuracy of ANMAC in both cases is higher than that of OSBP because it further calculates the weight of different samples of TD. The average diagnosis accuracy of the proposed method is the highest among all tasks, because the proposed method considers both open set and noise problems. FIM in the proposed method suppresses useless information and highlights valuable information.
Classification accuracy for the CWRU bearing dataset under the first circumstance
Classification accuracy for the CWRU bearing dataset under the first circumstance
Classification accuracy for the CWRU bearing dataset under the second circumstance

FD result in the CWRU bearing dataset under the second circumstance.

FD result in the SDU bearing dataset under the first circumstance.
Classification accuracy for the SDU bearing dataset under the first circumstance
Classification accuracy for the SDU bearing dataset ubder the second circumstance

FD result in the SDU bearing dataset under the second circumstance.
To further highlight the superiority of the proposed method, we compared the H-score of the proposed method, OSBP, ANMAC. Figures 11 and 12 show the H -score of both in the first case of the CWRU dataset and the second case of the SDU dataset, respectively. It can be seen that the H-score of the proposed method is higher than that of OSBP and ANMAC in various tasks on the CWRU and SDU datasets, indicating that the proposed method has advantages in identifying known and unknown types of faults can extract discriminative features from signals containing noise.

H-score in the CWRU bearing dataset under the first circumstance.

H-score in the SDU bearing dataset under the second circumstance.
In order to more intuitively display the FD performance of each method, t-distributed stochastic neighbor embedding [33] visualization is used for research. We selected the 2→0 transfer diagnosis task of under the CWRU bearing dataset first circumstance and the 1750→2000 transfer diagnosis task of under the SDU bearing dataset second circumstance. The high-dimensional features learned by each method are shown in the Figs. 13 and 14. FIAN can achieve satisfactory alignment of features across domains. FIAN shows satisfactory classification for all known and unknown classes. The same class has the same projection in SD and TD. FIM improves the ability of discriminative feature extraction and reduces the negative impact of noise in the signal. OSBP enables the proposed method to identify unknown classes in TD. Unknown classes in TD can be properly segmented in different regions. CNN, MK-MMD and CORAL are still limited in performance and cannot effectively distinguish known and unknown classes. The OSBP and ANMAC method has a better clustering effect than other methods. However, its diagnostic performance is weaker than the proposed method due to the negative impact of noise.

Feature visualization for transfer diagnosis task in CWRU bearing dataset.(a) CNN. (b) MK-MMD. (c) CORAL. (d) OSBP.(e) ANMAC (f) Proposed.

Feature visualization for transfer diagnosis task in SDU bearing dataset.(a) CNN. (b) MK-MMD. (c) CORAL. (d) OSBP.(e) ANMAC (f) Proposed.
In this work, a new cross-domain open-set transfer FD method named FIAN is proposed to be applied to cross-domain open-set bearing FD tasks under noise condition. The proposed method focuses on more practical scenarios, where the collected signal contains noise and TD data contains fault categories that have never appeared in SD. In FIAN, FIM is designed to enhance valuable information and suppress redundant information, which alleviates the negative impact of noise. OSBP can identify unknown class samples existing in TD. The experimental results of two bearing datasets demonstrate the effectiveness and superiority of FIAN in solving open-set FD problems under noise conditions.
Next, we continue to study the cross-domain openset FD problem of noise and class imbalance, which is more challenging and complex
Footnotes
Acknowledgement
This work was supported by the National Key Research and Development Project [grant number 2023YFB3709601], the National Natural Science Foundation of China [grant number 62373215, 62373219, 62073193], the Key Research and Development Plan of Shandong Province [grant number 2021CXGC010204, 2022CXGC020902], the Fundamental Research Funds of Shandong University [grant number 2021JCG008].
