Abstract
To address data distribution discrepancy across scenarios, deep transfer learning is used to help the target scenario complete the recognition task using similar scenario data. However, fault misrecognition or low diagnostic accuracy occurs due to the weak expression of the deep transfer model in cross-scenario application. The Convolutional Block Attention Module (CBAM) can independently learn the importance of each channel and space features, recalibrate the channel and space features, and improve image classification performance. This study introduces the CBAM module using the Residual Network (ResNet), and proposes a transfer learning model that combines the CBAM module with an improved ResNet, denoted as TL_CBAM_ResNet17. A miniature ResNet17 deep model is constructed based on the ResNet50 model. The location of the CBAM module embedded in the ResNet17 model is determined to strengthen model expression. For effective cross-scenario transfer and reduced data distribution discrepancy between source and target domains, a multi-kernel Maximum Mean Discrepancy (MK–MMD) layer is added in front of the classifier layer in the ResNet17 model to select data with common domain features. Considering a reciprocating compressor as the research object, cross-scenario datasets are produced by the vibration signals from the simulation test bench and simulation signals from the dynamic simulation model. Mutual transfer experiments are conducted using these datasets. The proposed method (TL_CBAM_ResNet17) demonstrates better classification performance than TCA, JDA, the TL_ResNet50 model, the TL_ResNet17 model, and the TL_ResNet17 model integrated with other attention mechanism module, and greatly improves the accuracy of fault diagnosis and generalization of the model in cross-scenario applications.
Introduction
A reciprocating compressor, as the heart of an oil and gas pipeline network, has advantages including high efficiency, a wide range of pressure and flow coverage, and few limitations in compression medium. Thus, it is widely used in gas injection exploitation, gas gathering treatment, pipeline transportation, and underground natural gas storage. Stable operation of a reciprocating compressor is of great economic significance in terms of production costs, production efficiency, and safe production. The working conditions of a reciprocating compressor are not constant; changes in complex working conditions produce monitoring data with large distribution discrepancies, resulting in weak model generalization. However, with cross-scenario diagnosis (information from one or more scenarios is used to enhance the performance of another scenario), data distribution discrepancy between scenarios is more obvious, and weak model generalization is more prominent. To solve this problem, transfer learning (TL) [1], which breaks the limitation of traditional machine learning that requires test samples and training samples to meet the same probability distribution, is an effective way to solve cross-scenarios problems by transferring knowledge from source domain to target domain. Through TL, we can obtain potential knowledge shared by the source and target domains, extracting domain invariant features to improve the target task performance [2].
In early TL studies, transfer component analysis (TCA) [3] and joint distribution adaptation (JDA) [4] methods were proposed. However, they relied mainly on experience to manually extract features, which is time-consuming and labor-intensive. With continuous development of deep learning (DL), a domain adaptation layer was added to the deep neural network. A DL model was established for completing adaptation of the source and target domains to automatically extract features and produce more similar source and target domain data distributions for better network transfer. Tzeng et al. [5] proposed the deep domain confusion (DDC) method, relying on an AlexNet network for adaptive learning and a maximum mean discrepancy (MMD) adaptation layer added to the final FC layer, solving the adaptive problem of deep network for the first time. Long et al. [6] proposed the deep adaptation network (DAN) method according to the DDC method. Three adaptive layers of multi-kernel MMD (MK–MMD) were added to the first three layers of the AlexNet network classifier, further improving the generalization of the deep transfer learning network for big data. Recently, with increasing industrial demand for reliable fault diagnosis in different operating conditions, cross-scenario diagnosis tasks have become popular [7]. Commonly used transfer networks include AlexNet [8], VGGNet [9], GoogleNet [10], ResNet [11], and numerous variations of ResNet (such as DenseNet [12] and ResNeXt [13]). A ResNet network or a variation can be used to solve the problem of increased error with increased deep neural network depth [14]. DenseNet provides dense connection; ResNeXt introduces multi-branch inception, but has a higher network design difficulty and computational cost than a ResNet network.
Deep transfer learning can solve the problem of weak model generalization caused by inconsistent data distribution across scenarios, but exhibits weak model expression. It is unable to focus on important target features in the recognition task, resulting in misrecognition or low diagnosis accuracy. An attention mechanism focuses the neural network on only the important features of an image when extracting features, ignoring the unimportant features, and greatly improving model expression [15]. The convolutional block attention module (CBAM) [16] integrates a channel attention module and a spatial attention module to increase channel and spatial dimensions, and is more suitable for computer vision recognition tasks than single channel domain or single spatial domain attention modules. In reference [17], a new fault diagnosis model based on a residual network and an attention mechanism was proposed to address the problem of feature extraction and classifier training in bearing fault diagnosis. In reference [18], an improved multi-scale convolutional neural network integrated with a feature attention mechanism (IMS–FACNN) was proposed to address the instability of traditional CNN-based models. However, the above-mentioned research improved the attention mechanism or convolution model structure and combined them, focusing on a single dataset for verification without considering the data distribution discrepancy across scenarios. Thus, a ResNet network with a CBAM attention mechanism to improve model expression is used in this study. An MK–MMD measurement criterion is used to investigate the cross-scenario fault diagnosis of a reciprocating compressor.
Research has been conducted on cross-scenario transfer diagnosis. In reference [19], a multi-source subdomain adaptation transfer learning method was proposed to transfer diagnostic knowledge from multiple sources for cross-domain fault diagnosis. In reference [20], cross-machine fault diagnosis tasks were proposed using labeled data from related but not identical machines. In reference [21], motivated by transfer learning, a novel intelligent deep transfer network (DTN) method with multi-kernel dynamic distribution adaptation (MDDA) was presented to address cross-machine fault diagnosis. Previous cross-scenario research focused mainly on rotating equipment, and most of the datasets were public datasets or laboratory bench data. As there are no public datasets for reciprocating compressor equipment, and such equipment is not permitted to operate with faults on site, acquisition of fault data requires enormous labor and material resources. Thus, there is a scarcity of fault data with changing working conditions, further increasing the difficulty of reciprocating compressor unit fault diagnosis. To address the problem of insufficient cross-scenario fault data, dynamic simulation research and testbed simulation experiments are used for reference. A reciprocating compressor fault simulation test bench was established to conduct research for fault diagnosis of a small head bush of a connecting rod to address difficult monitoring and diagnosis [22]. A multi-body dynamics simulation method for a reciprocating compressor bearing fault state of a kinematic pair with clearance was studied to address insufficient data supply caused by difficulty in conducting multi-type fault tests [23]. A Bentley RCK-1 reciprocating compressor was used to research the crankshaft clearance fault in multiple working conditions [24]. The cross-scenario datasets in this study are from test bench vibration signals and simulation signals from a dynamic simulation model; normal and fault data in multiple working conditions were obtained through experiments and simulations.
The main purpose of this study is to solve the problem of data distribution discrepancy across scenarios between reciprocating compressors. The deep transfer learning method is used to help the target scenario complete the recognition task using similar scenario data.
The main contributions of this study can be summarized as follows: Improve the expression of a deep transfer learning model. Improve the ResNet50 model, and determine and analyze the integration location of CBAM modules on the improved residual network. Reduce the data distribution difference between source and target domains across scenarios. The MK–MMD measurement criterion is introduced into the deep model to reduce the difference between data distributions of source and target domains, and improve the effect of model learning across scenarios. Obtain the dataset from an simulation scenario. A rigid–flexible coupling dynamic simulation model of a reciprocating compressor is established to simulate normal and wear states, and vibration signals are simulated in different working conditions.
The remainder of this paper is structured as follows. Section 2 briefly explains the theoretical background of the methods used in this paper. Section 3 details the proposed method. Section 4 describes the experiment design, method comparison, and visualization display. Finally, Section 5 presents the conclusion.
Theoretical background
Transfer learning
Transfer learning (TL) is a new machine learning method, consisting of a domain and a task [25]. The domain includes the feature space χ and the population probability distribution p(x), denoted as D ={ χ, p(x) }, where (x1, x2 . . . x n ) ∈ χ is the population sample. Given the labeled source domain data D s ={ χ s , p(x s ) } and the unlabeled target domain data D t ={ χ t , p(xt) }, the distribution between the two datasets is different due to the difference of data sources, i.e. p(xs) ≠ p(xt). The TL task is to map source domain data and target domain data to the identical feature space, and to find representative features such that the source and target domain data have the same distribution.
Domain adaptation (DA) is a representative method in TL, used to align the data distributions of source and target domains by learning new feature expressions to allow a model trained in a source domain with labeled data to transfer directly to a target domain with unlabeled data without significant degradation of model performance [26]. The DA schematic is presented in Fig. 1.

DA schematic.
As the most widely used loss function in DA, MMD is mainly used to measure the distance between two different but related distributions. In 2012, Gretton [27] proposed MK–MMD based on MMD; it uses multiple kernels to construct the total kernel, and is better than single-kernel MMD. The distance between the mean values of the source and target domain probability distributions in the Reproducing Kernel Hilbert Space (RKHS) is represented as MK–MMD
A MK–MMD measurement criterion is added to the deep model to establish a deep transfer model, and to explore its effect in cross-scenario transfer diagnosis from theoretical and experimental perspectives to solve the data probability distribution discrepancy between the source and target domains, and improve the generalization ability of the deep transfer model in a new scenario.
As an increasing number of layers in an artificial neural network (ANN) causes gradient disappearance, the ResNet uses a jump connection network structure to superimpose shallow features and deep features, effectively preventing gradient disappearance caused by back propagation [28]. The process is shown in Fig. 2. The mapping relationship between input and output of residual learning is expressed as

Residual connection structure.
The network structure of the ResNet model includes one fully connected layer and five residual convolution blocks (Conv1, Conv2_x, Conv3_x, Conv4_x, Conv5_x) [29]. Two types of residual modules are used in ResNet: (1) BasicBlock, also known as a two-layer residual module, includes two convolutional layers with the convolution kernel size of 3×3; (2) Bottleneck, also known as a three-layer residual module, includes three convolution layers with the convolution kernel size of 1×1, 3×3 and 1×1. Compared with BasicBlock, Bottleneck adds a 1×1 convolutional layer to reduce the dimension of the input. A bottleneck structure is used in ResNet50, ResNet101, and ResNet152 networks. In reference [30], the values in ResNet101 and ResNet152 floating-point operations are approximately 2–3 times greater than those in ResNet50, indicating that these models are relatively complex, and the calculations are relatively slow. Thus, the ResNet50 model is selected to research the deep transfer model.
The CBAM is a lightweight attention module containing two sub-modules, the channel attention module (CAM) and the spatial attention module (SAM). The CBAM model can be seamlessly integrated into any CNN architecture and trained end-to-end with basic CNNs, reducing the number of parameters and computing power, and ensuring integration into an existing network architecture as a plug-and-play module. The overall CBAM network structure is shown in Fig. 3.

CBAM network structure.
The calculation process for a CBAM module can be generally divided into two stages [31]. First, an intermediate feature graph F as input is globally pooled by the maximum and mean values. Both pooled one-dimensional vectors are sent to the multi-layer perceptron MLP for operation, and added together. After a sigmoid activation operation, one-dimensional channel attention M
C
is generated, and multiplied by the input element. After channel attention adjustment, the feature map F′ is obtained. Second, F′is pooled by global maximum value and mean value according to space, and the two two-dimensional vectors generated by the pool are spliced for convolution. The two-dimensional spatial attention M
S
is generated, and multiplied by F′ according to the elements. The overall attention generation process of the CBAM can be described as
The CBAM module mainly addresses the weak expression of a CNN. In this study, a CBAM module is introduced into the residual model. The number and integration locations of the CBAM modules are analyzed and confirmed, and the classification performance of the model is further improved.
A deep transfer model with efficient expression is proposed to improve the diagnostic accuracy of the target domain data using relevant source domain data for training. Huang et al. [32] demonstrated that not all layers are necessary in ResNet, and that there are many redundant parameters. To improve the speed of the model, the redundant parameters are removed by referring to improved methods for a residual model in a previous study [33]. The number of layers of residual convolutional blocks Conv1 and Conv2_x remained unchanged, the number of convolutional kernels was decreased to 25%; a Bottleneck block was kept in Conv3_x and Conv4_x layers, the number of convolutional kernels was decreased to 25%; the residual convolutional block Conv5_x was removed; the number of deep network layers has been reduced from 50 to 17 (1+3×3+3×1+3×1+1), the parameters are reduced from 94.1 MB to 0.58 MB, and the improved model is denoted as ResNet17; the network structure of ResNet17 is shown in Table 1.
ResNet17 network structure
ResNet17 network structure
To further verify the performance of the ResNet17 model, it is compared with the DenseNet121, DenseNet169, ResNeXt50, and ResNet50 models. The experimental dataset is from the reciprocating compressor test bench in Section 4.1.1. The experimental results are shown in Table 2.
Performance comparison of ResNet17 and other models
The ResNet17 model has the fewest parameters and the shortest running time for a single epoch, with greatly reduced computational cost, as shown in Table 2. When Accuracy = 100%, ResNet17 requires three epochs, resulting in better overall performance than the other models.
After the ResNet17 model is established, the CBAM module is introduced to extract the deep features of the image and improve the expressive ability of the model. A MK–MMD measurement criterion is added before the last layer of the ResNet17 model to establish a deep transfer model, TL_CBAM_ResNet17, which further reduces the data distribution discrepancy between the source and target domains, and improves the effect of transfer diagnosis. The locations of embedded CBAM modules are shown in Table 3.
Comparison and analysis of embedded CBAM module locations
The network structure of the ResNet17 model includes one fully connected layer and four residual convolution blocks (Conv1, Conv2_x, Conv3_x, Conv4_x). Considering the second location as an example, ‘2’ indicates two CBAM modules. ‘Conv1, Conv2_x’ indicates that the two CBAM modules are placed after the Conv1 layer and the Conv2_x layer in turn, as shown in Table 3 (there is only one CBAM module behind each residual convolution block layer). The experimental results show that the network transfer effect is best when a CBAM module is added after the Conv1 and Conv4_x layers in the ResNet17 model. The architecture of the proposed TL_CBAM_ResNet17 model is shown in Fig. 4.

Architecture of proposed TL_CBAM_ResNet17 model.
A novel cross-scenario transfer model (TL_CBAM_ResNet17) was proposed for intelligent fault diagnosis, as shown in Fig. 4. The flowchart of the proposed method is shown in Fig. 5. The procedures are generalized as follows:

Flowchart of proposed method.
Step 1: A wavelet transform (WT) is used to convert experimental signals and simulation signals into time–frequency images for preprocessing data.
Step 2: Datasets from the source and target domains are fed into the TL_CBAM_ResNet17 model, and all parameters are initialized.
Step 3: Datasets from the source and target domains are exchanged, and cross-scenario transfer experiments are conducted to verify the performance of the proposed method.
Step 4: Grad–CAM and t-SNE methods are used to visually explain the decisions made by the proposed method.
Acquisition and preprocessing of data
Acquisition of experimental data from reciprocating compressor
The experimental system consists of a reciprocating compressor, signal monitor, data acquisition unit, and laptop, as shown in Fig. 6. The monitoring point of the vibration acceleration sensor is located in the radial direction of the crosshead of the reciprocating compressor; the sampling frequency is 20 kHz. During the experiment, the collected vibration acceleration signals were sequentially transmitted to the signal monitor and data acquisition unit through the signal line, and to the laptop for data collection and analysis.

Experimental system: ding172 Laptop; ding173 Signal monitor; ding174 Data acquisition unit; ding175 Reciprocating compressor.
Normal operation of the reciprocating compressor and the crosshead pin clearance fault experiment are simulated using normal and faulty crosshead pin components. The parameters are presented in Table 4.
Crankshaft pin parameters
A total of 12 groups of samples in a normal state (15.80 mm) and fault state (15.50 mm) in six working conditions were collected in the experiment, with a sample length of 20,000 for each group. The working conditions were denoted as A, B, C, D, E, and F, as shown in Table 5.
Working condition parameters
The reciprocating compressor testbed in the laboratory is considered as the physical prototype; its kinematic pair structure is shown in Fig. 7. According to the technical parameters provided by the test bench manual and the actual measured parameters of the motion pair parts, the modeling components including the piston, crosshead, crosshead pin, connecting rod, big head bush of the connecting rod, crankshaft pin, and crankshaft were completed in SolidWorks 2020. They were assembled through co-axis coordination orders, parallel, and overlapping from left to right to determine whether the model can realize the kinematic pair assembly generated after non-interference motion, as shown in Fig. 8.

Motion pair structure diagram of reciprocating compressor testbed: ding172 piston assembly; ding173 crosshead; ding174 connecting rod; ding175 big head bush of connecting rod; ding176 crankshaft pin; ding177 crankshaft.

Rigid–flexible reciprocating compressor coupling model: ding172 piston; ding173 piston rod; ding174 crosshead; ding175 connecting rod; ding176 crankshaft; ding177 crosshead pin; (ANSYS flexible body) ding177 crankshaft pin; ding178 big head bush of connecting rod.
After the 3-D model was established in SolidWorks, it was imported into ADAMS to define the kinematic pair, load, and applied drive, and the motion curves for the required parts were obtained. Treating the model as a completely rigid system does not meet the accuracy requirements in the special case of component deformation; thus, some components of the model were simulated as flexible bodies. Modal analysis of the crosshead pin component was conducted using ANSYS software. The material, density, elastic modulus, and Poisson’s ratio of the crosshead pin were set as 40 Cr, 7.82 × 10-6 kg/mm3, 2.06 × 105 N/mm2, and 0.290, respectively. An intelligent mesh with a total of 4,456 nodes and 23,063 units was created, converted into a modal MNF file, and imported into ADAMS to replace the original rigid component to obtain the rigid–flexible coupling model.
The contact calculation between the flexible body and the rigid body of the crosshead pin was conducted using the impact function. The impact force index was 1.5; the damping coefficient was 1% of the contact stiffness; the penetration depth value was 0.01 mm in the case of absolute damping. Coulomb friction was selected for frictional damping; the static friction coefficient was 0.13, and the dynamic friction coefficient was 0.09 [34]. The solver used the default GSTIFF in ADAMS 2018, whose calculation results for certain conventional problems are relatively ideal. To ensure more accurate simulation results, the AKISPL spline function (time, 0, SPLINE_1, 0) was used to set the piston load. The simulation time was set as 1 s, and the simulation step size was 5.0E-05. The acceleration motion curves were obtained in normal and fault states. During the simulation, the working conditions were consistent with the experimental platform; the serial numbers were A1, B1, C1, D1, E1, and F1.
As an example, the time domain diagrams for experimental and simulation signals were compared for working condition A, as shown in Figs. 9 and 10 The trends of the vibration signal waveform and the location of apparent impact obtained from the experiment and simulation are generally consistent, further confirming the correctness of the simulation mode.

Experimental signals.

Simulation signals.
In signal preprocessing, experimental and simulation signals were normalized; a sliding window sampling strategy was used to intercept samples. A sample length of 1024 and a sample size of 101 were used in each state. The cmor3-3 function was used as a wavelet basis function; one-dimensional signals were transformed into two-dimensional time–frequency images by WT. The experimental and simulated signals each had six working conditions; normal and fault datasets were collected in each working condition. The experimental dataset contained 1212 groups, as did the simulation dataset. The dataset division and transfer strategies across scenarios are shown in Table 6.
Dataset division and transfer strategies across scenarios
Dataset division and transfer strategies across scenarios
For example, A⟶A1 in Table 6 shows that the experimental signal is transferred to the simulation signal in working conditions with a rotating speed of 100 rpm and an exhaust pressure of 20 psi, where A represents the source domain dataset, and A1 represents the target domain dataset. A1⟶A indicates that the simulation signal is transferred to the experimental signal in working conditions with a rotating speed of 100 rpm and an exhaust pressure of 20 psi, where A1 represents the source domain dataset, and A represents the target domain dataset. The transfer strategies for other working conditions are presented in sequence.
The TL_CBAM_ResNet17 model network parameters were set as follows: batch size = 16, epochs = 200, CrossEntropyLoss() was the loss function. The Adam optimizer was used to accelerate network training. The learning rate (lr) was 0.001, and the default momentum parameters belta1 and belta2 were 0.9 and 0.999. In the neural network training process, the lr was adjusted using the lr_scheduler.StepLR equal-interval method to prevent accuracy rate oscillation. The lr decayed to one-tenth of the original when the network epochs reached multiples of 7. The TL_CBAM_ResNet17 model loaded the datasets from experimental and simulation signals at the same time; the input image size was 96 × 96 × 3.
Cross-scenario mutual-transfer diagnostic experiments
The signal of the reciprocating compressor is nonlinear and non-stationary [35]. The feature indexes of vibration signals are obtained through time and frequency domain analysis. The eight time-domain features and eight frequency-domain features in reference [36] were used to form a multi-dimensional feature set with a total of 16 feature indexes. The time-domain and frequency-domain feature indexes include standard deviation, mean value, root mean square value, square root amplitude, peak value, skewness value, kurtosis value, and kurtosis coefficient. This multi-dimensional feature set is used for comparative study of non-transfer method support vector machines (SVM), and non-deep transfer method TCA and JDA. Deep transfer models TL_ResNet50, TL_ResNet17, TL_SAM_ResNet17, and TL_CAM_ResNet17 were established. The network parameters were set as follows: batch size = 16, epochs = 200, CrossEntropyLoss() was the loss function. The Adam optimizer was used to accelerate network training; lr = 0.0001, and the default momentum parameters belta1 and belta2 were 0.9 and 0.999. The time–frequency image size input in TL_ResNet50 was 224 × 224 × 3; the image size for TL_ResNet17, TL_SAM_ResNet17, and TL_CAM_ResNet17 was 96 × 96 × 3.
To further verify the transfer diagnostic performance of the proposed method in cross-scenario applications, and its generalization ability, two experimental schemes were used: transfer of simulation signals to experimental signals in the same working condition, and transfer of experimental signals to simulation signals in the same working condition. The transfer effects are shown in Tables 7 and 8. The running time for each iteration was calculated to demonstrate that the running speed of the improved deep model is greatly increased, as shown in Table 9.
Simulation signals transferred to experimental signals in same working condition
Simulation signals transferred to experimental signals in same working condition
Experimental signals transferred to simulation signals in same working condition
Running time for each iteration of deep transfer learning models
In Table 7, for A1⟶A (as an example). Sixteen features were manually extracted in transfer diagnosis for non-transfer method SVM, the accuracy is 39.60%; for TCA and JDA non-deep transfer methods, the accuracy is 52.97% and 55.45%, respectively. Compared with SVM, TCA, and JDA, the deep transfer models (TL_ResNet50, TL_ResNet17, TL_SAM_ResNet17, TL_CAM_ResNet17, TL_CBAM_ResNet17) exceed 70% accuracy in transfer diagnosis. The proposed TL_CBAM_ResNet17 model exhibited the greatest accuracy (88.61%), representing 17.82%, 13.86%, 15.34%, and 3.96% greater accuracy than that for TL_ResNet50, TL_ResNet17, TL_SAM_ResNet17, and TL_CAM_ResNet17 models, respectively. In Tables 7 and 8, the cross-scenario transfer diagnosis results in other working conditions are consistent with A1⟶A, indicating that the deep transfer model can directly learn the original data and automatically extract features more accurately than with manual extraction. Moreover, the proposed TL_CBAM_ResNet17 model has greater comprehensive feature extraction ability and the highest accuracy in cross-scenario transfer diagnosis.

Classification accuracy in different working conditions.
In Table 9, compared with the baseline ResNet50 model, the improved model exhibits a large gap in the running time for each iteration, indicating that the proposed ResNet17 model can effectively reduce the number of model parameters and accelerate model training. The ResNet17 model integrates different attention mechanism modules. There is a slight difference in the running time of one iteration, but it has little impact.
Visualization of Grad–CAM focus area
Gradient-weighted class activation mapping (Grad–CAM) [37] is a means of visualizing the neural network output, intuitively showing the features learned by the CNN, the operating principle, and the neural network decision process. The two-dimensional time–frequency image is used as the sample input of the deep model. The model processes the image, and the area of interest is visualized, as shown in Fig. 12.

Focus area visualization.
The visualization results for the baseline ResNet50 model, the ResNet17 model, and the integrated network of ResNet17 (ResNet17+SAM, ResNet17+CAM, ResNet17+CBAM) are compared in Fig. 12. The improved ResNet17 model focuses on the feature range significantly better than the baseline ResNet50 model. The ResNet17+SAM model visualization is similar to that of the baseline ResNet50 model. The ResNet17+CAM model is better than the ResNet17 model. The proposed ResNet17+CBAM model focuses on a wider range of features than the ResNet17+CAM model, indicating that the features learned by the proposed model are more comprehensive and accurate.
To better analyze the distribution and portability of fault features in the two scenarios, the t-distributed stochastic neighbor embedding (t-SNE) [38] algorithm is used to visualize the extracted features. The eight models are compared for the A1⟶A condition, as shown in Fig. 13.

Feature visualization.
The proposed TL_CBAM_ResNet17 uses a MK–MMD measure criterion to map the sample features of the source and target domains into the RKHS space for domain adaptation, significantly reducing the data distribution discrepancy between the source and target domains, and minimizing the distance between them. Compared with other models, TL_CBAM_ResNet17 demonstrates a better clustering effect for the same fault features, and stronger separability for different fault features.
Data distribution discrepancy across scenarios leads to weak expression and generalization of traditional deep transfer models in a new scenario. A cross-scenario transfer method for reciprocating compressors based on CBAM and ResNet was proposed in this study to improve the expression and generalization of a deep transfer model. As the ResNet50 model has many redundant parameters, the ResNet17, a miniature deep model, was constructed with redundant parameters removed to improve the running speed. The number and position of CBAM modules integrated into the ResNet17 model were analyzed. The attention mechanism and residual network integration structure were determined to improve the expression of the model. A MK–MMD measurement criterion was added before the last layer in the ResNet17 model, significantly reducing the data distribution discrepancy between the source and target domains, and improving the transfer diagnosis performance of the deep transfer model. Experimental comparison indicates that the classification performance of the proposed model (TL_CBAM_ResNet17) is significantly better than that of other models, verifying the effectiveness and superiority of the proposed model. A limitation of the proposed model is that it can only process two-dimensional images, without experimental verification of one-dimensional data. In future research, a debugging model that can process both one-dimensional data and two-dimensional images will be developed. Datasets for different models, working conditions, and reciprocating compressor fault types will be collected in the field to verify the robustness of the model, and provide new methods for field fault detection for reciprocating compressors.
Footnotes
Acknowledgments
This work was supported by National Natural Science Foundation of China(No.51674277) and the Strategic Cooperation Technology Projects of CNPC and CUPB (ZLZX2020-05-02).
