Abstract
With the innovation and development of detection technology, various types of sensors are installed to monitor the operating status of equipment in modern industry. Compared with the same type of sensors for monitoring, heterogeneous sensors can collect more comprehensive complementary fault information. Due to the large distribution differences and serious noise pollution of heterogeneous sensor data collected in industrial sites, this brings certain challenges to the development of heterogeneous data fusion strategies. In view of the large distribution difference in the feature spatial of heterogeneous data and the difficulty of effective fusion of fault information, this paper presents a multi-scale deep coupling convolutional neural network (MDCN), which is used to map the heterogeneous fault information from different feature spaces to the common spaces for full fusion. Specifically, a multi-scale convolution module (MSC) with multiple filters of different sizes is adopted to extract multi-scale fault features of heterogeneous sensor data. Then, the maximum mean discrepancy (MMD) is applied to measure the distance between different spatial features in the coupling layer, and the common failure information in the heterogeneous data is mined by minimizing MMD to fuse effectively in order to identify the failure state of the device. The validity of this method is verified by the data collected on a first-level parallel gearbox mixed fault experiment platform.
Introduction
As a hot spot of modern industrial research, the condition monitoring and fault diagnosis of rotating machinery equipment play an important role in ensuring the safe and stable operation of industrial production [1, 2]. Due to the long-term operation under varying loads and operating speeds, mechanical key components, bearings, gears, etc., tend to have fatigue cracks, pitting corrosion and other failures with the increase of working years [3, 4]. These failures may appear on a single component, or different types of faults may occur in different components at the same time, and these combined faults bring challenges to effective machine state recognition [5]. It is necessary to develop an efficient and reliable system to monitor and diagnose the key components of mechanical equipment in order to reduce the cost of operation and maintenance of mechanical equipment and avoid high production losses [6, 7].
In recent years, progress has been made in sensor-based real-time monitoring, condition monitoring, control and optimization, and complex industrial process systems for key performance indicators [8]. The development of artificial intelligence, machine learning and other technologies has prompted people to shift from traditional corrective to more effective condition-based maintenance (CBM) with prognostics and health management (PHM) as the core [9]. In addition, with the complexity and integration of mechanical systems, single sensor information cannot fully reflect the operating status of the system. In the industrial field, different types of sensors are placed in key parts of the mechanical system to collect various types of fault information [10]. A large amount of heterogeneous sensor data can be collected by various types of sensors, such as acoustic signals, vibration signals, speed signals, etc., and data fusion technology is widely applied for revealing information from multiple sensors due to its inherent advantages [11, 12]. Data fusion technology based on different levels of fusion can be carried out at data level, feature level and decision level respectively [13]. If multiple sensor information is collected by the same type of sensor, they can be directly fused at the data layer. And the data collected by different types of sensors have different physical characteristics, which fusion can be selected according to the specific situation of feature layer fusion or decision layer fusion [14]. The heterogeneous data contains redundant complementary information in multiple spaces, compared to the same type of sensor data that only collects the same spatial fault information, which can more comprehensively characterize the operating status of the equipment. In addition, when multiple signal sources are used for information fusion, usually not all synthesized information is complementary. Due to the interference of external noise, different sensor data may convey inconsistent or contradictory information, resulting in wrong decisions [15]. Therefore, it is still a huge challenge to mine the correlation between different sensor signals and common information for effective state monitoring in the process of information fusion [16].
The intelligent fault diagnosis method was widely studied and reported in the literature because it can capture the state information of mechanical equipment by learning the historical sensing data. Praveen Kumar et al. [17] selected the optimal wavelet features from sound, vibration and acoustic emission signals through decision tree algorithm for fusion, and artificial neural network (ANN), support vector machine (SVM) and probabilistic support vector machine (PSVM) classifiers was trained respectively to compare their performance in gearbox fault diagnosis. By applying SVM and STFT (Short Term Fourier Transform) algorithm, Banerjee et al. [18] developed a fault signal classification method based on data feature layer fusion. Wu et al. [19] proposed a multi-indicators fusion dynamic feature matching model based on online wear debris images to identify the life evolution of the wear state. Applying Bayesian networks, Krishnamoorthy et al. [20] developed a multi-sensor system to simultaneously monitor sensor and process faults. In [21], a composition spectrum data fusion method was proposed, which simplified the fault diagnosis process of rotating machinery and improves the performance of condition monitoring. Although some success has been achieved in the field of condition monitoring, some inherent shortcomings of these traditional information fusion intelligent diagnosis methods still need to be resolved [22]. (1) Most traditional methods rely on the quality of manual feature selection, which requires a lot of signal processing technology and diagnostic expertise and may miss some important information. (2) Feature engineering and pattern recognition are conceived and executed separately. Asynchronous design complicates the diagnosis process and affects the performance of state recognition. (3) Most traditional intelligent diagnosis methods can only perform shallow learning, and it is difficult to characterize the complex relationship between sensor information and fault status [23, 24].
Deep learning is an attractive choice for processing mechanical big data for multimodal information fusion, because it aims to automatically discover representative features through multiple non-linear layers and does not depend on the specific application field, which effectively overcomes the shortcomings of traditional intelligent fault diagnosis [25]. Chopra and Yadav [26] employed sparse autoencoder (SAE) to extract unsupervised feature of data collected independently from four different locations of the engine and adopted the principle of majority voting to make decision on the type of engine fault. Chen et al. [27] adopted a channel overlay method to input the original horizontal and vertical vibration data into two channels of convolutional neural network (CNN) for fusion and realized automatic feature extraction and diagnosis. Zhang et al. [28] used the parallel superposition of the frequency spectrum of multiple vibration signals to effectively estimate the remaining useful life of ball screw under seven different degradation states on the deep belief network (DBN). Chen and Li [29] further improved the reliability of fault diagnosis by fusing the time-domain and frequency-domain features of different vibration sensor signals through multiple SAE and inputting them to DBN training. In reference [30], Jiao et al. developed a deep coupled dense convolutional network with complementary information, which automatically fuses features extracted from encoder signals and vibration signals in parallel and finally completed the state recognition of gearbox fault. However, most of these methods based on deep learning only use multiple vibration signals for fusion, and there is little research on the fusion of heterogeneous signals in the existing literature, and the relevant information and spatial differences of different types of sensor signals are not considered.
Heterogeneous sensor sources contain redundant complementary information from multiple spaces, which can more comprehensively characterize the operating status of equipment than the same kind of sensor sources. Therefore, integrating heterogeneous information from different spatial distributions into the same mapping space is helpful to enhance the performance of fault diagnosis. In order to solve the problem of large differences in the feature spatial distribution of heterogeneous data and the difficulty of effective fusion of fault information, a multi-scale deep coupling convolutional neural network (MDCN) is proposed. Firstly, multiple convolution kernels with different sizes are used to extract multi-scale fault features in the same convolutional layer. Then, the maximum mean discrepancy (MMD) is applied in the coupling layer to measure the distance between heterogeneous spatial features, and the common fault information in heterogeneous data is mined by minimizing MMD. Finally, these heterogeneous sensor features with common fault information are combined to realize the identification of equipment fault status.
The main insights and contributions of this work are summarized as follows: (1) A novel multi-scale deep coupling convolutional neural network is proposed for fault diagnosis of heterogeneous signal fusion, in which the multi-scale convolution module and deep coupling strategy are applied to fully fuse multi-scale information. (2) In the deep coupling strategy, the MMD is adopted to measure the distance between different spatial features in the coupling layer, and the common failure information in the heterogeneous data is mined by minimizing MMD to fuse effectively in order to identify the failure state of the device. (3) Corresponding fault diagnosis tasks were carried out on the gearbox hybrid fault experiment platform to verify the effectiveness of MDCN. Compared with several other methods, the experimental results show the effectiveness of the proposed method.
The remaining sections of this paper are structured as follows. Section 2 illustrates the basic theoretical knowledge of CNN and the MDCN architecture proposed in this paper. In Section 3, the diagnosis results of several different methods on a parallel gearbox mixed fault experiment platform are compared and analyzed. The conclusions of this paper are finally given in Section 4.
Theory background
Typical CNN structure
CNN, as a representative model in the field of deep learning, is widely applied in the field of fault diagnosis because it can automatically learn features and perform translation invariance classification of input information through multi-layer non-linear structure [31]. The input of one-dimensional CNN matches the sensor signal sequence, and various improved one-dimensional CNN architectures have been studied in order to obtain better diagnostic results [32]. Figure 1 displays a typical one-dimensional CNN structure, which includes one or more convolution pooling modules and fully connected layer, and the last classifier is connected to calculate the probability output of different categories [33]. The typical one-dimensional CNN structure is described as follows:

Architecture of a typical one-dimensional CNN.
A convolution pooling module includes a convolutional layer and a pooling layer. Through sliding convolution of several convolution kernels with shared weights and input information to generate feature mapping in high-dimensional space, the more abstract representation obtained by convolutional layer helps pattern recognition [34]. Specifically, the mathematical formula for convolutional layer operation is as follows:
The pooling layer is usually connected after convolution operation to reduce the size of feature mapping, diminish information redundancy, and retain classification sensitive and space invariant features. The pooling layer not only greatly reduces the number of parameters and improves the computational efficiency, but also prevents the network from over-fitting to a certain extent by this kind of down-sampling operation. The pooling operation uses the statistical values of a certain location feature and its adjacent features in the feature map to replace the output of the network at that location. Maximum pooling is the most common in CNN, and its calculation formula is as follows:
The fully connected layer and the classifier form the classification part of the CNN model. The high-level abstract features extracted by the previous layer are mapped to the sample label space through the fully connected layer, which helps the classifier to classify each sample correctly. As a promotion of logistic regression model on multi-classification problems, softmax can easily calculate the probability distribution of each sample belonging to different categories. Assuming that there are k categories, the probability that the sample belongs to each category is:
After using softmax as the classifier, the corresponding cost function should be selected for training model and updating model parameters to achieve better recognition performance. The cross-entropy loss function is often used in conjunction with softmax to measure the similarity between the category probability distribution predicted by the model and the target category probability distribution. The formula is as follows:
Due to the change of working load and the coupling effect of different parts of mechanical equipment, the multi-sensor signals collected from the industrial field usually have a high degree of nonlinearity and complexity and including multiple vibration modes, and the distribution of mechanical fault characteristics on different scales [36]. Convolution kernels with different sizes have different receptive fields, which can extract different local features. Based on this, in order to simultaneously learn multi-scale characteristics from multi-sensor signals, a multi-scale convolution module (MSC) is proposed. Specifically, convolutional layers with convolution kernels of different sizes are applied to learn abstract features at various scales in parallel, and multi-scale features are fused across channels through convolution operation. The module aims to mine more abundant features of sensor signals from multiple scales, thereby enhancing the robustness of network fault feature recognition. The MSC is displayed in Fig. 2. First, three parallel convolution blocks of different sizes are employed to learn features of different scales from the output map of the previous layer at the same time. For convolution kernels of different sizes (In this paper, the sizes of multi convolution kernels are 1 × 1, 1 × 3 and 1 × 5 respectively), the convolution operation can learn local information from different scales. Then, the learned scale feature maps are placed side by side in the channel direction to achieve the integration of information.

Multi-scale convolution module.
Due to the different physical characteristics of different types of sensor signals, there are differences in spatial distribution. If the features learned from different types of sensor signals are fused directly, it may cause aliasing of features of different fault classes. Therefore, some measures must be taken to fuse the different fault information of corresponding fault types properly. Inspired by the concept of domain adaptation, the maximum mean difference (MMD) is introduced to measure the distribution difference of heterogeneous sensor features in this paper. MMD maps data of different distributions to the reproducing Hilbert reproducing space (RKHS) for measurement [37], and its formula is defined as follows:
In order to make the heterogeneous feature information more accurately and reasonably expressed in the reproducing Hilbert space, the multi-kernel MMD is constructed by combining multiple bandwidth kernel functions, the combination is as follows:
MMD is taken as the distribution difference loss function of the network model, and the value of L
mmd
is reduced by updating the parameters, so that the heterogeneous features with different distribution are close to each other in space, avoiding unnecessary feature aliasing and improving the fault classification accuracy.
Because different types of sensor signals have different physical and statistical characteristics, each type of sensor signal implies different sensitive feature representations, and it is impossible to use the same parameter network structure to perform feature learning for different types of sensor signals. Therefore, a dual network structure (MDCN) is proposed in this paper, which is applied to extract representative features from two heterogeneous sensor data and couple them. The advantage of this model is that it employs different subnets for feature learning and considers the correlation between different sensor features. Figure 3 shows the overall architecture of MDCN, which is described as follows:

Framework of the proposed MDCN.
It can be seen from the schematic illustration displayed in Fig. 3 that the proposed MDCN consists of two identical subnets, a coupling layer, a fully connected layer and a classifier. Each subnet is composed of an initial convolutional layer and two multi-scale convolution modules. Since the two subnets have the same network structure, only one subnet parameter is presented here. In addition, the spectrum information can more intuitively and clearly show the fault frequency in the space, so the spectrum of the heterogeneous signal is used as the input of the network. In the first convolutional layer, a large convolution kernel with core size of 1 × 32 is employed to process the frequency domain signal. The main purpose of using the large convolution size is to make the extracted high-level features correspond to a larger receptive field and have global attributes [38]. Because the sampling frequency of heterogeneous signals may be different, different convolution strides can be set here to make the feature map size after convolution calculation the same. Next, two multi-scale convolution modules are applied to extract features from multiple scales, and the two modules use different convolution steps to control the feature map size. After the multi-scale module, a maximum pooling layer is added to reduce feature mapping. The detailed structural parameters are shown in Table 1. In all convolution operations, the zero-filling strategy is used in conjunction with the convolution step size to change the size of the feature map. After the multi-scale feature learning is completed, the state features of different spatial distributions extracted by the two subnets are mapped to the same distribution space by minimizing MMD, and then connected to a vector. Through a fully connected layer, the heterogeneous features are further fused, so that the heterogeneous features of the same fault type are gathered. Finally, a softmax classifier is implied to recognize different health states of mechanical equipment.
Structural parameters of MDCN
Parameter k is the number of types of health status.
In order to learn the joint feature representation that obeys same spatial distribution from heterogeneous sensor data, a linear combination of L
ce
and L
mmd
is applied to construct a total loss function for training the MDCN model. The total loss function is defined as follows:
Here, α denotes regularization parameter, which is employed to regulate the weight of the loss of the distribution difference of heterogeneous sensor features.
In the network training, MDCD model follows the typical supervised learning scheme and optimizes the network parameters by minimizing the loss function. Specifically, the network weight parameters θ
w
and bias θ
b
are optimized so that the learned heterogeneous features are projected to the same region and contain the discrimination information of different mechanical health states. The optimization problem can be formulated as follows:
In each mini-batch training, the parameters are updated:
Here, η
t
is the exponential decay learning rate, which is automatically updated with the number of training, the update method is as follows:
To sum up, the two important features of MDCN are multi-scale feature learning and feature coupling. The two subnets of MDCN learn multi-scale features layer by layer through MSC, and the coupling layer maps different multi-scale features to the same representation space to learn similar features from different modes. The condition of the equipment can be further monitored according to the joint representation automatically learned from the deep network structure.
So as to evaluate the diagnostic performance of the developed method, taking a typical rotating machinery gearbox condition monitoring as an example, a rotating machinery experimental platform for vibration analysis and fault diagnosis was built to collect two kinds of sensor data of vibration acceleration and vibration velocity for experimental analysis.
Heterogeneous sensing data acquisition
As shown in Fig. 4, a gearbox hybrid fault experimental platform was built, which includes an AC variable frequency motor with power of 0.75 KW, a planetary gearbox and a fixed-shaft variable speed gearbox, and a magnetic powder clutch brake for applying different loads. In the experiment, the planetary gearbox was always in a normal state, and the different fault states of the fixed-shaft gearbox were monitored. The fixed-shaft gearbox is a one-stage transmission system, consisting of a large gear and a small gear, the number of teeth is 75 and 55, and the material is S45 C. In order to collect the heterogeneous sensing signal, an acceleration sensor and a magnetoelectric speed sensor are respectively installed on the gearbox housing and the bearing housing of input shaft to collect vibration acceleration signals and vibration velocity signals. The sampling rate of both is 5120 Hz.

Test platform for vibration analysis and fault diagnosis.
In this experiment, a total of 6 kinds of gearbox operation conditions were simulated, including the wear, pitting, broken tooth fault of a single gear, and the coupling fault between two gears. Table 2 lists the detailed fault conditions. The motor speed is set to 880 rpm, the magnetic powder brake control current is set to 0.05 A, 0.1 A and 0.2 A respectively to simulate the load under different working conditions.
Description of gearbox health conditions
Under three different operating loads, 200 training samples and 20 test samples were collected for each gearbox health state, a total of 3600 training samples and 360 test samples are acquired, each of which contains 1024 sampling points to cover the whole rotation cycle. In order to intuitively view the collected heterogeneous signals, a sample of vibration acceleration and vibration velocity signal under 0.05 A load is randomly selected for time domain and frequency spectrum display in Fig. 5. In addition, a certain proportion of Gaussian white noise was added to the collected heterogeneous sensing signals to construct the noise signal, so as to simulate the noisy environment of the industrial scene. According to formula 14, heterogeneous sensing signals with SNR of 0 dB are constructed.

Heterogeneous signals and their spectrums of 6 health conditions. (a) Vibration acceleration signals. (b) Vibration velocity signals. (1) Normal, (2) Big gear pitting, (3) Big gear tooth broken, (4) Big gear tooth broken and pinion wear, (5) Big gear pitting and pinion wear, (6) Pinion wear.
After acquiring the noisy heterogeneous sensor signal, fast Fourier transform is applied to convert the vibration acceleration signal and the velocity signal to the corresponding frequency spectrum, and all the frequency spectrum signals are normalized to obey the standard normal distribution. Each time 100 mini-batch samples are randomly selected from training samples and input into the two subnets to train the MDCN. Because the sampling rate of the two sensors is the same, the first convolution stride of both is set to 8. The initial learning rate η0 of the network is set to 0.01, the decay rate of the learning rate β is 0.99, and the regularization parameter α is 0.5, which is trained for 15 epochs. To reduce the effect of randomization, each trial was repeated ten times.
In order to verify the effectiveness of multi-scale convolution module and feature coupling of MDCN, several different convolution neural network structures are designed for comparison. Except for the change of the network structure, the other settings of these networks are consistent with the MDCD proposed in this paper.
Baselines CNN: In this experiment, a traditional CNN with a single network structure was first introduced as a baseline for the entire experiment, to compare the impact of the improvement of the method proposed in this paper on the performance of fault diagnosis. The network is a single network structure that uses a single sensor signal as an input, and it uses convolutional layer with a convolution kernel size of 1 × 3 instead of a multi-scale convolution module. The vibration acceleration and vibration velocity signals are used as the inputs of the network, which are called CNN-V and CNN-S respectively.
Multi-scale CNN: In order to verify the effectiveness of the proposed multi-scale convolution module, on the basis of Baselines CNN, a multi-scale convolution CNN is constructed, and its structure is similar to the MDCN subnet. Similarly, the vibration acceleration and vibration velocity are respectively used as input, and each is called MCN-V and MCN-S.
Fusion CNN: In order to verify the effectiveness of the proposed MMD coupling layer, a traditional information fusion network structure is designed, which removes the MMD coupling on the basis of MDCN and directly connects the features learned by two subnets to a long vector. This method is called MCN- F.
The diagnostic accuracy of ten trials with different methods is shown in Fig. 6, and the average accuracy and standard deviation are shown in Table 3. From the experimental results, it can be observed that the diagnosis accuracy of the data fusion strategy is significantly improved compared with the single sensor data diagnosis scheme. If only the vibration velocity signal or vibration acceleration signal is implied for diagnosis, the gear state average recognition rate of cnn-s and cnn-v is only 80.86% and 92.69%. After the application of MSC, the diagnostic accuracy of mcn-s and mcn-v is improved to 82.22% and 94.67% respectively. This shows that the MSC can extract more sensitive features from different scales than the convolutional layer with fixed kernel size, which is conducive to pattern recognition. In addition, the application effect of vibration acceleration signal is better than that of vibration velocity signal, which indicates that fault information of gear is mainly distributed in high frequency band, while vibration velocity signal is mainly distributed in low frequency band relative to vibration acceleration signal, which can be seen intuitively in Fig. 5. When the data fusion strategy is used for joint diagnosis of two heterogeneous signals, the average diagnostic accuracy of mcn-f is 98.39%, and MDCN is improved to 99.50% on the basis of MCN-F. This is because MCN-F only linearly combines the features extracted by a single network, and does not consider the relevant information between heterogeneous signals. When the spatial distribution of heterogeneous features extracted from two subnets is different, it will cause feature aliasing and make it difficult for the classifier to determine the reliable decision boundary. From the standard deviation shown in Table 3, the model implying MSC and fusion strategy has smaller standard deviation, while mdcn has the smallest standard deviation compared with other methods, which shows that the proposed multi-scale convolution and information fusion strategy through coupling layer can increase the robustness of the model.

Diagnostic accuracy of ten trials with different methods.
Average accuracy and standard deviation of different methods
In order to analyze the diagnostic performance of different methods for different gearbox fault categories, Table 4 shows the average F1 scores of ten trials for six kinds of gear faults. It can be observed that for the diagnosis using single sensor signal, CNN-S has poor diagnosis effect on other fault classes except for the fourth fault class (big gear pitting and pinion wear) whose F1 score reaches 95.75%, while the diagnosis performance of CNN-V is superior to CNN-S in all fault classes except for the fourth class. Compared with CNN-S and CNN-V, the F1 scores of MCN-S and MCN-V are improved on the whole, which is exactly the effect of multi-scale convolution on the improvement of model performance. In addition, there are differences in the diagnostic scores of different fault types employing vibration acceleration signals and vibration velocity signals, indicating that the two heterogeneous sensor signals have complementary status information. Therefore, the use of data fusion strategy for joint fault diagnosis can effectively improve the performance of the model. It can be observed that the F1 scores of mcn-f and mdcn in each fault category are greatly improved. In particular, the average F1 scores of each state of MDCD proposed in this paper are more than 99%, and the fluctuation is small, which indicates that the feature mapping and fusion method in the process of minimizing MMD is superior to the traditional fusion method.
Average F1 scores of ten trials with different methods
In order to further compare the performance differences between mdcn coupling layer fusion and traditional fusion methods, the high-level fusion features learned from deep structure are visualized, and t-distributed stochastic neighbor embedding (t-sne) technology is applied to reduce the high-dimensional features of fully connected layer to two dimensions [39], which is shown in Fig. 7. It can be seen intuitively from Fig. 7(a) that using the traditional fusion method of MCN-F, the heterogeneous features of the same fault categories are separated, and the features of different categories are confused. The reason for this phenomenon is that there are differences in the distribution of heterogeneous sensing signals in space. If appropriate methods are not adopted to eliminate the spatial distribution differences between heterogeneous features, the accuracy of model diagnosis will decrease. The proposed MDCN projects heterogeneous features into the same distribution space by measuring the distribution distance and then merges them, which can overcome the shortcomings of traditional fusion methods. As shown in Fig. 7(b), MDCN can gather heterogeneous features of the same fault category together, and separate mapping features of different categories from each other, thus realizing proper fusion of heterogeneous complementary information and improving the accuracy of diagnosis. The confusion matrix of a trial result of mdcn is shown in Fig. 8, in which the last row grid shows the recall rate of each fault category diagnosis, and the last column is the precision rate. It can be observed that mdcn obtains high scores in both evaluation indexes. Corresponding to Fig. 7(b), this experiment only misclassified a third type (Big gear tooth broken and pinion wear) test sample into the fourth type (Big gear pitting and pinion wear), which may be that both types of samples contain fault information of pinion wear, which makes the extracted features have high similarity.

Feature visualization via t-SNE. (a) MCN-F. (b) The proposed MDCN.

Confusion matrix of the proposed CDCN.
In addition, the training loss function curves of the six methods are analyzed in Fig. 9. It can be seen intuitively that the loss function curve of the single-sensor signal diagnosis method has greater fluctuations in the convergence process. Among them, CNN-S and MCN-S use vibration velocity signals as network input, and the loss function converges slower than CNN-V and MCN-V, and always maintains a larger value. In the data fusion method, the loss function curve fluctuates slightly during the model training process, especially for MDCN, the loss function value steadily decreases, reflecting the better stability of the model. Although the loss function of MDCN is added with the loss term of distribution difference, the final value of the loss function can still converge to almost the same as MCN-F. In addition, the loss curve of Lce and Lmmd of MDCN is also given here. It can be seen that the loss of L mmd has been kept small and steadily decreased, and the loss values of Lce and MDCN is not much different. The downward trend of both is consistent with the overall loss of MDCN.

Training loss curve.
Finally, the influence of different values of the regularization parameter α on the diagnosis accuracy of MDCN is shown in Fig. 10. It can be observed from the figure that MDCN has obtained excellent diagnostic accuracy in a fairly wide range of values. When α < 0.5, the diagnostic accuracy gradually increases with the increase of a. When α = 0.5, the diagnostic accuracy reaches the highest level, and then remains stable and slightly decreases. Therefore, α = 0.5 is taken in the experiment of this paper.

Test accuracy under different regularization parameter values.
In this paper, a MDCN with heterogeneous sensor data was proposed for intelligent fault diagnosis of rotating machinery. The model is based on the information fusion strategy of minimizing MMD to project sensor features from different types into a common distribution space, so as to realize the effective integration of heterogeneous information. In addition, deep parallel multi-scale convolution architecture of MDCN can extract sensitive complementary and related features from multiple scales, which is conducive to the fusion of heterogeneous fault information and the state recognition of equipment components. The effectiveness of the proposed method is verified by collecting the operation state data under different working conditions on the first-stage parallel gearbox hybrid fault experimental platform. The experimental results show that compared with the single sensor signal diagnosis method, the application of multi-sensor information fusion strategy can significantly improve the accuracy and reliability of diagnosis. Compared with the traditional information fusion method, the proposed CDCN considers the relevant information of the heterogeneous signal in the feature extraction process, which can effectively avoid the aliasing problem of different types of heterogeneous features, thus showing better diagnostic performance. Although this method has achieved good diagnostic performance, it only considers the fusion of two kinds of heterogeneous sensor signals at present. For more heterogeneous data fusion, it will inevitably expand more branches of the network, resulting in low computational efficiency. How to effectively fuse more heterologous sensor data without reducing the computational efficiency is a matter to be considered in the following work.
Footnotes
Acknowledgments
The studies were funded by the National Natural Science Foundation of China (Grant numbers 51875500 and 61973262), Natural Science Foundation of Hebei Province (Grant number E2020203147 and E2019203146) and High level talents funding project in Hebei Province (Grant number A201803001).
