Abstract
As the service life of the assembly equipment are short, the tightening data it produces are very limited. Therefore, data-driven assembly quality diagnosis is still a challenge task in industries. Transfer learning can be used to address small data problems. However, transfer learning has strict requirements on the training dataset, which is hard to satisfy. To solve the above problem, an Improved Deep Convolution Generative Adversarial Transfer Learning Model (IDCGAN-TM) is proposed, which integrates three modules: The generative learning module automatically produces source datasets based on small target datasets by using the improved generative-adversarial theory. The feature learning module improves the feature extraction ability by building a lightweight deep learning model (DL). The transfer learning module consists of a pre-trained DL and a one fully connected layer to better perform the intelligent quality diagnosis on the training small sample data. A parallel computing method is adopted to obtain produced source data efficiently. Real assembly quality diagnosis cases are designed and discussed to validate the advance of the proposed model. In addition, the comparison experiments are designed to show that the proposed approach holds the better transfer diagnosis performance compared with the existing three state-of-art approaches.
Introduction
Bolts are essential components in assembly fields. The quality of a single bolt can directly influence the whole assembly quality, leading to machine failures, equipment damages and casualties. Therefore, it is indispensable for workers in assembly companies to focus on condition monitoring and quality diagnosis of the bolt tightening process.
There are three types of methods to effectively monitor and diagnose the bolt problems: model-driven, signal-based and data-driven approaches. Model-driven and signal-based methods require sufficient priori bolt property knowledge to design mathematical model or signal processing model specifically and manually, which could achieve high accuracy of quality diagnosis results but are time-consuming and expert-relying. On the contrary, the data-driven approaches [1, 2] gradually become popular because it is able to automatically monitor and diagnose bolt tightening data without any priori-knowledge inputs. However, the millions or even billions of data are required to train this model to get better quality diagnosis accuracy. Therefore, small bolt tightening data in real assembly factory and big data demands of the data-driven model limit its application on bolt quality diagnosis.
Transfer learning method (TL) makes small data learning possible, which uses sufficient and source data to train deep-learning model previously, then put the small data into the model to fine-tune the parameters. To perform TL well, the discrepancy of the source data and the target data should be close. There are three types of transfer learning methods to narrow the gap between these two data distributions: instance-based TL [3], feature-based TL [4], and model-based TL [5, 6]. Instance-based TL refers to selecting samples from the source domain that are similar to the target domain for training. By doing this, the small dataset can be supplemented with more effective data, but the selection method is not easy to define. Feature-based TL aims to find a cross-feature space so that the common feature of the source data and the target data can be preserved in this space. Feature-based TL can map the source data into the target data by building a feature space, but the feature space building method is not uniform. Model-based TL is achieved by pre-training the model with the source dataset (labeled, sufficient) and then fine-tuning the model with the target data (unlabeled, insufficient).
There were also many emerging works focusing on how to make transfer learning simple and have a better performance by screening the source data or improving the two datasets relevance [7, 8]. For feature-based TL, Luo W et al. [4] applied deep convolutional neural networks to preprocess the source data, so that the source data after feature mapping can maintain a certain similarity with the target data, reducing the risk of negative transferring. This team also used model-based TL to further improve the performance. For instance-based TL, Liu B et al. [9] raised a Selective Multiple Instance Transfer Learning Method, which ensure the similarity of two datasets through source sample filtering. In addition, the KL divergence method [10] and the graph matching loss method [11] are famous for their effective of calculating and reducing loss between the source and target distribution.
Although versatile improved transfer learning methods were emerged to address the small data learning problem, these fields still face bottlenecks. First, the performance of the model highly depends on the source data selection but selecting the appropriate source data for training is hard, time-consuming and even impossible in a real assembly industry, because the procedure of producing and selecting source data could not be finished without human helps. Second, using multiple similar machines to produce source data is not an economical way for real assembly industries, because each machine in the industry is expensive and needs to be fully utilized for assembly tasks.
Deep Convolutional Generative Adversarial Networks (DCGAN) [12, 13] is able to generate desired data samples in any domain distribution, so DCGAN makes it possible to supplement the small dataset by generating new data autonomously.
To address the above-mentioned problems in real assembly situations, this paper proposes an Improved Deep Convolution Generative Adversarial Transfer Learning Model (IDCGAN-TM), aiming at producing and selecting the high-quality source data autonomously by models and get a better fault diagnosis performance than existing models. IDCGAN-TM consists of three modules: generative learning, feature learning and transfer learning. The generative learning module seeks to automatically produce source datasets based on small target datasets by using the generative-adversarial theory. The feature learning module targets at improving the feature extraction ability by building a lightweight deep learning model (DL). The transfer learning module is built to transfer the parameters of the pre-trained model generated in the previous stage to a new model of the same structure and add a fully connected layer to obtain the ability of the model to learn small target data. With those three modules, IDCGAN-TM can effectively train small sample data without preparing massive source data manually, and the parallel computing methods are adopted to obtain the source data efficiently. In order to verify the robustness of the proposed model, we build a tightening assembly test platform. The test platform is used to generate bolt torque curves of seven different bolt quality problems, and the built model is utilized to learn the typical features of generated curves. After doing it, the IDCGAN -TM can be directly applied to the online quality diagnosis of the tightening assembly process.
Related works
Basic GAN model
Generative Adversarial Network (GAN) [14] has been viewed as an effective way to produce desired data autonomously, composed of a generator and a discriminator. The generator produces fake data based on random noise data, then the generated fake data and real data are compared by the discriminator. The generator generates fake data in order to cheat the discriminator, while the discriminator aims to distinguish whether the input data is real or fake. The generator loss function G Loss and the discriminator loss function D Loss can be computed as the performance index of these two models after each running of the GAN, which are used to back-propagate and refresh the weights and bias of the generator (or discriminator). After multiple back-propagate processes, the fake data produced by the generator can be similar with real data.
In general, when a Nash equilibrium [15] appears, we consider that the model reaches the optimal value or falls into the local optimal value.
Transfer learning
The core idea of the transfer learning is ‘domain adaptation’. A ‘domain’ means the probability distribution of the feature space in a task. specifically, the source (target) domain can be noted as P (Y S |X S ) (P (Y T |X T )), which represents the probability distribution of the label space Y S (Y T ) given data instances X S (X T ). If P (Y S |X S ) = P (Y T |X T ), the source domain needs not to be adapted to the target one, so it becomes a traditional machine learning problem. The transfer learning method addresses the domain adaptation problem when P (Y S |X S ) ≠ P (Y T |X T ). However, it should be noted that the two domains have some limits: (1) The source data should be enough to pre-train deep learning model well. (2) Target dataset and source dataset should have a low distribution discrepancy. If one of the above conditions is not met, it will cause negative transfer learning [16], which results in bad performance of the transfer learning model.
Architecture of modeling of IDCGAN-TM
The logical structure of IDCGAN-TM is shown in the Fig. 1, which is composed of a generating learning module, a feature learning module and a transfer learning module. In the Generate learning module, an I-GAN is proposed to learn important features of target data and generate abundant related source data. This process can actively generate high-quality source data that meets the conditions of transfer learning and avoid GAN limits. In feature learning module, an advanced deep learning model is used to learn and distinguish the key features of the source data with different category labels. In the transfer learning module, the pre-trained deep learning model generated from feature learning is transferred to a new model. After a fine-tuning process, the final model can achieve the ability to distinguish target data and achieve intelligent diagnosis of small sample data efficiently.

The logical structure of the IDCGAN-TM.
The first step of the IDCGAN-TM is generative learning. A GL aims to produce source data that is highly relevant to the target data, avoiding the influence of negative transferring on model. We propose an Improved Deep Convolution GAN (I-GAN) as a core model of GL, which can be used to generate sufficient and eligible source samples. The basic principle of I-GAN is shown in the Fig. 2. To begin with, a new generator composed of convolutional networks trains random noise data to generate fake samples. Then, the real samples and the fake samples are randomly and anonymously input to the new discriminator. Finally, the discriminator determines whether the input data is real or fake. The loss function of the new generator and discriminator are listed in formulas (1-2).

The model of I-GAN.
Where, an MMD method [17] is added as a loss item to G Loss , which calculates the discrepancy of the generated data and the real data. τ, μ are weights of each loss item in formula (1), and ρ is a decaying weight, which aims to alleviate the punishment of regarding the generated data as a real data. Through new loss functions, the generated data is not required be the exact same with the real data, and users can control the data distribution of the generated data by tuning the hyper-parameters (τ, μ, ρ).
When the learning ability of the generator and the discriminator are in an equal position, it is inevitable that the generator will have a high probability to generate data that is completely consistent with the real sample. Therefore, a decision formula for effectively selecting eligible data is proposed. In detail, we calculate the similarity degree between the data generated by the generator and the real data. The fake samples that are eligible to be put in the training set should meet the effective similarity degree interval. In this paper, the Structural Similarity Index (SSIM) method is used to calculate the similarity degree between fake samples and real samples, because compared with other similarity calculation methods such as Kullback–Leibler (KL) divergence, MMD, Mean Squared Error (MSE), the outputs of SSIM can be constrained in the interval of [- 1, 1], thus the selecting standard is convenient to set (As shown in the formula (3)).
where, μ S , μ T are means of the source data and target data respectively. σ S , σ T are variances of the source data and target data respectively. σ ST is the covariance of the source data and the target data. c1, c2 are positive constants, which avoid the denominators to be zero.
When SSIM>0, the real data and the fake data are considered to have a relevant relationship. The closer SSIM is to 1, the more similar the two data, and the closer SSIM is to -1, the more irrelevant the two data. By setting the effective similarity degree interval [SSIM min , SSIM max ], the source data with specific similarity will be selected. Therefore, it not only can avoid that the IDCGAN generates fake data that is the same as the real data, which causes the feature learning model to overfit, but also can avoid the IDCGAN generates fake data that is totally different with the target data, which may lead to the underfitting of the model.
In this module, source data generated from the generative learning module is trained by the feature learning module. Data acquired from the engineering usually exhibits long-time series features. Traditional machine learning models often appear gradient disappearance when learning such long-time features of samples. A light deep learning model [18] is used as the structure of the feature learning phase to improve the learning effect of long-time series features. The deep learning model is a lightweight neural network model and a depth separable convolution method is adopted in this model, which has better learning performance than traditional convolution models. The comparison between depth separable convolution and traditional convolution is shown in the Fig. 3. In traditional convolution, the input data of a × a × b dimension and the convolution kernels of d dimensions c × c × b are convoluted (d > b), and the characteristic data of a × a × d dimension can be output. While in the depth separable convolution, the input data need to go through two convolution processes: depth convolution and point-to-point convolution. Deep convolution phase is to split the convolution kernel of dimension c × c × b into b subconvolution kernels of c × c × 1, and perform convolution operation on the input data to obtain the feature output result of a ×a × b. Point-to-point convolution is to convolve the previous feature output result with d convolution kernels with a dimension of 1×1 × b to output feature data with a dimension of a × a × d.

The principal of two convolution methods.
It can be seen that the deep separable convolution requires less parameters and calculations than the traditional convolution, that is, the deep learning network containing the deep separable convolution method has an efficient feature learning capability for complex data.
By inputting enough source samples generated in the generation learning phase into the deep learning model, the pre-trained model can be obtained. The pre-trained model can identify the features of the source samples. The pre-trained model acquired in this stage is the basis of the next stage of transfer learning.
In the transfer learning stage, the ability to extract the source data features in the feature learning stage can be transferred to the real target data through the transfer learning method, and a fully connected layer is added at the end of the transfer deep learning model to standardize the size of model output. The number of fully connected layer neurons is the number of target data categories. The transfer learning model is shown in the Fig. 4. Specifically, the gray part of model indicates that the convolutional network parameters are frozen, which means these parts of parameters are not trainable, while the orange part is the newly added fully connected layer, which means these parameters are trainable.

The principal of Transfer Learning.
Since the performance of the transfer learning module depends on whether the source and target samples have domain adaptability, this paper designs a loss function that considers domain adaptability, and incorporates three indicators from evaluate the generated learning performance into the design of the loss function. This method can be used to further reduce the negative transfer effects caused by low domain adaptability. The loss function formula is as follow:
where, L total is the total loss function value of the transfer learning model. L C (Y t , O t ) is the value of the classifier loss function. Y t is the correct classification label set of the input data, Q t is the output classification set of the input data. N is the number of source samples. GLoss,i is the generator loss of the i-th source sample, DLoss,i is the discriminator loss of the i-th source sample. R i is the similarity value of the i-th source sample, M is the number of target samples, y j is the correct classification value of the j-th target sample, and q j is the output classification value of the j-th target sample. α, β, γ are constant coefficients.
Above all, the dataflow of the proposed IDCGAN-TM is shown in the Fig. 5 and the pseudo-code of the IDCGAN-TM can be shown in Table 1. Firstly, the generated learning model generates a large amount of fake data based on the original random noise data. The fake data that meet the similarity degree interval requirements are selected as the source data. Secondly, source data are input into the deep learning as a pre-trained model, which has the ability to extract and identify key features of source data. Thirdly, the parameters of the pre-trained model are transferred to a new deep learning model and a fully connected layer is added. Finally, the target data are input to train this new model. Through the above method, the IDCGAN-TM model has the ability to identify small target data and avoid the negative transfer problem like the traditional transfer learning model in training process.

The dataflow of IDCGAN-TM.
The pseudo-code of IDCGAN-TM
Datasets
To verify the effectiveness and state-of-the-art of the proposed model for engineering data training, we built an assembly tightening platform and collected bolts with different quality problems for tightening, which can be seen in [4]. The tightening data can be divided into three stages: loading stage, tightening stage and yield stage (As shown in Fig. 6). In the loading stage, the bolt is screwing but has not yet touched the work-piece, so the torque is low and completely used to overcome the friction between the screw pair of the nut and bolt (l1, h2 can describe the features of this stage). In the tightening stage, the torque increases linearly with the rotation angle and the bolt is gradually tightened (l2, α1 can describe the features of this stage). In the yield stage, the bolt begins to deform plastically, which means the tightening torque exceeds the maximum torque that the bolt would tolerate (l3, h1 can describe the features of this stage). The tightening curves pattern are similar, but different bolts quality would be different in terms of these curve parameters (l1, l2, l3, h1, h2, α1), so it can be used as the indicator of bolt fault type in real factory.

Bolt connection structure and its assembly diagram.
The test bench consists of a servo tightening device, a controller, and a display. The servo tightening device is equipped with a torque sensor, which can monitor the change of the tightening torque during the bolt tightening process in real time.
The tightening shaft adopts type of SCI-FU20 with a torque resolution of 0.01 N · M, the tightening accuracy is 2%, and the maximum speed is 6000 rpm (revolutions per minute). The controller has a maximum power of 400 W. The test bench uses the MODBUS-RTU communication protocol. The rust, burr, embrittlement, fracture and material difference (A3 cast iron, 35# steel, 45# steel) of bolts have significant influence on the assembly quality. The tightening tests were carried out for 7 different types of bolts respectively and repeated for 10 times to obtain 10 curves of each type of bolt. To ensure the credibility of the experiment, the same bolt can only be tightened once. To repeat the experiment, a new batch of bolts should be replaced for tightening. Specific tightening data description can be seen in [4]. In addition, the following train and test experiments are all run based on tightening data to show the best performance of IDCGAN-TM on learning small size engineering data with temporal feature.
Through the tightening experiment, we obtained 70 tightening data (7 types) as the target data, but the amount of target data is too small to meet the training requirements of deep learning networks. In the IDCGAN-TM, the generative learning module is used to extract the important time-series features of real tightening data and generate fake data with the high similarity to real tightening data from random noise data. The epochs value is set to be 500, and every epoch of the learning model will produce a fake image. By fine-tuning the hyper-parameters, the hyper-parameters (τ, μ, ρ) are set to (1, 1e-5, 1). After running the generative learning module, 500 fake data are produced. To visualize the performance of the generating learning module, the maximum mean discrepancy (MMD) of each produced data and the corresponding real data can be shown in Fig. 7. It can be seen that with the increase of epochs, the false data produced by this module is closer to the real data. Then we select the fake data based on the similarity degree interval. To select the fake data which have low discrepancy but not exactly the same as the real data (Avoid over-fitting), the effective similarity interval is set to (0.4, 0.6). Therefore, the number of training data increases from 70 to 2270, which alleviates the overfitting problem of the small sample problem.

The MMD performance of the generating learning module.
The 9 examples of generating fake data process (7 types) are shown in the Fig. 8.

The training process of IDCGAN.
In Fig. 8, it can be seen that the generated data from IDCGAN is similar to the real data through 20,000 iterations. By observing the generated results, it can be seen that the IDCGAN is able to learn the average distribution from target domain and can generate the similar data sample based on the target domain. To further select the effective data samples, all the generated fake data are filtered through the effective similarity degree interval. The data that meets the interval requirements are used as the source data in the feature learning stage of IDCGAN-TM.
The source data generated in the previous stage is used to train the deep learning model. The pre-trained model and one multi-connected layer are composed of IDCGAN-TM. In this section, transfer-learning performance of the IDCGAN-TM is validated by inputting target data to the model. 70%of the target data is used for training, and 30%of the target data is used for validation. The accuracy and loss value curves of the IDCGAN-TM are shown in the Figs. 9 10.

The accuracy curve of IDCGAN-TM.

The loss curve of IDCGAN-TM.
As the training process is unstable at the early stage of training, the training curves often fluctuate obviously. However, the intensity of the early fluctuation in curves has no direct effect on whether the model converges. Therefore, to show the trend of each curve clearly, the weighted smoothing method is adopted, which is shown in formula (6).
As our model is required to be applied in real assembly industry, the diagnosis accuracy and training time cost are chosen as comparison indexes. In real assembly tasks, the assembly object changes frequently, which means retraining model with new dataset is an indispensable step. Therefore, training time cost is essential for real industry. In addition, the diagnosis accuracy reflects the reliability and robustness of the proposed model in diagnosing the tightening problems.
If the source data only have 70 samples (Fake data are not inserted into the train data), the overfitting problem can be emerged. The training and validation process of this small sample data is shown in Figs. 11 12. The training accuracy of this section is 98.57%, while the validation accuracy is 68.7%. Moreover, the loss value of training is 0.1376, while the validation one is 0.9535. Therefore, it can be seen than the fake data produced by IDCGAN can avoid the over-fitting problem of small sample data training.

The accuracy curve of IDCGAN-TM without fake data participant.

The loss curve of IDCGAN-TM without fake data participant.
In section 2.2, a light deep learning model was adopted as feature learning module in IDCGAN-TM, but it does not mean the light deep learning model is the best feature learning module for IDCGAN-TM. Therefore, the currently leading deep learning model such as EfficientNet model [19], BigGAN model [20] and Inception-resnet model [21] in data intelligent diagnosis are selected for comparison. These models are chosen as the baseline because they all have skip connection blocks, which can simplify the model and reduce the overfitting effect while training. The validation accuracy and loss comparison results are selected and shown in Figs. 13–14. It can be seen that IDCGAN-TM and EfficientNet have good performance in validation tests, because the final validation accuracy and loss values are better than others.

The validation accuracy curves of different models.

The validation loss curves of different models.
In details, the classification accuracies, loss errors and time cost of the 4 algorithms are shown in the Table 2. It can be concluded that ResNet and BigGAN have the problem of overfitting when learning small sample data, as the test results are significantly worse than the training results. However, the proposed the IDCGAN-TM and EfficientNet outperforms other methods for comparison on the algorithm test and shows good performance in both training and test results. Considering the time cost of training process is essential for industry, the better feature learning model for IDCGAN-TM can be chose for fault diagnosis in industrial level.
Comparison of algorithm characteristics
From the training and verification results, it can be seen that after 500 epochs, the intelligent diagnostic test accuracy rate of IDCGAN-TM reaches 93.7%, and the loss value is 0.3151, which means the model has a good performance on intelligent data diagnosis in engineering (No overfitting problem is emerged).
Small sample data is a challenge to learn by existing machine learning methods. The result of the IDCGAN-TM model shows its capability in addressing this tough problem.
By observing the results, it can be seen that the IDCGAN-TM model has high validation accuracy, which proves that the proposed model overcomes over-fitting problem while training.
One of the state-of-the-art model EfficientNet has the highest validation accuracy and lowest validation loss among the four models. However, the time cost of EfficientNet model is triple than the time cost of our model. Therefore, IDCGAN-TM is more suitable for applying in real assembly industry. The further study of our research is to improve the existing feature learning model, and designing a more concise and efficient deep networks is our next plan.
Conclusion
In this paper, the generative learning and transfer learning modules were introduced into intelligent fault diagnosis of tightening machine with small sample data, and a new learning model IDCGAN-TM was proposed and verified by comparing 3 state-of-the-art deep learning model. The following conclusions were drawn. It integrates the generative adversarial networks, the lightweight deep separable network, and the transfer learning method into a comprehensive model for small sample engineering data quality diagnosis. The results show that IDCGAN-TM outperforms ResNet and BigGAN in both validation accuracy and time cost. Although the EfficientNet has similar accuracy performance with IDCGAN-TM, the training time cost is bigger than the cost of our model. It proves that our model is more suitable for applying in real industry. It should be noted that IDCGAN-TM also has limitations. GAN is an unstable model, and it is prone to model collapse easily. Therefore, in the future work, our team will focus on improve the robust performance of GANs.
Overall, the IDCGAN-TM method obtain relatively higher recognition accuracies of tightening quality detection and the lowest training time cost compared with the traditional state-of-the-art deep learning transfer methods, and can be capable of effectively classifying tightening curve acquired from machine by training small sample datasets.
Footnotes
Acknowledgments
We gratefully acknowledge the financial support from the National Defence Basic Scientific Research Program of China (JCKY2018208A001), and Tsinghua University-Weichai Power Joint Institute of Intelligent Manufacturing (JIIM02).
