Abstract
A novel deep learning segmentation method based on Conditional Generative Adversarial Nets (CGAN) is proposed, being U-GAN in this paper to overtake shortcomings of the metallographic images of GCr15 bearing steel, such as multi-noise, low contrast and difficult to segment. The results of experiment indicate that the proposed model is the most accurate comparing with the digital image processing methods and deep learning methods on carbide particle segmentation. The average Dice’s coefficient of similarity measure function is 0.9158, which is the state-of-the-art performance on dataset.
Introduction
GCr15 is a kind of high carbon chromium bearing steel commonly used in industry. The carbon content in its composition is about 1%. Carbide content, distribution and size in bearing steel have a vital impact on the life of bearing steel. Therefore, carbide particle content has become an important index to measure the performance of GCr15 steel and it is crucial to analysis the metallographic structure.
At present, most of the methods for segmenting metallographic structure are limited to the application of traditional digital image processing technology and image processing software. Wu [1] used the watershed algorithm to extract the contour of the object, however, the segmented boundary was not obvious and it was prone to over-segmentation. Zhao [2] proposed a morphological-based digital method to segment the carbide in metallographic structure, while morphological dilation and erosion processes were inaccurate to extract the contour. Hecht [3] utilized the ImageJ software to quantify the carbide network connectivity. Dutta [4] segmented the metallographic phases by OTSU algorithm that is an automatic threshold selection technique from gray level histogram. Accordingly, the digital image processing methods cannot achieve the expected segmentation results of undissolved carbide particles.
In recent years, deep learning has showed outstanding performance in object detection, classification and semantic segmentation. Convolutional neural network (CNN) reduces the subjectivity caused by human factors and makes it easy to obtain the feature information of image. CNN has achieved great results in semantic segmentation tasks [5–7].
Generative Adversarial Network (GAN) [8], an adversarial framework that consists of generator network and discriminator network, was proposed in 2014. In GAN, the generator is adopted to generate output that is consistent with the training sample distribution and as close as possible to the ground truth, while the discriminator is employed to distinguish image that is sampled from generator or ground truth. Generator and discriminator optimize parameters respectively. GAN’s proposal can greatly improve the accuracy of results of deep learning. However, the data being generated cannot be controlled in an unconditioned generator. Therefore, to overcome the shortcoming, Mirza proposed CGAN [9] by introducing controllable condition variables in GAN to control the quality of the result generated by generator.
To improve the carbide segmentation performance in metallographic structure, we propose a novel deep learning approach named U-GAN based on CGAN for carbide segmentation. Compared with the existing metallographic image segmentation methods, our method achieves better segmentation performance.
U-GAN
U-GAN structure
CGAN is an extension of GAN. Based on GAN, additional information is added to supplement the generator and discriminator. In this paper, carbide particle labels in metallographic structure are input into generator and discriminator respectively as supplementary conditions.
U-GAN consists of two models, generator and discriminator. The generator part is an improved U-Net model [10]. Compared with the original U-Net structure, the improved U-Net adds batch normalization layer after each convolution layer to improve the stability and efficiency of image feature extraction and removes the image overlap strategy to reduce the amount of calculation, the carbide shape in the metallographic structure is similar, mostly spherical or circular, even at the edge position, the feature is obvious. The discriminator part is VGG16 structure [11]. VGG16 structure has great advantages in image feature extraction. Compared with other models, VGG16 has fewer layers, but still maintains a high accuracy. The discriminator and generator optimize their parameters following two-player min-max game to achieve better segmentation effect. Figure 1 shows the U-GAN structure. In the U-GAN model, the blue box corresponds to a multi-channel feature map, the number of channels is over the blue box, the red box represents batch normalization layer and the white box is copied feature map.
Loss function
Generator and discriminator can use G and D expressed separately. The model optimizes parameters for G to maximize D (G (z|y)), and optimizes parameters for D to maximize D (x|y). D can be regarded as a binary classifier. The loss function of CGAN can be written as follows:
Previous approaches have found that adding loss function can improve effect efficiently, so that IOU loss combined to the CGAN’s objective function. The function can be formulated as:
Thus, the final loss function for segmentation is as follow:

U-GAN Structure.
Data preparation
The tapered roller rolling bearing is taken as the research object. After being polished and corroded with 4% nitric acid solution, the metallographic images of different magnifications are collected by the JSM-6700 scanning electron microscope and process by Labelme software to make ground truth. Dividing the images into two subsets, the training dataset and the test dataset, where the training dataset contains 30 images and the test dataset contains 10 images. Each image is normalized and augmented by 30 degrees of rotation.
Experimental setup
The methods in this paper are implemented based on Keras which takes TensorFlow as backend and trained using NVIDIA RTX 2080Ti with 11G memory. In the experiment, Adam and stochastic gradient descent (SGD) are applied to optimize the generator and discriminator. The weighting coefficient are selected as 𝛼 = 4 and 𝛽 = 1. The batch size is set as 4 and the number of epochs is set as 100.
Carbide segmentation comparison in digital image processing
OTSU algorithm, improved Canny algorithm and U-GAN are applied to test dataset. The results are shown in Fig. 2.

Different segmentation methods. (a) image; (b) ground truth; (c) OTSU algorithm; (d) Canny algorithm; (e) U-GAN.
According to Fig. 2, OTSU algorithm cannot get the image threshold accurately, there are large areas of over-segmentation. The improved Canny algorithm removes the smaller carbide particles, but owing to the unclear edge contour of the carbides, the image edge segmentation by the Canny algorithm has the phenomenon of non-closure and partial martensite segment to carbide. The segmentation results of model in this paper are low noise, clear contour and recognizable, all carbide particles are basically detected and segmented.
With the development of deep learning, the semantic segmentation algorithm has also been improved continuously. Since the Fully Convolutional Network (FCN) algorithm framework was proposed in 2015, more and more excellent segmentation frameworks have been proposed and applied to many filed with great results [12–14]. This paper trains the U-Net model and the DeepLabv3+ [15] model separately, compares the test results with U-GAN. The results are as follows:

Segmentation methods in deep learning. (a) image; (b) ground truth; (c) U-Net model; (d) deeplabV3+ model; (e) U-GAN.
It can be seen from Fig. 3 that the U-NET model is inaccurate for the segmentation of some flaky carbides, and obvious flaky carbides are not recognized and there is a missing phenomenon. DeepLabV3+ model segments most of the carbides, but some small spheroidal carbides are neglected, and the edge contours of the partially segmented carbides are rough and not precise enough. Compared with the above two deep learning methods, the proposed method is more accurate, and there are few under-segmentation and omission, the edge contour is fine and smooth.
In sub-section 2.2, a weighted loss function from generator and IOU loss added to the CGAN’s loss to enhance segmentation effects. Dice coefficient is used to evaluate the results in different loss function, the formula of Dice coefficient is written as Eq. (5). The performance is shown in Table 1.
As shown in Table 1, both added IOU loss and weighted loss get the highest Dice coefficient, so that final loss function proposed in this paper have the best result.
Dice coefficient of different loss function
Dice coefficient, Precision and F1 measure are used to objectively characterize the accuracy of the five segmentation methods mentioned in Section 3. The comparison of results is shown in Table 2 and Fig. 4 to illustrate the accuracy of each image for different segmentation methods.
Average Dice coefficient of different segmentation methods

Accuracy of each image for different segmentation methods.
Figure 4 and Table 2 show that the OTSU operator has the lowest segmentation accuracy and the smallest Dice coefficient, F1 measure and Precision. The segmentation effect is slightly improved when more carbide particles are distributed, but in the case of less carbide distribution, the segmentation result is not good. There is still a mis-segmentation of the martensite matrix which segmented to carbide particle. The improved Canny operator extracts the edge information after selecting the optimal threshold, but the unclosed edge of the segmented image is ubiquitous, small spherical carbide is easy to ignore, so that the segmentation accuracy is not high; although the U-NET model does not have the phenomenon of mis-segmentation, in the case of staggered distribution of carbide size, part of the flaky carbides are not recognized and segmented, which affects the segmentation accuracy. The DeepLabV3+ model basically divided all the carbides, but the segmentation of carbide edges is not fine and smooth enough. The results of the proposed model segmentation have clear contours, the recognition is high. Basically detected and segmented all carbide particles without mis-segmentation and have the greatest results in all evaluation function.
In this paper, a U-GAN model is proposed to segment carbide particles in metallographic images. In view of low contrast, low resolution, multi-noise and mixed carbide particles distribution, a weighted loss from generator and IOU loss combined to CGAN’s loss function to enhance the performance of segmentation results. The improved U-Net model as generator and VGG16 model as discriminator in the structure. Compared with the digital image processing and two deep learning methods, the proposed model obtains more accurate segmentation results than other methods, this model achieve the highest score on the average Dice coefficient, F1 measure and precision respectively, which is 0.9158, 0.926, 0.9096, with the highest accuracy.
Footnotes
Acknowledgements
This work was financially supported by The Key Research and Development Program of Gansu Province (18YF1GA063).
