Metallographic image segmentation of GCr15 bearing steel based on CGAN

Abstract

A novel deep learning segmentation method based on Conditional Generative Adversarial Nets (CGAN) is proposed, being U-GAN in this paper to overtake shortcomings of the metallographic images of GCr15 bearing steel, such as multi-noise, low contrast and difficult to segment. The results of experiment indicate that the proposed model is the most accurate comparing with the digital image processing methods and deep learning methods on carbide particle segmentation. The average Dice’s coefficient of similarity measure function is 0.9158, which is the state-of-the-art performance on dataset.

Keywords

Metallographic image image processing carbide particle segmentation deep learning CGAN

1. Introduction

GCr15 is a kind of high carbon chromium bearing steel commonly used in industry. The carbon content in its composition is about 1%. Carbide content, distribution and size in bearing steel have a vital impact on the life of bearing steel. Therefore, carbide particle content has become an important index to measure the performance of GCr15 steel and it is crucial to analysis the metallographic structure.

At present, most of the methods for segmenting metallographic structure are limited to the application of traditional digital image processing technology and image processing software. Wu [1] used the watershed algorithm to extract the contour of the object, however, the segmented boundary was not obvious and it was prone to over-segmentation. Zhao [2] proposed a morphological-based digital method to segment the carbide in metallographic structure, while morphological dilation and erosion processes were inaccurate to extract the contour. Hecht [3] utilized the ImageJ software to quantify the carbide network connectivity. Dutta [4] segmented the metallographic phases by OTSU algorithm that is an automatic threshold selection technique from gray level histogram. Accordingly, the digital image processing methods cannot achieve the expected segmentation results of undissolved carbide particles.

In recent years, deep learning has showed outstanding performance in object detection, classification and semantic segmentation. Convolutional neural network (CNN) reduces the subjectivity caused by human factors and makes it easy to obtain the feature information of image. CNN has achieved great results in semantic segmentation tasks [5–7].

Generative Adversarial Network (GAN) [8], an adversarial framework that consists of generator network and discriminator network, was proposed in 2014. In GAN, the generator is adopted to generate output that is consistent with the training sample distribution and as close as possible to the ground truth, while the discriminator is employed to distinguish image that is sampled from generator or ground truth. Generator and discriminator optimize parameters respectively. GAN’s proposal can greatly improve the accuracy of results of deep learning. However, the data being generated cannot be controlled in an unconditioned generator. Therefore, to overcome the shortcoming, Mirza proposed CGAN [9] by introducing controllable condition variables in GAN to control the quality of the result generated by generator.

To improve the carbide segmentation performance in metallographic structure, we propose a novel deep learning approach named U-GAN based on CGAN for carbide segmentation. Compared with the existing metallographic image segmentation methods, our method achieves better segmentation performance.

2. U-GAN

2.1. U-GAN structure

CGAN is an extension of GAN. Based on GAN, additional information is added to supplement the generator and discriminator. In this paper, carbide particle labels in metallographic structure are input into generator and discriminator respectively as supplementary conditions.

U-GAN consists of two models, generator and discriminator. The generator part is an improved U-Net model [10]. Compared with the original U-Net structure, the improved U-Net adds batch normalization layer after each convolution layer to improve the stability and efficiency of image feature extraction and removes the image overlap strategy to reduce the amount of calculation, the carbide shape in the metallographic structure is similar, mostly spherical or circular, even at the edge position, the feature is obvious. The discriminator part is VGG16 structure [11]. VGG16 structure has great advantages in image feature extraction. Compared with other models, VGG16 has fewer layers, but still maintains a high accuracy. The discriminator and generator optimize their parameters following two-player min-max game to achieve better segmentation effect. Figure 1 shows the U-GAN structure. In the U-GAN model, the blue box corresponds to a multi-channel feature map, the number of channels is over the blue box, the red box represents batch normalization layer and the white box is copied feature map.

2.2. Loss function

Generator and discriminator can use G and D expressed separately. The model optimizes parameters for G to maximize D (G (z|y)), and optimizes parameters for D to maximize D (x|y). D can be regarded as a binary classifier. The loss function of CGAN can be written as follows: $\begin{eqnarray}\displaystyle \min _{G}\max _{D}V(D,G)=E_{x\sim p_{\text{data}(x)}}[\log D(x|y)]+E_{z\sim p_{z(z)}}[\log (1-D(G(z|y)))] & & \displaystyle\end{eqnarray}$ (1)z is random noisy of metallographic image, p is data distribution, x is image and y is carbide particles label in this paper. The D (x|y) is the judgment result of the discriminator on the real data x under the condition y, the D (G (z|y)) is the discriminating result of the discriminator on the synthetic data G (z|y) under the condition y. In training, for the discriminator, maximize D (x|y) and minimize D (G (z|y)), let the objective function can reach the maximum value, for the generator, D (G (z|y)) is expected to be maximized, and the objective function reaches the minimum value.

Previous approaches have found that adding loss function can improve effect efficiently, so that IOU loss combined to the CGAN’s objective function. The function can be formulated as: $\begin{eqnarray}\displaystyle L_{IOU}={\displaystyle \frac{y\cdot G(z|y)}{y+G(z|y)-y\cdot G(z|y)}}. & & \displaystyle\end{eqnarray}$ (2) In addition, the segmentation task can utilize weighted loss function of generator to penalize distance between ground truth and outputs. The original loss function of generator is performed as: $\begin{eqnarray}\displaystyle L_{\text{Seg}}=-{\displaystyle \frac{1}{n}}\mathop{\sum }_{x}[y\ln a+(1-y)\ln (1-a)] & & \displaystyle\end{eqnarray}$ (3) where x is input sample, n is the sum of inputs, a is synthetic data G (z|y).

Thus, the final loss function for segmentation is as follow: $\begin{eqnarray}\displaystyle L=L_{CGAN}+{\alpha}L_{Seg}+{\beta}L_{IOU} & & \displaystyle\end{eqnarray}$ (4) where 𝛼 and 𝛽 are weighting coefficient.

Fig. 1.

U-GAN Structure.

3. Carbide segmentation

3.1. Data preparation

The tapered roller rolling bearing is taken as the research object. After being polished and corroded with 4% nitric acid solution, the metallographic images of different magnifications are collected by the JSM-6700 scanning electron microscope and process by Labelme software to make ground truth. Dividing the images into two subsets, the training dataset and the test dataset, where the training dataset contains 30 images and the test dataset contains 10 images. Each image is normalized and augmented by 30 degrees of rotation.

3.2. Experimental setup

The methods in this paper are implemented based on Keras which takes TensorFlow as backend and trained using NVIDIA RTX 2080Ti with 11G memory. In the experiment, Adam and stochastic gradient descent (SGD) are applied to optimize the generator and discriminator. The weighting coefficient are selected as 𝛼 = 4 and 𝛽 = 1. The batch size is set as 4 and the number of epochs is set as 100.

3.3. Carbide segmentation comparison in digital image processing

OTSU algorithm, improved Canny algorithm and U-GAN are applied to test dataset. The results are shown in Fig. 2.

Fig. 2.

Different segmentation methods. (a) image; (b) ground truth; (c) OTSU algorithm; (d) Canny algorithm; (e) U-GAN.

According to Fig. 2, OTSU algorithm cannot get the image threshold accurately, there are large areas of over-segmentation. The improved Canny algorithm removes the smaller carbide particles, but owing to the unclear edge contour of the carbides, the image edge segmentation by the Canny algorithm has the phenomenon of non-closure and partial martensite segment to carbide. The segmentation results of model in this paper are low noise, clear contour and recognizable, all carbide particles are basically detected and segmented.

3.4. Carbide segmentation comparison in deep learning

With the development of deep learning, the semantic segmentation algorithm has also been improved continuously. Since the Fully Convolutional Network (FCN) algorithm framework was proposed in 2015, more and more excellent segmentation frameworks have been proposed and applied to many filed with great results [12–14]. This paper trains the U-Net model and the DeepLabv3+ [15] model separately, compares the test results with U-GAN. The results are as follows:

Fig. 3.

Segmentation methods in deep learning. (a) image; (b) ground truth; (c) U-Net model; (d) deeplabV3+ model; (e) U-GAN.

It can be seen from Fig. 3 that the U-NET model is inaccurate for the segmentation of some flaky carbides, and obvious flaky carbides are not recognized and there is a missing phenomenon. DeepLabV3+ model segments most of the carbides, but some small spheroidal carbides are neglected, and the edge contours of the partially segmented carbides are rough and not precise enough. Compared with the above two deep learning methods, the proposed method is more accurate, and there are few under-segmentation and omission, the edge contour is fine and smooth.

4. Evaluation

In sub-section 2.2, a weighted loss function from generator and IOU loss added to the CGAN’s loss to enhance segmentation effects. Dice coefficient is used to evaluate the results in different loss function, the formula of Dice coefficient is written as Eq. (5). The performance is shown in Table 1. $\begin{eqnarray}\displaystyle \text{Dice}={\displaystyle \frac{2\times |A\cap B|}{|A|+|B|}} & & \displaystyle\end{eqnarray}$ (5) where A and B are sets of segmentation results and ground truth of carbide.

As shown in Table 1, both added IOU loss and weighted loss get the highest Dice coefficient, so that final loss function proposed in this paper have the best result.

Table 1

Dice coefficient of different loss function

Methods	Added IOU	Added weighted loss	U-GAN
Dice coefficient	0.882	0.8993	0.9158

Dice coefficient, Precision and F1 measure are used to objectively characterize the accuracy of the five segmentation methods mentioned in Section 3. The comparison of results is shown in Table 2 and Fig. 4 to illustrate the accuracy of each image for different segmentation methods.

Table 2

Average Dice coefficient of different segmentation methods

Methods	OTSU algorithm	Improved canny algorithm	U-NET	DeepLabV3+	U-GAN
Dice coefficient	0.2647	0.3949	0.6429	0.8746	0.9158
F1 measure	0.2558	0.64	0.9046	0.898	0.926
Precision	0.161	0.6316	0.866	0.8781	0.9096

Fig. 4.

Accuracy of each image for different segmentation methods.

Figure 4 and Table 2 show that the OTSU operator has the lowest segmentation accuracy and the smallest Dice coefficient, F1 measure and Precision. The segmentation effect is slightly improved when more carbide particles are distributed, but in the case of less carbide distribution, the segmentation result is not good. There is still a mis-segmentation of the martensite matrix which segmented to carbide particle. The improved Canny operator extracts the edge information after selecting the optimal threshold, but the unclosed edge of the segmented image is ubiquitous, small spherical carbide is easy to ignore, so that the segmentation accuracy is not high; although the U-NET model does not have the phenomenon of mis-segmentation, in the case of staggered distribution of carbide size, part of the flaky carbides are not recognized and segmented, which affects the segmentation accuracy. The DeepLabV3+ model basically divided all the carbides, but the segmentation of carbide edges is not fine and smooth enough. The results of the proposed model segmentation have clear contours, the recognition is high. Basically detected and segmented all carbide particles without mis-segmentation and have the greatest results in all evaluation function.

5. Conclusion

In this paper, a U-GAN model is proposed to segment carbide particles in metallographic images. In view of low contrast, low resolution, multi-noise and mixed carbide particles distribution, a weighted loss from generator and IOU loss combined to CGAN’s loss function to enhance the performance of segmentation results. The improved U-Net model as generator and VGG16 model as discriminator in the structure. Compared with the digital image processing and two deep learning methods, the proposed model obtains more accurate segmentation results than other methods, this model achieve the highest score on the average Dice coefficient, F1 measure and precision respectively, which is 0.9158, 0.926, 0.9096, with the highest accuracy.

Footnotes

Acknowledgements

This work was financially supported by The Key Research and Development Program of Gansu Province (18YF1GA063).

References

J.J.

Zhang

Y.H.

, Quantitative analysis of metallographic organization of TC4 titanium alloy based on digital image processing, Failure Analysis and Prevention9(02) (2014), 75–79.

Jin

W.Y.

Zhao

X.X.

X.L.

, Quantitative analysis of metallographic organization of bearing steel GCr15 based on digital image processing, Journal of Lanzhou University of Technology39(1) (2013), 6–9.

Hecht

M.D.

Webler

B.A.

and Picard

Y.N.

, Digital image analysis to quantify carbide networks in ultrahigh carbon steels, Materials Characterization117 (2016), 134–143.

Dutta

Banerjee

Saha

S.K.

, Noise removal and image segmentation in micrographs of ferrite-martensite dual-phase steel, Destech Transactions on Engineering and Technology Research (2017).

Long

Shelhamer

and Darrell

, Fully convolutional networks for semantic segmentation, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, 2015, pp. 3431–3440.

Badrinarayanan

Kendall

and Cipolla

, SegNet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence39(12) (2017), 2481–2495.

Gkioxari

Dollár

Mask R-CNN, in: IEEE International Conference on Computer Vision (ICCV), Venice, 2017, pp. 2961–2969.

Goodfellow

Pouget-Abadie

Mirza

Generative adversarial nets, in: Advances in Neural Information Processing Systems, MIT Press, Cambridge, 2014, pp. 2672–2680.

Mirza

and Osindero

, Conditional generative adversarial nets, Computer Science (2014), 2672–2680.

10.

Ronneberger

Fischer

and Brox

, U-Net: convolutional networks for biomedical image segmentation, in: The Medical Image Computing and Computer Assisted Intervention Society (MICCAI), Munich 2015, 2015, pp. 234–241.

11.

Simonyan

and Zisserman

, Very deep convolutional networks for large-scale image recognition, in: International Conference on Learning Representations (ICLR), San Diego, 2015.

12.

Tong

G.F.

Chen

H.R.

, Improved U-NET network for pulmonary nodules segmentation, Optik174 (2018), 460–469.

13.

Fabijańska

, Segmentation of corneal endothelium images using a U-Net-based convolutional neural network, Artificial Intelligence in Medicine88 (2018), 1–13.

14.

Han

Z.Y.

Wei

B.Z.

Mercado

, Spine-GAN: Semantic segmentation of multiple spinal structures, Medical Image Analysis50 (2018), 23–35.

15.

Chen

L.C.

Zhu

Y.K.

Papandreou

Encoder-decoder with atrous separable convolution for semantic image segmentation, in: European Conference on Computer Vision (ECCV), Munich, 2018, pp. 801–818.