Abstract
Accurate segmentation of industrial CT images is of great significance in industrial fields such as quality inspection and defect analysis. However, reconstruction of industrial CT images often suffers from typical metal artifacts caused by factors like beam hardening, scattering, statistical noise, and partial volume effects. Traditional segmentation methods are difficult to achieve precise segmentation of CT images mainly due to the presence of these metal artifacts. Furthermore, acquiring paired CT image data required by fully supervised networks proves to be extremely challenging. To address these issues, this paper introduces an improved CycleGAN approach for achieving semi-supervised segmentation of industrial CT images. This method not only eliminates the need for removing metal artifacts and noise, but also enables the direct conversion of metal artifact-contaminated images into segmented images without the requirement of paired data. The average values of quantitative assessment of image segmentation performance can reach 0.96645 for Dice Similarity Coefficient(Dice) and 0.93718 for Intersection over Union(IoU). In comparison to traditional segmentation methods, it presents significant improvements in both quantitative metrics and visual quality, provides valuable insights for further research.
I Introduction
Industrial computed tomography (CT) imaging plays a crucial role in non-destructive testing and evaluation of materials and components [1]. It enables the visualization of internal structures and defects with high resolution. CT image segmentation, which aims to separate regions of interest within the images, is essential for quantitative analysis and defect detection in industrial applications. Accurate segmentation can provide valuable insights into the characterization and evaluation of materials, ensuring the reliability and safety in various industries, such as aerospace, automotive, and manufacturing.
However, one of the primary challenges in industrial CT image segmentation arises from the presence of metal artifacts. Metal objects, such as fasteners, joints, or metallic components, generate severe artifacts due to beam hardening, scattering, statistical noise, and partial volume effects [2]. These artifacts cause intensity distortions, streaking, and local blurring, reducing the visibility of the underlying structures and compromising the accuracy of subsequent segmentation algorithms. Consequently, traditional segmentation methods designed for artifact-free CT images often struggle to provide satisfactory results in the presence of metal artifacts.
Several traditional segmentation approaches have been developed, such as thresholding [3], region-based methods [4], and edge-based methods [5]. These methods often rely on intensity-based properties, texture analysis, or gradient information to delineate different regions. However, these approaches face limitations when applied to metal-artifact contaminated industrial CT images. The intensity distortions caused by metal artifacts can lead to inaccurate region boundaries and result in poor segmentation performance. Additionally, manual annotation of large-scale industrial CT datasets for supervised segmentation [6] is time-consuming and impractical.
To address the challenges posed by metal artifacts and the limitations of traditional segmentation methods, we propose an improved CycleGAN architecture for semi-supervised segmentation of metal-artifact contaminated industrial CT images. Our approach leverages the power of CycleGAN [7], a deep learning image translation model, to overcome the need for paired CT image data, reducing the dependence on laborious and costly manual annotations. By incorporating additional modules and preprocessing steps [8], our method aims to enhance segmentation accuracy while preserving fine image details, yielding more reliable and comprehensive results.
By harnessing the benefits of unsupervised learning and image translation, our proposed approach offers a promising solution for industrial CT image segmentation in the presence of metal artifacts. This enables improved characterization, defect detection, and quality assessment in diverse industrial applications.
Related work
The typical processing steps for segmenting industrial CT images with metal artifacts are usually to perform metal artifact correction followed by segmentation. Numerous techniques have been proposed to mitigate metal artifacts in CT images. These techniques can be broadly categorized into three groups: pre-processing-based approaches [9], iterative-based approaches [10], and deep learning approaches [11, 12]. Pre-processing-based approaches aim to reduce metal artifacts by correcting the projections before reconstruction, involving methods such as interpolation, extrapolation, or sinogram inpainting. Iterative-based approaches incorporate metal artifact estimation and correction steps into the reconstruction process, iteratively refining the estimated data to minimize artifacts. Deep learning approaches, on the other hand, utilize convolutional neural networks (CNNs) and other deep learning architectures to directly learn and remove metal artifacts from the reconstructed images. The advancements in deep learning techniques have provided a solution for directly segmenting CT images with metal artifacts.
Fully supervised segmentation methods [13] rely on a large amount of manually annotated data for training, which is often scarce and expensive to obtain in industrial CT imaging. Moreover, the presence of metal artifacts adds further complexity to the segmentation process, as accurate annotations are challenging due to the distortion and blurring caused by these artifacts. Consequently, fully supervised methods may struggle to achieve satisfactory segmentation results in industrial CT imaging scenarios.
CycleGAN is a popular and effective generative model that has been widely applied to various image translation tasks [14, 15]. It consists of two adversarial networks, namely two generators and two discriminators [16], where the generators try to generate translated images that are indistinguishable from the target domain, while the discriminators aim to discriminate between the generated images and the real images from the target domain. CycleGAN has been successfully used for tasks such as style transfer [17], image-to-image translation [18], and domain adaptation.
Despite the advancements in metal artifact reduction and traditional fully supervised segmentation methods, there is still a research gap in addressing the challenges of metal-artifact contaminated industrial CT image segmentation. The limited availability of annotated data and the difficulty of accurately annotating such data in the presence of metal artifacts hinder the widespread adoption of fully supervised approaches. To bridge this gap, a semi-supervised segmentation approach using improved CycleGAN can leverage the power of unsupervised learning and alleviate the dependency on annotated data. By exploiting the ability of CycleGAN to translate images between artifact-free and metal-artifact contaminated domains, the proposed approach can provide more robust and accurate segmentation results, thereby enhancing the analysis and evaluation of industrial CT images.
Methodology
A. Overview of the proposed improved CycleGAN architecture
The proposed enhancement to the CycleGAN architecture aims to address the challenge of metal artifact contamination in CT image segmentation. As shown in Fig. 1, The architecture consists of two main components: two generator networks and two discriminator networks. Specifically, the GA2B generator network transforms images from domain A (CT images with metal artifacts) to domain B (segmented CT images), while the GB2A generator network performs the inverse transformation. The D B discriminator network is responsible for discriminating between generated domain B images and real domain B images, while the D A discriminator network performs a similar task for domain A images.

Overview of our model.
We have drawn inspiration from the architectures proposed by Jun-Yan Zhu et al. [16] and Johnson et al. [19] for our generator networks design. Figure 1 illustrates the main framework of Cycle-Consistent Adversarial Network and the architecture of the generator network and the discriminator network used in this paper. The generator network is an ResNet which consists of three convolutional layers, several residual blocks [20], two fractionally-strided convolutions with a stride ratio of 1/2 and a final convolutional layer that maps features to RGB images. To accommodate our datasets with CT images of 512 × 512 resolution, we have employed nine blocks and applied the same instance normalization [21]. For the discriminator network, we have utilized a 70 × 70 PatchGANs [22] framework to classify overlapping image patches and determine their authenticity.
The generators and discriminators are trained using adversarial loss functions, which enable them to competitively improve the quality of generated images over time. A cycle-consistency loss function is introduced to constrain the generator networks by reconstructing the original images, ensuring consistency when transforming back to the original domain. An identity loss is employed to preserve the identity information of the original images, yielding results that remain as unchanged as possible for images that do not require transformation. Additionally, a segmentation metric, specifically the Dice coefficient, is incorporated as an additional component of the loss function. This metric compares the generated segmentation results with the ground truth segmentation, allowing the network to optimize performance for the segmentation task. Figure 2 provides a clear explanation of the composition of the loss in our improved CycleGAN.

Improved CycleGAN’s workflow diagram and the composition of its loss functions in the workflow.
B. Loss function
For the mapping function GA2B(A→B) and its discriminator DB, we express the adversarial loss as Eq. (1).
For the second mapping function GB2A(B→A), We can formulate a similar loss function Eq.(2).
Adversarial loss contributes to the model’s ability to capture more authentic image features, thereby enhancing the realism of the generated segmentation results.The generators and discriminators are trained using adversarial loss functions, which enable them to competitively improve the quality of generated images over time.CycleGAN attempts to minimize the sum of two GAN losses.
Cycle-consistency loss function LossCycle (GA2B, GB2A), namely Eq.(3) is introduced to constrain the generator networks by reconstructing the original images, ensuring consistency when transforming back to the original domain.
Identity loss is employed to preserve the identity information of the original images, yielding results that remain as unchanged as possible for images that do not require transformation.
Additionally, a segmentation metric, namely the Dice coefficient, is incorporated as an additional component of the loss function. This metric compares the generated segmentation results with the ground truth segmentation, allowing the network to optimize performance for the segmentation task.
The Dice coefficient, named after Lee Raymond Dice [24], is a similarity metric commonly used to quantify the similarity between two samples, with values ranging from 0 to 1.
In the context of segmentation tasks, |X| and |Y|represent the ground truth and predicted masks, respectively. Furthermore, the formula for the Dice Loss can be derived as Eq.(6).
The CycleGAN network aims to improve the segmentation performance of the generated network by minimizing the Loss
Dice
.
The complete definition of the CycleGAN loss function used for training the network is the sum of two GAN losses, the cycle consistency loss, Identity loss and the Dice loss. Weight factors, denoted as λ, ω and μ, are used to control the importance of the cycle consistency loss in the overall loss. Higher weights indicate that reducing the corresponding loss function is more meaningful in comparison to the other losses. By incorporating a combination of adversarial loss, cycle-consistency loss, identity loss, and Dice loss, we can effectively constrain the training process of the model and enhance its segmentation performance.
C. Data acquisition and preprocessing
The industrial CT image datasets used in this experiment was obtained using practical industrial CT scanning method and analytical simulation method [25], resulting in a series of CT images containing metal artifacts.
In the practical industrial CT scanning procedure, the high-precision cone beam CT device, as illustrated in Fig. 3, was employed. The scanned objects are commonly seen in industrial applications, including conical roller bearings and gears, as depicted in Fig. 4. These objects are primarily composed of bearing steel and cast steel. The operating voltage of the CT system is 150-300KV, and the other operating parameters of the experiment can be found in Table 1. The projection data, generated from 500 projections, underwent reconstruction using the FDK algorithm, resulting in the corresponding CT images.

The cone beam CT system, where each pixel represents a physical distance of 25 μm.

The 1. 5-module 18-tooth convex gear and the 42-mm-diameter cone bar bearing used in the experiment.
Operating parameters of CT system
In the simulation process,the underlying principle of analytical simulation method involves representing the essence of the problem using simple formulas that accurately describe or approximately analyze macroscopic physical properties. Multispectral X-ray beams with energy spectra Ω (E) can be divided into different energy regions based on their energy levels, where photons within each region possess similar energies and approximately follow Lambert-Beer’s law. The material’s response to multispectral X-ray beams is an integration of the responses across all energy regions. If the X-ray spectrum and attenuation coefficients μE,S are known, the relationship between incident X-ray intensity I
i
and transmitted X-ray intensity I
o
can be expressed as:
In this study, the energy spectrum of an X-ray tube operating at 150 KV to 300KV was acquired, as shown in Fig. 5. During the simulation process, the size of the object is 512 × 512 × 512, and it is predominantly made of materials such as aluminum alloy, alloy steel, and copper alloy. The simulation parameters are set as shown in Table 2. A series of projection data was obtained using the Eq.(8). Through filtered backprojection reconstruction, a series of industrial CT image slices of size 512 × 512 were generated.

Energy spectrum of 150 KeV to 300 KeV used in the simulation.
Parameters of simulation CT system
The datasets have 3000 CT images, including 2500 in the training set and 500 in the test set, which is carefully selected to cover a diverse range of industrial components with varying levels of metal artifacts.
Industrial CT images often suffer from metal artifacts, which cause uneven distribution of intensity values and degrade image quality.Metal artifacts can lead to decreased contrast and loss of details, making it challenging for segmentation algorithms to accurately identify object boundaries.To improve the performance of CycleGAN in transforming metal artifact-contaminated CT images, a pre-processing step is necessary to enhance image quality and improve the input data for better segmentation results. Adaptive local histogram equalization(AHE) [23] is an image enhancement technique that aims to improve the local contrast of an image.It divides the image into smaller regions, calculates the histogram of each region, and redistributes the intensity values according to the local histograms.By performing adaptive contrast enhancement locally, Adaptive Local Histogram Equalization can effectively enhance image details while avoiding over-enhancement and preserving the global image statistics.This technique is particularly effective in handling images with uneven illumination, varying contrast, and intensity variations caused by metal artifacts, making it suitable for enhancing industrial CT images.
The industrial CT images undergo adaptive local histogram equalization, and the before-and-after CT images are depicted in Fig. 6. It can be observed that adaptive local histogram equalization is capable of adapting to the local characteristics of different regions within CT images. The processed images exhibit enhanced contrast, and the visibility of subtle details is improved, which is crucial for achieving accurate segmentation.

Comparison of CT images before(the first row) and after(the second row) AHE processing. It shows that AHE enhances the details and edges in the CT images, improving image contrast and clarity.
D. Training details
The network training process was implemented using the PyTorch toolkit, with the parameters λ set to 10, ω set to 0.5 and μ set to 5. The Adam optimizer was utilized with a learning rate of 0.0002. The models were trained for 200 epochs, with a batch size of 1. The experimental setup included a GPU –NVIDIA GeForce RTX3060 and a CPU –12th Gen Intel® Core™ i7-10700F 2.10GHz.
After completing the network training, the generated network G is used to process the simulated CT images containing metal artifacts and real CT images to obtain segmented CT images. Figure 7 illustrates the segmentation results obtained by inputting the CT images from the test set into the image generation network G. As depicted, the algorithm’s performance in segmenting the images is visually demonstrated.

The segmentation results of CT images in the test set and their comparison with the corresponding ground truth.
In order to analyze the performance of our proposed algorithm quantitatively, the evaluation of segmentation quality is measured by the Dice Similarity Coefficient and Intersection over Union(IoU). The article discusses the Dice coefficient in Section III. IoU measures the ratio of correctly segmented pixels to the union of segmented result and all pixels in the original image. It is used to evaluate the accuracy and precision of segmentation algorithms. The IoU value ranges from 0 to 1, with higher values indicating better segmentation results. The calculation process of IoU is illustrated in Fig. 8.

Calculation process of Intersection over Union(IoU).
In the process of quantitative analysis, we compared the output results of our improved CycleGAN network with those of the original CycleGAN network, and calculated the IoU coefficient and Dice coefficient for both. The improved network and the original network corresponded to two scenarios of μ = 5 and μ = 0, respectively, constituting a set of ablation experiments to verify the impact of adding Dice Loss on network performance.In order to calculate the Dice coefficient and IoU, a visual comparison was made between the outputs and the ground truth. Evaluation metrics are as shown in Figs. 9 and 10, and the differences between the two were highlighted by mapping them to black and white regions.

The Dice values and IoU values of the the Improved CycleGAN Network’s segmentation result of the partial CT images in the test set, and the difference between the segmentation results and the ground truth were mapped into black areas.At the case of μ = 5, the mean Dice coefficient was 0.96645 and the mean IoU coefficient was 0.93718.

The Dice values and IoU values of the the original CycleGAN Network’s segmentation result of the partial CT images in the test set, and the difference between the segmentation results and the ground truth were mapped into black areas.At the case of μ = 0, the mean Dice coefficient was 0.9316 and the mean IoU coefficient was 0.8799.

The comparison of the proposed method with various approaches in dealing with industrial CT images without ground truth.
From the quantitative analysis, we can see that the addition of Dice Loss significantly improves the segmentation performance of the network.In the task of direct segmentation of CT images with metal artifacts, we face multiple challenges such as image quality degradation, artifact removal, and segmentation accuracy. The combination of multiple loss functions we adopted improved the performance and robustness of the network, effectively addressing these challenges.
To compare the segmentation results, we processed a subset of industrial CT images without ground truth using our proposed method, single-threshold Otsu method [26], multi-threshold Otsu method [27], and the original CycleGAN network method [16]. The results are shown in Fig. 10.
The improved CycleGAN network applied to industrial CT image segmentation has achieved excellent visual segmentaion results, preserving the overall characteristics of the CT detection targets. As illustrated in Fig. 9, the quantitative evaluation of image segmentation quality reveals that the method based on the improved CycleGAN architecture yields an enhancement of 3.5% in the Dice coefficient and 5.2% in the Intersection over Union (IoU), in comparison to the original CycleGAN approach. The CT images processed by the improved semi-supervised image segmentation algorithm based on improved CycleGAN are very close to the ground truth, whose the mean Dice coefficient can reach 0.96645, exhibiting high-quality segmentation results. The algorithm successfully preserves the structural features, boundary features, and other detailed characteristics of the detected targets.
In contrast, when applying different image segmentation methods to actual industrial CT images without ground truth information, the results in Fig. 10 demonstrate that the single-threshold Otsu algorithm is completely inadequate for segmenting CT images containing metal artifacts. Although the multi-threshold Otsu algorithm can partially accomplish the segmentation task when the artifacts are not severe, it fails to accurately preserve the desired boundaries. Additionally, it cannot complete the segmentation task when metal artifacts are severe. The original CycleGAN network, without the introduction of image segmentation quality metrics, produces unsatisfactory segmentation results, often leading to the presence of redundant or missing details.
On the other hand, from Fig. 10, we can observe that the proposed semi-supervised industrial CT image segmentation method based on the improved CycleGAN demonstrates outstanding segmentation effectiveness in practical tasks, further validating the generalization capability of the proposed approach.
It is worth noting that this method operates without the need for paired data sets, which eases the burden of data collection. Nonetheless, the reliance on a limited data set poses limitations to the generalization capability of the network. Future work should focus on expanding the data set to enhance the network’s robustness.
Additionally, exploring the integration of attention mechanisms and multi-scale information fusion modules holds promise in improving the network’s perceptual capabilities for higher segmentation quality and more accurate target detection.
Conclusion
In our study, we introduce an improved CycleGAN algorithm to achieve semi-supervised segmentation of industrial CT images with metallic artifacts. In the absence of paired data, the proposed method can directly transform industrial CT images contaminated with artifacts into segmented images, surpassing the capabilities of supervised networks. The average values of image segmentation quality assessment metrics, were found to be 0.96645 for the Dice coefficient, and 0.93781 for the Intersection over Union (IoU),closely approaching the fidelity of ground truth.Through comparison with other segmentation methods, the proposed approach not only eliminates the cumbersome steps of metallic artifact removal and denoising, but also directly transforms metal-contaminated images into ideal segmentation results while preserving the fine details of CT images. The method demonstrates outstanding performance in practical applications and provides valuable insights for further research.
