Research on the effectiveness of multi-view slice correction strategy based on deep learning in high pitch helical CT reconstruction

Abstract

BACKGROUND:

Recent studies have explored layered correction strategies, employing a slice-by-slice approach to mitigate the prominent limited-view artifacts present in reconstructed images from high-pitch helical CT scans. However, challenges persist in determining the angles, quantity, and sequencing of slices.

OBJECTIVE:

This study aims to explore the optimal slicing method for high pitch helical scanning 3D reconstruction. We investigate the impact of slicing angle, quantity, order, and model on correction effectiveness, aiming to offer valuable insights for the clinical application of deep learning methods.

METHODS:

In this study, we constructed and developed a series of data-driven slice correction strategies for 3D high pitch helical CT images using slice theory, and conducted extensive experiments by adjusting the order, increasing the number, and replacing the model.

RESULTS:

The experimental results indicate that indiscriminately augmenting the number of correction directions does not significantly enhance the quality of 3D reconstruction. Instead, optimal reconstruction outcomes are attained by aligning the final corrected slice direction with the observation direction.

CONCLUSIONS:

The data-driven slicing correction strategy can effectively solve the problem of artifacts in high pitch helical scanning. Increasing the number of slices does not significantly improve the quality of the reconstruction results, ensuring that the final correction angle is consistent with the observation angle to achieve the best reconstruction quality.

Keywords

Computer tomography 3D medical image restoration slice correction strategy multi-view slicing experimental verification deep learning

1 Introduction

Data-driven 3D medical image restoration presents a complex and formidable challenge, requiring the recovery of pristine images from damaged or corrupted 3D images riddled with artifacts [1 –3]. In recent years, end-to-end deep learning methods have become increasingly prevalent in tasks such as segmentation, generation, denoising, and artifact removal. Nevertheless, the intricate nature and computational demands involved in building end-to-end 3D networks for 3D image restoration, coupled with hardware limitations and computational constraints, severely restrict the size of images that can be effectively processed. Therefore, the majority of deep learning models are designed for 2Ddomains.

However, the 3D features of medical images are of great significance for clinical diagnosis. Many diseases cannot be diagnosed accurately by relying solely on 2D slices [6]. For example, it may not be possible to fully assess the shape, size, and location of lung nodules using only 2D slices, which are crucial for determining whether they are benign or malignant [7, 8]. For cardiovascular diseases like aneurysms or heart valve abnormalities, 3D images are essential for evaluating the shape, diameter, and blood flow within the vessels [9]. In addition, joint diseases such as arthritis or cartilage injury also require 3D images to assess the structure and extent of damage to the joint [10]. Therefore, numerous previous studies have endeavored to transform 3D objects into 2D planes through slicing techniques and utilize 3D end-to-end networks to restore each slice individually. Finally, they were superimposed to recover the true 3D image. However, these studies varied in their methods of slicing and rectifying 3D images. Some studies sliced the 3D volume from a single direction and recovered it layer by layer, while others sliced it from different directions separately and recovered it multiple times. Finally, the rectified 2D images were superimposed.

For the high pitch helical CT scanning method that can effectively accelerate scanning speed and reduce radiation dose in clinical diagnosis, this article aims to verify three questions through theory and experiment. Firstly, which direction of slicing yields the best results when employing single-directional slicing correction? Secondly, can multiple-directional corrections significantly enhance recovery accuracy compared to single-directional corrections? Lastly, how can 3D volume slicing and correction be optimized for maximum benefit with limited computing resources? Unlike the method of estimating the possible composition of 3D anisotropic media from multiple perspectives using statistical methods in materials science [11], our research focuses on constructing realistic and accurate 3D images through slicing methods, rather than estimating them. In this article’s experiment, to include as many damaged image scenarios as possible in existing medical images and fit real clinical application scenarios, we chose the most used helical computed tomography structure in clinical diagnosis. We introduced sparse damage between layers and limited-angle damage within layers by adjusting the pitch, ensuring that the experimental images fully sustain damage from all directions possible [12]. Building upon this foundation, we apply image post-processing techniques for image restoration [13], further validating our findings. The challenge of reconstructing 3D images affected by such scenarios has long been a formidable obstacle in clinical medical diagnosis. Restoring such 3D images through data-driven methods stands as one of today’s most prominent research directions [14], which is crucial for improving the quality of medical imaging and accurate diagnosis and treatment. Through the experimental results of this study, we hope to draw conclusions and confirm whether a single directional correction is sufficient and whether multiple directional slices and one-by-one correction can significantly improve recovery accuracy. This will help to improve the methods and technologies of 3D image recovery and provide more reliable and high-quality 3D image recovery solutions for practical applications.

This article represents the inaugural exploration into slicing methods pertaining to the medical 3D image restoration process. Our principal contributions are outlined as follows:

We have established a general theoretical model for 3D volume correction using slice methods, applicable to data reconstruction and correction in 3D helical scanning scenarios.

Through extensive experimentation, we have validated the effects of correction direction, sequence, and their combinations on the improvement of correction results, providing quantitative analysis.

We have assessed the performance of CNN, GAN, and Transformer architecture networks in 3D volume correction under different pitch conditions, providing references for selecting the optimal network architecture.

For clinical practices employing high pitch and accelerated helical scanning methods, we have proposed strategies and recommendations for data reconstruction and correction to optimize image quality and reduce artifacts.

The rest of this paper is organized as follows. We reviewed the relevant research on using 2D slices for 3D volume correction in Section 2. In Section 3, we establish a 2D slice correction model for a 3D image and detail the possible sources of error in the correction process, as well as the experimental scenario and setup. Section 4 presents and analyzes the experimental results. In Section 5, we conducted an in-depth discussion of the experimental results. Finally, this work is summarized, and we give our suggestions in Section 6.

2 Background

3D damaged images are closer to the real clinical situation, and more than half of clinical diagnoses require the use of 3D data. The restoration of 3D-damaged medical images is important for the accuracy of clinical diagnosis [15]. In previous studies, data-driven 3D feature recovery has been widely applied to recover 3D image damage caused by various scenarios such as low dose, metal artifacts, and motion artifacts. However, the computational complexity of 3D end-to-end deep learning networks is often too large, and they are limited by 3Dthe memory size of graphics processors, which can only handle small-sized 3D volumes [16]. Therefore, most research focuses on 2D images or slicing 3D objects and then using end-to-end networks for learning. Initially, to reduce computational complexity, slices along one direction are usually used for correction. For example, a classic study used a Wasserstein Generative Adversarial Network (WGAN) to reconstruct 2D computed tomography (CT) slice images from a limited number of projection images and then performed longitudinal superposition to complete 3D image reconstruction [17]. This is a general method that can be used by almost all deep learning networks with end-to-end nonlinear fitting capabilities. In addition, the Department of Mechanical Engineering at Tsinghua University constructed a 3D slicing reconstruction model (3DSR) to super-resolve and stitch 3D volumes layer by layer, improving accuracy while reducing data acquisition resources and storage space during the 3D modeling process [18].

However, research has found that there are always problems with the accuracy of using 2D slices to characterize 3D features [19], especially in scenarios where there is a significant pixel shift along the direction perpendicular to the slice. In complex textures or porous microstructures [20], single-direction correction cannot meet the accuracy requirements of 3D reconstruction [21]. Therefore, many studies have attempted to improve the inter-layer consistency during single-direction correction. For example, Patrik Kamencay et al. used SURF (Speeded-Up Robust Features) descriptor and SSD (Sum of Squared Differences) matching algorithm to improve the quality of 3D reconstruction under slice methods using the idea of inter-layer feature point matching [22], but the drawback is longer computation time. L Salvolini et al. proposed a fast multi-GPU acceleration framework for slice-to-volume reconstruction to address the problems of motion artifacts and slow algorithms in MRI imaging. The proposed method ensures high reconstruction accuracy by accurately calculating the point spread function (PSF) for each input data point [23]. K Choi et al. first used the correlation between slices for model training in the context of low-dose CT (LDCT). They constructed a self-supervision framework (Corr2Self) to train a CNN denoiser. The proposed method incorporated the inter- and intra-slice correlation of LDCT images into the self-supervision learning procedure by using adjacent slices and thicker slices, without unnecessary radiation dose transfer [24]. In addition, some simple techniques can also increase the 3D consistency of 2D slices in the perpendicular direction. For example, K Choi et al. used multiple consecutive slices as inputs in low-dose CT (LDCT) reconstruction tasks, which is essentially also a 3D reconstruction network, but significantly reduces the computational load to a certain extent [25].

Another method is to slice and correct the 3D volume from different directions in multiple stages. This method often selects orthogonal slicing angles to minimize the possible shift of features in 3D space. Although this method can further improve the accuracy of 3D feature recovery, it is not widely used due to cumbersome steps and exponentially increased recovery time. For example, JW Hayes et al. used a U-Net network to correct the 3D volume from two directions, the coronal and sagittal planes, in high-pitch clinical CT, which can simultaneously eliminate finite-angle artifacts in sections and sparse artifacts between layers [26]. In addition, in the field of fast multi-layer magnetic resonance imaging (MRI), K Kim et al. adopted a method called slice intersection motion correction (SIMC) to directly align multiple slice stacks by obtaining the matching structure of orthogonal slices and all intersecting slices, to provide a single high isotropic resolution 3D image. The registration effect of this method has significantly improved compared to previous methods [27].

However, the above studies all assume the validity of slicing and have not explored the impact of slicing methods or angle selection on 3D volume recovery from a theoretical or experimental perspective. To our knowledge, our study is the first to attempt to investigate the effectiveness of single-direction and multi-direction slicing and their effects on 3D volume accuracy recovery.

3 Methods

3.1 2D slicing methods for high pitch helical CT volume correction

Most image translation work directly targeted at 3D images uses 3D convolution operations in the network encoding stage, which requires a large amount of memory, so it can only handle very small-sized 3D volumes. Therefore, the current approach is to correct damaged 3D images by slicing them layer by layer. Taking the 3D body structure of the chest and abdomen of the human body as an example, the slicing method along the sagittal plane (x-axis) is shown in Fig. 1(b). In addition, there are also common slicing methods along two mutually perpendicular coronal planes as shown in Fig. 1(c) and (d). The above process can be generalized to the general form of slicing 3D images at any angle. Given the angle θ, a unit vector ν = [cos (θ) , sin(θ) , 0] can be defined to represent the direction of slicing, and then the 3D volume is sliced along this direction. The process of slicing inside a rectangular prism P with a side length of d and a height of h can be expressed as follows:

Fig. 1

Three reconstruction slicing methods for 3D volumes.

$P_{θ} (x, y, z) = \int_{- \frac{d}{2}}^{\frac{d}{2}} f (x - tcos (θ), y - tsin (θ), z) dt (z \in [0, h])$ (1) where, P_θ (x, y, z) represents slices formed by cutting at the (x, y, z) coordinate in the 3D volume using the function f at a given angle θ.

To describe the slicing process more generally, let us assume that we have a plane $S$ . We want to slice a 3D volume structure using the plane $S$ . This description allows us to generalize the slicing process to any angle and plane. We can use the parameterization of the plane $S$ to describe the slicing process. Suppose that the parameterization of the plane $S$ is r (u, v) = [x (u, v) , y (u, v) , z (u, v)], where u and v are parameters. We can describe any point on the plane $S$ using parameters u and v. The slicing process can be represented as: $P_{θ} (x^{'}, y^{'}, z^{'}) = \iint f (x (u, v), y (u, v), z (u, v)) {(\frac{\partial x}{\partial u} \cos (θ) + \frac{\partial y}{\partial u} \sin (θ))}_{S} dudv .$ (2)

3.2 Deep learning to correct 2D slices

The deep learning method that emerged in recent years abandoned the idea of accurately solving the analytical methods from a mathematical perspective but instead constructed a nonlinear space to realize image-to-image transformation. In theory, with sufficient training parameters, the deep learning model can be trained to approximate any complex space, thus achieving damaged image restoration. Especially in the field of repairing damaged medical images, this process can also be vividly referred to as “image translation”.

Assuming we have a 3D volume V , and we have a tangent plane f (x, y, z). So, by moving the 3D volume V along the tangent plane f (x, y, z). By cutting, we can obtain a series of 2D slices. Each slice can be parameterized as Γ (s, t), where s is the parameter on the tangent plane, and t is the positional parameter of the slice on that plane.

The entire 3D volume V can be represented as the accumulation of slices, that is: $V = \nabla f (x, y, z) \int Γ (s, t) dsdt,$ (3) above, ds and dt represent the parameters on the tangent plane and the positional parameters of the slice, respectively. The entire three-dimensional volume V can be expressed as the accumulation of these slices. ∇f (x, y, z) is a calculus operation on a slice, where Γ (s, t) describing the positional relationship of slices on the tangent plane. So, this formula transforms 3D geometric problems into correction problems on 2D slices.

If we assume that a deep learning network model is denoted as $S$ , the formula (1) represents the implementation of damaged image restoration.

$S ɛ = Z,$ (4) where ɛ denotes the damaged 2D image slices, i.e. $ɛ = \sum_{i = 1}^{T} ɛ_{i}$ , and ɛ _i = f_i (x, y, z) is the slice image of the i-th layer. Z is the ground-truth image.

The loss of applying deep learning networks in each layer slice in each direction can be expressed as: $E (x) = \min_{S} \sum_{j = 1}^{M} {‖ Z^{(j)} - S ɛ^{(j)} ‖}^{2},$ (5) where x is a vector containing all correction parameters. This error function measures the correction error of each slice in a specific direction. The goal of deep learning networks is to find an optimal correction parameter vector, which minimizes the error function E (x). The problem can be transformed into a minimization problem for the target loss x^*: $x^{*} = argminxE (x),$ (6) we can expand the error function E (x) to the sum of the errors for each slice: $E (x) = \sum_{i = 1}^{i = N} E_{i} (x),$ (7) where N is the total number of slices, and E_i (x) is the error function of the i-th slice.

When considering multiple directions, we can add up the error functions in each direction and then minimize the overall error function. Assuming there are D directions {x₁, x₂, …, x_D}, where each x_d is the correction parameter vector for the corresponding direction d. The overall error function can be expressed as: $E (x_{1}, x_{2}, \dots, x_{D}) = \sum_{i = 1}^{i = d} ({dx}_{d}),$ (8) among them, d is the number of slices in the d-th direction, and x_d is the error function of the i-th slice in the d-th direction.

Therefore, our goal is to find the optimal set of correction parameter vectors ${x_{1}^{*}, x_{2}^{*}, \dots, x_{D}^{*}}$ , so as to minimize the overall error function E(x₁, x₂,..., x_D).

This problem can be formalized as a multivariate function optimization problem: $(x_{1}^{*}, x_{2}^{*}, \dots, x_{D}^{*}) = argminE (x_{1}, x_{2}, . . ., x_{D}) .$ (9)

This single network objective function can be described as follows: $x^{*} = \min_{S} \sum_{j = 1}^{M} {‖ Z^{(j)} - S ɛ^{(j)} ‖}^{2} .$ (10)

To verify the influence of correction angle on the recovery effect of 3D volume, we used a variety of classic deep learning models in the field of computer vision, a brief introduction to the model and experimental setup are as follows:

U-Net is a renowned neural network architecture featuring a codec structure extensively employed in tasks such as image segmentation and restoration [28]. The U-net used in the experiment is a classic five-layer structure with a maximum down sampling rate of 16.

The GAN method is widely used in restoring damaged medical images due to its ability to adaptively generate complex images by refining the loss function during training. Among these methods, pix2pixGAN [29] is notable. It learns the mapping between input and target images by training both the generator and discriminator simultaneously. ResNet-9 was used as its generator in the experiment, while PatchGAN was used as the discriminator.

CycleGAN [30] is an image conversion model that does not require paired training data and can convert between different fields, such as converting horse images to zebra images. Unlike pix2pixGAN, CycleGAN does not require paired data as input.

Swin-Unet [31] is an important work in the field of image segmentation, which combines the characteristics and advantages of Swin Transformer and U-Net. Compared to traditional convolutional neural networks, it has excellent modeling capabilities and effective processing of long-distance dependencies. The code used in the experiment can be found in reference [31].

3.3 Experiment setup

In the experiment, we wrote the program in Python, implemented all deep learning methods using PyTorch, and ran them on a computer equipped with an Intel(R) Xeon(R) Silver 4210 R CPU@2.40 GHz and an eight-card NVIDIA RTX3090 GPU. The experimental procedure is outlined in Fig. 2, illustrating the overall flow of the experiment.

Fig. 2

Experimental steps and end-to-end network architecture diagram for 3D volume slice validation.

3.3.1 Experimental scenario and method

In theory, achieving precise 3D helical FDK algorithms requires satisfying two data completeness conditions: (1) the scanning angle interval must be sufficiently small, and the rays must be dense enough to prevent sparse angle artifacts; (2) the scanning trajectory needs to meet the Tuy condition to prevent helical artifacts between layers and limited-angle artifacts within layers. Only by meeting these conditions can it be ensured that the rays adequately irradiate all voxels and achieve accurate reconstruction. Literature [26] has already demonstrated that in helical scanning trajectories, as shown in Fig. 3, ensuring a standard pitch P < 1.375 satisfies the data completeness condition. In practice, some CT vendors have also introduced data extrapolation schemes to extend the pitch to P = 1.5. To fully demonstrate the correction effect of end-to-end deep learning algorithms, we selected scanning conditions with pitches of 2, 3, and 4, as shown in Fig. 4.

Fig. 3

Scanning schematic diagram of different pitches. When the standard pitch is less than 1, the scanning trajectory rotates one revolution and is still within the detector plane range. Otherwise, it will cause the projection data to be missing along the detector direction.

Fig. 4

Reconstruction results of FDK algorithm for standard helical trajectories under different pitches: (a-4), (b-4), (c-4), and their cross-sections (a-2), (b-2), and (c-2); Schematic diagram of coronal plane (a-3) (b-3) (b-4). As the pitch increases, the severity of damage to the 3D reconstruction results and slices in all directions gradually increases, and the training difficulty faced by the end-to-end model also gradually increases. The display windows are [–1000 1000] HU.

3.3.2 Datasets

To adapt to the 3D characteristics in this study, we used the AbdomenCT-1K dataset, which is a large-scale abdominal CT dataset, including 1112 CT scans, for the segmentation of four abdominal organs, including the liver, kidney, spleen, and pancreas. To facilitate the training of end-to-end deep learning models, we first proportionally scale all 2D slice data along the x-axis, from the original size of 512 * 512 to 288 * 288. Then, we symmetrically fill the edges of the slices with less than 512 images along the y-axis and perform center symmetric cropping for those exceeding 512 pixels. Finally, we make all 3D volumes of size 288 * 288 * 512, and obtain the results along x-axis, The slice sizes for the y-axis and z-axis are 288 * 288, 512 * 512 and 512 * 512, respectively.

We randomly selected 1062 cases and obtained 174,540, 305,856, and 305,856 2D slices along the x, y, and z axes using the slicing method, respectively. The training set, test set, and validation set were divided according to the size of 8 : 1:1. Finally, we implemented the U-net, pix2pixGAN, and Swin-Unet methods on the Pytorch framework. The training, validation, and testing of the models were performed on an Intel Core i77700 (3.60 GHz) central processing unit (CPU) and eight RTX3090 graphics processing units (GPU) with 24GB of video memory. To fully verify the effectiveness of multi-angle 2D slices for 3D feature recovery, we designed five sets of experiments to conduct controlled experiments on different directional correction, correction order, increasing correction dimensions, different deep learning methods, and different difficulty levels of correction. The data involved in the training reached 9.8T, and all model training lasted for 10 weeks and 3 days.

3.3.3 Selection of Evaluation Indicators

When evaluating the results of 3D large pitch CT reconstruction, we used a series of objective evaluation indicators, including SSIM (structural similarity index), PSNR (peak signal-to-noise ratio), as well as 3D SSIM and 3D PSNR. These indicators can provide a comprehensive evaluation, particularly suitable for our scenario of using end-to-end deep learning data-driven methods for reconstruction, while comparing the effects of X, Y, Z, and slices in different directions.

Firstly, we use the Structural Similarity Index (SSIM) to measure the structural similarity between the reconstructed image and the original image. The calculation formula for SSIM is as follows: $SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{xy} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})},$ (11) where x and y represent the reconstructed image and the original image, respectively. μ_x and μ_y represents their mean values, respectively, σ_x and σ_y represents their variance, σ_xy represents their covariance, while C₁ and C₂ are constants used to calculate the stable denominator.

Secondly, we use Peak Signal to Noise Ratio (PSNR) to quantify the signal-to-noise ratio between the reconstructed image and the original image. The calculation formula for PSNR is as follows: $PSNR = 10 * \log_{10} (\frac{L^{2}}{MSE}),$ (12) where ‘L’ represents the maximum pixel value, while ‘MSE’ stands for Mean Squared Error, calculated as the sum of the squares of the average pixel differences between the reconstructed image and the original image, divided by the total number of pixels. Additionally, two additional metrics were considered: 3D Structural Similarity and 3D PSNR. The calculation formula for these metrics is analogous to the 2D counterparts mentioned earlier, albeit accounting for the various dimensions of the 3D body.

4 Results

4.1 Results on single directional correction

Table 3 and Fig. 1 depict the results of slice images corrected from cross-sectional, coronal, and sagittal directions utilizing the pix2pixGAN model, with a pitch of 2. The results conspicuously illustrate the ability of end-to-end data-driven models to effectively restore original images from damaged 3D high-pitch scenarios, with outcomes closely approximating the Ground truth slice images.

Traditional analytical 3D FDK algorithms exhibit significant limited-angle artifacts within cross-sectional planes, manifested as extensive data voids and arc-like artifacts, while coronal and sagittal planes present severe helical artifacts, hindering discernment of internal structural textures. In contrast, all slices corrected using the Pix2pixGAN model show no apparent artifacts or errors. Examination of Error maps in Fig. 5 reveals the error distribution across slice directions. As per the objective evaluation metrics in Table 1, results corrected along directions perpendicular to the final observed slice direction demonstrate optimal performance, while slices in other directions exhibit some small, dense, vertical, or horizontal stripe artifacts, as indicated by the red arrows in Fig. 5. Moreover, Table 1 indicates minimal discrepancies in the 3D reconstruction results along various correction directions.

Fig. 5

Slices along different axes, corrected using the FDK (II) and pix2pixGAN algorithm with a pitch of (p = 2), are presented. Slice (a) along the x-axis of the cross-section and its corresponding error map (a1), slice (b) along the y-axis of the coronal plane and its error map (b1), and slice (c) along the z-axis of the sagittal plane and its error map (c1) are depicted. The window size is set to [–1000, 1000].

Table 1

The quantitative evaluation index for image slices corrected in different directions and orders using the pix2pixGAN model with a pitch of 2. Bold font represents the maximum value in a column

Slices	FDK			X			Y			Z
	RMSE	PSNR	SSIM	RMSE	PSNR	SSIM	RMSE	PSNR	SSIM	RMSE	PSNR	SSIM
x-slices	0.1219	25.3483	0.7294	0.0091	37.9223	0.9751	0.0150	36.1752	0.9683	0.0163	37.2323	0.9720
y-slices	0.0918	23.9458	0.6401	0.0124	34.3742	0.9345	0.0119	38.9822	0.9778	0.0121	34.2534	0.9423
z-slices	0.1126	24.8732	0.7043	0.0136	35.7323	0.9487	0.0131	37.9834	0.9570	0.0110	39.3154	0.9716
3D	0.1074	23.9763	0.7213	0.0109	37.2854	0.9703	0.0133	37.1984	0.9643	0.0132	36.0201	0.9699

4.2 Results on multi-directional correction

Figure 6 and Table 2 elucidate the reconstruction outcomes of the pix2pixGAN model under a pitch of 2, following the augmentation of correction angles and adjustment of correction sequences. Relative to the 3D FDK model, any sequence or quantity of corrections significantly enhances the quality of resulting images, effectively mitigating circular artifacts in cross-sectional planes and helical artifacts in coronal and sagittal planes. Objective evaluation metrics in Table 2 indicate a minor improvement in 3D image quality with increased correction angles, albeit not substantial. However, noteworthy is the observation, as indicated by the red arrows in the Error Map of Fig. 6, that augmenting correction directions not only fails to noticeably reduce image errors but may introduce artifacts along other directions, leading to a further increase in overall error, a phenomenon corroborated by the objective evaluation metrics in Table 2.

Fig. 6

Results obtained using the FDK (II) and pix2pixGAN algorithm with a pitch of (p = 2) have been presented with corrections along the x-axis as follows: (III) corrected along the x-axis, (IV) x-axis before y-axis, (V) x-axis before z-axis, and (VI) in the order of x, y, and z-axis. The results are depicted along the x-axis of the cross-section (a) and its corresponding error map (a1), along the y-axis of the coronal plane (b) and its error map (b1), and along the z-axis of the sagittal plane (c) and its error map (c1). The window size is set to [–1000, 1000].

Table 2

The quantitative evaluation index for image slices corrected in different directions and orders using the pix2pixGAN model with a pitch of 2, (PSNR/SSIM). The column directions represent slicing the 3D volume along different directions and then correcting it. The row direction represents the objective evaluation indicators of slicing the corrected results along the cross-sectional (x-slice), coronal (y-slices), and sagittal (z-slices) planes. 3D represents the evaluation results using 3D objective evaluation indicators. Bold font represents the maximum value in a column

Slices	X⇒Y	Y⇒X	Y⇒Z	Z⇒Y	X⇒Z	Z⇒X	X⇒Y⇒Z	X⇒Z⇒Y
x-slices	36.7834/0.9731	37.7223/0.9751	37.1752/0.9683	37.2323/0.9620	36.8321/0.9762	38.8872/0.9763	36.9821/0.9732	37.8234/0.9740
y-slices	37.9458/0.9843	36.3742/0.9645	36.9822/0.9678	38.9832/0.9763	35.8019/0.9593	37.8321/0.9634	37.8732/0.9701	38.9834/0.9737
z-slices	35.5732/0.9543	36.9832/0.9692	38.1401/0.9818	36.3123/0.9870	38.7213/0.9831	37.9328/0.9731	38.7812/0.9713	37.6384/0.9794
3D	36.9233/0.9793	37.1854/0.9763	37.0984/0.9719	37 0201/0.9792	36.8741/0.9800	38.2714/0.9762	38.7651/0.9803	38.2145/0.9832

Furthermore, the 3D reconstruction outcomes following correction sequence permutations, as depicted in Table 2, exhibit negligible discrepancies. However, upon closer examination from a columnar perspective, it becomes apparent that optimal results are generally achieved when the final corrected slice aligns with the observed direction.

4.3 Results of different models and pitches

Table 3 illustrates the results of correction along the cross-sectional slice (x-axis) by different models at various pitches. With increasing pitch, all models exhibit a certain degree of quality deterioration, especially evident in the traditional CNN-based Unet model, which shows the lowest reconstruction quality at a pitch of 4. In contrast, the pix2pixGAN model demonstrates consistently superior reconstruction quality across pitches of 2 and 3, surpassing the unsupervised CycleGAN model. Additionally, although the Transformer-based Swin-Unet model does not consistently achieve the best reconstruction quality across all pitch scenarios, its reconstruction results exhibit considerable stability across varying pitches, without significant degradation as pitch increases (implying increased network training difficulty). Furthermore, as evident from Table 3, the reconstruction results along the cross-sectional slice (x-axis) direction remain optimal across all directional slices.

Table 3
Quantitative evaluation indicators for image slices corrected from the cross-sectional (x-axis) direction using different models with pitches of 2, 3, and 4, respectively. Bold font represents the maximum value in a row

Model U-net Pix2pixGAN CycleGAN Swin-Unet

Ptich (P) P = 2 P = 3 P = 4 P = 2 P = 3 P = 4 P = 2 P = 3 P = 4 P = 2 P = 3 P = 4

x PSNR 37.9525 35.9102 33.8450 38.4223 34.3742 33.2854 37.1752 35.9822 34.8984 37.9598 36.9872 36.3902

SSIM 0.9576 0.9674 0.9352 0.9762 0.9654 0.9462 0.9752 0.9576 0.9300 0.9893 0.9765 0.9601

Y PSNR 36.3458 35.8724 33.7256 37.9852 36.0981 34.7653 36.8500 34.9751 33.4874 37.4871 37.0821 35.8183

SSIM 0.9591 0.9601 0.9483 0.9683 0.9582 0.9501 0.9693 0.9621 0.9543 0.9633 0.9601 0.9576

Z PSNR 35.8732 33.0725 31.2511 37.9184 37.2874 35.0194 37.0934 35.8992 34.9724 37.9411 36.0183 35.9663

SSIM 0.9702 0.9581 0.9491 0.9721 0.9682 0.9427 0.9774 0.9573 0.9401 0.9739 0.9651 0.9510

3D PSNR 37.0017 35.8742 33.0271 38.8913 35.8732 34.1501 37.9831 35.9822 33.9717 37.5735 36.9714 36.3045

SSIM 0.9701 0.9611 0.9453 0.9748 0.9593 0.9463 0.9701 0.9534 0.9489 0.9671 0.9695 0.9587

Model	U-net	Pix2pixGAN	CycleGAN	Swin-Unet
x	PSNR	37.9525	35.9102	33.8450	38.4223	34.3742	33.2854	37.1752	35.9822	34.8984	37.9598	36.9872	36.3902
	SSIM	0.9576	0.9674	0.9352	0.9762	0.9654	0.9462	0.9752	0.9576	0.9300	0.9893	0.9765	0.9601
Y	PSNR	36.3458	35.8724	33.7256	37.9852	36.0981	34.7653	36.8500	34.9751	33.4874	37.4871	37.0821	35.8183
	SSIM	0.9591	0.9601	0.9483	0.9683	0.9582	0.9501	0.9693	0.9621	0.9543	0.9633	0.9601	0.9576
Z	PSNR	35.8732	33.0725	31.2511	37.9184	37.2874	35.0194	37.0934	35.8992	34.9724	37.9411	36.0183	35.9663
	SSIM	0.9702	0.9581	0.9491	0.9721	0.9682	0.9427	0.9774	0.9573	0.9401	0.9739	0.9651	0.9510
3D	PSNR	37.0017	35.8742	33.0271	38.8913	35.8732	34.1501	37.9831	35.9822	33.9717	37.5735	36.9714	36.3045
	SSIM	0.9701	0.9611	0.9453	0.9748	0.9593	0.9463	0.9701	0.9534	0.9489	0.9671	0.9695	0.9587

Figure 7 presents the objective evaluation metrics results of different models after increasing correction directions at a pitch of 4. It is notable that increasing correction directions does not universally result in an improvement in reconstruction quality, particularly for traditional CNN and GAN architectures. The results indicate that the Swin-Unet network model can achieve consistently superior reconstruction performance on average, particularly in challenging scenarios (pitch of 4), with the most stable outcomes.

Fig. 7

Quantitative evaluation index curves of image slices corrected along the X, X⇒Y and X⇒Y⇒Z directions using different models in the pitch 4 scenario.

5 Discussions

Recovering data from high-pitch helical CT scans, where data loss is inherent in 3D space, presents considerable challenges in clinical applications. Our research focuses on investigating the predominant method for restoring compromised 3D reconstruction outcomes: slice correction. Our objective is to identify the correction approach that can offer the most advantageous outcomes.

The widely recognized 3D FDK analytical algorithm, due to its utilization of global filtering operators and back-projection methods, amplifies errors during back-projection with even minor data loss, resulting in prominent artifacts such as inter-slice helical artifacts and intra-slice limited-angle artifacts. Especially when the pitch is further increased, it can be found that the results are even less ideal. This is because the FDK algorithm’s filter is a high pass filter, and the missing data with obvious sharp protrusions or depressions will be enhanced in this process, thereby further increasing the impact of artifacts on the reconstruction results. Nonetheless, its concise and efficient nature positions it as the initial step in end-to-end data-driven approaches. In the subsequent stage, we deliberately selected three slices in different directions and explored various permutations of correction sequences, encompassing both single-directional and multi-directional corrections. Our objective was to investigate whether selecting correction angles or increasing correction angles could effectively enhance correction accuracy. Thus, we conducted experiments covering a wide array of mainstream 2D deep learning network architectures. Both objective and subjective evaluation metrics demonstrate significant improvements in reconstruction quality when employing the slice correction method compared to 3D FDK.

Indeed, if a model possesses sufficiently strong nonlinear fitting capabilities, and each slice corrected using the slice method closely resembles the corresponding label slice, correction from a second direction may be unnecessary regardless of the direction of damage to artifacts in the 3D volume. However, the nonlinear fitting capabilities of any end-to-end deep learning model are inherently limited, especially in challenging scenarios such as 3D high-pitch reconstruction. Even after model training, artifacts may persist along certain directions. Hence, research into correction directions, sequences, and model performance becomes crucial. Through experimentation, we observed that the correction sequence does not significantly influence the results for different models. Furthermore, increasing the number of corrections does not notably enhance 3D reconstruction quality, even in severely compromised scenarios, as illustrated in Fig. 5. One potential explanation could be that under different correction directions, models prioritize different aspects of recovery. For instance, during recovery along the cross-sectional plane, models may mitigate errors within the plane but overlook the coherence between adjacent cross-sectional planes. This oversight leads to severe artifacts along the longitudinal axis of the coronal plane, as indicated by the red arrows in Fig. 5 and Fig. 6, posing challenges for subsequent corrections in other directions. Similarly, during subsequent corrections in other directions, even if a network model can reduce image errors within a specific plane to a sufficiently low level, it cannot guarantee coherence across other directions. It may even reintroduce artifacts from previous corrections, thereby explaining why the objective evaluation metrics of 3D reconstruction, as shown in Table 2, do not significantly improve with an increase in correction directions.

On the other hand, we have observed the advantages of transformer-based models in reconstructing images from ultra-high pitch helical scans. It is imperative to acknowledge that different models exhibit varying learning capacities, and we conducted extensive experiments on prevalent methods, as evidenced in Table 3 and Fig. 5. Notably, although our data-driven approaches employ similar types and sizes of datasets, their nonlinear fitting capabilities differ. Transformer architectures like Swin-Unet have demonstrated good stability across training scenarios of varying difficulty. In contrast, CNN and GAN architectures exhibit more variability. For scenarios involving 3D helical reconstruction, large-scale, global limited-angle artifacts are prevalent within the planes. Traditional CNN architectures struggle to recognize such extensive global features due to constraints imposed by convolutional kernel sizes and pooling layers. This limitation hinders the effective utilization of long-range dependencies for image recovery, particularly in scenarios with increased helical pitches leading to significant degradation in reconstruction quality. Moreover, GAN-based methods excel in image restoration accuracy compared to CNN architectures but lag in training time and inference speed. This delineates the trade-offs between these methodologies in the context of 3D helical reconstruction, with Transformer models like Swin-Unet demonstrating promising stability and efficacy across various trainingdifficulties.

6 Conclusion

Through theoretical analysis and extensive experiments, our study has provided further clarity on the impact of correction directions on the quality of 3D and 2D slice reconstruction. We found that increasing the number of correction directions does not significantly enhance reconstruction quality once the model is fully trained. Similarly, selecting correction sequences in multiple directions does not notably affect the accuracy of 3D reconstruction.

Moreover, we observed that during the multi-directional correction process, artifacts in the correction results tend to align with the direction perpendicular to the previous correction. While models with the ability to capture long-range dependencies may face challenges in training and inference speed, they demonstrate excellent performance in scenarios with high pitch.

Given these insights, we recommend cautious clinical use of 2D slicing methods for correcting high pitch spiral reconstructions. Considering limited computing resources and time constraints, it is advisable to correct from a single direction. Optimal results are achieved when the correction direction is perpendicular to the final clinical observation direction.

For scenarios with abnormally high pitch (p > = 4), models with stronger capabilities in modeling long-range dependencies can be appropriately utilized for correction. If significant errors persist after one-dimensional correction, multi-directional correction can be employed to achieve the highest 3D reconstruction accuracy, ensuring that the final correction direction aligns as closely as possible with the observation direction.

Footnotes

Acknowledgments

This research is supported by National Natural Science Foundation of China (Grant No.: 52075133), CGN-HIT Advanced Nuclear and New Energy Research Institute (Grant No.: CGN-HIT202215).

References

Zhou

, Fan

, Hansen

, Johnson

C.R.

, Weiskopf

, A review of three-dimensional medical image visualization, Health Data Science 2022 (2022).

, Liu

, 3D High-quality magnetic resonance image restoration in clinics using deep learning, arXiv preprint arXiv:2111.14259, (2021).

Zhu

, Li

, Hu

, Ma

, Zhou

S.K.

, Zheng

, Rubik’s cube+: A self-supervised feature learning framework for 3d medical image analysis, Medical Image Analysis 64 (2020), 101746.

Ben Yedder

, Cardoen

and Hamarneh

, Deep learning for biomedical image reconstruction: A survey, Artificial Intelligence Review 54 (2021), 215–251.

Zhong

, Li

, Luo

, Xu

, Zhou

, Zhen

, Image restoration for low-dose CT via transfer learning and residual network, IEEE Access 8 (2020), 112078–112091.

Hussain

, Mubeen

, Ullah

, Shah

S.S.U.D.

, Khan

B.A.

, Zahoor...

, Sultan

M.A.

, Modern diagnostic imaging technique applications and risk factors in the medical field: A review, BioMed Research International 2022 (2022).

Snoeckx

, Reyntiens

, Desbuquoit

, Spinhoven

M.J.

, Van Schil

P.E.

, van Meerbeeck

J.P.

and Parizel

P.M.

, Evaluation of the solitary pulmonary nodule: size matters, but do not ignore the power of morphology, Insights Into Imaging 9(1) (2018), 73–86.

Mazzone

P.J.

, Lam

, Evaluating the patient with a pulmonary nodule: a review, Jama 327(3) (2022), 264–273.

Warmerdam

, Krings

G.J.

, Leiner

, Grotenhuis

H.B.

, Three-dimensional and four-dimensional flow assessment in congenital heart disease, Heart 106(6) (2020), 421–426.

10.

Turmezei

T.D.

, Low

B.S.

, Rupret

, Treece

G.M.

, Gee

A.H.

, MacKay...

J.W.

, Segal

N.A.

, Quantitative three-dimensional assessment of knee joint space width from weight-bearing CT, Radiology 299(3) (2021), 649–659.

11.

Seibert

, Raßloff

, Kalina

K.A.

, Gussone

, Bugelnig

, Diehl

, Kästner

, Two-stage 2D-to-3D reconstruction of realistic microstructures: Implementation and numerical validation by effective properties, Computer Methods in Applied Mechanics and Engineering 412 (2023), 116098.

12.

Klein

H.M.

, Wein

, Truong

, Pfingsten

F.P.

, Günther

R.W.

, Computed tomographic cholangiography using spiral scanning and 3D image processing, The British Journal of Radiology 66(789) (1993), 762–767.

13.

Kirchgeorg

M.A.

, Prokop

, Increasing spiral CT benefits with postprocessing applications, European Journal of Radiology 28(1) (1998), 39–54.

14.

Singh

S.P.

, Wang

, Gupta

, Goli

, Padmanabhan

, Gulyás

, 3D deep learning on medical images: a review, Sensors 20(18) (2020), 5097.

15.

Dalrymple

N.C.

, Prasad

S.R.

, Freckleton

M.W.

, Chintapalli

K.N.

, Introduction to the language of three-dimensional imaging with multidetector CT, Radiographics 25(5) (2005), 1409–1428.

16.

Kaviani

, Sanaat

, Mokri

, Cohalan

, Carrier

J.F.

, Image reconstruction using UNET-transformer network for fast and low-dose PET scans, Computerized Medical Imaging and Graphics 110 (2023), 102315.

17.

Thaler

, Hammernik

, Payer

, Urschler

, Štern

, Sparse-view CT reconstruction using wasserstein GANs, In International workshop on machine learning for medical image reconstruction (pp. 75–82), (2018, September), Cham: Springer International Publishing.

18.

Lin

, Xu

, Zheng

, Zhong

, Nie

, Three-Dimensional-Slice-Super-Resolution-Net: A fast few shooting learning model for 3D super-resolution using slice-up and slice-reconstruction, Journal of Computing and Information Science in Engineering 24(1) (2024).

19.

, Xiao

, Li

, Thomas

H.R.

, Li

, Stochastic reconstruction of 3D microstructures from 2D cross-sectional images using machine learning-based characterization, Computer Methods in Applied Mechanics and Engineering 390 (2022), 114532.

20.

Wei

, Zhang

S.Y.

, Chui

Y.H.

, Leblon

, Reconstruction of 3D images of internal log characteristics by means of successive 2D log computed tomography images, (2009).

21.

Feng

, Teng

, He

, Qing

, Li

, Reconstruction of three-dimensional heterogeneous media from a single two-dimensional section via co-occurrence correlation function, Computational Materials Science 144 (2018), 181–192.

22.

Kamencay

, Zachariasova

, Hudec

, Benco

, Radil

, 3D image reconstruction from 2D CT slices, In 2014 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON) (pp. 1–4). (2014, July). IEEE.

23.

Kainz

, Steinberger

, Wein

, Kuklisova-Murgasova

, Malamateniou

, Keraudren...

, Rueckert

, Fast volume reconstruction from motion corrupted stacks of 2D slices, IEEE Transactions on Medical Imaging 34(9) (2015), 1901–1913.

24.

Choi

, Lim

J.S.

, Kim

, Self-supervised inter-and intra-slice correlation learning for low-dose CT image restoration without ground truth, Expert Systems with Applications 209 (2022), 118072.

25.

Choi

, Lim

J.S.

, Kim

, StatNet: statistical image restoration for low-dose CT using deep learning, IEEE Journal of Selected Topics in Signal Processing 14(6) (2020), 1137–1150.

26.

Hayes

J.W.

, Montoya

, Budde

, Zhang

, Li

, Li...

, Chen

G.H.

, High pitch helical CT reconstruction, IEEE Transactions on Medical Imaging 40(11) (2021), 3077–3088.

27.

Kim

, Habas

P.A.

, Rousseau

, Glenn

O.A.

, Barkovich

A.J.

, Studholme

, Intersection based motion correction of multislice MRI for 3-D in utero fetal brain image formation, IEEE Transactions on Medical Imaging 29(1) (2009), 146–158.

28.

Ronneberger

, Fischer

, Brox

, U-net: Convolutional networks for biomedical image segmentation, In Medical image computing and computer-assisted intervention–MICCAI 2015:18th international conference, Munich, Germany, October 5–9, 2015, proceedings, part III 18 (pp. 234–241). (2015). Springer International Publishing.

29.

Isola

, Zhu

J.Y.

, Zhou

, Efros

A.A.

, Image-to-image translation with conditional adversarial networks, In Proceedings of the IEEE conference on computer vision and pattern recognition(pp. 1125–1134), (2017).

30.

Zhu

J.Y.

, Park

, Isola

, Efros

A.A.

, Unpaired image-to-image translation using cycle-consistent adversarial networks, In Proceedings of the IEEE international conference on computer vision(pp. 2223–2232), (2017).

31.

Cao

, Wang

, Chen

, Jiang

, Zhang

, Tian

, Wang

, Swin-unet: Unet-like pure transformer for medical image segmentation, In European conference on computer vision (pp. 205–218). (2022, October). Cham: Springer Nature Switzerland.