Ultra-sparse view lung CT image reconstruction using generative adversarial networks and compressed sensing

Abstract

X-ray ionizing radiation from Computed Tomography (CT) scanning increases cancer risk for patients, thus making sparse view CT, which diminishes X-ray exposure by lowering the number of projections, highly significant in diagnostic imaging. However, reducing the number of projections inherently degrades image quality, negatively impacting clinical diagnosis. Consequently, attaining reconstructed images that meet diagnostic imaging criteria for sparse view CT is challenging. This paper presents a novel network (CSUF), specifically designed for ultra-sparse view lung CT image reconstruction. The CSUF network consists of three cohesive components including (1) a compressed sensing-based CT image reconstruction module (VdCS module), (2) a U-shaped end-to-end network, CT-RDNet, enhanced with a self-attention mechanism, acting as the generator in a Generative Adversarial Network (GAN) for CT image restoration and denoising, and (3) a feedback loop. The VdCS module enriches CT-RDNet with enhanced features, while CT-RDNet supplies the VdCS module with prior images infused with rich details and minimized artifacts, facilitated by the feedback loop. Engineering simulation experimental results demonstrate the robustness of the CSUF network and its potential to deliver lung CT images with diagnostic imaging quality even under ultra-sparse view conditions.

Keywords

computed tomography image reconstruction compressed sensing generative adversarial network

Introduction

X-ray Computed Tomography (CT) is extensively employed clinically due to its superior spatial resolution and rapid imaging capabilities, particularly for traumatic and oncological diagnosis. Recent increases in CT usage have heightened concerns regarding the potential risks associated with its X-ray ionizing radiation, recognizing that prolonged or excessive exposure may elevate the risk of cancer.¹ For instance, PET/CT scans are generally associated with elevated radiation doses,² whereas cardiac CT frequently necessitates repeated imaging for the assessment of coronary artery disease.³ Particularly, Periodic CT screening for individuals at high risk of lung cancer is crucial, but frequent scans may lead to the accumulation of significant radiation doses.⁴ The development of technologies aimed at minimizing X-ray radiation doses, adhering to the ALARA (as low as reasonably achievable) principle, is critical in medical imaging. The primary objective of these technologies is to generate clinically diagnostic CT images while ensuring minimal radiation exposure to patients.

A straightforward way to reduce X-ray radiation dosage is to lower the number of projections in CT scans. Ref.^5,6 have developed a CT iterative reconstruction algorithm based on the Compressed Sensing (CS) theory, the View-driven Compressed Sensing (VdCS) algorithm, which is capable of reconstructing images from sparse views without prior data. However, the VdCS algorithm may exhibit staircase artifacts, a common drawback in CS-based CT image reconstructions, leading to discontinuous edges and detail loss in high-frequency areas,⁷ hence undermining its clinical utility. Specifically in lung CT images, the edges of small nodules or lesions may become discontinuous or exhibit a step-like appearance, which reduces the accuracy with which radiologists assess the precise shape of subtle lesions. To address this, we explore advanced techniques to enhance the quality of images reconstructed by the VdCS algorithm. Generative Adversarial Network (GAN)⁸ have shown exceptional performance in medical image restoration,⁹ denoising,^10–12 and reconstruction¹³ by exploiting the strengths of CNN.¹⁴ The combination of GAN and CNN effectively capture intricate image features, leading to high-quality and noise-free generations, which forms the basis of our methodological improvements.

In this study, we developed an end-to-end network, CT-RDNet, with a U-shaped architecture inspired by the classic U-Net,¹⁵ specifically designed to act as the generator in a GAN for CT image restoration and denoising. We also introduced a self-attention (SA) mechanism into the CT-RDNet to enhance its ability to capture both local features and global information, thereby improving overall performance. For the GAN training, a combined loss function that incorporates LSGAN, SSIM, and L1 losses was constructed. This combined loss function not only stabilizes the GAN training but also balances the structural similarity and overall visual quality of the generated image. Then we integrated the CT-RDNet with the enhanced VdCS algorithm through a feedback loop, developing a novel framework, the Compressed Sensing U-shaped network with Feedback (CSUF), for ultra-sparse view lung CT image reconstruction. In CSUF, the VdCS algorithm is utilized for the image reconstruction from ultra-sparse view projection data, providing enhanced features to the CT-RDNet. The CT-RDNet is used for restoring high-frequency details and denoising the images reconstructed by the VdCS algorithm. Through the feedback loop, the CT-RDNet supplies valuable priors to the VdCS algorithm, thereby improving overall performance of the VdCS algorithm.

Our main contributions are summarized as follows:

This paper proposes the CSUF network, particularly designed for ultra-sparse view lung CT image reconstruction, which has the potential to generate lung CT images with diagnostic imaging quality even under conditions of ultra-sparse views. As such, the CSUF has the potential to reduce X-ray radiation exposure for patients during CT scans.

The CSUF network is a novel framework that combines the CS theory with GAN technology, comprising three key components: (1) a CS-based CT image reconstruction module (VdCS module), (2) a U-shaped end-to-end network (CT-RDNet), and (3) a feedback loop that is proposed to enable a cooperative integration between the VdCS module and the CT-RDNet.

The CT-RDNet is specifically designed for CT image restoration and denoising. This paper introduces a self-attention mechanism into the CT-RDNet and also designs a combined loss function, both aimed at optimizing network performance.

This paper is organized as follows: Section 2 briefly reviews related work. Section 3 outlines the architecture and components of the proposed CSUF network. Section 4 presents the engineering simulation experiments conducted using synthetic CT projection data. The methods used are discussed in Section 5. Section 6 draws conclusions and sketches future directions.

Related work

Numerous CT iterative reconstruction algorithms have been developed to reduce X-ray ionizing radiation dose. Sidky et al.,¹⁶ for instance, proposed a Total Variation (TV) minimization algorithm for divergent-beam CT, achieving more accurate reconstruction from sparsely sampled projection data. In contrast to the discrete gradient transform used in TV minimization, dictionary learning has proven more effective for sparse representation. For instance, Xu et al.¹⁷ incorporated a redundant dictionary into a statistical iterative reconstruction framework, to capture local structural information, which can suppress artifacts and retain detailed structural features. Differing from algorithms based on dictionary learning, the Adaptive Statistical Iterative Reconstruction (ASIR) algorithm¹⁸ relies on adaptively captured statistical properties of the image, modeling X-ray photon statistics and electronic noise to mitigate noise and artifacts in low-dose CT images.

Compressed sensing (CS) theory,¹⁹ a landmark advancement, enables accurate restoration of original signals from undersampled data. Chen et al.,²⁰ building upon CS theory, proposed the Prior Image Constrained Compressed Sensing (PICCS) method, leveraging a prior image to guide the reconstruction from highly undersampled projection data for precise CT imaging. However, the clinical utility of PICCS is limited by the need to obtain accurately matched high-quality prior images. Our previous work^5,6 introduced a CS-based image reconstruction algorithm, VdCS, which effectively reconstruct images from sparse projection data without prior information, demonstrating strong engineering applicability. However, CS-based iterative image reconstruction algorithms, including VdCS, frequently suffer from the loss of high-frequency details and the emergence of blur artifacts attributed to the staircase effect.

Deep learning-based image reconstruction algorithms have emerged as a new frontier in medical imaging,^21,22 demonstrating superior practical efficacy compared to traditional methods. These algorithms excel at suppressing noise and artifacts, thereby enhancing image quality. A notable approach is the DL-PICCS proposed by Zhang et al.,²³ which integrates deep learning with the PICCS algorithm for sparse view CT image reconstruction. This method employs a U-Net framework to eliminate artifacts in the images reconstructed from undersampled data by Filtered Back Projection (FBP) techniques. It utilizes the output of the U-Net as the prior image for the PICCS algorithm. Then, it applies an additional light U-Net to restore image details. Deep learning-based image reconstruction algorithms can be classified into three categories: image-domain reconstruction,^24–26 sinogram-domain reconstruction,^27–29 and hybrid-domain reconstruction.^30–32

Image-domain reconstruction methods aim to establish non-linear mappings between noisy and high-quality images, primarily for artifact elimination and detail restoration. Zhang et al.²⁵ introduced DD-Net, combining DenseNet and a deconvolution network with shortcut connections to improve artifact suppression and detail recovery in low-quality FBP-reconstructed images. GANs have also been applied to CT image reconstruction.^13,30,33 Q. Yang et al.³³ proposed a GAN-based low-dose CT image denoising method that optimizes the Wasserstein distance and employs perceptual loss for high-frequency detail restoration. Overall, since the input of the image-domain methods is an image reconstructed from sparse projections by the conventional FBP algorithm, the output inevitably suffers from artifacts.

Sinogram-domain reconstruction methods preprocess projection data prior to image reconstruction. Lee et al.²⁷ proposed a method that employs residual U-Net to generate missing data from sparse projection data, followed by image reconstruction using the FBP algorithm. This technique outperforms conventional interpolation methods and iterative reconstruction methods in terms of reconstructed image quality. To overcome the limitations of traditional supervised learning methods, which require sparse-full view image pairs, Guan et al.²⁹ proposed a fully unsupervised score-based generative model for sparse view CT image reconstruction in the sinogram domain. This method trains a score-based generative network using full-view projection data and employs a multi-channel strategy to capture prior information, delivering comparable or better performance than the supervised learning counterparts.

Hybrid-domain reconstruction methods simultaneously process projection data and reconstructed images to suppress artifacts and enhance detail restoration. Wu et al.³⁰ proposed the DRONE network, which integrates embedding, refinement, and awareness modules for artifact suppression, detail restoration, and measurement consistency, respectively, ultimately improving edge preservation and reconstruction accuracy. Zhang et al.³² developed LEARN++, an extension of the LEARN framework,³⁴ featuring parallel sub-networks for the projection and image domains. This architecture enables simultaneous denoising and restoration while ensuring mutual interaction between domains for superior reconstruction performance.

Our work centered on accurately reconstructing lung CT images for clinical diagnosis utilizing ultra-sparse view CT scanning. The engineering simulation experiments presented in this paper demonstrate that the proposed CSUF network facilities the acquisition of sufficiently accurate lung CT images to meet clinical diagnostic standards, using only 20 view angles within the [0, 2π) interval for standard fan-beam scanning.

Theory and methods

Architecture of the CSUF network

The CSUF network, as depicted in Figure 1, comprises two modules: the VdCS module and the CT-RDNet, interconnected by a feedback loop. The workflow of CSUF is as follows: first, the VdCS algorithm is used for the pre-reconstruction of ultra-sparse view projection data, followed by the restoration of high-frequency information and denoising using the CT-RDNet. Then, the output of the CT-RDNet is fed back as prior information to the VdCS algorithm for the next reconstruction. Finally, the CT-RDNet is applied once more to generate the final reconstruction.

Figure 1.

The CSUF network. (a) Architecture: Integrating the VdCS module and the CT-RDNet with a feedback loop. (b) The interaction between the VdCS module and the CT-RDNet.

The VdCS algorithm

The CT iterative image reconstruction based on the CS theory under ultra-sparse view can be regarded as solving the ill-posed linear equation:

\begin{matrix} g = W f \end{matrix}

(1)

where g is the M-dimensional column vector of projection data,

f

is the N-dimensional image vector to be reconstructed, usually

M << N

, and

W

is the system matrix. According to CS theory, if the system matrix satisfies the strict isometry condition,¹⁹ Equation (1) can be transformed into a constrained L1 norm optimization problem:

\begin{matrix} min_{f} ∥ Ψ f ∥_{l 1} s . t . g = W f \end{matrix}

(2)

where

Ψ f

is the sparse representation of the image to be reconstructed

f

We enhance the VdCS algorithm that was proposed in Ref,^5,6 which is presented in Algorithm 1 and depicted in Figure 2. The VdCS algorithm consists of two modules: rough reconstruction and optimization. The rough reconstruction module performs an initial approximation of the image, while the optimization module further refines this approximation to find the optimal solution. Specifically, the rough reconstruction module iteratively solves Equation (1) to obtain solutions that approximately meet the constraints specified in Equation (2). This iterative process involves four steps: forward projection, difference computation, backward projection, and correction, as summarized in Equation (3).

\begin{matrix} f^{(n, r, θ)} = f^{(n, r, θ - 1)} + φ_{θ} \cdot M_{θ}^{T} \cdot (g_{θ} - W_{θ} \cdot f^{(n, r, θ - 1)}) \end{matrix}

(3)

where

f^{(n, r, θ)}

represents the image vector at the

θ

-th view in the

r

-th rough reconstruction sub-iteration of the

n

-th overall iteration,

g_{θ}

is the sparse projection data at the

θ

-th view and

φ_{θ}

is the correction factor, an experimental parameter with a significant impact on the module's performance.

W_{θ}

and

M_{θ}

are the forward/backward projection matrix at the

θ

-th view angle respectively. In the rough reconstruction process (Equation (3)), the steps are as follows: first, the image vector

f^{(n, r, θ - 1)}

undergoes forward projection; then, the difference with

g_{θ}

is computed; subsequently, this difference is backward projected; and finally, the reconstructed image is refined through correction.

Figure 2.

The VdCS algorithm: N is the number of the overall iterations, and R and K are the number of the rough reconstruction sub-iterations and optimization sub-iterations, respectively. $x 0$ is the initial image with all pixel values set to zero. $x n$ and $x n + 1$ are the images at overall iterations $n$ and $n + 1$ , respectively. $x r$ and $x r + 1$ are the images at rough reconstruction sub-iterations $r$ and $r + 1$ , respectively. $x k$ and $x k + 1$ are the images at optimization sub-iterations $k$ and $k + 1$ , respectively.

The optimization module takes $∥ Ψ f ∥_{l 1}$ of Eqaution (2) as the objective function and solves the minimization problem. $Ψ$ is the discrete gradient transform (DGT), and $Ψ f$ is the sparse representation of the image to be reconstructed. This module employs the gradient descent method to optimize the objective function, as shown in Equation (4):

\begin{matrix} f^{(n, k)} = f^{(n, k - 1)} - α \cdot d^{(n)} \cdot {\frac{\partial ∥ Ψ f ∥_{l 1}}{\partial f} |}_{f = f^{(n, k - 1)}} \end{matrix}

(4)

where

f^{(n, k)}

denotes the image vector in the

k

-th optimization sub-iteration of the

n

-th overall iteration,

α

is the step size factor, and

d^{(n)}

represents the balance factor calculated in the

n

-th overall iteration.

The forward and backward projection modules of VdCS are distance-drive, referred to as DDFP and DDBP module, respectively. It should be noted that, in the engineering simulation experiments of this study, the DDFP module was also used to simulate the forward projection procedure and generate synthetic projection data.

Algorithm 1

VdCS

Input: The projection data g, Overall iterations N, Rough reconstruction iterations R, Optimization iterations K, Projection angles

Θ

, DGT

Ψ

, Relax factor

λ

, Step size factor

α

Output: The reconstructed image

f^{(N)}

1: for n = 1

\to

N do

f^{(1, 1, 0)}

=

0 // Initial image

3: ▷ 1. Rough Reconstruction

4: for r = 1

\to

R do

5: for

θ

= 1, 2, …,

Θ

φ_{θ}

=

\frac{λ}{\sqrt{ω_{θ} \cdot r}}

f^{(n, r, θ)} = f^{(n, r, θ - 1)} + φ_{θ} \cdot M_{θ}^{T} \cdot (g_{θ} - W_{θ} \cdot f^{(n, r, θ - 1)})

8: end for

9: end for

10:

d^{(n)}

=

\frac{{| f^{(n, 1, 0)} - f^{(n)} |}_{2}}{{| f^{(n)} |}_{2}}, w h e r e f^{(n)} = {\begin{matrix} f^{(n, R, Θ)}, f^{(n, R, Θ)} \geq 0 \\ 0, f^{(n, R, Θ)} < 0 \end{matrix}

11: ▷ 2. Optimization

12: for k = 1

\to

K do

13:

v^{(n, k)} = \frac{\partial {∥ Ψ f ∥}_{l 1}}{\partial f} ∣_{f = f^{(n, k)}}

14:

f^{(n, k)} = f^{(n, k - 1)} - α \cdot d^{(n)} \cdot v^{(n, k - 1)}

15: end for

16: end for

17: return

f^{(N)}

CT-RDNet

A GAN consists of a generator and a discriminator that engage in adversarial learning to enhance the quality of generated data. The architecture of the GAN in this paper is shown in Figure 3, where the CT-RDNet is the generator. As illustrated, the CT-RDNet is a U-shaped end-to-end network specifically designed for CT image restoration and denoising.

Figure 3.

Architecture of the GAN, comprises the CT-RDNet (the generator) and the discriminator. The CT- RDNet is augmented with a self-attention module.

The design of the CT-RDNet is inspired by the conventional U-Net, retaining the advantages of its symmetrical encoder-decoder structure and skip connections. But unlike the conventional U-Net, the CT-RDNet uses convolutional layers with strides for downsampling, rather than max-pooling, to reduce noise sensitivity while preserving critical image details. The CT-RDNet's upsampling layers mirror the downsampling layers’ parameters, with a 4×4 kernel size, a stride of 2, and padding of 1, ensuring that the feature map size is doubled during upsampling and halved during downsampling.

To enhance the ability of the CT-RDNet to understand the overall image structure, we introduced the self-attention (SA)^35,36 mechanism implemented with convolution into the CT-RDNet. The SA module is applied to the concatenated feature maps formed by the first upsampling-generated feature maps and the corresponding downsampling-generated feature maps of the same spatial dimensions, enabling adaptive focus on different regions and capturing global information. As shown in Figure 4, the SA module processes the feature maps through three 1×1 convolutional layers to create query, key, and value tensors. Then the query tensor is multiplied by the key tensor, normalized with SoftMax, and the product is multiplied by the value tensor. Finally, the result passes through another 1×1 convolution layer to produce the self-attention feature maps.

Figure 4.

The architecture of the self-attention (SA) module, which is integrated into the CT-RDNet for capturing global information.

Feedback loop

The feedback loop plays a vital role in the CSUF network, enabling a cooperative integration between the VdCS module and the CT-RDNet. Through the integration, the CSUF network effectively reconstructs ultra-sparse view CT images, reducing noise and artifacts, and restoring detailed structures.

The VdCS module iteratively reconstructs the image from the ultra-sparse sinogram starting with a zero-pixel value image as the initial estimate. While this approach facilitates preliminary reconstruction, it is challenged by artifacts due to severe undersampling. By utilizing the feedback loop, the CT-RDNet replaces the zero-pixel value image with its output, supplying the VdCS module with rich high-frequency details and reduced artifacts as prior information, which leads to higher-quality reconstruction and augmented artifact suppression. Actually, the image generated by the CT-RDNet may exhibit artifacts, such as local distortions or strange textures, due to the inherent limitations of GAN. The feedback loop enables the VdCS module to mitigate these issues, enhancing the overall image quality.

Loss functions

The Least Squares GAN (LSGAN) loss³⁷ addresses issues such as mode collapse and gradient vanishing in GAN training, thereby facilitating more effective learning between the generator and the discriminator. In this paper, the CT-RDNet's LSGAN loss is defined as Equation (5).

\begin{matrix} L_{L S G A N} (G) = E_{(z \sim p (z))} [{(D (G (z)) - 1)}^{2}] \end{matrix}

(5)

where,

z

is the output image.

G

represents the CT-RDNet and

D

represents the discriminator. The notation

E_{(z \sim p (z))}

denotes the expectation operation over a sample

z

drawn from the generated data distribution

p (z)

To balance training stability and the quality of the generated image, the total loss function in this study further combines the aforementioned LSGAN with L1 loss and SSIM loss,^38,39 defined as Equation. (6). L1 loss focus more on pixel accuracy, while SSIM loss focus more on overall image quality. The combination enhances the denoising and restoration performance of the CT-RDNet.

L_{total} (G) = L_{LSGAN} (G) + c \cdot (β \cdot L_{L1} (G) + (1 - β) \cdot L_{SSIM} (G))

(6)

where c is a constant,

L_{L 1} (G)

and

L_{S S I M} (G)

denote L1 and SSIM loss functions respectively. The parameter

β

is a weight that balances the contribution of L1 loss and SSIM loss, and its value is determined experimentally. Unlike L2 loss, which can overly penalize large errors between the output image and the label image, L1 loss mitigates the blurriness and unnaturalness that may arise from L2 loss. To evaluate the similarity between the denoised image generated by the CT-RDNet and the label image, (1-SSIM) is used as the SSIM loss function in this study.

Engineering simulation experiments

Synthetic projection data

The LIDC-IDRI dataset,⁴⁰ developed as a reference for the early diagnosis of lung cancer within the medical imaging research community, was utilized to generate synthetic projection data. A total of 987 thoracic CT images from different patients were selected. Each image has a resolution of 512×512 pixels. Simulating a standard fan-beam CT scanning procedure, we generated synthetic projection data using the DDFP algorithm, with varying numbers of projections. For the training dataset (511 out of the total 987 images), projection numbers of 20, 45, 90, and 180 were selected to generate sparse projection data. For the testing dataset (476 out of the total 987 images), only 20 and 30 at ultra-sparse conditions were selected.

Preparation of the dataset for the GAN training

After obtaining the synthetic projection data for training, we use the VdCS algorithm to initially reconstruct sparse view CT images. These reconstruct images were paired with the original CT images, used as the ground truth (GT) images, to form a sparse-full image dataset for training the GAN. We randomly selected different number of images for each projection number and performed data augmentation technique to enhance the sparse-full image dataset.

Implementation details

DDFP. The relevant parameters are listed in Table 1. $D e t e c t o r N u m$ specifies the number of detectors, and $d l$ indicates the spacing between detector units. $F O V$ represents the field of view, and $p l$ denotes the pixel spacing. $D 1$ refers to the distance from the light source to the rotation center, and $D 2$ refers to the distance from the rotation center to the detector. The projection angle range is set to [0, 2 $π$ ), with projection angles uniformly selected according to the number of projections.

Table 1.

The parameters of DDFP.

DetectorNum	dl(mm)	FOV	pl(mm)	D1(mm)	D2(mm)	Projection angle range
512	1.0	256	0.5	640	640	[0, 2π)

VdCS. A total of 10 iterations were performed, each consisting of 4 sub-iterations for rough reconstruction and 50 sub-iterations for optimization with a step size of 0.01. The parameters related to the detector and the selection of projection angles were consistent with those used in the DDFP algorithm.

GAN. The Adam optimizer⁴¹ was used with a learning rate was of 0.0002 and parameters $β 1 = 0.5$ and $β 2 = 0.999$ to ensure model convergence. The training process was carried out over 200 epochs with a batch size of 8.

In this study, the codes for DDFP and VdCS algorithms were implemented with C++ programming language under the ISO/IEC 14882:2020 standard. And the GAN was trained on an NVIDIA RTX 3090 GPU running Ubuntu 18.04.6 LTS, while the CSUF network was tested using CPU-only. We implemented GAN with PyTorch version 1.10.0 and CUDA version 11.4.

Experimental results

Qualitative analysis

To assess the performance of the CSUF network, we respectively reconstructed images from the testing projection dataset with 20 and 30 angles and performed qualitative analysis for the quality of image. For each of the projection angles, we randomly selected two instances, resulting in four instances (Instance 1–4) in total. The CT reconstructed images at each stage are shown in Figure 5. In this context, GT represents GT images, CS denotes images generated by the VdCS algorithm, CS-U refers to the CT-RDNet enhanced images, CS-U-CS indicates the reconstructed images obtained by the VdCS algorithm with prior image, and CSUF represents the final reconstructed images.

Figure 5.

Reconstructed images at different stages (CS, CS-U, CS-U-CS, and CSUF) and corresponding region of interest (ROI) images. (a) and (b) respectively show the reconstructed images and the ROI images, for Instance 1 at 20 projection angles, (c) and (d) for Instance 2 at 20 projection angles, (e) and (f) for Instance 3 at 30 projection angles, and (g) and (h) for Instance 4 at 30 projection angles.

The CSUF network demonstrates its capability to reconstruct CT images with rich detailed structures and minimal artifacts at 30 projection angles, as shown in the corresponding ROI images of (e) and (g), where the lung textures are restored with high fidelity. Even in more sparse conditions with 20 projection angles, as illustrated in (a) and (c), the CSUF network largely restores the detailed structure of the images. By comparing CS and CS-U in (a), (c), (e), and (g), it can be found that the CT-RDNet effectively suppresses the blur artifacts caused by the VdCS algorithm and significantly restores the texture structure of the CS images. Based on a thorough evaluation of the imaging results, experts from the Department of Medical Imaging at the University of Saskatchewan and the Saskatoon Health Region concluded that the CSUF network proposed in this study has the potential to achieve clinical diagnostic standards for ultra-sparse view CT image reconstruction.

Quantitative analysis

In this study, Peak Signal-to-Noise Ratio (PSNR) and SSIM were employed as image quality metrics. we calculated the PSNR and SSIM values for some images from the testing projection dataset and presented the results in Figure 6. Figure 6(a) and (b) show the PSNR and SSIM box plots, respectively, for 20 projection angles, while (c) and (d) depict the results for 30 projection angles. The results in (a) and (b) indicate that the reconstruction performance of the VdCS algorithm is limited when the projection number is reduced to 20. However, the introduction of the CT-RDNet significantly improves both PSNR and SSIM metrics and reduces the fluctuation of the quality indices of the reconstructed images.

Figure 6.

The PSNR and the SSIM metrics for reconstructed images at different stages (CS, CS-U, CS-U-CS, and CSUF). (a) and (b) show the metrics, respectively, for 20 projection angles, while (c) and (d) present the metrics for 30 projection angles.

Performance evaluation

The complexity of the CT-RDNet was evaluated by calculating its FLOPs and the parameter scale revealing that it comprises 56 million parameters and requires 71 GFLOPs for a single forward pass.

We also measured the computational efficiency of the CSUF network under the condition of 20 projection angles. The average inference time was computed across 100 test samples, and memory usage was monitored during execution. Conducting on an Intel Xeon Silver 4210R CPU @2.40GHz, the CSUF network achieved an average inference time of 40 s and utilized 203 MB of memory.

Discussion

In this study, the feedback loop, together with the self-attention mechanism in CT-RDNet, significantly enhances the performance of the CSUF network. Additionally, the design of the loss function contributes to further improvements.

Feedback loop. The feedback loop enables the VdCS algorithm to reconstruct images using a prior image generated by the CT-RDNet, helping to suppress image artifacts caused by the staircase effect. Moreover, it can suppress or even eliminate local hallucinations introduced by the GAN.⁴² We selected Instance 5 from the testing dataset at 20 projection angles and conducted a comparative analysis on the images at each stage, as shown in Figure 7. The comparison between Figure 7(b) and (d) illustrates that the feedback loop allows CS-U to provide an effective prior image for VdCS, resulting in a noticeably improved reconstruction. Furthermore, the regions marked with the red arrows of (a), (c), and (d) demonstrate that the VdCS algorithm can suppress local hallucination in the CS-U image to some extent.

Figure 7.

Reconstructed images at different stages for Instance 5. (a) GT, (b) CS, (c) CS-U, and (d) CS-U-CS.

The introduction of the feedback loop enhances the CSUF network's adaptability to different iteration numbers of the VdCS algorithm. We conducted experiments with different iteration numbers, selecting Instance 6 from testing projection dataset at 20 projection angles, as shown in Figure 8 (A). The PSNR and SSIM metrics are shown in Figure 8(B). Figure 8(A) and (B) reveal that both the visual quality and metrics of reconstructed image of the CSUF network remain relatively consistent across different iteration numbers.

Figure 8.

A: Ground truth and reconstructed images under different iteration numbers for Instance 6. (a) GT, (b) 5 iterations, (c) 15 iterations, (d) 50 iterations. B: PSNR and SSIM results.

Self-Attention module . We introduce the SA mechanism into the CT-RDNet to emphasize long-distance dependencies in images and capture global information. This approach helps preserve image details and improve visual quality. To evaluate the impact of SA on CT images, we selected Instance 5 and 7 from the testing dataset at 20 angles and reconstructed images using the CSUF network with and without SA, referred to as CSUF_WithSA and CSUF_WithoutSA, as shown in Figure 9.

Figure 9.

Ground truth and CSUF reconstructed images with and without self-attention (SA). Results for Instance 5: (a) GT, (b) CSUF_WithSA, (c) CSUF_WithoutSA. (d), (e), and (f) are results for Instance 7.

The comparison between Figure 9(b) and (c), as well as Figure 9(e) and (f), demonstrates that the images reconstructed by CSUF network with SA are clearer and exhibit more detailed texture structures. Subjectively, the images reconstructed by CSUF network with SA better meet diagnostic imaging needs, indicating that SA plays a crucial role in enhancing the performance of the CSUF network.

Loss function parameters. The construction of the loss function is essential for training the CT-RDNet, which in turn affects the performance of the CSUF network. The loss function for the CT-RDNet combines LSGAN loss, L1 loss, and SSIM loss. To test the impact of different weight parameters $β$ in Equation (6) on the performance of the CSUF network, we selected 20 test instances (Instance 8–27) from the testing dataset at 20 angles. The CT-RDNet was trained with varying $β$ values and the images were reconstructed using the CSUF network. The PSNR and SSIM metrics for the reconstructed images corresponding to each $β$ value were shown in Figure 10(A).

Figure 10.

A: PSNR and SSIM metrics for the reconstructed images under different weight parameters $β$ (0.3, 0.5, and 0.4) for 20 instances. B: Ground truth and the reconstructed images for different $β$ values. For Instance 8, (a) is the GT image, while (b), (c), and (d) are the reconstructed images with $β$ = 0.3, 0.5, and 0.4, respectively. For Instance 15, (e) is the GT image, with (f), (g), and (h) representing the reconstructed images with the same $β$ values.

From Figure 10 A(a), we observe that the PSNR values fluctuate slightly but do not differ significantly when $β$ is set to 0.3, 0.4, and 0.5. Remarkably, Figure 10 A(b) reveals that the SSIM values are generally higher when $β$ is set to 0.4 compared to 0.3 or 0.5. Therefore, considering both PSNR and SSIM metrics, the optimal performance of the CSUF network is achieved when $β$ is set to 0.4. Figure 10 (B) shows the reconstructed images for Instance 8 and 15 at different $β$ values. The results demonstrate that the image details and texture features are more effectively restored at $β$ =0.4.

Limitation and future directions. Although the CSUF network demonstrates excellent reconstruction performance under extremely ultra-sparse conditions, the simulation experiments conducted utilizing the LIDC dataset in this paper is limited. Extensive validation of the CSUF network using multiple datasets, including real projection data, is necessary in future studies. Additionally, the CSUF network can achieve high-fidelity reconstruction of lung nodules; however, there are still certain limitations in reconstructing other anatomical structures, such as the abdomen and spine. In the future, we will focus on further refining the CSUF network architecture to overcome these limitations and on making targeted adjustments to better serve specific clinical applications, including lung nodule detection.

In the future, we will also conduct detailed experimental comparisons with state-of-the-art (SOTA) methods, such as FBPConvNet and DD-Net, to provide a more comprehensive evaluation of our approach and explore the integration of other advanced deep learning techniques. Furthermore, we aim to facilitate the clinical adoption and implementation of the CSUF network.

Conclusion

CT scanning is an indispensable diagnostic imaging technique in modern medicine, and it will maintain its irreplaceable clinical status even in the foreseeable future. However, for more than half a century since CT scanning was introduced into clinical practice, concerns about the potential harm from X-ray radiation have persisted. Ultra-sparse view CT scanning can significantly reduce the hazards associated with X-ray radiation. Motivated by the goal of enhancing quality of life and ensuring health and safety, this paper focuses on ultra-sparse view lung CT image reconstruction methods that integrate emerging technologies from relevant fields. Overall, this paper contributes a robust and promising methodology to the field of ultra-sparse view lung CT image reconstruction, with potential implications for reducing the X-ray radiation exposure to patients during CT scanning.

Footnotes

Acknowledgments

This work is supported by the Nature Science Foundation of Shandong Province (Grant No. ZR2019MF048) and the Natural Science Foundation of China (Grant No. 12204273), and partly supported by the National Key R&D Program of China (Grant No. 2021YFB1407001).

The authors sincerely thank the editor and reviewers for their valuable feedback and insightful suggestions, which have greatly enhanced the overall presentation of our work.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

Zhaoguang Li

Zhengxiang Sun

Lin Lv

Yuhan Liu

Feng-rong Sun

References

Bosch de Basea Gomez

, et al. Risk of hematological malignancies from CT radiation exposure in children, adolescents and young adults. Nat Med 2023; 29: 3111–3119.

Bernard

Comby

Lemogne

, et al. Deep learning reconstruction versus iterative reconstruction for cardiac CT angiography in a stroke imaging protocol: reduced radiation dose and improved image quality. Quant Imaging Med Surg 2021; 11: 392–401.

Tan

Cao

, et al. Ultralow-dose [¹⁸F] FDG PET/CT imaging: demonstration of feasibility in dynamic and static images. Eur Radiol 2023; 33: 5017–5027.

Ruano-Ravina

, et al. Low-dose CT for lung cancer screening. Lancet Oncol 2018; 19: e131–e132.

Sun

Yang

, et al. View-driven compressed sensing method of CT image reconstruction. In: 3rd Int. Symp. Appl. Mater. Sci. Energy Mater. (SAMSE 2019), Shanghai, China, 2019.

. View-driven compressed sensing method of CT image recon- struction . M.S. thesis, Dept. Inf. Commun. Eng., Shandong Univ., Jinan, China, 2021.

Zhang

,., et al. X-ray CT image reconstruction from few-views via total generalized p-variation minimization. In 2015 37th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Milan, Italy, 2015.

Goodfellow

. NIPS 2016 tutorial: Generative adversarial networks. (2017) [Online]. Available: http://arxiv.org/abs/1701.00160

Deng

, et al. RFormer: transformer-based generative adversarial network for real Fundus image restoration on a new clinical bench-mark. IEEE J. Biomed. Health Inform 2022; 26: 4645–4655.

10.

Wei

Feng

, et al. Low-Dose CT image denoising using a generative adversarial network with a hybrid loss function for noise learning. IEEE Access 2020; 8: 67519–67529.

11.

Kulathilake

KASH

Abdullah

Bandara

AMR

, et al. InNetGAN: inception network-based generative adversarial network for denoising low-dose computed tomography. J. Healthc. Eng 2021; 2021: 1–20.

12.

Babyn

. Sharpness-aware low-dose CT denoising using conditional generative adversarial network. J. Digit. Imaging 2018; 31: 655–669.

13.

Chen

, et al. Medical Image Reconstruction Using Generative Adversarial Network for Alzheimer Disease Assessment with Class-Imbalance Problem. In: Proc. 2020 IEEE 6th Int. Conf. Comput. Commun. (ICCC), Chengdu, China, 2020.

14.

Krizhevsky

Hinton

. ImageNet classification with deep convolutional neural networks. In: Proc. NIPS, 2012.

15.

Ronneberger

Fischer

Brox

. U-Net: Convolutional networks for biomedical image segmentation. In: Proc. Int. Conf. Med. Image Comput. Comput. Assist. Intervent., 2015.

16.

Sidky

Kao

C-M

Pan

. Accurate image reconstruction from few-views and limited-angle data in divergent-beam CT. J X-Ray Sci Technol 2006; 14: 119–139.

17.

Mou

, et al. Low-dose X-ray CT reconstruction via dictionary learning. IEEE Trans Med Imag 2012; 31: 1682–1697.

18.

Singh

, et al. Abdominal CT: comparison of adaptive statistical iterative and filtered back projection reconstruction techniques. Radiology 2010; 257: 373–383.

19.

Donoho

. Compressed sensing. IEEE Trans Inf Theory 2006; 52: 1289–1306.

20.

Chen

G-H

Tang

Leng

. Prior image constrained compressed sensing (PICCS): a method to accurately reconstruct dynamic CT images from highly undersampled projection data sets. Med Phys 2008; 35: 660–663.

21.

Wang

. A perspective on deep imaging. IEEE Access 2016; 4: 8914–8924.

22.

Wang

Mueller

, et al. Image reconstruction is a new frontier of machine learning. IEEE Trans Med Imag 2018; 37: 1289–1296.

23.

Zhang

Chen

. Accurate and robust sparse-view angle CT image reconstruction using deep learning and prior image constrained compressed sensing (DL-PICCS). Med Phys 2021; 48: 5765–5781.

24.

Jin

McCann

Froustey

, et al. Deep convolutional neural network for inverse problems in imaging. IEEE Trans Image Process 2017; 26: 4509–4522.

25.

Zhang

Liang

Dong

, et al. A sparse-view CT reconstruction method based on combination of DenseNet and deconvolution. IEEE Trans Med Imag 2018; 37: 1407–1417.

26.

Sun

Liu

Yang

. Degradation-Aware deep learning framework for sparse-view CT reconstruction. Tomography 2021; 7: 932–949.

27.

Lee

Lees

Kim

, et al. Deep-neural-network-based sinogram synthesis f or sparse-view CT image reconstruction. IEEE Trans Radiat Plasma Med Sci 2019; 3: 109–119.

28.

Okamoto

Ohnishi

Haneishi

. Artifact reduction for sparse-view ct using deep learning with band patch. IEEE Trans Radiat Plasma Med Sci 2022; 6: 859–873.

29.

Guan

, et al. Generative modeling in sinogram domain for sparse-view CT reconstruction. IEEE Trans Radiat Plasma Med Sci 2024; 8: 195–207.

30.

Niu

, et al. DRONE: dual-domain residual-based optimization network for sparse-view CT reconstruction. IEEE Trans Med Imag 2021; 40: 3002–3014.

31.

, et al. DDPTransformer: dual-domain with parallel transformer network for sparse view CT image reconstruction. IEEE Trans Comput Imaging 2022; 8: 1101–1116.

32.

Zhang

Chen

Xia

, et al. LEARN++: recurrent dual-domain reconstruction network for compressed sensing CT. IEEE Trans Radiat Plasma Med Sci 2023; 7: 132–142.

33.

Yang

, et al. Low-dose CT image denoising using a generative adversarial network with wasserstein distance and perceptual loss. IEEE Trans Med Imag 2018; 37: 1348–1357.

34.

Chen

, et al. LEARN: learned experts’ assessment-based reconstruction network for sparse-data CT. IEEE Trans Med Imag 2018; 37: 1333–1347.

35.

Zhang

., Goodfellow

., Metaxas

., et al. : Self-Attention Generative Adversarial Networks. In Proc. Int. Conf. Mach. Learn. (ICML), PMLR, 2019.

36.

Vaswani

, et al. Attention is all you need. In: Proc. Adv. Neural Inf. Process. Syst., 2017.

37.

Mao

Xie

, et al. Least squares generative adversarial networks. 2016. http://arxiv.org/abs/1611.04076. Online]. Available: .

38.

Wang

Bovik

Sheikh

, et al. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 2004; 13: 600–612.

39.

Zhao

Gallo

Frosio

, et al. Loss functions for image restoration with neural networks. IEEE Trans Comput Imag 2017; 3: 1–11.

40.

Armato

III ,., et al. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys 2011; 38: 915–931.

41.

Kingma

. Adam: A method for stochastic optimization. (2014) [Online]. Available: http://arxiv.org/abs/1412.6980

42.

Xie

, et al. DeSRA: detect and delete the artifacts of GAN-based real-world super-resolution models. In: Proc. 40th Int. Conf. Mach. Learn. (ICML’23), 2023.