Abstract
BACKGROUND:
An effective method for achieving low-dose CT is to keep the number of projection angles constant while reducing radiation dose at each angle. However, this leads to high-intensity noise in the reconstructed image, adversely affecting subsequent image processing, analysis, and diagnosis.
OBJECTIVE:
This paper proposes a novel Channel Graph Perception based U-shaped Transformer (CGP-Uformer) network, aiming to achieve high-performance denoising of low-dose CT images.
METHODS:
The network consists of convolutional feed-forward Transformer (ConvF-Transformer) blocks, a channel graph perception block (CGPB), and spatial cross-attention (SC-Attention) blocks. The ConvF-Transformer blocks enhance the ability of feature representation and information transmission through the CNN-based feed-forward network. The CGPB introduces Graph Convolutional Network (GCN) for Channel-to-Channel feature extraction, promoting the propagation of information across distinct channels and enabling inter-channel information interchange. The SC-Attention blocks reduce the semantic difference in feature fusion between the encoder and decoder by computing spatial cross-attention.
RESULTS:
By applying CGP-Uformer to process the 2016 NIH AAPM-Mayo LDCT challenge dataset, experiments show that the peak signal-to-noise ratio value is 35.56 and the structural similarity value is 0.9221.
CONCLUSIONS:
Compared to the other four representative denoising networks currently, this new network demonstrates superior denoising performance and better preservation of image details.
Introduction
Currently, computed tomography (CT) is one of the most widely used medical imaging techniques [1]. During CT scanning, the use of higher X-ray doses can pose radiation hazards to the human body, making it necessary to minimize the X-ray dose as much as possible [2, 3]. However, as the X-ray dose decreases, the reconstructed CT images suffer from increased noise, which significantly affects the accuracy of diagnostic results. Therefore, current research efforts are primarily focused on the denoising of low-dose CT images, aiming to seek better solutions.
CNN, as a deep learning method, has been proven to achieve excellent results in the field of image denoising [4, 5]. In 2016, Zhang et al. proposed the Denoising Convolutional Neural Network (DnCNN), which proposes a CNN-based end-to-end architecture and incorporated residual learning and batch normalization techniques to effectively accomplish image denoising tasks [6]. In 2017, Chen et al. proposed the Residual Encoder-Decoder Convolutional Neural Network (RED-CNN), achieving effective low-dose CT image denoising [7]. In 2020, Wang et al. proposed the Two-Stream Sparse Network (TSSN), which learns shallow features and deep features separately through shallow and deep streams [8]. In each stream, Sparse Residual Blocks with sparse connections and residual learning are introduced to achieve efficient hierarchical feature aggregation and improve image restoration quality. In 2021, Tian et al. proposed the Dual Denoising Network (DudeNet) that consists of two sub-networks for extracting global and local features, respectively [9]. These features are then fused to obtain significant features, enabling effective recovery of fine details in complex noisy images. In 2021, Jia et al. proposed the Pixel Attention Convolutional Neural Network (PACNN) that utilizes pixel attention mechanism to generate pixel-level feature maps for eliminating random noise [10]. Furthermore, it improves the denoising performance on color noise images by introducing a color-related loss function.
In recent years, Transformer has made significant breakthroughs in the fields of natural language processing and computer vision, effectively compensating for the limitations of CNN methods in terms of global connectivity [11]. Compared to CNN-based methods, Transformer possesses the characteristic of global connectivity, enabling efficient modeling of long-range dependencies. Transformer facilitates better acquisition of global features, thereby exhibiting superior performance over CNN in various tasks. In 2020, Alexey et al. proposed the Vision Transformer (ViT), which divides an image into multiple patches and trains the model on sequences of these patches, achieving remarkable performance in image classification tasks [12]. In 2021, Wang et al. proposed the U-shaped Transformer (Uformer), which utilizes non-overlapping window-based self-attention mechanism, significantly reducing the computational complexity of high-resolution feature maps. It successfully accomplishes image restoration tasks [13]. In 2021, Liu et al. proposed the Shifted Window Based Transformer (Swin Transformer), which employs a sliding window mechanism and a hierarchical structure to accomplish visual tasks [14]. The sliding window mechanism confines attention within non-overlapping local windows while allowing connections between windows, effectively improving computational efficiency. In 2021, Liang et al. proposed the Image Restoration Using Swin Transformer (SwinIR), which utilizes residual Swin Transformer blocks for deep feature extraction in image restoration tasks [15]. It can recover high-quality images from low-quality inputs. In 2021, Dong et al. proposed the Transformer with Cross-Shaped Windows (CSWin Transformer), which utilizes Cross-Stripe Self-Attention to perform self-attention computations along horizontal and vertical stripes [16]. It also incorporates local enhanced position encoding, exhibiting powerful performance in image classification, object detection, and semantic segmentation tasks. In 2022, Zamir et al. proposed the Efficient Transformer for High-Resolution Image Restoration (Restormer) [17]. This network implicitly models global context by applying channel-level self-attention and employs a gated depthwise convolutional feed-forward network for feature transformation. It achieves outstanding performance in various image restoration tasks.
CNN is mainly applied to process data with regular grid structures, while Transformer is suitable for tasks involving sequential data modeling [18]. However, traditional CNN and Transformer are not inherently applicable to irregular structured data. In contrast, GCN can efficiently handle irregular structured graph data. The significant advantage of GCN lies in its ability to capture both local and global structures between nodes in graph data, thereby extracting effective feature representations [19]. In fields such as social network analysis, recommendation systems, and chemical computations, GCN has achieved remarkable results and have become one of the essential tools in graph data analysis [20–22]. In 2021, Chen et al. proposed an Encoder-Decoder based Graph Convolutional Network (ED-GCN) [23]. This approach introduces different encoder and decoder structures and considers both CNN and GCN to handle local and non-local features simultaneously, effectively improving the denoising performance of low-dose CT images. In 2022, Chen et al. proposed the Graph Convolutional Network with Multi-Information Fusion (GCN-MIF) [24]. This network incorporates GCN and constructs intra-slice and inter-slice graphs based on top-k similarity to leverage non-local relationships and contextual information among pixels, achieving efficient denoising of low-dose CT images. In 2023, Fu et al. proposed a Two-branch Network Architecture with Swin Transformer Units and Graph Convolution Operation (SW-GCN) [25]. This network introduces a spatial adaptive branch using Swin Transformer units to effectively capture global contextual information. Simultaneously, by incorporating a channel adaptive branch, it constructs the topological structure among feature maps, providing a good initialization for connections in each feature map, thus significantly improving the resolution of short-axis PET images.
Although existing CNN-based networks can accurately capture the similarity between local features, they only focus on local information interaction and neglect global context interaction, tending to consider feature map information with the same weights. However, the noise distribution in low-dose CT images is non-uniform and should have significantly different weights in different regions. Therefore, we consider utilizing Transformer to handle global relationships in the image. By using self-attention mechanism, we can capture long-range dependencies between pixels in the image, thus better modeling the contextual relationships. On the other hand, based on the message-passing mechanism and feature learning capabilities of GCN, the model can dynamically construct neighborhoods in the feature space to detect potential correlations in the feature maps generated by hidden layers. Therefore, we propose a Channel Graph Perception based U-Shaped Transformer Network (CGP-Uformer). The network consists of the following key modules: (1) Convolutional Feed-forward Transformer module, (2) Channel Graph Perception module, and (3) Spatial Cross-Attention module. CNN efficiently extracts local features from input data, capturing the local correlations and providing powerful feature learning capabilities. Transformer, through self-attention, greatly facilitates spatial interactions in images. The Channel Graph Perception module leverages GCN to process the graph structure information of image channels, flexibly fusing different channel information using adjacency matrices. This effectively models global pixel relationships, providing the network with comprehensive channel graph perception features and further enhancing the denoising performance of the network.
Method
In this section, we first present the overall structure of the CGP-Uformer network, followed by a detailed description of its core components: (1) Convolutional Feed-forward Transformer module; (2) Channel Graph Perception module; (3) Spatial Cross-Attention module.
Overall structure
The detailed structure of CGP-Uformer is shown in Fig. 1(a). Given a noisy input image

The structure of CGP-Uformer network. (a) The overall structure of CGP-Uformer. (b) The structure of the convolutional feed-forward Transformer module. (c) The structure of the channel graph perception module. (d) The structure of the spatial cross-attention module.
The network starts from the input noisy image and extracts the feature information of the input feature maps layer by layer through the encoders. And the decoders gradually recover the representation of the feature maps. The encoders and decoders are composed of multiple convolutional feed-forward Transformer (ConvF-Transformer) blocks, each block possessing feature extraction and reconstruction capabilities. The specific structure of the ConvF-Transformer block will be detailed in Section 2.2. By adopting channel graph perception module to process the feature maps obtained at the bottom of the network, we enhance the capability of channel information fusion and further strengthen the feature extraction capacity of the network through a message passing mechanism. The detailed explanation of the channel graph perception module will be provided in Section 2.3. To enhance feature representation and information exchange within the network, we utilize a spatial cross-attention module. This module involves computing spatial cross-attention between the feature maps generated by the encoder and the upsampled feature maps from corresponding layers, which are then fed into the respective decoder. A detailed description of the spatial cross-attention module will be presented in Section 2.4.
Traditional feed-forward networks typically consist of two fully connected layers. Each neuron in this structure is connected to all neurons in the previous layer, enabling forward propagation of information. Although fully connected layers perform well in tasks such as image classification, they are prone to overfitting. Therefore, we propose the ConvF-Transformer module, as shown in Fig. 1(b), which can enhance the ability of feature representation and information transmission.
The ConvF-Transformer consists of the classic Transformer self-attention mechanism and a convolution-based feed-forward network. In the convolution-based feed-forward network, the input is processed through a parallel branch structure, where each branch sequentially performs feature extraction using 3×3 convolution, 5×5 convolution, and 7×7 convolution. This design serves two main purposes: (1) gradually increasing the receptive field by incrementally expanding the convolution kernel size, allowing for the extraction of local features at different receptive field scales and enabling feature fusion learning; (2) through the parallel branches during the backpropagation process, each branch learns different weights, thereby enhancing the feature representation capability of the network. The process of convolutional feed-forward Transformer is defined as follows:
Here, X and X C represent the input feature maps and output feature maps. MHA denotes the computation of classic multi-head self-attention, ConvF denotes the convolution-based feed-forward network, LN represents layer normalization, W T and W C are learnable weights. In summary, the feed-forward network of ConvF-Transformer complements the information flowing through the encoder and decoder via parallel branch convolution operations, thereby extracting more refined features.
During the process of channel feature extraction, CNN treats the information within each channel equally, disregarding the potential structural information among channels, thus increasing the difficulty of effective feature map fusion. In contrast, GCN can consider global structural information and provide a more comprehensive context modeling of the image channels. Moreover, GCN can propagate information across different channels through graph convolution operations, enabling cross-channel information exchange that helps the model better understand the semantic structure of the image and extract richer feature representations. Therefore, to handle the flow of channel information, we propose the Channel Graph Perception Block (CGPB) to better fuse channel information. The structure of the Channel Graph Perception Block is shown in Fig. 1(c).
We transform each channel feature into a vertex, obtaining the corresponding vertex features. By calculating the cosine similarity, we can determine the correlations between vertices and form an adjacency matrix. We design to compute the mean value of the correlations between each vertex and other vertices in the adjacency matrix, and then binaryize it. Specifically, correlation values above the mean are set to 1, while those below the mean are set to 0. Finally, we input the vertex features and the processed adjacency matrix into a two-layer GCN for computation to facilitate effective fusion of channel information. The process of the CGPB is defined as follows:
Here, X
i
and X
k
represent the i-th and k-th vertex features, Cos represents cosine similarity calculation, and A
ik
represents the correlation between the i-th and k-th vertices.
Here, μ
i
represents the mean of the correlations between the i-th vertex and other vertices in the adjacency matrix. When the correlation between the i-th and k-th vertices is greater than or equal to the mean, A
ik
is assigned a value of 1. When the correlation between the i-th and k-th vertices is less than the mean, A
ik
is assigned a value of 0.
Here, A is the adjacency matrix composed of the correlations between different vertices.
Here,
In CGPB, we construct a graph structure to enhance the efficiency and flexibility of information interaction between channel features. By computing the cosine similarity and performing binarization on the vertex features of each channel, we construct an adjacency matrix that connects each channel vertex to its most highly correlated channel vertex, thereby significantly reducing information redundancy. This design enables more efficient and responsive information exchange between channels.
Traditional U-shaped networks concatenate the feature maps of the encoder and decoder through skip connections, but there may be semantic discrepancies in feature fusion. To better integrate the spatial information flow between the encoder and decoder and reduce semantic disparities, we propose the Spatial Cross-Attention module (SC-Attention). This module performs SC-Attention calculation between the upsampled feature maps and the corresponding level encoder feature maps and feeds the results into the respective decoder. SC-Attention calculation is achieved using sliding window technique. By utilizing the features generated by the encoder to guide the elimination of blurriness introduced during the progressive feature restoration process in the decoder, this module effectively enhances feature clarity. The structure of the SC-Attention is shown in Fig. 1(d). The process of SC-Attention is defined as follows:
Here, X enc represents the feature map obtained from the corresponding layer of the encoder, X up represents the feature map obtained after upsampling, SCA represents the SC-Attention operation, W sca represents the learnable parameters, A sc represents the resulting SC-Attention, AP represents the AdaptivePooling operation, and softmax represents the Softmax activation function.
The application of SC-Attention enables better collaboration between the encoder and decoder, enhancing spatial information fusion, reducing semantic disparities, improving feature representation, and effectively improving the clarity and accuracy of feature extraction in the network. Consequently, it enhances the performance and efficacy of the network in low-dose CT image denoising tasks.
Construction of the dataset
To evaluate the performance of the CGP-Uformer network, we utilized the latest dataset provided by the Cancer Imaging Archive (TCIA). This dataset is an updated version based on the AAPM Mayo Clinic’s 2016 Low-Dose CT Challenge (https://www.cancerimagingarchive.net). The dataset encompasses three different CT scan types (abdomen, chest, and head) from 140 patients. For each patient, the data includes both quarter-dose CT images and corresponding normal-dose CT images (NDCT). From this dataset, we randomly selected 5,000 pairs of images with a size of 256×256 pixels, which were then split into 80% for training, 10% for validation, and 10% for testing. In the model, we utilized the quarter-dose CT images as input and the corresponding normal-dose CT images as label images for performance evaluation.
Hyper-parameters setting
In the experiments, the model adopt Charbonnier loss as the loss function [26] and is trained using the Adam optimizer (β1 = 0.9, β2 = 0.999) [27]. During the training process, a total of 100 epochs is trained, and the initial learning rate is set to 2×10–4 and gradually decreased to 5×10–5. Additionally, the batch size is 10 during training.
The CPU used is Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10 GHz, and the GPU is NVIDIA GeForce GTX 3090 Ti. These experiments are implemented by Python language based on the PyTorch framework.
Comparison of denoising ability of low dose CT images
We investigated the performance of CGP-Uformer in low-dose CT image reconstruction and conducted a comparative study between CGP-Uformer and four classical networks (DnCNN, RED-CNN, Uformer, and SwinIR). In our experiments, we focused on low-dose lung CT images. As shown in Fig. 2, the results demonstrate that CGP-Uformer network achieved the best performance. Particularly, within the red rectangular region, the number of pulmonary alveoli is maximized, and the image clarity is significantly improved, which is of great assistance in enhancing the accuracy of medical diagnosis. Fig. 3 shows the enlarged images corresponding to the red box of Fig. 2. The red circle area in Fig. 3 is the main observation area. It can be seen from Fig. 3 that the alveolar structure of the CGP-Uformer-images is more obvious and clearer. The DNCNN-images and RED-CNN-images have obvious noise, and SwinIR-images and Uformer-images still have some noise.

Denoising results of low-dose lung CT images.

The enlarged images correspond to the red box in Fig. 2.
For the processing of low-dose abdominal CT images, we conducted a comparative study as shown in Fig. 4. The results demonstrate that the denoising effects of the DnCNN and RED-CNN networks are overly blurry, leading to a significant loss of detailed information. While the Uformer and SwinIR networks perform better in denoising, they still exhibit a certain level of noise. In comparison, the CGP-Uformer network excels in preserving image details. The enlarged area in Fig. 5 demonstrates that the edges of the DNCNN-images and RED-CNN images are too fuzzy, and the clear details of the edge part are lost. The SwinIR images and Uformer images also exhibit a loss of some edge details. Regarding the overall visual impact, the proposed method in this paper has better results in terms of intuitive noise comparison and is closer to the NDCT image.

Denoising results of low-dose abdominal CT images.

The enlarged images correspond to the red box in Fig. 4.
We selected peak signal-to-noise ratio (PSNR) [28], structural similarity (SSIM) [29], root mean square error (RMSE) [30], training time, and parameters as evaluation metrics to quantitatively assess the quality of different networks in low-dose CT image reconstruction. For quantitative comparisons in low-dose CT image denoising, we have detailed the results of different networks in Table 1.
Comparison of quantitative results of five networks in test images
To ensure that each feature map focuses only on the most relevant features and avoids redundancy during the feature fusion process, we introduce CGPB. This module adopts graph convolutional operations to capture relationships among channels, significantly enhancing the denoising and reconstruction capabilities of the networks. Therefore, the position of CGPB in the CGP-Uformer network plays a crucial role. Further investigation into the position of the CGPB in the CGP-Uformer network will contribute to further improving the performance of the network.
CGP-Uformer is a U-shaped network where the encoder gradually extracts feature information from input feature maps, and the decoder progressively restores the representation of feature maps. CGPB effectively integrates structural information from channel maps. Therefore, to maximize the utilization of channel-level feature information and better recover the representation of feature maps, we adopt a bottom-up approach to add CGPB layer by layer. CGP-Uformerlow denotes CGPB added at the bottom of the network. CGP-Uformerlow - 4 indicates CGPB added at the bottom and the 4th encoder layer. CGP-Uformerlow - 4 -3 represents CGPB added at the bottom, 4th, and 3rd encoder layers. CGP-Uformerlow - 4 -3 - 2 signifies CGPB added at the bottom, 4th, 3rd, and 2nd encoder layers.
As shown in Fig. 6, in the processing of low-dose chest CT images, adding CGPB at the bottom of the CGP-Uformer network achieves the best model performance, and the image details at the boundaries of the red rectangle region in the figure are most prominent. Figure 7 shows the enlarged images corresponding to the red box of Fig. 6. Figure 7 illustrates that the organ tissue details of CGP-Uformerlow - 4 -3 - 2-images and CGP-Uformerlow - 4 -3-images are blurred, and the contrast with the surrounding background is not obvious. The denoising effect of CGP-Uformerlow - 4 is slightly better, but some clear details are lost. The CGP-Uformerlow is clearer in organ tissue details, and the contrast with the surrounding background is also more obvious.

Denoising results of low-dose chest CT images.

The enlarged images correspond to the red box in Fig. 6.
We adopted PSNR, SSIM, RMSE, training time, and parameters as evaluation metrics to quantitatively assess the denoising performance of CGPB when added at different positions in the decoder. The detailed evaluation results can be found in Table 2.
Comparison of quantitative results for layer-by-layer addition of CGPB
Based on the evaluation results in Table 2, we can observe an inverse relationship between network performance and the total number of layers where CGPB is added. As the total number of CGPB layers increases, the network performance gradually decreases. This is because the number of channels decreases as the feature maps progress from the bottom of the network to the output of the fourth encoder layer. The feature maps obtained at the bottom of the network have the highest number of channels, thus adding CGPB only at the bottom of the network allows for maximum utilization of channel information and more effective fusion of channel features. However, as the total number of CGPB layers increases, the number of channels decreases, limiting the amount of channel information that this module can utilize, thereby affecting the learning effectiveness of CGPB on channel features. In conclusion, both theory and practice indicate that adding CGPB at the bottom of the network is more effective.
To further explore the effects of different modules in the CGP-Uformer network on low-dose CT image reconstruction, we conducted relevant ablation experiments: (1) no ConvF-Transformer, using the classic Swin Transformer; (2) no CGPB; (3) no SC-Attention, with the encoder and decoder concatenated through skip connections. As shown in Fig. 8, when reconstructing low-dose chest CT images, networks without ConvF-Transformer and without CGPB produce highly blurry images, losing a significant amount of details. However, the network without SC-Attention performs slightly better in denoising, but the resulting images are slightly blurry and lose a small number of details. Figure 9 shows the enlarged images corresponding to the red box of Fig. 8. The red circle area in Fig. 9 is the main observation area. It can be seen from Fig. 9 that the result without CGPB is over-smoothed and the edges of tissue become blurred. We can observe that the contrast of tissue details with the surrounding background is not obvious in the result without CGPB. In the results without ConvF-Transformer and without SC-Attention, the contrast of tissue details with the surrounding background is slightly obvious.

Results of the ablation experiments.

The enlarged images correspond to the red box in Fig. 8.
We evaluated the performance of the three networks mentioned above in terms of low-dose CT image reconstruction using five metrics: PSNR, SSIM, RMSE, training time, and parameters. The evaluation results have been compiled in Table 3.
Ablation experiments for the CGP-Uformer network
By observing the data column by column, we can see that the PSNR and SSIM values show a gradual increase, while the RMSE values show a gradual decrease. This indicates an improvement in the quality of low-dose CT image reconstruction with variations in network structures. Therefore, we can rank ConvF-Transformer, CGPB, and SC-Attention in order of importance, namely CGPB, ConvF-Transformer, and SC-Attention. Hence, the experimental results demonstrate that CGPB is the most crucial factor in enhancing the denoising performance of the CGP-Uformer network.
Computed tomography (CT) is the most widely used medical imaging technique. Maintaining a consistent quantity of projection angles while decreasing the radiation dose at every angle constitutes a efficacious approach for attaining low-dose CT. However, this will lead to an increase in the intensity of noise in the reconstructed image, which will have an impact on subsequent image processing and medical diagnosis. This paper proposes a Channel Graph Perception based U-shaped Transformer (CGP-Uformer) network, aiming to achieve high-performance denoising of low-dose CT images. The network combines the local information correlation ability of CNN, the global information capturing ability of Transformer, and the graph structure modeling ability of GCN. Compared to four representative denoising networks currently, this network demonstrates superior denoising performance and better preservation of image details.
Inspired by the ability of CNN to extract local features, we propose the ConvF-Transformer block. In the feed-forward network of this block, we gradually increase the convolution kernel size to expand the receptive field. In this way, the local features of different receptive field scales can be extracted, and feature fusion of different scales can be realized. At the same time, we choose parallel branches to implement the feed-forward network. By multiplying the two branches, each branch can learn different weights from each other, thereby enhancing the feature representation ability of the network.
Inspired by the ability of GCN to model graph structures, we propose the CGPB. In CGPB, we construct a graph structure to enhance the efficiency and flexibility of information interaction between channel features. Through the calculation of cosine similarity and subsequent binarization of vertex features within each channel, an adjacency matrix is formed. This matrix establishes connections between individual channel vertices and their most strongly correlated counterparts, resulting in a substantial reduction of information redundancy. The CGPB can realize cross-channel information propagation, help the model to better understand the semantic structure of the image and extract richer feature representations.
Inspired by the ability of Transformer to model long-range dependencies in images, we propose SC-Attention block. This block performs SC-Attention calculations between the upsampled feature maps and the corresponding level of encoder feature maps, subsequently feeding the resulting outputs into the respective decoder. The SC-Attention block can better integrate the spatial information flow between encoder and decoder and reduce semantic differences.
In a nutshell, the contributions provided by this paper are as follows: An end-to-end deep learning method, CGP-Uformer, is proposed for high-performance denoising of low-dose CT images. The CGP-Uformer consists of ConvF-Transformer blocks, CGPB blocks, and spatial SC-Attention blocks. The ConvF-Transformer blocks enhance the ability of feature representation and information transmission. The CGPB promotes the propagation of information across different channels and enabling inter-channel information interchange. The SC-Attention blocks reduce the semantic difference in feature fusion between the encoder and decoder by computing spatial cross-attention.
Conclusion
This paper proposes the CGP-Uformer network structure for low-dose CT image denoising tasks. The network cleverly combines the local feature extraction capability of CNN, the long-range dependency modeling capability of Transformer, and the structural modeling capability of GCN, further enhancing the performance of the network in denoising. CNN, with its excellent local feature extraction capability, can capture the local correlations in the input data and provide strong support for feature learning. Transformer greatly promotes the interaction between spatial pixels in the image through self-attention mechanism. Furthermore, the introduction of a channel graph perception module and the utilization of GCN to process the inherent structural information of image channels enable flexible utilization of the adjacency matrix, effectively integrating information from different channels. As a result, it comprehensively models the relationships among global pixels, providing the network with more comprehensive channel graph perception features.
Compared to classical networks such as DnCNN, RED-CNN, Uformer, and SwinIR, the proposed CGP-Uformer network in this paper demonstrates superior performance in low-dose CT image denoising tasks while effectively preserving the image details. Despite this, it must be noted that our model has the limitation of long training time and large number of parameters. Therefore, in order to reduce the model training time and the number of parameters without compromising the accuracy of low-dose CT image reconstruction, we plan to apply data preprocessing and depthwise separable convolution in our proposed method. Additionally, we are currently applying this network to the field of 3D Electron Paramagnetic Resonance Imaging (EPRI) to evaluate its performance in EPRI imaging mode.
Funding
This work is supported in part by the following grants: (1) National Natural Science Foundation of China (Award Number: 62071281), (2) Local Science and Technology Development Fund Project Guided by the Central Government (Award Number: YDZJSX2021A003), and (3) Research Project Supported by Shanxi Scholarship Council of China (Award Number: 2020-008).
