Enhancement based convolutional dictionary network with adaptive window for low-dose CT denoising

Abstract

BACKGROUND:

Recently, one promising approach to suppress noise/artifacts in low-dose CT (LDCT) images is the CNN-based approach, which learns the mapping function from LDCT to normal-dose CT (NDCT). However, most CNN-based methods are purely data-driven, thus lacking sufficient interpretability and often losing details.

OBJECTIVE:

To solve this problem, we propose a deep convolutional dictionary learning method for LDCT denoising, in which a novel convolutional dictionary learning model with adaptive window (CDL-AW) is designed, and a corresponding enhancement-based convolutional dictionary learning network (called ECDAW-Net) is constructed to unfold the CDL-AW model iteratively using the proximal gradient descent technique.

METHODS:

In detail, the adaptive window-constrained convolutional dictionary atom is proposed to alleviate spectrum leakage caused by data truncation during convolution. Furthermore, in the ECDAW-Net, a multi-scale edge extraction module that consists of LoG and Sobel convolution layers is proposed in the unfolding iteration, to supplement lost textures and details. Additionally, to further improve the detail retention ability, the ECDAW-Net is trained by the compound loss function of the pixel-level MSE loss and the proposed patch-level loss, which can assist to retain richer structural information.

RESULTS:

Applying ECDAW-Net to the Mayo dataset, we obtained the highest peak signal-to-noise ratio (33.94) and sub-optimal structural similarity (0.92).

CONCLUSIONS:

Compared with some state-of-art methods, the interpretable ECDAW-Net performs well in suppressing noise/artifacts and preserving textures of tissue.

Keywords

Low-dose CT deep convolutional dictionary learning adaptive window multi-scale edge extraction patch-level loss

1 Introduction

Although X-ray computed tomography (CT) has been applied in medical imaging for the past decades, the radiation dose delivered to patients is raising more concerns. X-ray may lead to potential determinist and stochastic risks, such as genetic or cancerous diseases [1], thus X-ray CT radiation dose is required as low as reasonably achievable. However, low-dose CT (LDCT) images reconstructed by traditional methods, for example, filtered back projection (FBP), are contaminated with noise and streak artifacts. There are three categories to improve LDCT images: sinogram filtering technique, iterative reconstruction, and image post-processing after reconstruction. Since the post-processing method does not need to access projection data and does not depend on the scanning geometry of CT system, it has better portability. Therefore, how to estimate high-quality images from LDCT images has attracted increased attention. In this work, we focus on the post-processing category.

In the past decade, some excellent image processing techniques have been used in LDCT image processing, such as non-local means [2], block-matching 3D (BM3D) [3] and dictionary learning [4]. However, these traditional methods always suffer from difficult parameter setting issue and perform over-smoothing in the denoised images or introduce additional artifacts [5]. In recent years, deep learning has achieved great success in the field of image denoising. A growing number of researchers are exploring the role of convolutional neural network (CNN) in LDCT image processing [6, 7]. Chen et al. [8] suppressed noise and artifacts by using a CNN with an Encoder-decoder structure (REDCNN), which has slight model parameters and satisfactory inference time, but the noise removal capability is limited. Fan et al. [9] adopted quadratic neurons to construct an encoder-decoder structure, referred as the quadratic autoencoder (QAE), and improved model efficiency in LDCT denoising. By extracting the directional components of artifact via the directional wavelet transform, Kang [10] proposed a wavelet domain deep learning algorithm (WavNet) to suppress CT-specific noise, but some of the textures are not fully recovered. Later he proposed a wavelet domain residual network (WavResNet) to fully recover textures [11] and got improvements in both qualitative and quantitative analysis. Due to the irregular distribution of noise and artifacts in LDCT images, these two wavelet-based networks show some limitations, such as inferior denoising performance (blurred edges and details). To improve the ability to protect anatomical structures and textures in the process of denoising, Geng et al. [12] applied the content-noise complementary learning (CNCL) strategy and proposed a generative adversarial network (GAN), which demonstrated a robust generalization capability. Han et al. [13] proposed a GAN with a dual-encoder-single-decoder structure, in which a pyramid non-local attention module is designed in the main coding branch. Marcos et al. [14] proposed a GAN composed of spatial attention module and channel attention module. However, the unsatisfactory ability of noise description and unstable training process still surround GAN frameworks [5]. In [15], Liang et al. proposed EDCNN based on edge enhancement and dense connections. Benefiting from a trainable Sobel convolution, the quality of CT images processed by EDCNN is significantly improved. Jiang et al. [16] applied the frequency separation network to respectively recover the low and high frequency components in LDCT denoising and achieved inspiring noise/artifact removal while retaining the detailed information. Although CNN-based methods achieved promising results, they lack intuitive explanation because the process of fitting the mapping function is abstract [17–19].

Based on the cost function composed of data fidelity term and regularization terms, convolutional dictionary learning [19] is a good strategy to understand the logic of model optimization. The convolutional dictionary learns to represent images by utilizing convolution operation of a set of filters and their corresponding feature maps. After iterations of the cost function, the estimated image gradually approaches the optimal result through an optimization algorithm. Zheng [17] unfolded the iterations in convolutional dictionary learning using deep learning and proposed a framework of deep convolutional dictionary learning (Dcdicl) for Gaussian noise reduction in nature images and acquired sufficient interpretability and noise suppression. Wang et al. [18] adopted a concise rain convolutional dictionary model (RCDNet) to encode rain shapes and got inspiring de-raining performance. Fu et al. [20] proposed a model-driven deep unfolding method for JPEG artifacts and generated competitive deblocking results. Li et al. [21] used an online multi-scale convolutional sparse coding model to encode the snow in video, and finely delivered the sparse scattering and multi-scale shapes of real snow. Wang [22] also adopted the strategy of convolutional dictionary to represent metal artifacts in CT images. Liu et al. [23] combined the advantages of residual convolution network and convolutional dictionary learning and proposed an LDCT denoising model (DRCSC) to recover the high-frequency information. But their model still violates the mathematical constraint (convolutional dictionary D and D^T are the rotation of each other), thus the interpretation is limited. Furthermore, they did not compare with some leading CNN methods, such as EDCNN. In our previous study [24], we proposed a transfer learning densely connected convolutional dictionary learning (TLD-CDL) framework for LDCT denoising, which has a good balance between noise reduction and the preservation of details.

It is worth noting that the convolution kernel and convolutional dictionary atom can be regarded as a special filter with a rectangular window of amplitude one, and its small size makes it susceptible to spectral leakage. Spectral leakage refers to the generation of additional amplitudes in multiple frequency bands. To solve the problem, Tomen et al. [25] suggested applying the standard Hamming window and proved that the convolution kernel employing Hamming window increases classification accuracy and improves robustness.

Inspired by the pioneering studies of convolutional dictionary learning and spectrum leakage, we propose a new convolutional dictionary learning with adaptive window (CDL-AW) to encode noise/artifacts in LDCT images, in which adaptive window is designed to solve the spectrum leakage. Then we propose an enhancement-based convolutional dictionary network with adaptive window (ECDAW-Net) to achieve the solution of CDL-AW model. At the same time, a multi-scale edge extraction module composed of trainable LoG and Sobel convolution layers is designed to suppress edge/texture blurring in the iterative process. And the compound loss is used to train the ECDAW-Net to further prevent the excessive smoothing of the image.

The remainder of this paper is organized as follows. Section 2 reviews convolutional dictionary learning and spectral leakage. In Section 3, the proposed CDL-AW model and ECDAW-Net are elaborated, including the proposed adaptive window, the optimization procedure, as well as the multi-scale edge extraction module. In Section 4, the experimental designs and representative results are given. Finally, we will discuss relevant issues and conclude this paper in Section 5.

2 Related work

2.1 Convolutional dictionary learning

Many CNN-based noise reduction methods aim to estimate the mapping function T from X to μ:

$μ = T (X),$ (1)

where X ∈ R^H×W and μ ∈ R^H×W are LDCT image and its noise/artifacts. However, the end-to-end direct mapping in (1) is not suitable, as the noise/ artifacts in LDCT images are related to the real signal and the denoised results are easy to be blurred. In addition, the noise reduction network constructed by (1) is unexplainable for LDCT denoising because of the difficulty in understanding the logic of internal operation mechanism.

In contrast, dictionary learning has a clear physical meaning (clear mathematical formula) for image denoising. Convolutional dictionary learning decomposes an image by using convolution operation instead of matrix multiplication used in patch-based dictionary learning [4]. The noise μ in LDCT image, with the size of H×W, can be represented as the summation of convolutions between a set of filters (also called convolutional dictionary) and their corresponding feature maps:

$μ = \sum_{i = 1}^{N} D_{i} \otimes Z_{i},$ (2)

where ⊗ denotes convolution, D_i ∈ R^k×k and Z_i ∈ R^H×W are the i-th filter (or convolutional dictionary atom) and its corresponding coefficient map, respectively. The optimal estimations of {D_i } _i=1,2,...,N and {Z_i } _i=1,2,...,N can be obtained by solving the following minimization problem:

$min_{{D_{i}}, {Z_{i}}} {‖ μ - \sum_{i = 1}^{N} D_{i} \otimes Z_{i} ‖}_{F}^{2} + λ_{Z} {‖ Z_{i} ‖}_{1},$ (3)

where ‖Z_i ‖ ₁ is the regularization term and λ_Z is the weighted parameter. This convolutional optimization strategy overcomes the redundancy of traditional patch-based dictionary learning and avoids the loss of underlying structure caused by image block vectorization. Several optimization methods can be used to solve the objective function in (3), including iterative shrinkage-threshold [26], alternating direction method of multipliers [27] and half-quadratic splitting [28].

Although the convolutional dictionary learning in (3) can obtain rotation-invariant features, it may be limited in some further applications [17, 24, 25, 29]. One reason is that the handcrafted sparse prior (e.g., L₁ sparsity in (3)) is fixed, and the other reason is that it neglects analysis of dictionary filters design (e.g., size and shape) in the solution process.

2.2 Spectral leakage

Under the signal theory: the short signal observed in a finite window can be regarded as a long signal truncated by a rectangular window with an amplitude of 1, and the product of the signal itself and the spatial rectangular window is the convolution of their spectrum. Because the window function cannot be infinitely wide, that is, the spectrum cannot be an impulse function, the convolution in the frequency domain will inevitably produce broadening and sidelobe, which is called spectrum leakage. The two-dimensional convolution kernel can be thought of as being truncated by two one-dimensional rectangular windows of magnitude 1 by the product operation, thus the frequency domain characteristics of the rectangular signal will cause the dictionary kernel to have spectrum leakage in the frequency domain. The rectangular leakage may reduce the signal-to-noise ratio and blur the image. Convolutional dictionary atoms can be regarded as special filters with rectangular windows, which are easily affected by spectrum leakage in the process of truncated filtering. For this reason, the convolutional dictionary atom can be constrained designing a two-dimensional adaptive window function, and the image blur caused by spectrum leakage thus can be effectively alleviated.

As the rectangular function imports window artifacts to the frequency response in the form of “ripples”, Tomen [25] proposed to solve spectral leakage in CNNs by introducing the 2-dim Hamming window. The 1-dim Hamming window is defined in (4) and the 2-dim Hamming window is the outer product of two 1-dim Hamming windows.

$w (m) = σ - (1 - σ) \cdot cos (\frac{2 π m}{M}) 0 ⩽ m ⩽ M,$ (4)

where σ = 25/46 and M is the discrete sample. However, hand-designed window functions cannot get rid of the hyper-parameters, which often need to be manually adjusted according to the tasks.

3 Materials and methods

Although the use of well-designed window functions to reduce spectral leakage is not new, its application in the systems based on convolutional dictionary learning has been ignored. The rich textures of CT images are distributed in different frequency bands, so spectrum leakage will inevitably dispute the visual performance. To fill this gap, we propose an adaptive window and incorporate it into the denoising model based on convolutional dictionary learning, namely the CDL-AW model, then design a convolutional dictionary network (ECDAW-Net) to realize the solution of the proposed CDL-AW model.

3.1 Adaptive window

Different CT images have different spectral characteristics, the window function with fixed parameters is difficult to fit all CT images, and the optimal parameters need to be debugged and determined repeatedly. Instead of using the standard Hamming window, we propose the adaptive window w by utilizing an adaptive coefficient matrix α to the 2-dim Hamming window w_Ham, namely,

$w_{Ham} = w (m) \times w (m),$ (5-1)

$w = α ⊙ w_{Ham},$ (5-2)

where × and ⊙ are outer product and Hadamard product, α is a coefficient matrix which determined by the size of Hamming window. Taking M = 3 as an example, α has nine elements to adjust the Hamming window, as shown in Fig. 1(a). We get the adaptive α by setting it as a tensor with gradient information, thus it can be updated in the process of network backpropagation. In the process of convolutional sparse coding, the convolutional dictionary atom and w are multiplied by elements, which helps to make the truncated region smoother. In specific applications, the value of M is determined by the size of the convolutional dictionary atom. Figure 1(b) shows the process of a randomly assigned convolutional dictionary {D_i} _i _= 1,2,…,N constrained by a set of adaptive 2-dim window {w_i} _i _= 1,2,…,N, and the convolutional dictionary with adaptive window $D_{adaptive} \in R^{k \times k \times N}$ is denoted by

Fig. 1

The windowing operation. (a) the adaptive window and (b) convolutional dictionary with the adaptive window.

$D_{adaptive} = {w_{i} ⊙ D_{i}}_{i = 1, 2, . . . N},$ (6)

where w_i and D_i can be constantly updated through network training. Thus, the $D_{adaptive}$ is more suitable for images containing different frequency information.

3.2 CDL-AW model

Like the noise model in (2), we sparsely represent noise and artifacts by the proposed $D_{adaptive}$ as follows:

$μ = \sum_{i = 1}^{N} D_{adaptive}^{i} \otimes Z_{i} .$ (7)

Then, the convolutional dictionary learning with adaptive window (CDL-AW) is proposed to predict the clean image Y from noisy image X by following minimization problem:

$min_{Y, Z} {‖ X - Y - \sum_{i = 1}^{N} D_{adaptive}^{i} \otimes Z_{i} ‖}_{F}^{2} + ζ_{1} φ (Z) + ζ_{2} φ (Y),$ (8)

where $Z = {Z_{i}}_{i = 1, 2, . . ., N}$ , $φ (Z)$ and φ (Y) are regularization terms, ζ₁ and ζ₂ are weighted parameters. We utilize the proximal gradient technique [26] and solve the cost function in (8) by alternately updating Y and $Z$ . In the t-th stage:

$Z^{(t)} = \underset{Z}{arg min} f (Z^{(t - 1)}) + \frac{1}{2 η_{1}} {‖ Z - Z^{(t - 1)} ‖}_{F}^{2} + 〈 Z - Z^{(t - 1)}, \nabla f (Z^{(t - 1)}) 〉 + ζ_{1} φ (Z),$ (9-1)

$Y^{(t)} = \underset{Y}{arg min} g (Y^{(t - 1)}) + \frac{1}{2 η_{2}} {‖ Y - Y^{(t - 1)} ‖}_{2}^{2} + 〈 Y - Y^{(t - 1)}, \nabla g (Y^{(t - 1)}) 〉 + ζ_{2} φ (Y),$ (9-2)

where $f (Z^{(t - 1)}) = {‖ X - Y^{(t - 1)} - \sum_{i = 1}^{N} D_{adaptive}^{i} \otimes Z_{i}^{(t - 1)} ‖}_{F}^{2}$ , $g (Y^{(t - 1)}) = ‖ X - Y^{(t - 1)} - \sum_{i = 1}^{N} {D_{adaptive}^{i} \otimes Z_{i}^{(t)} ‖}_{F}^{2}$ , η₁ and η₂ are the step-size parameters. It is easy to prove that the problems in (9-1) and (9-2) are equivalent to:

$min_{Z} \frac{1}{2} {‖ Z - (Z^{(t - 1)} - η_{1} \nabla f (Z^{(t - 1)}) ‖}_{F}^{2} + ζ_{1} η_{1} φ (Z),$ (10-1)

$min_{Y} \frac{1}{2} {‖ Y - (Y^{(t - 1)} - η_{2} \nabla g (Y^{(t - 1)}) ‖}_{F}^{2} + ζ_{2} η_{2} φ (Y) .$ (10-2)

Moreover, by substituting

$\nabla f (Z^{(t - 1)}) = (D_{adaptive}) \otimes^{T} (\sum_{i = 1}^{N} D_{adaptive}^{i} \otimes Z_{i}^{(t - 1)} + Y^{(t - 1)} - X),$ (11)

$\nabla g (Y^{(t - 1)}) = \sum_{i = 1}^{N} D_{adaptive}^{i} \otimes Z_{i}^{(t)} + Y^{(t - 1)} - X,$ (12)

we can obtain the updating formulas (10-1) and (10-2) as:

$Z^{(t)} = pro x_{ζ_{1} η_{1}} [Z^{(t - 1)} - η_{1} D_{adaptive} \otimes^{T} (\sum_{i = 1}^{N} D_{adaptive}^{i} \otimes Z_{i}^{(t - 1)} + Y^{(t - 1)} - X)],$ (13)

$Y^{(t)} = pro x_{ζ_{2} η_{2}} [(1 - η_{2}) Y^{(t - 1)} + η_{2} (X - \sum_{i = 1}^{N} D_{adaptive}^{i} \otimes Z_{i}^{(t)})],$ (14)

where ⊗^T denotes the transposed convolution, prox (·) is the proximal operator dependent on the regularization terms. The solving process of CDL-AW is depicted in Algorithm 1:

Algorithm 1: $min_{Y, Z} {‖ X - Y - \sum_{i = 1}^{N} D_{adaptive}^{i} \otimes Z_{i} ‖}_{F}^{2} + ζ_{1} φ (Z) + ζ_{2} φ (Y)$
For t = 0, . . . S do
1. ${\hat{μ}}^{(t)} = X - Y^{(t - 1)}$
2. ${\tilde{μ}}^{(t)} = \sum_{i = 1}^{N} D_{adaptive}^{i} \otimes Z_{i}^{(t - 1)}$
3. $ɛ^{(t)} = η_{1} D_{adaptive} \otimes^{T} ({\tilde{μ}}^{(t)} - {\hat{μ}}^{(t)})$
4. $Z^{(t)} = pro x_{ζ_{1} η_{1}} (Z^{(t - 1)} - ɛ^{(t)})$
5. $μ^{(t)} = \sum_{i = 1}^{N} D_{adaptive}^{i} \otimes Z_{i}^{(t)}$
6. ${\hat{Y}}^{(t)} = X - μ^{(t)}$
7. $Y^{(t)} = pro x_{ζ_{2} η_{2}} ((1 - η_{2}) Y^{(t - 1)} + η_{2} {\hat{Y}}^{(t)})$

3.3 The proposed ECDAW-Net architecture

The traditional methods of solving the proximal operator often require a complex computational process, such as the work in [29]. Therefore, we unfold the optimization algorithm iteratively by adopting two networks to construct the proximal operators, then propose the ECDAW-Net architecture.

1) Network design

As shown in Fig. 2, the proposed ECDAW-Net consists of S stages, corresponding to S iterations of the Algorithm 1 for solving (9). In each stage, we designed two networks, $proxNe t_{θ_{Z}^{(t)}}$ and $proxNe t_{θ_{Y}^{(t)}}$ , to construct the proximal operators with the parameters $θ_{Z}^{(t)}$ and $θ_{Y}^{(t)}$ , respectively. We visualized the process of the t-th stage, the parameters shown in Fig. 2 correspond to the aforementioned algorithm one-to-one. {D_i} _i_=1,2,…,N is randomly assigned before the first stage, {w_i } _i=1,2,...,N is a set of corresponding adaptive window. In the t-th stage, the previous outputs Y^(t - 1)^, $Z^{(t - 1)}$ , and the LDCT image X are inputs, $Z^{(t)}$ is updated by $proxNe t_{θ_{Z}^{(t)}}$ and then used to get the Y^(t) by $proxNe t_{θ_{Y}^{(t)}}$ . In detail, ${\hat{μ}}^{(t)}$ is the difference between X and Y^(t - 1), ${\tilde{μ}}^{(t)}$ is acquired by the $D_{adaptive}$ and estimated $Z^{(t - 1)}$ , while μ^(t) is updated by $Z^{(t)}$ which is based on $Z^{(t - 1)}$ and the residual information ɛ^(t).

Fig. 2

The iteration process of ECDAW-Net.

Except for windowing, signals with a narrower spatial domain are more susceptible to spectral leakage, and the expansion of spatial width may help to acquire sharp boundaries. To extend the spatial width of convolution kernel in $proxNe t_{θ_{Z}^{(t)}}$ and $proxNe t_{θ_{Y}^{(t)}}$ , we utilize hybrid dilated convolution [31] as basic unit and construct the two proximal operators via hybrid dilated module (as shown in Fig. 3). In detail, skip connection is applied to improve feature utilization and prevent gradient disappearance, and the BN layers are utilized to hinder over-fitting.

Fig. 3

The hybrid dilated module.

Especially, we elaborately designed the process marked by dotted rectangle in Fig. 2. Unlike the $proxNe t_{θ_{Z}^{(t)}}$ with N-channels, the input-channel of $proxNe t_{θ_{Y}^{(t)}}$ is not beneficial for learning feature since most of the previous updating information would be compressed. To expand the channel, we introduce an auxiliary variable $B^{(t - 1)}$ with N-channels and put it behind the feature map (as shown in Fig. 4). In addition, an edge enhancement module based on Sobel and LoG operators is designed to alleviate blurring and low contrast in the process of noise reduction. In detail, a 3×3 trainable Sobel convolution layer designed in [15] is used for edge extraction. Besides, a trainable LoG convolution layer is also designed based on the famous edge detection operator LoG. Since the LoG operator first uses the Gaussian function to smooth the image and then uses the Laplacian operator to detect the image edge according to the zero-crossing point of the second derivative, it has the following advantages: (1) strong noise immunity, the LoG template can restrain noise from interfering with edge features; (2) no directionality, which can save computation; (3) high positioning accuracy, it can extract edge points with weak contrast. Benefiting from the points above, like the trainable Sobel convolution layer designed in [15], we design the 5×5 trainable LoG convolution layer based on the LoG operator to extract features of edges, as shown in Fig. 5(a), its good performance benefits from the learnable factor β which is a tensor with gradient information and can be updated in the process of network backpropagation. Like the parameters updating mechanism of the ordinary convolution layer in CNN, the factor β can also be learned from the training data. Since the 5×5 LoG template may not be able to extract small-scale features, we design a multi-scale edge extraction module consisting of the 3×3 trainable Sobel convolution layer [15] and the trainable LoG convolution layer, as shown in Fig. 5(b). 1×1 Conv can be seen as a linear transformation of the input channel. Then we add the first feature map of the output of $proxNe t_{θ_{Y}^{(t)}}$ (the remaining channels is denoted as auxiliary variable $B^{(t)}$ ) and the extracted edges by the proposed multi-scale edge extraction module, and finally get Y^(t) in the t-th stage.

Fig. 4

The process in red dotted rectangle in Fig. 2.

Fig. 5

The multi-scale edge extraction module.

2) Training phase

equation 15 In the training phase, according to experimental experience, the number and the size of convolutional dictionary atom is 32 and 9×9, respectively. Too many convolutional dictionaries will not significantly improve performance but increase memory usage. A set of small convolution dictionaries are difficult to effectively represent streak artifacts, while the excessive size will affect the reference time of the model. Then, the convolutional dictionary is initialized with the random value between [0, 0.2]. The proposed adaptive window (M = 9) has the same number of channels as the convolutional dictionary. Besides, we set $Z^{(0)} = 0$ , Y⁽⁰⁾ and the auxiliary variable $B^{(0)}$ are initialized by:

${Y^{(0)}, B^{(0)}} = pro x_{θ_{Y}^{(0)}} (cat (X, C \otimes X)),$ (15)

where $C \in R^{M \times M \times 1 \times N}$ is a set of convolution kernels, which can be updated in the form of a convolutional layer in the training process, and the cat represents the concatenation operation. The network settings are outlined in Table 1. According to trial-and-error experiments, the total iteration S is experimentally set to 10, the hyper-parameters η₁ and η₂ is 0.1 and 0.5, respectively. In the training process, the mini-batch size is 8, the learning rate started from 1e-4 and slowly decreased down to 4e-6 after 30 epochs. The Adam optimizer with default configuration is used to update the learnable parameters.

Table 1

The configuration of the proposed moudle (f: kernel size, s: stride, p: padding, d: dilation rate)

	Layers	Parameters	Channel (input)	Channel (output)
proxNet _{θ_Z}	Conv	f=5; p = 2; s = 1; d = 1	32	32
	Conv	f = 3; p = 2; s = 1; d = 2	32	32
	Conv	f = 3; p = 3; s = 1; d = 3	32	32
proxNet _{θ_Y}	Conv	f=5; p = 2; s = 1; d = 1	33	33
	Conv	f = 3; p = 2; s = 1; d = 2	33	33
	Conv	f = 3; p = 3; s = 1; d = 3	33	33
Multi-scale edge	Sobel	f=3; p = 1; s = 1; d = 1	1	16
Extraction module	LOG	f = 5; p = 2; s = 1; d = 1	1	16
	Conv	f = 1; p = 0; s = 1; d = 1	32	1

The compound loss $L$ is adopted in each stage, which weighs the global and local loss:

$L = l_{MSE} + τ \cdot l_{Patch},$ (16)

$l_{Patch} = {‖ E_{local} (I) - E_{local} (O) ‖}_{L_{1}},$ (17)

where τ is the weight parameter and set to 0.01 in this study, l_Patch is the proposed local energy consistency loss based on image blocks, which is defined as the L₁ norm of “local energy graph difference” between input I and output O. The energy graph is defined as follows:

$E_{local} (I (x, y)) = \frac{1}{8} \sum_{s, t \in N_{x, y}} {(I (x, y) - I (s, t))}^{2},$ (18)

where N_x,y is a neighborhood of the pixel (x, y). With the patch-level loss l_Patch, the over-smoothing caused by pixel-level loss MSE is alleviated to some extent. To reduce the number of model parameters and the inference time, the total loss function only involves the sum of the errors in the first layer and the last layer. The total loss is weighted by the following stages:

$L_{total} = λ_{1} L (Y, Y^{(0)}) + λ_{2} L (Y, Y^{(S)}) + λ_{3} L (X - Y, μ^{(S)}),$ (19)

where λ is the weight parameter. We assign less weight to the first stage to prevent its larger loss from weakening the role of subsequent stages, while higher weight for the last stage. Through trial-and-error experiments, λ₁ = 0.1, λ₂ = 0.9, and λ₃ = 1 in this study.

All the experiments were implemented in Python with the Pytorch library. Equipped with a Linux computer (Intel Core i7-9700K CPU, RAM 32 G, NVIDIA GeForce RTX 2080 SUPER), we used the NVIDIA CUDA10.1 to accelerate the training processing.

4 Experiments and results

In this section, we first describe the experiment setting. Then we compare our proposed method with other existing methods. Finally, we introduce the ablation study.

4.1 Experimental setting

We compare ECDAW-Net with the traditional method, typical leading deep learning-based methods, and the convolutional dictionary learning-based method, including, K-SVD [4], REDCNN [8], EDCNN [15], CTformer [35], QAE [9], CNCL [12] and TLD-CDL [24]. In particular, CTformer is the first pure Transformer architecture for LDCT denoising. For a fair comparison, we trained all networks with our training data. The dataset from “the 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge” [32], authorized by Mayo Clinic, and the Piglet dataset [33] are utilized to train our model. The Mayo dataset includes paired LDCT and corresponding normal-dose CT (NDCT) images with size of 512×512, collected from 10 patients. 760 pairs of CT images from 9 patients were randomly selected as the training datasets, and 35 images from one patient were randomly selected as the testing dataset. Furthermore, the proposed method was also evaluated on the piglet dataset from a GE scanner (Discovery CT750 HD), in which the images obtained under 300mAs are as the normal dose, and images with a range of different does (50%, 25%, 10%, and 5%) are LDCT images [34]. 400 pairs of piglet images (150mAs and 300mAs tube current) are randomly selected as the training dataset, and 50 images were selected from images with dose 25%, 10%, and 5%, respectively. In addition, we also tested our model on 40 real clinic LDCT images, which was approved by our hospital’s ethics committee.

For the Mayo and Piglet datasets, we judged the performance of the model by multiple quantitative measures as well as subjective evaluations by two radiologists. For clinical data, the subjective evaluation of two radiologists is directly relied upon. Six quality metrics are utilized for objective evaluation comparison between our ECDAW-Net and other methods mentioned above. The quality metrics include peak signal-to-noise ratio (PSNR), structural similarity (SSIM) [36], gradient magnitude similarity deviation (GMSD) [37], feature similarity (FSIM) [38], visual information fidelity (VIF) [39] and its multi-scale pixel domain implementation (VIFs) [39]. SSIM computes the product of luminance similarity, contrast similarity, and structural similarity, then predicts the image local quality at a position. A higher SSIM means better similarity between the denoised image and the reference image. FSIM combines the similarities of phase congruency maps and gradient magnitude maps between the reference and the processed image. VIF quantifies the information that could ideally be extracted by brain, and its score indicates the amount of shared information between the reference and processed images. As same as SSIM, the higher values of FSIM and VIF mean better image quality. GMSD indicates the pixel-wise similarity of the gradient magnitude maps between the reference and processed images, and a good processed result has a low GMSD score. The implementation of these comparison procedures is based on their official codes, and their parameters are set according to the recommendations of the papers.

4.2 Comparison with other methods

4.2.1 Mayo experiments

1) Visual assessment

Figures 6, 7 show the processed results of two representative slices (denoted as Case1 and Case2) from the Mayo test dataset by different methods. All CT images in axial view are displayed in the window [-160,240] HU. We also zoom three ROI (region of interesting) areas (marked by rectangles in Figs. 6(a) and 7(a)) for better comparison, as shown in Fig. 8. ROI1 and ROI2 are related to lesions and marked respectively by red and blue circles in Fig. 8, ROI3 shows soft tissue. From the NDCT images (Figs. 6(b) and 7(b)), we can observe clear lesions, but they are contaminated by serious artifacts and noise in the LDCT images, as shown in Figs. 6(a) and 7(a), thus are difficult to be outlined precisely. The denoising performance of KSVD is not significant (as shown in Figs. 6(c) and 7(c)), which acquired over-smoothing results (as shown in Fig. 8(c1)-(c3)). REDCNN shows a certain noise suppression, as shown in Figs. 6(d) and 7(d). From the images (e) in Figs. 6, 7, we can see that the result of EDCNN is more satisfactory in edge preserving, but the streak artifacts in ROI2 are still not completely suppressed (see the details marked by arrows in Fig. 8(e2)). Images (f) in Figs. 6, 7 show that CTformer can significantly suppress noise and artifacts, however, the lesions in ROI1 and ROI2 lost sharp boundaries (see (f1) and (f2) in Fig. 8). Results (g) and (i) in Figs. 6, 7 show that QAE and TLD-CDL removed most speckle noise in LDCT images, but still cannot remove artifacts effectively (see details in Fig. 8(g2) and (i2)). In contrast, CNCL and our ECDAW-Net obtain better results with effective noise/artifact removal and tissue/structure preserving. The contours of lesions in ROI1 and ROI2 (as shown in Fig. 8(h1)-(h2) and 8(j1)-(j2)) can be readily recognized. In addition, both CNCL and ECDAW-Net have satisfactory soft tissues in ROI3 (as shown in Fig. 8(h3) and 8(j3)), while textures of ECDAW-Net are the clearest. These results demonstrate that ECDAW-Net has an outstanding capacity for texture preservation and noise suppression.

Fig. 6

Comparison of processed images by different methods for Case1.

Fig. 7

Comparison of processed images by different methods for Case2.

Fig. 8

The zoomed ROIs in Figs. 6, 7.

2) Quantitative assessment

Six objective metrics were used to evaluate the denoising performance of the proposed ECDAW-Net, and several competing denoising methods for objective comparison. The values of six metrics are tagged in the left-upper and right-upper corners in the images in Figs. 6, 7. In addition, all methods were conducted on 35 LDCT images from the test dataset. Table 2 shows the average values of metrics on all processed LDCT images obtained by different methods. The best and second-best values are indicated in red and blue, respectively. From Table 2, CNCL and KSVD are inferior on PSNR, but the gaps with KSVD, REDCNN, and EDCNN in other quantitative metrics are very small. For VIF and VIFs, KSVD and REDCNN have the lowest scores, which are consistent with the visual judgment of images over-smooth and low contrast. Although CNCL got inferior on PSNR, its SSIM, VIF, and VIFs have satisfactory values. QAE and CTformer are superior to the two typical CNN-based methods above. TLD-CDL and ECDAW-Net are ahead of other methods in all indicators, which proves that convolutional dictionary learning has an inspiring prospect in the domain of LDCT post-processing. Especially, ECDAW-Net has the best value for several metric, indicating that ECDAW-Net can achieve good noise/artifact suppression (PSNR), feature information preserving (FSIM, VIF, and VIFs), and its result is close to the corresponding NDCT image (GMSD).

Table 2

Quantitative results of the different algorithm

	PSNR↑	SSIM↑	GMSD↓	FSIM↑	VIF↑	VIFs↑
LDCT	30.2805±0.5197	0.8585±0.0124	0.0876±0.0063	0.9452±0.0039	0.3931±0.0161	0.4643±0.0157
K-SVD	33.2760±0.3758	0.9117±0.0064	0.0688±0.0014	0.9553±0.0025	0.4214±0.0123	0.5023±0.0112
REDCNN	33.3665±0.3933	0.9132±0.0074	0.0584±0.0030	0.9621±0.0023	0.4271±0.0139	0.5193±0.0132
EDCNN	33.2217±0.4164	0.9081±0.0081	0.0605±0.0031	0.9614±0.0025	0.4324±0.0145	0.5204±0.0138
Ctformer	33.3942±0.3801	0.9141±0.0075	0.0589±0.0030	0.9625±0.0023	0.4352±0.0147	0.5259±0.0131
QAE	33.6436±0.4001	0.9161±0.0074	0.0581±0.0030	0.9625±0.0023	0.4399±0.0146	0.5302±0.0137
CNCL	33.1521±0.5604	0.9186±0.0064	0.0636±0.0040	0.9598±0.0022	0.4412±0.0115	0.5371±0.0107
TLD-CDL	33.9225±0.3422	0.9201±0.0068	0.0579±0.0036	0.9636±0.0022	0.4467±0.0136	0.5377±0.0123
ECDAW-Net	33.9413±0.3527	0.9200±0.0069	0.0575±0.0033	0.9639±0.0022	0.4472±0.0138	0.5388±0.0126

To compare the distribution characteristics of different methods on the test dataset, we analyzed PSNR and SSIM through boxplots, as shown in Fig. 9. Boxplot summarizes a set of data through five statistics: maximum, minimum, the lower and upper quartiles, and the median values. By observing the widths of boxes and the distribution range of PSNR and SSIM in Fig. 9, we can conclude that the ECDAW-Net acquires superb robustness. From the perspective of gray line (median value) in the boxplots, different denoising methods are in the following order: (PSNR) EDCNN < KSVD < CNCL < REDCNN < CTformer < QAE < TLD-CDL < ECDAW-Net, (SSIM) EDCNN < KSVD < REDCNN < CTformer < QAE < CNCL < TLD-CDL/ECDAW-Net. The highest median values confirm the average quantization performance of ECDAW-Net.

Fig. 9

Boxplot of denoised results using different denoising methods on AAPM testing set.

Given that doctors tend to focus on ROIs related to medical diagnosis, we also quantify the performance of each method in ROIs using PSNR and SSIM, as shown in Fig. 10. Comparing the PSNR of each method in the same ROI, the proposed ECDAW-Net always has the leading quantitative performance. For SSIM, although there is no obvious difference in ROI1, ECDAW-Net is significantly ahead in ROI2 and ROI3. Although the EDCNN has a similar edge enhancement module and has SSIM scores that compete with the proposed ECDAW-Net, ECDAW-Net has higher PSNRs. Therefore, our ECDAW-Net outperforms all competing methods qualitatively and quantitatively, in the whole image and local regions as well.

Fig. 10

Quantitative performance of three ROIs in Fig. 8.

In addition, we visualize PSNR and SSIM curves of models after different epochs on the test dataset, as shown in Fig. 11. Contrasted with those comparison methods which are trained nearly 100 epochs, it is seen that after only 20 epochs, our model acquires satisfactory performance, which demonstrates the efficiency of our strategy.

Fig. 11

Visualization of PSNR and SSIM curves along with epoch on the test dataset.

4.2.2 Piglet experiments

1) Visual assessment

Figure 12 shows the processed results of the representative piglet slice (denoted as Case3) under different doses (50%, 25%, 10% and 5%) by different methods. All CT images are displayed in the window [40,400] HU. It shows that KSVD acquires over-smoothing results (as shown in Fig. 12(a1) and (b1)). Since we trained the dataset of 150mAs LDCT images, the CNN-based methods and convolutional dictionary-based methods can maintain a certain denoising performance when the test images are from datasets of 50% dose and 25% dose. In detail, REDCNN, CTformer and QAE blur details, see (a2), (a4) and (a5) in Fig. 12. EDCNN retains edges and details, but is inferior on noise suppression. TLD-CDL leads to over-smoothing in Fig. 12(a7) and (b7). The results of CNCL and the proposed ECDAW-Net are close to the corresponding NDCT image, when the tube current are 150 mAs and 75 mAs (as shown in Fig. 12(a6)-(b6) and (a8)-(b8)). However, when images are heavily polluted (30mAs and 15mAs), all methods cannot achieve satisfactory results.

Fig. 12

Comparison of processed images by different methods for Case3 on the Piglet dataset.

2) Quantitative assessment

Table 3 shows the average quantitative performance obtained by implementing the robustness verification experiments on the whole Piglet dataset. The best and second-best values are indicated in red and blue, respectively. The performance of KSVD and CNCL are inferior on PSNR, although CNCL gains the optimal and suboptimal values in some other indicators. REDCNN, EDCNN, CTformer, QAE and TLD-CDL are very close in all indicators, which is consistent with the visual judgment. When the noise is not very strong (150mAs and 75mAs), the proposed ECDAW-Net not only takes the lead in PSNR, but also most of the other indicators are in the forefront. Excluding VIF and VIFs, KSVD and TLD-CDL show leading indicator values when the tube current is 15mAs. This may be since these methods need to adjust noise -parameters according to the noise level.

Table 3

Quantitative results of the different algorithm on piglet dataset

Dose level	Method	PSNR↑	SSIM↑	GMSD↓	FSIM↑	VIF↑	VIFs↑
50%	LDCT	31.6362	0.9041	0.0721	0.9651	0.3755	0.4657
	K-SVD	32.0547	0.9283	0.0711	0.9646	0.3858	0.4736
	REDCNN	32.8117	0.9323	0.0685	0.9658	0.3972	0.4897
	EDCNN	32.6239	0.9237	0.0629	0.9692	0.3950	0.4859
	CTformer	32.8068	0.9314	0.0680	0.9664	0.3964	0.4887
	QAE	32.6588	0.9315	0.0672	0.9662	0.3920	0.4829
	CNCL	31.1114	0.9283	0.0635	0.9691	0.3943	0.4737
	TLD-CDL	32.7512	0.9347	0.0701	0.9656	0.3983	0.4892
	ECDAW-Net	34.1201	0.9420	0.0580	0.9686	0.4226	0.5082
25%	LDCT	28.6568	0.8683	0.1042	0.9429	0.2983	0.3847
	K-SVD	29.4760	0.9095	0.0988	0.9445	0.3114	0.3963
	REDCNN	30.0259	0.9106	0.0859	0.9528	0.3262	0.4168
	EDCNN	29.8145	0.8982	0.0868	0.9526	0.3219	0.4106
	CTformer	29.9176	0.9058	0.0860	0.9528	0.3206	0.4106
	QAE	29.8093	0.9074	0.0867	0.9520	0.3192	0.4074
	CNCL	29.3702	0.9075	0.0794	0.9581	0.3346	0.4149
	TLD-CDL	29.8932	0.9149	0.0933	0.9483	0.3247	0.4138
	ECDAW-Net	31.1917	0.9171	0.0798	0.9538	0.3449	0.4293
10%	LDCT	27.8282	0.8444	0.1296	0.9324	0.2786	0.3627
	K-SVD	29.4323	0.8861	0.0967	0.9495	0.2976	0.3807
	REDCNN	30.3414	0.8929	0.0898	0.9547	0.3156	0.4058
	EDCNN	29.7752	0.8772	0.1012	0.9494	0.3083	0.3974
	CTformer	29.7070	0.8782	0.0944	0.9514	0.3007	0.3899
	QAE	29.7131	0.8842	0.0971	0.9505	0.3050	0.3914
	CNCL	29.6315	0.8915	0.0887	0.9557	0.3210	0.4031
	TLD-CDL	30.2021	0.9010	0.0852	0.9556	0.3139	0.4006
	ECDAW-Net	30.9769	0.8882	0.0965	0.9491	0.3304	0.4154
5%	LDCT	25.0477	0.8073	0.1692	0.9017	0.2170	0.2922
	K-SVD	27.8818	0.8759	0.1164	0.9345	0.2478	0.3270
	REDCNN	27.1331	0.8456	0.1361	0.9244	0.2432	0.3263
	EDCNN	26.5800	0.8323	0.1464	0.9181	0.2371	0.3187
	CTformer	26.5983	0.8310	0.1387	0.9213	0.2311	0.3124
	QAE	26.4767	0.8346	0.1452	0.9179	0.2331	0.3126
	CNCL	27.4803	0.8555	0.1221	0.9344	0.2627	0.3433
	TLD-CDL	27.8580	0.8754	0.1090	0.9381	0.2527	0.3343
	ECDAW-Net	27.7343	0.8418	0.1437	0.9187	0.2557	0.3356

4.2.3 Real clinic experiments

As shown in Fig. 13, we carried out a robustness verification experiment on the real clinic CT images and selected a representative slice (denoted as Case4). We also select two ROI areas (marked by rectangles in Fig. 13(a)) for better comparison and put them on the left-upper and right-upper corners. From the ROIs of LDCT image (such as Fig. 13(a)), serious artifacts/noise make texture and lesions (such as ROI5) difficult to be outlined precisely. The denoising result of KSVD can hardly see the texture. Another phenomenon we have observed in Fig. 13 is that the REDCNN, CTformer, and QAE can suppress noise and artifacts to some extent. CNCL, TLD-CDL and ECDAW-Net have the leading effect of noise suppression, and the lesions and tissue textures are well preserved, but CNCL still needs to be improved in brightness (such as ROI5 in Fig. 13(g)).

Fig. 13

The denoised results for Case4. The display window is [-160,240] HU.

4.2.4 Model efficiency

Model complexity is a major concern in LDCT denoising. We compared the trainable parameter numbers and testing time of all methods and show them in Table 4. The parameters are those trainable parameters that are updated during the backpropagation.

Table 4
Numbers of parameters, inference times used in the compared models (M: million, s: second)

Parameters Testing time

REDCNN 0.47M 0.157s

EDCNN 0.08M 0.031s

CTformer 1.45M 2.385s

QAE 0.22M 0.016s

CNCL 46.59M 0.472s

TLD-CDL 39.86M 1.197s

ECDAW-Net 3.46M 0.179s

	Parameters	Testing time
REDCNN	0.47M	0.157s
EDCNN	0.08M	0.031s
CTformer	1.45M	2.385s
QAE	0.22M	0.016s
CNCL	46.59M	0.472s
TLD-CDL	39.86M	1.197s
ECDAW-Net	3.46M	0.179s

Concerning the model parameter, CNCL and TLD-CDL have the highest memory usage, which require more than 30 million parameters. This is because CNCL uses two deep learning predictors to learn the respective content, while TLD-CDL has multiple hyper-parameter prediction networks and model update networks. EDCNN and QAE have the smallest number of parameters. Although ECDAW-Net does not have the smallest memory usage, it is acceptable with only 3.46M. Concerning the testing time, EDCNN and QAE also have the fastest image processing speed. The proposed ECDAW-Net has almost six times speed than EDCNN but still less than 0.2 seconds, surprisingly followed by CNCL since the inference is only implemented by the generators. TLD-CDL and CTformer have the highest time cost of all the comparison methods. Overall, combined with the denoising performance and model efficiency, ECDAW-Net is competitive compared to other denoising algorithms.

4.3 Ablation studies

In this part, ablation experiments were conducted on Mayo dataset, to analyze the impacts of all contributions in ECDAW-Net. Briefly, four ablation experiments are “w/o the compound loss” (the model is obtained by replacing the compound loss with MSE loss), “w/o the adaptive window” (the model is obtained by subtracting the adaptive window), “w/o the hybrid dilated module” (the model is obtained by replacing the hybrid dilated module with the normal 3×3 convolution), and “w/o the multi-scale edge extraction module” (the model is obtained by subtracting the multi-scale edge extraction module).

As shown in Fig. 14, we select another representative slice (denoted as Case5) from the Mayo dataset and zoom two ROI areas (marked by rectangles in Fig. 14) for better comparison. It can be seen from Fig. 14(c) and (d) that, without the adaptive window or replacing hybrid dilated module, the results are blurred and suffer from noise residues to various extents. The model only with MSE loss obtains a similar visual effect (as shown in Fig. 14(e)) compared with ECDAW-Net (Fig. 14(g)). Although the visual difference between Fig. 14(f) and (g) is also subtle, the ECDAW-Net is better at the edge of the lesion and artifact suppression (marked by red circles).

Fig. 14

The denoised results obtained by performing the ablation experiments for Case5 on Mayo dataset. The display window is [-160,240] HU.

Table 5 indicates the average quantitative values of six metrics on the test image, the best and second-best values are indicated in red and blue, respectively. We can see that the proposed ECDAW-net achieves the best scores for each metric (except for GMSD), while the model without the multi-scale edge extraction module performs the second-best values except for GMSD and FSIM. It shows that the improvement in edge enhancement has a positive effect. The MSE loss only seeks to minimize gaps in pixel level, and the model with only MSE loss has inferior PSNR, SSIM, VIF, and VIFs, and gets the highest GMSD value. It indicates that the proposed compound loss strengthens the detail retention ability. From the observation of ECDAW-Net and the model obtained by replacing the hybrid dilated module with the normal convolution, the PSNR, and VIF values declined significantly, which accounts for the impact of the hybrid dilated module. In the comparison of ECDAW-Net and the model obtained by subtracting the adaptive window, the latter gets inferior values in all indicators (especially on PSNR, VIF, and VIFs), which is consistent with the visual judgment.

Table 5

Quantitative performance of ablation studies (mean value)

	PSNR↑	SSIM↑	GMSD↓	FSIM↑	VIF↑	VIFs↑
w/o the adaptive window	33.6356	0.9167	0.0585	0.9628	0.4372	0.5279
w/o the hybrid dilated module	33.7552	0.9178	0.0575	0.9635	0.4413	0.5324
w/o the compound loss	33.8131	0.9187	0.0572	0.9637	0.4446	0.5350
w/o the multi-scale edge extraction module	33.8536	0.9188	0.0576	0.9635	0.4452	0.5366
ECDAW-Net	33.9413	0.9200	0.0575	0.9639	0.4472	0.5388

This is a sign that the convolutional dictionary with the adaptive window is a reasonable solution for spectral leakage that is liable to disrupt the denoising performance. In words, the proposed ECDAW-Net is effective and meaningful.

5 Discussion and conclusions

Convolutional dictionary learning decomposes an image by using convolution operation instead of matrix multiplication in patch-based dictionary learning and considers the consistency constraint of pixels in overlapped patches. In this paper, we utilized the convolutional dictionary to represent noise and artifacts, and proposed the CDL-AW model in view of the unique advantages of convolutional dictionary learning, and designed the ECDAW-Net to solve the CDL-AW model according to proximal gradient optimization. Compared with most CNN-based LDCT denoising methods which perform like black-box, the proposed ECDAW-Net is more explainable, and one can understand the denoising process from Algorithm 1 and Fig. 2. In detail, we proposed an adaptive window to eliminate spectral leakage, and extended the spatial width by replacing the traditional convolution kernel with hybrid dilated module, which further reduces the influence of spectral leakage. Furthermore, the proposed multi-scale edge extraction module, composed of the learnable LoG convolution and Sobel convolution, has satisfactory effectiveness to compensate the edge information. In addition, instead of using L1 norm loss and MSE loss (both are in pixel-level) in most CNNs, we proposed the patch-level loss based on the local energy of image block, protecting the CT texture to a certain extent.

To better prove these points, we conducted comparative experiments on the Mayo dataset and the public Piglet dataset. The results show the proposed ECDAW-Net achieves inspiring noise/artifact removal while retaining detailed information. Compared with the comparative experiments mentioned in this paper, ECDAW-Net achieves the most satisfactory results in denoising and detail preservation and does not introduce additional artifacts in denoising like CTformer, and the restored tissue texture is closer to the NDCT image. In addition, we implement the real clinic experiments to explore application potential and the generality of our proposed method. It is proved that our method acquires acceptable model parameters and inference time, and has strong competitiveness compared with other methods. Finally, ablation studies were conducted and proved each improvement in the proposed architecture. The main contributions of this work are summarized as follows:

A novel convolutional dictionary learning model with adaptive window (CDL-AW) is designed to relieve spectral leakage caused by data truncation during convolution, and its corresponding convolutional dictionary network (ECDAW-Net) is constructed to solve the CDL-AW model iteratively, with an explicit physical meaning.

Since there is a certain correlation between the noise/artifacts and the inherent information, to restrain edge/texture blurring in iterations, we designed a multi-scale edge extraction module consisting of trainable LoG and Sobel convolution layers.

In addition, a patch-level loss is proposed and combined with MSE (pixel-level) loss to train the ECDAW-Net, to further prevent image over-smoothing.

Comparison experiments and the ablation studies show that the proposed ECDAW-Net achieves superb ability of noise/artifacts suppression and edge/detail preserving, and each improvement has a positive contribution.

Although the proposed ECDAW-Net acquires promising noise suppression and details preservation, our method trains a universal convolutional dictionary, which limits the flexibility of the image representation. In the future, more efforts will focus on how to adjust ECDAW-Net to an adaptive convolutional dictionary, that is, each input image has its own convolutional dictionary. Furthermore, our method requires the participation of paired CT images, which are not easy to obtain in practice, we will also try to design the architecture in an unsupervised manner and explore the scene of small samples.

Footnotes

Declaration of interest

The authors have no relevant financial or non-financial interests to disclose.

Acknowledgments

The authors would like to thank the editors and reviewers for improving the content of this article and thank the Mayo Clinic and the Ethics Committee for providing the data used.

Funding information

This work was supported in part by the National Key Research and Development Program of China under Grant (2022YFE0116700, 2022YFC2408500, and 2022YFC2401600), the Natural Science Foundation of Shanxi Province of China (202103021224204), Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education (K93-9-2022-02).

References

Liu

, Castro

, Lederlin

et al., An improved nonlinear diffusion in Laplacian pyramid domain for cone beam CT denoising during image-guided vascular intervention, BMC Med. Imaging 18(1) (2018), 18–25.

J.H.

, Huang

, Feng

Q.J.

et al., Low-dose computed tomography image restoration using previous normal-dose scan, Med. Phys 38(10) (2011), 5713–5731.

Stringhini

R. Marconato

, Welfer

, Gamarra

D.F. Tello

and Dotto

G. Nogara

, Noise reduction with detail preservation in low-dose dental CT images by morphological operators and BM3D, Intelligent Systems Design and Applications (ISDA) 1181 (2019), 309–317.

Aharon

, Elad

, Bruckstein

K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Trans. Signal Process 54(11) (2006), 4311–4322.

Zhang

, Han

Z.F.

, Shangguan

et al., Artifact and detail attention generative adversarial networks for low-dose CT denoising, IEEE Trans. Med. Imag 40(12) (2021), 3901–3918.

Nie

, Trullo

, Lian

et al., Medical image synthesis with deep convolutional adversarial networks, IEEE Trans. Biomed. Eng 65(12) (2018), 2720–2730.

Park

, Hwang

, Kim

K.Y.

et al., Computed tomography super-resolution using deep convolutional neural network, Phys. Medicine Biol. 63(14) (2018), 145011.

Chen

, Zhang

, Kalra

M.K.

et al., Low-dose CT with a residual encoder-decoder convolutional neural network, IEEE Trans. Med. Imag 36(12) (2017), 2524–2535.

Fan

F.L.

, Shan

H.M.

, Kalra

M.K.

et al., Quadratic autoencoder (Q-AE) for low-dose CT denoising, IEEE Trans. Med. Imag 39(6) (2020), 2035–2050.

10.

Kang

, Min

, Ye

J.C.

A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction, Med. Phys 44(10) (2017), 360–375.

11.

Kang

, Min

, Ye

J.C.

Wavelet domain residual network (WavResNet) for low-dose X-ray CT reconstruction. Proc. Fully three-dimensional image reconstruction, Radiol. Nucl. Med. (Fully3D) (2017), 237–241.

12.

Geng

M.F.

, Meng

X.X.

, Yu

J.Y.

et al., Content-noise complementary learning for medical image denoising, IEEE Trans. Med. Imaging 41(2) (2022), 407–419.

13.

Han

Z.F.

, Shangguan

, Zhang

et al., A dual-encoder-single-decoder based low-dose CT denoising network, IEEE J. Biomed. Health Inform 26(7) (2022), 3251–3260.

14.

Marcos

, Alirezaie

, Babyn

Low dose CT image denoising using boosting attention fusion GAN with perceptual loss. 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), (2021), 3407–3410.

15.

Liang

T.F.

, Jin

, Li

Y.D.

, Wang

EDCNN: Edge enhancement-based Densely Connected Network with Compound Loss for Low-Dose CT Denoising. 2020 15th IEEE International Conference on Signal Processing (ICSP), (2020), 193–198.

16.

Jiang

X.C.

, Wang

L.L.

, He

Z.S.

, Du

J.L.

Learning a frequency separation network with hybrid convolution and adaptive aggregation for low-dose CT denoising. 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), (2021), 919–925.

17.

Zheng

H.Y.

, Yong

H.W.

, Zhang

Deep convolutional dictionary learning for image denoising, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 630–641.

18.

Wang

, Xie

, Zhao

, Meng

D.Y.

A model-driven deep neural network for single image rain removal, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 3100–3109.

19.

Garcia-Cardona

, Wohlberg

Convolutional dictionary learning: A comparative review and new algorithms, IEEE Trans. Compu. Imaging 4(3) (2018), 366–381.

20.

X.Y.

, Zha

Z.J.

, Wu

et al., JPEG artifacts reduction via deep convolutional sparse coding, IEEE International Conference on Computer Vision (ICCV) (2019), 2501–2510.

21.

M.H.

, Cao

X.Y.

, Zhao

et al., Online rain/snow removal from surveillance videos, IEEE Trans. Image Process 30 (2021), 2029–2044.

22.

Wang

, Li

Y.X.

, He

N.J.

et al., DICDNet: Deep interpretable convolutional dictionary network for metal artifact reduction in CT images, IEEE Trans. Med. Imaging 41(4) (2022), 869–880.

23.

Liu

, Xia

Z.Y.

, Kang

Y.Q.

, Qiang

Deep residual convolutional sparse coding networks for low dose CT imaging, 2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) (2021), 1–6.

24.

Yan

R.B.

, Liu

Y.H.

et al., Image denoising for low-dose CT via convolutional dictionary learning and neural network, IEEE Trans. Compu. Imag 9 (2023), 83–93.

25.

Tomen

, Gemert

J.V.

Spectral leakage and rethinking the kernel size in CNNs, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 5138–5147.

26.

Beck

, Teboulle

A fast iterative shrinkage thresholding algorithm for linear inverse problems, SIAM J. Imaging. Sci 2(1) (2009), 183–202.

27.

Boyd

, Parikh

, Chu

et al., Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning 3(1) (2011), 1–122.

28.

Afonso

M.V.

, Bioucas-Dias

J.M.

, Figueiredo

M.A.T.

Fast image recovery using variable splitting and constrained optimization, IEEE Trans. Signal Process 19(9) (2010), 2345–2356.

29.

Mayo

, Karakuş

, Holmes

and Achim

, Representation learning via Cauchy convolutional sparse coding, IEEE Access 9 (2021), 100447–100459.

30.

Tavşanoğlu

, Teaching aliasing and spectral leakage through the sampling of images, 2019 IEEE International Symposium on Circuits and Systems (ISCAS) (2019), 1–5.

31.

Wang

P.Q.

, Chen

P.F.

, Yuan

et al., Understanding convolution for semantic segmentation, 2018 IEEE winter conference on applications of computer vision (WACV) (2018), 1451–1460.

32.

AAPM (2017) Low Dose CT Grand Challenge. [Online]. Available: https://www.aapm.org/GrandChallenge/LowDoseCT/.

33.

Piglet Dataset. [Online]. Available: https://homepage.usask.ca/?xiy525/publication/sagan/.

34.

, Babyn

Sharpness-aware low-dose CT denoising using conditional generative adversarial network, J. Digit. Imag 31(5) (2018), 655–669.

35.

Wang

D.Y.

, Fan

F.L.

, Wu

et al., CTformer: Convolution-free token2token dilated vision transformer for low-dose CT denoising, Phys. Medicine Biol 68(6) (2023), 065012.

36.

Wang

, Bovik

A.C.

, Sheikh

H.R.

, Simoncelli

E.P.

Image quality assessment: from error visibility to structural similarity, IEEE Trans. Signal Process 13(4) (2004), 600–612.

37.

Xue

W.F.

, Zhang

, Mou

X.Q.

, Bovik

A.C.

Gradient magnitude similarity deviation: A highly efficient perceptual image quality index, IEEE Trans. Signal Process 23(2) (2014), 684–695.

38.

Zhang

, Zhang

, Mou

X.Q.

, Zhang

FSIM: A feature similarity index for image quality assessment, IEEE Trans. Signal Process 20(8) (2011), 2378–2386.

39.

Sheikh

H.R.

, Bovik

A.C.

Image information and visual quality, IEEE Trans. Signal Process 15(2) (2006), 430–444.

Enhancement based convolutional dictionary network with adaptive window for low-dose CT denoising

Abstract

BACKGROUND:

OBJECTIVE:

METHODS:

RESULTS:

CONCLUSIONS:

Keywords

1 Introduction

2 Related work

2.1 Convolutional dictionary learning

3.1 Adaptive window

4.1 Experimental setting

4.2 Comparison with other methods

4.2.1 Mayo experiments

Table 4 Numbers of parameters, inference times used in the compared models (M: million, s: second) Parameters Testing time REDCNN 0.47M 0.157s EDCNN 0.08M 0.031s CTformer 1.45M 2.385s QAE 0.22M 0.016s CNCL 46.59M 0.472s TLD-CDL 39.86M 1.197s ECDAW-Net 3.46M 0.179s

Footnotes

Declaration of interest

Acknowledgments

Funding information

References

Table 4
Numbers of parameters, inference times used in the compared models (M: million, s: second)

Parameters Testing time

REDCNN 0.47M 0.157s

EDCNN 0.08M 0.031s

CTformer 1.45M 2.385s

QAE 0.22M 0.016s

CNCL 46.59M 0.472s

TLD-CDL 39.86M 1.197s

ECDAW-Net 3.46M 0.179s