Abstract
Magnetic Resonance Imaging (MRI) is a cornerstone of modern medical diagnosis due to its ability to visualize intricate soft tissues without ionizing radiation. However, noise artifacts significantly degrade image quality, hindering accurate diagnosis. Traditional denoising methods struggle to preserve details while effectively reducing noise. While deep learning approaches show promise, they often focus on local information, neglecting long-range dependencies. To address these limitations, this study proposes the deep and shallow feature fusion denoising network (DAS-FFDNet) for MRI denoising. DAS-FFDNet combines shallow and deep feature extraction with a tailored fusion module, effectively capturing both local and global image information. This approach surpasses existing methods in preserving details and reducing noise, as demonstrated on publicly available T1-weighted and T2-weighted brain image datasets. The proposed model offers a valuable tool for enhancing MRI image quality and subsequent analyses.
Keywords
Introduction
Magnetic Resonance Imaging (MRI) has become an indispensable diagnostic tool in modern medicine due to its capacity to visualize intricate soft tissue details without ionizing radiation. However, MRI images are susceptible to noise and artifacts, which can significantly impede diagnostic accuracy. Noise reduction is crucial for enhancing image clarity and contrast, enabling better visualization of anatomical structures, tumors, lesions, and blood vessels. Ultimately, this leads to more accurate diagnoses, improved treatment planning, and better patient outcomes.
Traditional image denoising techniques, such as filter-based [1,2,3] and transform-domain methods [4,5,6], often struggle to preserve image details while effectively reducing noise. Moreover, these methods require intricate parameter tuning. The advent of deep learning, particularly convolutional neural networks (CNNs), has revolutionized image processing, with models like DnCNN [7], FFDNet [8], SADNet [9], SCNN [10], and MCDnCNN [11] demonstrating promising results in MRI denoising. However, these models primarily focus on local information, limiting their ability to capture long-range dependencies inherent in MRI images.
To address this limitation, recent studies have incorporated global information into the denoising process, as seen in FFA-DMRI [12], CNN-DMRI [13], and ADNet [14]. Additionally, the encoder-decoder architecture, popularized by UNet [15], has gained traction in image denoising. While effective for conventional images, this architecture may not fully exploit the complex anatomical structures present in MRI data.
Preserving anatomical details while eliminating noise in medical images remains a critical challenge. While techniques like filtering [16,17,18], wavelet transforms [19], denoising autoencoders [20], and non-local means [21] have shown some success, they often struggle to balance detail preservation and noise reduction. CNN-based models, despite their impressive performance, face challenges such as overfitting, computational demands, and limited interpretability.
To address these limitations, this study proposes a novel MRI denoising network, deep and shallow feature fusion denoising network (DAS-FFDNet), that effectively combines the extracted deep and shallow features. The model prioritizes the preservation of both local and global image details, aiming for improved denoising performance. Key contributions include: (1) A shallow feature extraction module based on a nested double residual structure for efficient feature transmission; (2) The recursive multi-scale channel-spatial attention module (RMCSAM-UNet) aimed to capture long-range spatial information; (3) A tailored feature fusion module for MRI denoising to effectively combine shallow and deep features.
Related work
Zhang et al. [7] pioneered the use of deep convolutional neural networks (CNNs) for image denoising with their DnCNN model. Subsequently, the UNet architecture [15] gained prominence as a denoising baseline due to its effective multi-scale feature extraction. This encoder-decoder structure progressively downsamples and upsamples image features to capture deep receptive fields, which are then fused with high-resolution features via skip connections to mitigate information loss. Building upon this foundation, various approaches have been proposed, such as the residual dense neural network (RDUNet) [22] and the prior-based denoising network [23], which incorporate techniques like improved BM3D [24] and the non-subsampled shearlet transform (NSST-UNET) [25] to enhance denoising performance.
Denoising high-resolution brain images presents unique challenges due to their intricate anatomical structures and complex spatial relationships. Capturing long-range dependencies in these images is crucial for preserving detailed morphological information. Conventional methods often struggle to effectively address this, hindering accurate restoration. Inspired by the Noise2Noise (N2N) [26] framework, RA-UNet [27] combines the UNet with convolutional block attention module (CBAM) [28] to create a multi-scale residual block that improves noise identification and removal while preserving structural details. FONDUE [29] further advances denoising of structural MRIs by employing densely connected UNets to achieve high-quality results across different resolutions.
Recent research has explored the integration of transformers with UNet architectures to enhance image restoration. TransUNet [30] pioneers this approach, while Uformer [31] and Restormer [32] adopt transformer-based encoder-decoder structures to effectively capture both local and global image dependencies.
Methods
Overview of the DAS-FFDNet architecture
As illustrated in Fig. 1a, our proposed denoising network, dubbed DAS-FFDNet, comprises three primary components: deep feature extraction, shallow feature extraction, and feature fusion. The deep feature extraction module (Fig. 1b) generates high-resolution features within a UNet framework enhanced by RMCSAM-UNet [33]. The shallow feature extraction module (Fig. 1c) employs multiple dual residual blocks (DRBs) [34] to efficiently capture shallow-level image information. Finally, the adaptive fusion block combines shallow and deep features to produce the denoised image.

Proposed DAS-FFDNet architecture. (a) Overview of the DAS-FFDNet architecture, comprising shallow feature extraction, RMCSAM-UNet, and feature fusion modules. (b) Detailed structure of the RMCSAM module, incorporating channel-wise and spatial-wise attention mechanisms. (c) Dual residual block (DRB) architecture employing a dual residual structure and Autocorrelation weights (ACW) for efficient feature expansion and fusion.
The shallow feature extraction module utilizes local and global skip connections to facilitate efficient feature transmission and preserve fine details. Given an input image X, the output feature
The DRB [34] comprises a nested residual structure (Fig. 1c) with an outer and inner residual unit. Each unit contains two convolutional layers with a ReLU activation. Autocorrelation weight units extract feature information after the second convolutional layer in both inner and outer units. The outer unit expands and compresses feature channels, while the inner unit focuses on feature refinement. Local and global skip connections enhance feature transmission.
The autocorrelation weight unit employs global average pooling followed by a Sigmoid function to generate weights for feature fusion. This mechanism stabilizes the model and improves feature learning.
RMCSAM-UNet for deep feature extraction
The RMCSAM [33] captures multi-scale attention information in feature maps, enhancing long-range dependencies. We integrate RMCSAM into a UNet architecture to form the RMCSAM-UNet module (Fig. 1a). The encoder downsamples features and the decoder upsamples them, with RMCSAM modules at each level to capture multi-scale information. Skip connections transfer features between encoder and decoder for effective fusion.
Feature fusion module
To combine shallow and deep features, we employ an adaptive fusion block. The shallow features are upsampled and concatenated with deep features. Subsequent convolutional and deformable convolutional layers [35] process the combined features. Adaptive fusion is achieved using element-wise multiplication and skip connections. The output feature
Datasets
We evaluated our model using three publicly available MRI datasets: the IXI dataset (
The IXI dataset includes three types of MRI contrasts–T1-weighted (T1w), T2-weighted (T2w), and Proton Density (PD) weighted images – acquired at three different sites using 1.5T or 3T scanners. The MICCAI dataset contains only T1w brain images. The KUH dataset comprises T1w images with two levels of signal-to-noise ratios (SNR), acquired at 3T using MPRAGE (magnetization-prepared rapid acquisition with gradient-echo) and wave-CAIPI (Controlled Aliasing In Parallel Imaging) MPRAGE sequences. MPRAGE is the gold-standard sequence for assessing brain anatomy and is generally preferred for voxel-based morphometric (VBM) analyses. Conventional MPRAGE sequences were acquired with a 2-fold GRAPPA accelerated protocol, taking 4:26 minutes to acquire a whole-brain 3D T1w volume with 1 mm isotropic spatial resolution. The 9-fold accelerated wave-CAIPI MPRAGE protocol took only 2 minutes to acquire a whole-brain 3D T1w volume with the same spatial resolution.
From the IXI (T1w and T2w) and MICCAI (T1w) datasets, we randomly selected 80 volumes of 3D MRI brain images per dataset, splitting them into training, validation, and test sets with an 8:1:1 ratio. The central slab of each volume was cropped to 100 slices of 2D images, each 256
The KUH dataset contains T1w image volumes at two different SNR levels. We selected 48 volumes of 3D T1w MRI brain images at each SNR level, cropping the central slab of each volume to 120 slices of 2D images, each 256
Implementation details
Experiments were conducted on an HP Z840 Linux workstation (Ubuntu 20.04) equipped with dual Intel Xeon E5-2687W V4 CPUs, 512 GB DDR4-2400 ECC registered SDRAM, and an NVIDIA Tesla V100 PCIe GPU with 16 GB RAM. The model was implemented using PyTorch 1.11.0 and leveraged the CUDA 11.3 platform for GPU acceleration. For training, the ADAM optimizer [37] was employed with a learning rate of 1e-4 and momentum decay of 1e-8. The model was trained for 50 epochs using a batch size of 4. The proposed model was benchmarked against several state-of-the-art denoising methods, including DnCNN [7], FFDNet [8], CNN-DMRI [13], SADNet [9], RDUNet [22], MSANet [35] and LADCNN [38].
Metrics for performance evaluation
To assess model performance, peak signal-to-noise ratio (PSNR) [39] and structural similarity index metrics (SSIM) [40] were computed. PSNR is a metric used to assess image quality by comparing a reconstructed image to its original counterpart. It is expressed in decibels (dB), with higher values indicating better image quality. PSNR is calculated by determining the mean squared error between the images and then applying a logarithmic transformation. While simple to compute, PSNR has limitations as it is highly sensitive to noise and often poorly correlates with human perception of image quality. SSIM is a perceptual metric designed to measure the structural similarity between two images. It incorporates luminance, contrast, and structure information to provide a more comprehensive assessment of image quality. SSIM values range from 0 to 1, with values closer to 1 indicating higher similarity. Unlike PSNR, SSIM better aligns with human perception and is more robust to noise. Consequently, SSIM is generally preferred over PSNR for evaluating image quality.
Results
The main experimental results are summarized in Table 1. To facilitate comparison, the PSNR and SSIM metrics after the various denoising methods are presented in Fig. 2. As expected, image quality deteriorates with increasing noise levels, regardless of the denoising approach. The PSNR decreased by approximately 5 dB when the noise level tripled. The PSNR performance of the different denoising algorithms varied by about 3–4 dB. Overall, the proposed DAS-FFDNet demonstrated superior performance across all datasets. Its advantage was particularly evident in the higher noise levels (7 and 9%). The RDUnet showed good performance in the lower noise levels (3 and 5%). Our model exhibited remarkable robustness and excelled at preserving spatial information in the face of increasing noise. This highlights the method’s precision in retaining intricate brain details, effectively restoring both morphology and anatomical structures.
Denoising performance comparison of the proposed RMCSAM-UNet with state-of-the-art methods in terms of PSNR (dB) and SSIM on the IXI and MICCAI datasets. The best results in each comparison group are highlighted in bold.
Denoising performance comparison of the proposed RMCSAM-UNet with state-of-the-art methods in terms of PSNR (dB) and SSIM on the IXI and MICCAI datasets. The best results in each comparison group are highlighted in bold.

Denoising performance evaluation of the proposed RMCSAM-UNet in comparison to state-of-the-art methods. PSNR and SSIM metrics are plotted as a function of noise level for IXI (T1w, T2w) and MICCAI (T1w) datasets to assess image quality preservation.

Visual comparison of denoising performance between the proposed DAS-FFDNet and state-of-the-art methods on IXI (T1w, T2w) and MICCAI (T1w) datasets with 9% added noise. Representative sagittal, axial, and coronal slices are shown for each dataset, along with zoomed-in regions (marked by red rectangles) and difference images between denoised and ground truth images.
Figure 3 depicts representative denoised images produced by the different methods, accompanied by zoomed-in views. For visual comparison, discrepancies between the original and denoised images are also shown. The denoised image comparisons confirm the superior ability of our proposed method to preserve fine brain details while effectively removing noise artifacts compared to other approaches. To investigate the impact of the number of DRB blocks on model performance, we systematically varied the number of DRBs in the DAS-FFDNet from 0 to 4 and evaluated the results. The ablation study using the IXI(T2w) dataset across different noise levels is summarized in Table 2. The results indicate that increasing the number of DRB modules generally improves PSNR and SSIM performance. As shown in Fig. 4, adding a single DRB module initially led to a slight decline in PSNR and SSIM compared to the baseline. However, as the number of DRB modules increased, performance recovered and continued to improve, highlighting the need to balance performance and model complexity. At last, we select 3 DRBs in our proposed method.
PSNR (dB)/SSIM performances of ablation studies on the IXI(T2w) dataset.
Summary of the VBM results for the two serial datasets acquired from a single subject using the two protocol variants to assess the effect of noise levels. The overlapping voxels are voxels with partially overlapping GM and WM tissues.

Bar graphs PSNR (a) and SSIM (b) metrics for the IXI (T2w) dataset after denoising with DAS-FFDnet models with different number of DRBs.
To assess how noise levels in MRI data affect image segmentation, we compared VBM results for a time-series T1w dataset from KUH acquired using 2x GRAPPA and 9x Wave-CAIPI acceleration techniques. Figure 5 displays cross-sectional views of a typical 1 mm isotropic resolution T1w volume acquired using both protocols. As expected, noise levels differed significantly between the time-series datasets. This is further supported by the statistical analysis of the corresponding VBM results shown in Fig. 6, which depicts cross-sectional views of mean GM concentration (a and b), SD (c and d), and CVAR (e and f) for the VBM serial datasets derived from 20 repeated MPRAGE measurements in a single subject. The VBM data acquired with the 9-fold Wave-CAIPI encoding protocol exhibited increased noise levels, reflected in more voxels with higher SD and CVAR (see Fig. 6c-f), particularly in the central brain region and at the boundaries between GM and WM tissues. Interestingly, some frontal regions showed lower noise levels compared to the 2-fold GRAPPA scans. Table 3 summarizes the VBM results for both serial datasets using the two protocols. After denoising with the proposed DAS-FFDNet framework, the PSNR of T1w images acquired with 9x Wave-CAIPI was approximately 5 dB lower than those acquired with 2x GRAPPA, and the corresponding SSIM was 11.8% lower. It is clear that the reduced image quality associated with faster acquisition impacted segmentation results. As shown in Table 3, VBM analysis for the 9-fold Wave-CAIPI MPRAGE protocol yielded 0.47% less GM and 0.13% more WM compared to the 2-fold GRAPPA protocol. Additionally, the VBM results for the 2-fold GRAPPA scans detected 0.33% more voxels with overlapping GM and WM tissues.

Cross-sectional displays of a typical T1w 3D MRI volume for a healthy adult subject at 1 mm isotropic resolution acquired at 3T using 2x-GRAPPA (a) and 9x-Wave CAIPI acceleration methods.

Cross-sectional displays of the mean GM concentration (a and b), standard deviation (c and d), and coefficient of variance (e and f) of the VBM serial datasets derived from the 20 timeframes of repeated measurements from a single subject. The left (a, c, and e) and right (b, d, and f) panels are the corresponding statistical maps (mean, SD, and CVAR) derived from the 2x-GRAPPA and 9x-Wave CAIPI protocols, respectively. The crossing green lines indicate the location of the cross sections.
The impact of noise on MRI image quality and subsequent analysis
Noise, a ubiquitous artifact in MRI imaging, significantly degrades image quality, compromising diagnostic accuracy and subsequent image analysis tasks. The presence of noise reduces image contrast, obscures fine details, and introduces artifacts that can mislead interpretation. This study underscores the critical role of noise reduction in optimizing MRI image quality and subsequent downstream analyses, such as segmentation.
While the proposed DAS-FFDNet effectively mitigates noise and preserves image details, its limitations become apparent in scenarios with extremely high noise levels, often introduced by accelerated acquisition techniques. As demonstrated by VBM analysis, noise-induced discrepancies in brain tissue volume measurements highlight the need for robust denoising techniques to ensure accurate neuroimaging analysis.
The DAS-FFDNet: A Novel approach to image denoising
The DAS-FFDNet represents a significant advancement in MRI image denoising through its innovative combination of shallow and deep feature extraction. This approach effectively addresses the complex challenge of preserving both fine-grained local details and essential global image structures, which are crucial for accurate clinical diagnosis. By incorporating DRBs within the shallow feature extraction module, the model excels in preserving image details, particularly in challenging high-noise scenarios.
Comparative analysis with state-of-the-art CNN-based denoising methods (DnCNN, FFDNet, CNN-DMRI, SADNet, RDUNet, MSANet and LADCNN) demonstrates the DAS-FFDNet’s superior performance in terms of noise reduction and image detail preservation. The model’s robustness and adaptability to various image content make it a promising tool for a wide range of clinical applications.
Challenges and future directions in denoising MRI data of different contrasts
While this study focused on T1w and T2w brain images, the complexities inherent to different MRI contrasts and anatomical regions necessitate tailored denoising approaches. For instance, PD-weighted images, often used for visualizing knee ligaments, require distinct denoising strategies compared to brain imaging. The specific diagnostic targets and underlying image characteristics of each modality influence the optimal denoising techniques. Future research should explore the development of adaptive denoising methods that can be tailored to different MRI contrasts and anatomical regions. This would involve considering factors such as tissue contrast, noise patterns, and specific diagnostic requirements to optimize denoising performance.
Limitations and future directions in MRI image denoising
While the DAS-FFDNet exhibits promising results, further research is necessary to fully realize its potential. Key areas for exploration include: (1) Modality expansion: Extending the model’s applicability to other MRI modalities (DWI, fMRI) to address the unique challenges posed by these imaging techniques. (2) Generative adversarial networks (GANs) integration: Incorporating GANs to enhance image realism and quality, potentially leading to improved perceptual image quality. (3) Advanced evaluation metrics: Developing more sophisticated evaluation metrics that correlate closely with human perception to accurately assess the model’s impact on downstream tasks. (4) Real-time implementation: Investigating real-time implementations for clinical adoption and integration into clinical workflows. By addressing these challenges, the DAS-FFDNet can be further refined to become an indispensable tool for improving diagnostic accuracy and patient care.
Conclusion
The DAS-FFDNet represents a significant advancement in MRI image denoising, offering a robust and effective solution for preserving image quality while reducing noise. Its superior performance compared to state-of-the-art methods underscores its potential to improve medical image analysis. However, the diverse nature of MRI contrasts and anatomical regions necessitates further research to develop tailored denoising approaches. By addressing these challenges and exploring future research directions, the model’s impact on clinical practice can be maximized.
Footnotes
Acknowledgments
This research was supported by a grant from the Zhejiang Natural Science Foundation of China (No. LY23F010005). Our study utilized three publicly available magnetic resonance imaging (MRI) datasets: the IXI dataset (
].
