Abstract
In the field of super-resolution image reconstruction, as a learning-based method, deep plug-and-play super-resolution (DPSR) algorithm can be used to find the blur kernel by using the existing blind deblurring methods. However, DPSR is not flexible enough in processing images with high- and low-frequency information. Considering a channel attention mechanism can distinguish low-frequency information and features in low-resolution images, in this paper, we firstly introduce this mechanism and design a new residual channel attention networks (RCAN); then the RCAN is adopted to replace deep feature extraction part in DPSR to achieve the adaptive adjustment of channel characteristics. Through four test experiments based on Set5, Set14, Urban100 and BSD100 datasets, we find that, under different blur kernels and different scale factors, the average peak signal to noise ratio (PSNR) and structural similarity (SSIM) values of our proposed method increase by 0.31dB and 0.55%, respectively; under different noise levels, the average PSNR and SSIM values increase by 0.26dB and 0.51%, respectively.
Keywords
Introduction
The research on a single-image super-resolution (SISR) reconstruction problem, which is how to reconstruct low-resolution (LR) images into high-resolution (HR) ones perfectly and efficiently, has high academic and practical values [1]. In recent years, with the rapid development of deep learning technologies, super-resolution improvement methods based on deep learning have been widely adopted due to their ability to directly learn the mapping between LR and HR images by training an end-to-end network model [2]. At present, most of the SISR improvement methods based on deep learning focus on enhancing network architecture, learning strategies, model frameworks, up-sampling methods, and other aspects [3, 4]. Although all of these aspects can bring some performance improvement, most of the above SISR improvement methods are mainly based on the widely-used bicubic degenerate model. Therefore, it remains a considerable challenge to super-resolved LR images with arbitrary blur kernels for these improvement methods [1].
As one of the deep-learning-based SISR improvement methods, the deep plug-and-play super-resolution (DPSR) algorithm provides a new solution to the above challenge [5]. Specifically, the work of DPSR is reflected in the following aspects. Firstly, a new degradation model is designed in the DPSR, and it is different from the bicubic degradation model and the general degenerate model. The balance is made between the computation of degradation model and the characterization accuracy of degradation process. Secondly, to solve the SISR problem with the new degradation model, a DPSR framework is proposed, which is applicable beyond bicubic degradation and can handle LR images with arbitrary blur kernels. Based on the above measures, it is clear that the plug-and-play prior of SISR is not limited to Gaussian denoising, and the iterative scheme for solving the new degeneration-induced energy function has sound principles.
Although DPSR provides a new idea to solve the reconstruction problem of LR image with the arbitrary blurring kernel, it has a clear limitation. That is, LR images contain valuable low-frequency information, which can be directly forwarded to the final HR output, but DPSR treats a large amount of low-frequency information equally when processing the input LR images across different feature channels, and cannot learn the discrimination across feature channels. Hence, the flexibility in processing different types of information is lost, thus severely hindering the representation capability of neural networks. Given the above shortcoming of DPSR, this paper proposed the enhanced DPSR algorithm with channel attention mechanism (EDPSR), which can distinguish different image features to realize that the convolutional neural network (CNN) treats each channel feature differently, and improve the flexibility of processing different information.
Related work
Residual network
In 2015, He et al. proposed a 152-layer residual network (ResNet) [6]. Due to the effectiveness of ResNet in improving the training speed and prediction accuracy of a deep network, Ledig et al. proposed SRResNet, which is applied ResNet to solve the SISR problem with remarkable results [7]. SRResNet+ was proposed by Zhang et al. based on SRResNet [5], and the network structure is shown in Fig. 1.

The Network Architecture of DPSR.
As can be seen from Fig. 1, the DPSR network structure consists of the following four parts. Firstly, a convolutional layer (Conv) is used to extract the shallow features of input LR images. Secondly, a long hop connection and a deep feature extractor composed of 16 basic blocks are used to extract the deep features of LR images (each of the 16 basic blocks is composed of a residual block). Thirdly, the features extracted by the deep feature extractor and the shallow feature extractor are fused into an upper sampler for scale adjustment. Finally, a Conv is used as an image reproducer to carry out high-resolution image reconstruction. Compared with SRResnet, SRResNet+ is improved in the following three aspects: (1) the batch normalization layer, which is detrimental to feature extraction and consumes a lot of memory, is removed by SRResNet+; (2) the number of SRResNet feature maps is increased by SRResNet+ from 64 to 96; (3) based on the original SRResNet input volume, SRResNet+ adds an additional noise level map as input. Based on the above improvements, the SRResNet+ can stack more network layers and extract more features in each layer using the same computing resources. However, SRResNet+ cannot distinguish image features and loses the flexibility to handle different types of information. In this paper, we give the following solution to the above problems, which is summarized as follows: based on SRResNet+, we introduce a channel attention mechanism to learn the image space and channel feature correlation, which can adaptively adjust the features through the interdependence between different channels and make the network focus more on the potential high-frequency information recovery of LR images, and finally improve the network feature expression capability.
The attention mechanism mimics the signal processing of human brain, and the probability distribution of attention is used to emphasize the effectiveness of critical input on output [8]. Since 2014, it has performed well in natural language processing, speech recognition, and image recognition to improve the performance of neural networks. It has been favored by many scholars in the field of deep learning [9, 10]. In the field of image recognition, Zhang et al. proposed the residual channel attention networks (RCAN) in 2018 [11]. As shown in Fig. 2, the main work of RCAN can be divided into two aspects. On the one hand, the attention mechanism is introduced to treat different channels to improve the representative ability of the CNN. On the other hand, a new residual in residual (RIR) structure is introduced, allowing a large amount of low-frequency information to be transmitted through multiple skip connections, so that the CNN can focus on learning high-frequency information. Compared with algorithms without channel attention mechanism, RCAN realizes the ability to distinguish learning across feature channels. Hence, we introduce the RCAN module to transfer various types of high-frequency and low-frequency information from LR images to HR images, and then propose the EDPSR, which is described in detail in the following section.

The Network Architecture of RCAN.
In Section 2, we analyze the limitations of the DPSR algorithm in network structure and explain the improvement of the attention mechanism introduced by RCAN. This section focuses on the design of the super-resolution model based on the channel attention mechanism.
Selection of blur kernels
There are many types of blur kernels, such as linear kernels, polynomial kernels, Gaussian kernels, exponential kernels, ANOVA kernels, etc. [12]. To show the improved algorithm’s effectiveness in this paper, as shown in Fig. 3, we have selected three types of blur kernels (disk blur kernels, Gaussian blur kernels, and motion blur kernels) with the same size in the DPSR algorithm [5].

Examples of blur kernels.
In this section, the EDPSR algorithm is proposed based on the introduction of residual group [11]. In Section 1 and 2.2, the DPSR algorithm achieves good results in both deblurring and denoising. However, considering it cannot effectively distinguish the feature information with different degrees of importance in LR images, much effective feature information may be lost in the reconstruction process. If this critical feature information is lost in the real image reconstruction, it will lead to low image quality after reconstruction. To solve the above problems, RCAN is introduced to enhance the perception ability of DPSR algorithm and the transmission ability during the reconstruction process. Specifically, it is divided into the following two parts: The residual channel attention block (RCAB) is used to rescale channel-wise features adaptively by considering interdependencies among different channels; the RIR structure is adopted to directly transmit the effective low-frequency characteristic information through multiple hop connection, under the cooperation of multiple short skip connection (SSC) and one long skip connection (LSC), so that the main network can focus on learning high-frequency information.
In this way, the network can intelligently understand the importance of different channel characteristics. Therefore, the introduction of RCAN in this article can effectively solve the information loss problem. As shown in Fig. 2, the RCAN network comprises four parts: shallow feature extraction part, deep feature extraction part, up-sampling module, and image reconstruction part. The working process is as follows: firstly, a 3×3 Conv is used to extract shallow features from the LR input, and a feature map is obtained; secondly, deep feature extraction is carried out through the RIR structure, which contains ten residual groups (RG), a Conv of 3×3 and an LSC, and each RG is formed by the superposition of 20 RCAB and one SSC; thirdly, the up-sampling is performed through the pixelshuffle module; Finally, image reconstruction is realized through a Conv of 3×3. Based on the above analysis, this paper proposes the enhanced deep plug-and-play super-resolution algorithm with residual channel attention networks, and its network structure is shown in Fig. 4. The improvement of EDPSR network is mainly reflected in the deep feature extraction part.

The Network Architecture of EDPSR.
In Fig. 4, after the first Conv extracts the shallow feature F
SF
of the input, F
SF
is fed into the RIR structure to obtain depth feature F
DF
. Before F
SF
is introduced into the first basic block (RG) of the RIR, F
SF
is identically mapped to the tail of the RIR structure through LSC. When F
SF
enters the first RG of the RIR structure (what has been discussed above, we already know that RG is composed of N RCAB, an SSC, and a Conv), F
SF
propagates backward through SSC identity mapping. At the same time, when F
SF
is passed to the first RCAB in this RG, a Conv-Relu-Conv operation is performed, and the residual f1,1 with dimension H × W × C is obtained. Then f1,1 is rescaled by a CA module to obtain the adjusted feature fc1,1. In the end, to obtain the output feature
It should be emphasized here that, after the introduction of RCAN, by increasing the number of RGs in RCAN and increasing the number of stacked RCABs in each RG, the representable performance of the original neural network will be improved to some extent due to the deepening of network depth. However, given that the purpose of this paper is to explore the before-and-after differences caused by attentional mechanisms on DPSR, to avoid or reduce the effectiveness of other disturbance quantities (the network depth or width, etc.), we deliberately ignore the advantages of RCAN in enhancing the depth of the network. Clearly, during the design of deep feature extraction part in EDPSR, the number of RGs G is set to 16, and the number of RCABs N contained in each RG is set to 1, to ensure that the network depth of EDPSR is close to the depth of DPSR. In addition, the rest of EDPSR network is consistent with that of DPSR.
Experiments
We use the DIV2K dataset [13] as the training set and Set5 [14], Set14 [15], BSD100 [16], Urban100 [17] as the test sets. The minibatch size is set to 16. The patch size of LR input is set to 48 × 48. We use the Adam optimizer to train our model. The initial learning rate is set to 10-4, decreasing half after every 5 × 105 backpropagation iterations, and the training ends when the learning rate falls below 10-7. The experimental environment is shown in Table 1.
Experimental environment
Experimental environment

Comparison between PSNR, SSIM and LOSS.
Fig. 5 shows the resultant curves of the average LOSS, PSNR, and SSIM values after 5000 training epochs for Set5. In Fig. 5a, when the LOSS value is equal to 0.05, compared with DPSR, the training iteration required by EDPSR decreased by 3.59%. Furtherly, it can be seen that when EDPSR drops, the fluctuation of its curve is greater than that of DPSR. The reason is that adding the attention mechanism can effectively improve the feature learning ability of the original model. In Fig. 5b and 5c, the curves of EDPSR are higher than that of DPSR in both speed and amplitude of rising. Compared with DPSR, when the PSNR value reaches 25dB, the training iteration required for EDPSR has been decreased by 27.87%. When the SSIM value is equal to 0.7, the least iteration needed by EDPSR has been dropped by 31.48%. The reason is that, DPSR does not effectively distinguish and treat input information with different levels of importance.
In this section, we focus on the performance differences between EDPSR and DPSR in deblurring. Firstly, after convolution with three types of blur kernels and down-sampling, all HR images in the four public benchmarks are processed into LR images with blur kernels. Secondly, EDPSR is used to deblur and reconstruct LR images. Finally, the reconstructed images are compared with their corresponding HR images.
Deblurring test results (PSNR(dB) /SSIM) on public benchmarks
Deblurring test results (PSNR(dB) /SSIM) on public benchmarks

Visual comparison of deblurring performance between DPSR and EDPSR.
In this section, we focus on the performance differences between EDPSR and DPSR in denoising. The procedure is roughly the same as in Section 4.2. Firstly, after adding different noise and down-sampling levels, all HR images in the four public benchmarks are processed into LR images with the noise level. Secondly, EDPSR is used to denoise and reconstruct LR images. Finally, the reconstructed images are compared with their corresponding HR images. However, the only difference here is that, we set the scale factor for 4 for a clean and straightforward comparison of denoising performance.
Denoising test results (PSNR(dB) /SSIM) on public benchmarks
Denoising test results (PSNR(dB) /SSIM) on public benchmarks

Visual comparison of denoising performance between DPSR and EDPSR.
In this paper, the EDPSR is proposed to improve the performance of DPSR. During image reconstruction, low-resolution images contain a wealth of low-frequency information treated equally in DPSR and cannot be effectively distinguished, thus hindering the ability of CNN representation. We introduce an attentional mechanism into DPSR that could adaptively redistribute channel features by taking into account interdependencies between channels; then, with the cooperation of SSC and LSC, a large amount of low-frequency information is bypassed by multiple skip connections, allowing the main network to focus on learning high-frequency information. Compared with the DPSR, the experiment result proves that EDPSR has higher training and test accuracy and better image reconstruction effects in image deblurring or denoising. The proposed measures can demonstrate the effectiveness of the neural network attention mechanism in solving the super-resolution reconstruction problem of LR images with arbitrary blur kernels, and accelerate the progress of super-resolution image reconstruction technology in the fields of urban management, medical imaging, satellite, and aerial imaging.
Footnotes
Acknowledgments
This work is supported by National Science Foundation of China (51804249, 61603295), and Shaanxi Postdoctoral Science Foundation (2018BSHEDZZ124).
