Learning depth super-resolution by using multi-scale convolutional neural network

Abstract

Imaging techniques are the most rapidly growing area of computer vision, and the resolution has reached a new level. Super-resolution is a technique that enhances the resolution of images from the low-resolution input and help to accurately analyze and derive the data. Recently convolutional neural network are becoming mainstream in computer vision. Most existing CNN models based super-resolution either directly reconstruct the low-resolution input and then improve the resolution at the last layer, or another way is, to firstly enlarge the low-resolution input to high resolution (HR), then reconstruct the HR to obtain the desired output. These models encounter some major flows; large computational resources and losing information. In this paper, we adopt gradual process for training the CNN, to propose an efficient super-resolution model. The gradual strategy helps network to progressively magnify and reconstruct the LR image in each step, and thereby possibly avoid of losing information (second problem). In addition, we optimize the number of layers, add the residual network and skip connection to the proposed network to ease the difficulty of training (first problem). The proposed model not only achieves a compatible performance with the existing prominent methods but also, efficiently reduce the computational expenses.

Keywords

Super-resolution deep network skip connections image processing

1 Introduction

Image resolution describes the details of an image and the higher resolution targets needs more image details. Initially there was little interest in this technology, overtime with grows of technology; the need of resolution enhancement cannot be overlooked in many crucial applications, such as security surveillance [21], medical imaging [28], remote sensing [1 , 14], object recognition [18 , 32]. However, it is very difficult to resort the high-resolution images from a low-resolution input, because usually, the number of pixel in HR images targets are larger than LR images.

In essence, the idea behind image resolution is to combine the useful and non-redundant information which contained in the low-resolution cases to create a high-resolution image. Before taking the study further, a brief description regarding the degrading factors in image capturing is given. In the process of capturing the digital image, there is a natural loss of resolution that occurs by the optical due to limited shutter speed, noise that occurs during transmission or even within the sensor, and insufficient sensor density. Therefore, the effected images will e noisy, indistinct and deficient in resolution [11 –15]. The best remedy could be super resolution methods. Many researchers have developed and design methods for solving the resolution problems in low-quality images. Deep learning-especially convolutional neural networks-are becoming a mainstream in many computer vision applications, and it considered to be the best solution for image resolution problem, due to the structure of deep learning which gradually learning the information of an image by cascading the CNN and nonlinear layers [27]. The first attempt at using CNN based on image super-resolution introduced by Dong et al. [4] termed as SRCNN.

They used only three convolutional layers in their network structure. A couple of years later, they [3] extended their work by increasing the number of filters and the size of filters with the fixed depth of CNN. These two experiments proved that deeper models not only hard to train but even failed to improve the performance. In contrast, many other researchers used deep network in their structure to improve the resolution of images. However, their observations convey that deeper networks which stand-alone based image super-resolution tend to degrade the performance and even not suitable for image super resolution problems.

The reason can be due to two problems. Naturally, with more layers of convolution, there is a significant threat to lose the image details, and the fewer image details, the less quality resolution (vanishing gradient problem). The second problem refers to computation and optimization that leads to the hard and lengthy training process. Practically, when any models contain more parameters, the network faces difficulty to train, even sometimes by adding more layers we face more training errors [5 , 23]. Therefore the combination of deep CNN with the residual network will be expected to take advantage of the correlations between low resolution and high-resolution images, also adding skip connection would ease the gradient vanishing problem in the deeper network.

However most of the existing networks based SR adopting the following strategies. The first group approaches upsamples the low resolution input with a simple interpolation method such as bicubic at the initial step and then improves the resolution [7 , 23]. The second group uses upsampling only at the last layer, typically using a sub-pixel convolution layer to recover the HR result. These methods dealt with some major flaws to achieve high quality. Based on above observations, in this paper, we tried to compensate the undesired behavior of deep CNN architecture in image super-resolution by optimizing the network structure.

Figure 1 shows the structure of proposed network. The proposed network consists of multistep upsampling layers that gradually reconstruct and improve the image resolution. The gradual network contains several subnetworks, in which, each subnetwork only train a small target magnification, and thus, more image details could be extracted in each step. To improve the training and reduce the hyper-parameters the residual network and skip connection has been added to the network which suggested by [7]. Indeed, the proposed structure in this paper is simple and provides superior accuracy performance compared with state-of-the-art methods. The overall contributions of this paper are in three folds: A convolutional neural network is presented for image super-resolution which gradually do upsampling and extracting features, while preserves all the image information. The network is an end-to-end architecture between low (input) and high-resolution images (target). To better train the network, avoiding of losing image details we propose to add skip connection to CNN layers between convolution and upsampling layers. We demonstrate that gradual upsampling helps the DeepCNN (where prudently combined with residual network and skip connection) in the classical computer vision problem of super-resolution and can achieve good quality with lower expenses time. We evaluated our original experiments on Set14 and BSDS100 image data. In addition, we compare with a number of recently published methods and confirm that our model still outperforms existing approaches using different evaluation metrics.

2 Related work

Due to the potential and widespread usage of high resolution images, and also the LR images are the formidable obstacle to display the reality, the super-resolution approach has been one of the most active research areas in the literature. The main advantage of SR approach refers to cost less and also the existing low resolution imaging systems can be utilized. Even we found that, currently, many applications which are based on image processing and computer vision and have a desire demand of high quality of display; they switched their image quality’s application to use super resolution technique [24–26 , 30]. The convolutional neural network is successfully applied in wide range of computer vision areas, such as classification, recognition, detection, and super resolution. Since this paper proposed a deep CNN model for image super-resolution, thus, in this section we have given special emphases on the most recent prominent work based on deep learning in image super-resolution.

Fig. 1

The Pipeline of the proposed deep network.

Dong et al. [3] proposed the deep convolutional neural network for the image super-resolution by using 2 to 4 CNN layers, termed as (SRCNN). They experimentally proved that the deeper models are not suitable for image super-resolution, while, using a CNN with larger filter size is better than deeper layers. A few years later, the authors attempt to improve SRCNN and made it faster which is termed as FSRCNN [4]. The FSRCNN used transposed CNN in place of normal CNN to easily and quickly process the low-resolution input. They experimentally proved that RAISR and FRSCNN’s models are 10 to 100 times faster than other prominent Deep Learning based image resolution. Contrary, Kim et al. [7] proposed a very deep CNN contains 20 layers with a long skip connection to discover the residual image termed as (VDSR). Their model outpaces the SRCNN which proposed by Dong et al. [4]. Deeply recursive convolutional network is another method that is proposed by Kim et al., for image super-resolution problem [8]. The model has a deep recursive depth 16, and 20 convolution layers, which shows the model dealt with the huge number of parameters.

In order to minimize the number of parameters, they share the CNN’s weight, and hence they successfully mitigated the training problem of deep CNN and achieved the significant performances. Yaniv Romano et al. [15] proposed a shallow model instead of deep with a fast learning technique. Their model called “Rapid and Accurate Image Super Resolution” and it can classify the input image patches based on their patch’s angles, strengths and coherence then map the low-resolution input to high-resolution targets by the clustered patches.

The other Deep Learning-based Image resolution methods are deep Residual Encoder-Decoder Networks (RED) which is proposed by [13]. RED is based on residual learning and contains symmetric convolution as encoder and deconvolution as decoder layers. In addition, they use skips connections to connect every two or three layers. This structure leverages the capacity of the model to train, up to 30 layers and achieves the acceptable performance. However, in this paper we used the advantages of existence methods and tried to improve the image quality as well as the training process.

3 Deep convolutional neural network for image super-resolution

Based on existing observation, SR methods based convolutional neural network outpaces the other techniques. Hence, as far as super-resolution is concerned and is a severe demand from many industry, the pioneering work such as CNN will be as fresh as 2014 [2–4 , 8]. The initial start is one CNN layer and then increases the number of layers. We keep continuing up to reach the desired result, and once it stopped improving performance, then we used other features such as activation, dropout, and batch normalization. Ultimately after several trials, we select the best structures and hyperparameters as our final model which will suit for image super-resolution task. Figure 2 is depicted to show the difference of proposed model over the exiting methods in what we follow.

Fig. 2

Overall Structure of difference between the other models and the proposed model.

3.1 Model architecture

Before explaining the proposed model, it is expedient to briefly descript the challenges of existing CNN based SR methods for better understating the work. Technically, the deeper the CNN, the results should be more accurate. However, as a results from the previous methods the deeper networks deal with three main issues; overfitting, losing information in deep layers, and a huge model [6 , 19]. Moreover, the researchers of SR based deep networks noted that, they suffered with training problems, especially the exploding/vanishing gradients problem. Distinguish from other methods in this paper we propose a SR methods which has a deep network in order to extract more subtle image details (as before discussion, more details given better resolution) and also, with deep layers we want to give specific advantage to the proposed model that the input images can be of any size and the algorithm is not patch-based.

To address the current issues and compensate the undesired behavior of deep CNN, the direct upsampling process is substituted by gradual upsampling and we try to accurately optimize the number of layers (possibly deep layers). Another issue that we encounter in using deep networks is the huge number of parameters. In this paper we try to reach to the far fewer parameters from the existing methods by adopting a residual learning [19 , 24] and skip connection [13, 18], which can control the number of parameters in depth network. Moreover, most of existence Deep CNN models based on image super-resolution follows two strategies: some approaches firstly enlarge the low-resolution input to high resolution and then reconstruct the HR inputs [2 , 17]; other approaches directly reconstruct the low-resolution image then at the last layer recover the high-resolution target [4 , 28].

The proposed architecture technically is different with the current approaches. We aim to gradually enlarge & magnify the image in a multistep process to obtain and extract all the subtle details & information in order to improve the image qualification and resolution. Since the quality and resolution of the images highly depend on the image details. Note that, more image details, better image resolution. The rectification layers are added after each convolution and upsample layers. The structure of proposed model has shown in Fig. 1 and also the difference of proposed model over other existence models are depicted in Fig. 2. In our proposed network we didn’t consider pooling since it ignores some of the useful image information which may crucial for the super-resolution tasks. Structurally, convolutional layers act to extract the features by preserving the important components of objects in the image. Most of the times, the imperative contents of the image will be lost during the training process in the deep network. Therefore we tried to solve the problem by using gradual upsampling strategy. The basic structure of the model is as follow. For extracting all features of the images, the entire outputs of the hidden layers are merged to the Upsampling network as Skip Connection.

After merging all of the extracted features blocks of CNN have been used to recreate the details of the image. The output of final CNN layer is the square of scale factor image, and finally, the up-sampled original image is predictable by including these outputs to the up-sampled image created by upscaling. Consequently, the novelty of the proposed deep CNN model is to use gradual upsampling and residual strategy with skip connection for learning from the low resolution input to get a high resolution image. This network twice has been used in the proposed model. In the earlier works, the image up-sampling was generally used as their input for the CNN based models. As shown in Fig. 1 it also looks inefficient to extract the features from an up-sampled image rather than using the original size image. In the proposed model there are two levels of feature extraction before and after the upsampling.

Moreover, there are also some hidden layers for concatenation of each pair block of networks. The first novelty of this model is to extract features before and after upsampling and secondly have hidden layers between each pair block of convolution layers. The number of convolution layers in the proposed model is less than other approaches meanwhile; skip connection and residual learning also have been used in the model to reduce the computational cost and the processing time. Moreover, in this paper, we experimentally show that very deep networks considerably improve super-resolution performance. In Fig. 4, we set the network depth from 5 to 20, and by increasing the depth, the performance will be better.

Fig. 4

Residual networks quickly reach desired performance, while with-no-residual networks after many epochs reach maximum performance.

3.2 Model formulation

The proposed network conceptually consists of several interdependent components. Input layer: the input layer is a convolution layer which is activated by ReLU. For a low-resolution input X in size of (m_LR, n_LR), we can compute the input layer as: $F_{in} (X) = Re LU (w_{in} * X + b_{in})$ (1)

X is the low-resolution image (input), and ’*’ is the convolution operation. W_in and b_in are the sizes of weight and bias. In this paper, the range of filter size is from 64 to 512 with the 3×3 size of the input layer. Moreover, in the proposed model all the convolutions are computed with zero-padding, thus it has a result in invariant resolution.

Upsampling: this operation will gradually enlarge and resize the low-resolution image to high resolution by the following equation: $F_{up}^{l} (X^{l}) = U ↑ (X^{l})$ (2)

where X^l is the input of the layer L and $U \bar{↑}$ represent the upsampling process. Moreover, the bicubic interpolation is used in each upsampling layers. It is observed that the deconvolutional layers also can be used for magnifying the resolution of the images. However, the deconvolution layer only with a constant integer is able to be upscaled and is severely determined by the size of filters. Therefore, based on these observations, the upsampling layer is more flexible and can easily magnify and enlarge the input to the desired resolution.

Convolutional layers: as we discussed before, the low-resolution input progressively upsampled within several steps which termed as gradually upsampling. Structurally, each step contains an upsampling layer and several convolution layers. We used the stack of 3×3 convolutional layers. However, the 3×3 convolutions are not able to handle 1×1 convolution, which is proved to be impactful in super-resolution task [11]. Therefore, for better target resolution we insert a 1×1 convolution layer at the end of each step, and the output can be formulated as: $F^{l} (X^{l}) = Re LU (w^{l} * X^{l} + b^{l})$ (3)w^l represents the last layer of each step. X^l is the input of each layer, and b^l is used after each convolutional layer to enhance the capacity of the network.

Output layer: the target of the network can be computed as: $Y = w_{op} * X^{l} + b_{op}$ (4)

The output of this calculation will be a Y high-resolution image. W_op and b_op are in the size of 3×3×64×1, and 1×1 respectively. According to [28], while employing a CNN for super resolution, pooling or subsampling are not appropriate since they causes to discard and losing the image details. Therefore, the pooling layers are usually avoided in super resolution a method which again has its weakness; and overfitting may occurs. Hence, in this paper, by proposing gradual upsampling we tried to solve this undesired downside. In addition, the low resolution input X is gradually & progressively reconstructed in R steps. The upsample resolution in each step can be formulated as Let assume we have a low-resolution image with the [m_HL, n_HL] size, and we want to find the upsample resolution of the jth step: $\begin{matrix} (m_{L} + j Δ_{m} + n_{L} + j Δ_{n}), j = 1, 2, 3, \dots, R - 1 \\ Δ_{m} = D (\frac{m_{HR} - m_{LR}}{R}) \\ Δ_{n} = D (\frac{n_{HR} - n_{LR}}{R}) \end{matrix}$ (5)

Term D (.) indicated as the round-down function.

3.3 Training the proposed model

CNN models are able to set a highly nonlinear regression problem. As we noted before, about different CNN strategies in super-resolution; but gradual upsampling in CNN architecture not very commonly used in image SR models. Following equation has given for the training pairs of low resolution and high-resolution images: X^LR, Y^HR indicates the low-resolution and high-resolution image respectively. We aim to predict the target Y^HR through; {X^LR, Y^HR} ⁱ, i = 1, 2, …, k and CNN mapping the X^LR to Y^HR by mapping function f (X^LR). The CNN based mapping function process will be learned as: $l_{m s} (X^{L R}, Y^{H R}) = 1 / 2 \sum_{i = 1}^{k} | | Y^{H R} - g (X^{L R}) | |^{2}$ (6)

Table 1

The PSNR and SSIM comparisons

Dataset	Scale	SRCNN [2]	VDSR [7]	DRCN [8]	FSRCNN [4]	RED [13]	Romano et al. [15]	Proposed (GUSR)
		PSNR/SSIM	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM	PSNR/SSIM
Set 14	×2	31.70/0.9063	33.70/0.9205	32.86/0.9246	32.06/0.8862	32.11/0.8890	32.69/0.9132	33.82/0.9427
	×3	28.78/0.8209	33.04/0.8315	32.82/0.8756	30.00/0.8154	31.69/0.8842	31.06/0.8805	33.64/0.9017
	×4	28.93/0.7503	29.11/0.7774	28.06/0.7632	27.49/0.8102	30.13/0.7753	27.37/0.7904	29.88/0.7875
BSDS 100	×2	30.98/0.8879	31.62/0.8960	30.91/0.8982	29.64/0.8103	29.37/0.7792	31.89/0.8539	31.98/0.8993
	×3	28.33/0.7863	28.73/0.7967	28.73/0.7921	25.79/0.7805	29.04/0.7831	30.58/0.8347	28.72/0.7966
	×4	26.90/0.7101	27.29/0.7252	27.30/0.7251	24.05/0.7740	26.18/0.7359	23.76/0.8294	27.92/0.7753

(Scale Factor ×2, ×3, ×4). The best performance is stressed italics and the second best performance is stressed in bold.

To achieve a better efficacy and solving the training difficulty, we combined the CNN architecture with a residual network that suggested by [7, 13]. Similar to the structure of [20], since the input and the output images are very similar, we define the residual Image as Y^HR – X^LR (where most of the values are small or zero). The object function of proposed image resolution cans is formulated as: $\min \frac{1}{2} \sum_{i = 1}^{k} | | Y^{H R} - X^{L R} - g (X^{L R}) | |^{2}$ (7)

As the low-resolution image in input and high-resolution image in output are very similar, and thus we need to learn their residual information and then skip the similarities, thereby residual network can improve the network process.

4 Experiments and discussion

To evaluate the performance of the proposed model for image super-resolution two set publicly available image datasets have been used. The reports have given in Table 1 and Figs. 4–7. In this section we discuss two important properties in super resolution models. One property is the depth of network. We proved that for SR models depth of network is very important, since the deep network can grab more details, and the more details can provide a better resolution. The second property can be regarding the residual learning. We show that combining the residual learning to the CNN could improve the training process much better than the standard CNN. In following subsections, we have given the details of implementations.

4.1 Residual learning and skip connection

Skip connection can reduce the vanishing gradient problems in the deep networks and enhance the feature extraction. Based on the recent researches on SR we observe that the deeper network in image SR will degrade the performance. Thus integrating residual network and skip connection to the network will make an easier training process. Figure 3 is an example of residual architecture used in this paper. Our skip layer is different from [6] because we use skip connections as a bridge to pass the image details from convolutional layers to upsampling layers, thus in this case has beneficial to recover the image details. Then the feature maps are passed to the next layer after rectification for further assessments. The current methods in literature share a key idea: link the input data and the final reconstruction layer in SR. However they only adopted a single skip connection which may not fully explore the advantage of skip connection. It is essential to build many skip connections between layers to effectively train a very deep network, and thus in this work we use several skip connection in the residual network to improve the network structure.

From other side, the degradation of training accuracy specifies that, not all the systems are similarly easy to optimize. Because the existing evidence indicates that a deeper network should not produces the higher training errors, but experimentally there is a higher errors in deep networks as well, if we couldn’t properly optimize the network.

The concept of residual learning has been used in many applications [5] but it has not been widely applied in the CNN-based super-resolution methods. In this paper we address the degradation problem by adopting residual learning in the network. In Fig. 3, we have shown the details of residual unit which contains two convolutional layers and one skip connections in a single residual. We combined CNN with residual learning and skip connection to improve the network structure and ease the training process. As we indicated in Fig. 1, in the proposed model we used two residual units with the above details. We have evaluated two networks; with residual learning and without residual learning.

Fig. 3

The architecture of residual unit.

Figure 4 shows the effect of residual learning in the DeepCNN architecture. In Fig. 4, we didn’t evaluate for all scale factors, we just used scale factor 4, with an initial learning rate 0.004.

4.2 Depth image super-resolution

In oppose to existing methods based super resolution [2–4 , 21], we proved that, the shallow models (where there is a few number of layers) are not suitable for image super-resolution.

With the increasing number of layers, we can grab more image subtle details and information (Fig. 5). As we discussed before, in image SR, the image details are very important, with more we can get better resolution results. Usually, the image information between the low resolution and high resolution in deep layers will be lost. Therefore, this can be a very challenging task, since the image information has a close relationship with their resolution.

Fig. 5

Depth vs Performance (Scale Factor ×2 & ×3).

To successfully achieve the high-resolution image from the low-resolution image, we propose to use the deep convolutional network with the gradual upsampling structure (multistep upsampling). Figure 5 shows thee performance evaluation via depth of network. We train the network in depth of 5 to 20. For all the layers we used the same filter size of 3×3. The receptive field of first layer is 3×3, while for the next layers, it increases by 2. Therefore, for the N number of layers the receptive field can be computed by (2N + 1)×(2N + 1) for both the height and depth. Its size is proportional to the depth. In a large receptive field, the network intends to use more contexts to predict the image details. For the SR, collecting and using more details, providing better resolution.

4.3 Implementation details

We have used Signal-to-Noise Ratio (PSNR) and Structure Similarity of Index (SSIM) as evaluation metrics in our experiment. In this section, we have given the details of the proposed model and the hyper-parameters used in this work. We assume the low-resolution image as X^LR, and the goal is to achieve a high-resolution Y^HR output in multistep magnification. In each step, the low-resolution image $X_{i}^{LR}$ convert to better resolution $X_{i + 1}^{LR}$ , and the result of each step $X_{i + 1}^{LR}$ will be the input of next step. In this way, we can minimize the corrupt and information lost. The proposed model is initialized to 0 for all biases and ELUs. During training, dropout is set to 0.5 for the output of each ELU layers. To minimize the loss, we have used Adamax [4] as optimization algorithm with an initial learning rate = 0.004 and it decreased by a factor of 5 every 20 epochs. The objective is to get the best model f which perfectly predicts values y = f (x).

Generally, in super resolution techniques the average of mean squared error $\frac{1}{2} {∥ y - f (x) ∥}^{2}$ in the training set should be minimized. Note that the above details have been used for both the networks; with residual learning and without residual learning. Table 1 represents quantitative comparisons of PSNR and SSIM for 2×, 3× and 4× scale factors. The overall results illustrate the proposed model has the best performance in comparison with the state of the art models. An example of the results is shown in Fig. 7(a) and (b).

Fig. 7

×3 scale factor upsampling result for different SR models.

Since this work is centralized to improve the image quality with reasonable time consuming, and also we have used different platform and libraries to evaluate the baselines, thus, it is not fair to compare the results based on execution time. However, to show effectiveness of our proposed model we have calculated the PSNR over execution time in scale factor 3 and the results exposed in Fig. 6. The baselines used in this paper are thoroughly selected and we tried to uses deep and shallow models to effectually show the performance of our model. We used six well-known baselines and analyzed their performance based on different scale factors; SRCNN [2], VDSR [7], DRCN [8], RED [13], FSRCNN [4] and RAISR [15]. Moreover, Fig. 7(a) and (b) shows the performance of our proposed model over Set14 datasets based on 3 scale factors. The zoom-up area is marked, and the HR results also given.

Fig. 6

Computational times over PSNR. Noted that, this figure is based on set 14 dataset, with ×4 scale factor.

4.4 Discussion

The residual CNN with a skip connection makes the proposed architecture a fully residual deep CNN. Therefore, it directly employs residual properties inside and across levels and hence grants the opportunity of building a deeper network with higher accuracy. Note that the proposed network is not only much deeper in comparison with other existence approaches but also a fully residual network due to skip connection which makes the network more robust and accurate.

The advantage of using residual network with skip connection is given in Fig. 3. We have used two residual units in our architecture, in which each residual units contains two convolution and one skip connection.by using this trick we will get the network more robust and smooth which is important in resolution problem. Furthermore, by using gradual upsampling strategy we can progressively improve the quality of the low-resolution image without information loss, and more worthy, this strategy helps us to reduce the execution time at optimal level. We proved our claim in Fig. 6.

As it observed from Fig. 6, our proposed model outpaces the baselines in execution time metric. The proposed strategy enables us to significantly reduce the evaluation times while keeping the results competence with baselines. The second best performance based on time execution reduction goes to VDSR and DRCN, which both are the new techniques in image super resolution. Compares to other methods, FSRCNN has a small execution time however, its performance in PSNR is weak and not considerable. Figure 7 is an example that shows the quality and clarity result of our proposed model over other methods. We used the maximum scale factors which is 4 in Fig. 7. By comparing the magnified edges of the selected area, following observations can be getting. For the Fig. 7(a) which the area around the eye has been selected the result of SRCNN and VDSR blur in comparison with DRCN and the proposed model GUSR. For the Fig. 7(b) Firstly, the texts on the watch are almost lost for most of the models during scale factor 4×. Secondly, the two-deep learning networks, the DRCN and the GUSR, can build sharp edges for 4×SR. Finally, the proposed GUSR achieves clearer and sharper results as compared to other models.

Once again we alluded to the plus points and the limitations of the proposed GUSR. The layer by layer upsampling and feature extraction could decrease the problems of the straightforward training upsampling. Additionally, the proposed model could quickly get trained and then progressively optimize the weights. As compared to the VDSR [7] and DRCN [8], the gradient clipping approach which is not used in the GUSR but still has high speed converged due to residual layer performance. The proposed GUSR specifically proposed for the SR scenario and does not have good performance for other image reconstruction issues like denoising. Last but not the least, as it is shown in the image 7 (b) the proposed GUSR does not have significant performance on the text and it is not able to properly recover the missing text components. Therefore, for the future work, we planned to use the advantaged of the model which proposed by Ledig et al. [10] to fulfill the limitation of the proposed model. The advantages of our proposed model are depicted in Table 1 and Figs. 6 and 7. Table 1 shows the qualitative performance of the proposed model over several well-known methods in super resolution on the different scale factors. The best performances over the different datasets are highlighted in red color, and the second best performance highlighted in blue color.

5 Conclusion

We proposed an efficient deep convolutional neural network for super-resolution method termed as Gradual upsampling SR (GUSR). The proposed network simply contains an input layer, multi-scales upsampling & convolutional layers, and an output layer. More specially, the proposed model enables us to control the execution time at the optimal level. Note that, for the fair comparison, our two networks have a same architecture and number of layers. The first network is plain and without using residual network, and the second networks contains two residual networks. The result of both networks has given in Fig. 3. The proposed model eases the training process with no information loss. By optimizing the network structure, the feature maps of our method are fed to the upsampling layer by skip connection. Using these structures, our model is able to achieve state-of-the-art performance with less computation cost. Experimental results on two representative image datasets demonstrate that the proposed GUSR can improve the image resolution and also produce competitive results over other prominent methods in super-resolution task.

Footnotes

Acknowledgments

This research was supported by NSFC, China (No: 61603171) and 863 PlanChina (No.2015AA042308).

References

Shamsolmoali ,

Zareapoor and

Yang , Convolutional neural network in network (CNNiN): Hyperspectral image classification and dimensionality reduction, LET Image Process (2018). 10.1049/iet-ipr.2017.1375.

Dong ,

C.C.

Loy ,

He and

Tang , Image superresolution using deep convolutional networks, IEEE Trans Pattern Anal Mach Intell 38(2) (2016), 295–307.

Dong ,

C.C.

Loy ,

He and

Tang , Learning a deep con-volutional network for image super-resolution, Computer Vision-ECCV 2014, Springer, 2014, pp. 184–199.

Dong ,

C.C.

Loy and

Tang , Accelerating the super-resolution convolutional neural network, European Conference on Computer Vision (2016).

Zeyde ,

Elad and

Protter , On single image scale-up using sparse-representations, Curves and Surfaces, Springer, 2012, pp. 711–730.

He ,

Zhang ,

Ren and

Sun , Deep residual learning for image recognition, Proc IEEE Conf Comp Vis Pattern Recogn (2016).

Kim ,

J.K.

Lee and

K.M.

Lee , Accurate image superresolution using very deep convolutional networks. arXiv preprint arXiv:1511.04587 (2015).

Kim ,

J.K.

Lee and

K.M.

Lee , Deeply-recursive convo-lutional network for image super-resolution. arXiv preprint arXiv:1511.04491 (2015).

R.K.

Srivastava ,

Greff and

Schmidhuber , Training very deep networks, Proc Adv Neural InfProcess Syst (2015).

10.

Ledig ,

Theis ,

Huszar , et al., Photo-realistic single image super-resolution using a generative adversarial network, International Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 105–114.

11.

Liu ,

Wang ,

Nasrabadi and

Huang , Learning a mixture of deep networks for single image superresolution, Conference on Computer Vision (ACCV), 2017, pp. 145–156.

12.

Xu ,

S.E.

Smith ,

Grunwald ,

A.A.

Elrahman and

S.P.

Wani , Evaluating the effect of remote sensing image spatial resolution on soil exchangeable potassium prediction models in smallholder farm settings, J Environ Manage 200(15) (2017), 423–433.

13.

X.J.

Mao ,

Shen and

Y.B.

Yang , Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections, Neural Information Processing Systems (2016).

14.

Zhang ,

Li and

Cui , High resolution remote sensing image change detection based on law of cosines with box-whisker plot, International Workshop on Remote Sensing with Intelligent Processing (RSIP), 2017. 10.1109/RSIP.2017.7958805.

15.

Romano ,

Isidoro and

Milanfar , RAISR: Rapid and accurate image super resolution, IEEE Trans Comput Imaging 3(1) (2017), 110–125.

16.

Shi ,

Caballer ,

Huszar , et al., Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, Proc IEEE Conf Comput Vis Pattern Recogn 1 (2016), 1874–1883.

17.

Wang ,

Wang and

Li , End-to-end image superresolution via deep and shallow convolutional networks. arXiv preprint arXiv:1607.07680 (2016).

18.

Yamanaka ,

Sh.

Kuwashima and

Kurita , Fast and accurate image super resolution by deep CNN with skip connection and network in network, International Conference, ICONIP, 2017, pp. 217–225.

19.

Yang ,

Feng ,

Yang , et al., Deep edge guided recurrent residual learning for image super-resolution. arXiv preprint arXiv:1604.08671 (2016).

20.

Zhao ,

Wang ,

Dong ,

Jia ,

Yang ,

Liu and

Gao , GUN: Gradual upsampling network for single image super-resolution. CoRR abs/1703.04244 (2017).

21.

Shamsolmoali ,

Zareapoor ,

D.K.

Jain ,

V.K.

Jain and

Yang , Deep convolution network for surveillance records super-resolution. Multimed Tools Appl (2018). 10.1007/s11042-018-5915-7.

22.

Liang ,

Yang ,

Zhang ,

He ,

Wang and

Zheng , Single image super-resolution with a parameter economic residual-like convolutional neural network. CoRR abs/1703.08173 (2017).

23.

Timofte ,

V.D.

Smet and

L.V.

Gool ,Supplementary material (a+: Adjusted anchored neighborhood regression for fast superresolution), http://www.vision.ee.ethz.ch/timofter/ACCV2014ID820SUPPLEMENTARY/.

24.

Yang ,

Zhang ,

Liang and

Wang , Single Image Super-Resolution with a Parameter Economic ResidualLike Convolutional Neural Network, Springer International Publishing, 2017, pp. 353–364.

25.

J.L.G.

Guirao and

Gao , Multi-frame super-resolution reconstruction via kernel regression regularized sparse learning, J Intell Fuzzy Syst 33(5) (2017), 3051–3058.

26.

Shelhamer ,

Long and

Darrell , Fully convolutional networks for semantic segmentation, IEEE Trans Pattern AnalMach Intell 39(4) (2017), 640–651.

27.

Liu ,

Wang ,

N.M.

Nasrabadi and

T.S.

Huang , Learning a mixture of deep networks for single image super-resolution. CoRR abs/1701.00823 (2017).

28.

Sano ,

Mori ,

Goto ,

Hirano and

Funa-hashi , Super-resolution method and its application to medical image processing, IEEE 6th Global Conference on Consumer Electronics, 2017. 10.1109/GCCE.2017.8229301.

29.

Haris ,

Shakhnarovich and

Ukita , Task-driven super resolution: Object detection in low-resolution images. CVPR (2017). arXiv:1803.11316 [cs.CV].

30.

Li ,

Xie and

Li , Hyperspectral image reconstruction by deep convolutional neural network for classification, Pattern Recogn 63 (2017), 371–383.

31.

Greaves and

Winter , Multi-frame video superresolution using convolutional neural networks. http://cs231n.stanford.edu/reports/2016/pdfs/212Report.pdf (accessed 16-04-17).

32.

Yang ,

Wu ,

Liu ,

P.W.

Kim ,

A.K.

Sangaiah and

Jeon , Long-distance object recognition with image super resolution: A comparative study, IEEE Access 6 (2018), 13429–13438.