Residual learning of deep convolutional neural networks for image denoising

Abstract

Image denoising is a hot topic in many research fields, such as image processing and computer vision. With the development of deep learning, deep neural networks are widely used for image denoising and have achieved good effectiveness. Inspired by the characteristics of feed-forward denoising convolutional neural network (DnCNN) and biological neuron response, we propose a Symmetry-Rectifier Linear Unit (SyReLU) and further offer a corresponding SyReLU activation function, which has a better consistency with biological neuron characteristics in comparison with other activation functions, e.g. Rectifier Linear Unit (ReLU) and Leaky Rectifier Linear Unit(LReLU). Also, in order to denoise image, we use SyReLU activation function for residual learning of CNN (e.g. DnCNN). Specially, the experimental results indicate DnCNN with SyReLU can achieve better effectiveness than DnCNN with other activation functions (e.g.ReLU and LReLU) for image denosing on Set12 and BSD68 datasets. Briefly, the proposed method plays an important role in the development of activation function and is very useful in deep neural networks for image denosing.

Keywords

Image denoising Symmetry-Rectifier Linear Unit convolutional neural networks SyReLU activation function residual learning

1 Introduction

Image denoising is not only a classical and active topic in low level vision but also an indispensable step in many practical applications. The goal of image denoising is to recover a clean image x from a noisy observation y which follows an image degradation model y = x + v. Generally speaking, v is additive white Gaussion noise (AWGN) with standard deviation σ. In view of a Bayesian, the image prior modeling will play a key role in image denoising when the probability is known. In the past few decades, many models have been used to construct image priors, including nonlocal self-similarity (NSS) model [1 –4], sparse models [4 –6], gradient models [7 –9] and Markov random field (MRF) models [10 –12]. In particular, the NSS models are popular in state-of-art methods such as BM3D [2], LSSC [4], NCSR [6] and WNNM [13]. Although most of the image prior-based methods have obtained high denoising quality, they usually have two major drawbacks. Firstly, these methods usually involve a complex optimization problem in the test phase, which makes the denoising process time-consuming [6, 13]. Therefore, most prior-based methods can hardly achieve high performance without sacrificing computational efficiency. Secondly, the models are usually nonconvex, including a number of manually chosen parameters, which provide some leeway for improving the denoising performance.

In order to overcome the limitations of prior-based approaches, several discriminant learning methods have been developed to learn image prior models in the context of truncated inference procedure [14 –19]. The resulting models are able to get rid of the iterative optimization procedure in the test phase. Schmidt and Roth [14] proposed a cascade of shrinkage fields (CSF) method that unifies the random field-based model and the unrolled half-quadratic optimization algorithm into a single learning framework. Chenet al. [15, 16] proposed a trainable nonlinear reaction diffusion (TNRD) model which learns a modified fields of experts [12] image prior by unfolding a fixed number of gradient descend inference steps. Besides, some of the other related works can be found in [17, 18]. Although CSF and TNRD have shown the promising results toward bridging the gap between computational efficiency and denoising quality, their performance could be inherently restricted to the specified forms of prior. To be specific, the priors adopted in CSF and TNRD are based on the analysis model, which is limited in capturing the full characteristics of image structures. In addition, the parameters are learned by stage-wise greedy training plus joint fine-tuning among all stages, and many handcrafted parameters are involved. Another nonnegligible drawback is that they train a specific model for a certain noise level, and are limited in blind image denosing. Recently, there have been several attempts to handle the denoising problems by deep neural networks. Jain and Seung [19] proposed to use convolutional neural networks (CNNs) for image denoising and claimed that CNNs have similar or even better representation power than the MRF model. The multi-layer perceptron (MLP) [20] was successfully applied for image denoising. Stacked sparse denoising auto-encoders method [21] was adopted to handle Gaussian noise removal and achieved comparable results to K-SVD [5]. Zhanget al. [22] proposed a denoising convolutional neural network (DnCNN) instead of learning a discriminative model with an explicit image prior. DnCNN treats image denoising as the typical discriminative learning problems, i.e., separating the noise from a noisy image by feed-forward convolutional neural networks (CNN). DnCNN is an end-to-end trainable deep CNN for Gaussion denoising. In contrast to the existing deep neural netwok-based methods which directly estimate the latent clean image, the network adopts the residual learning strategy to remove the latent clean image from noise. Residual learning and batch normalization can not only speed up the training but also boost the denoising performance in the DnCNN. For Gaussian denoising with a certain noise level, DnCNN outperforms state-of-the-art methods in terms of both quatitative metrics and visual quality. It is promising to solve three general image denoising tasks, i.e., blind Gaussian denoising, Single image super-resolution(SISR) and JPEG deblocking, with only a single DnCNN model.

From the view of neural network, the activation function is crucial to the performance of deep neural networks. As we know, Rectified Linear Unit (ReLU) [23] is one of keys to the breakthrough of deep neural networks and used in DnCNN. To further improve the performance of deep neural networks, there are a number of new activation functions are presented basing on ReLU, such as Leaky ReLU (LReLU) [24], Parametric Linear Unit (PLeLU) [25], Exponential Linear Unit (ELU) [26], S-shaped Rectified Linear Unit (SReLU) [27] and Multiple Parametric Exponential Linear Unit (MPELU) [28]. Furthermore, neurobiology studies have shown that all nerve cells have a resting potential [29]. The electrical signals generated by nerve cells fall into two broad categories. The first category is local graded potentials. The action potential (also known as the neural activity) is the second major category of electrical signals. It is the action potential that plays an important role in the process of signals transmission through the retina. Neurobiology studies further show that the magnitude of action potentials is generated and becomes bigger and bigger as the stimulation intensity increases and is more than a threshold, while the magnitude of action potentials does not increase when the stimulation intensity increases and is more than another larger threshold. Obviously, ReLU, LReLU, PLeLU, ELU, SReLU and MPELU are not compatible with the biological characteristic of nerve cells. In addition, the stimulation intensity should have the third lager threshold. The magnitude of action potentials is zero when the stimulation intensity is more than the third larger threshold. Human Auditory System works in a range of frequencies. When the vibration intensity of an object exceeds 20, 000 Hz, we can’t hear anything. ReLU, LReLU, PLeLU, ELU, SReLU and MPELU do not meet this condition. Inspired by the characteristic and condition, we explore the upper bound response of activation functions. Further considering the possibility that the gradient may be zero, we present Symmetry-Rectifier Linear Unit (SyReLU). Then, we combine DnCNN with LReLU and SyReLU respectively for image denoising.

This paper is organized as follows. Section 2 presents the response characteristics of biological neurons and the definition of Symmetry-Rectifier Linear Unit (SyReLU) and constructs DnCNN with LReLU and SyReLU respectively for image denoising. Section 3 offers the comparative experimental results of DnCNN with LReLU and SyReLU and the corresponding analysis. Section 4 draws the conclusions.

2 Residual learning of deep CNN for image denoising

2.1 Response characteristics of biological neurons

Neurobiology studies have shown that all nerve cells have a resting potential, i.e., the intracellular fluid is negative relative to the extracellular fluid (less than 100mV) [29]. All electrical signals produced by nerve cells are superimposed above the resting potential. Some signals depolarize the cell membrane to decrease the resting potential, and others hyperpolarize the cell membrane to increase the resting potential. The electrical signals generated by nerve cells fall into two broad categories. The first category is local graded potentials. These potentials are generated by the external physical stimuli, such as, the light irradiated on the photoreceptors in the eye, the sound waves that deform hair cells in the ear, the ophthalmic stimulation of sensory nerve endings on the skin, and the activity at synaptic sites (junctions between nerve cells and their target cells). The action potential (also known as the neural activity) is the second major category of electrical signals. The action potential is generated when the local grading potential reaches a level sufficient to depolarize the cell membrane beyond a certain critical level (called the threshold). Once the action potential is generated, it is quickly propagated over long distances. Unlike local grading potentials, action potentials that occur in neurons are fixed in amplitude and duration, just as the points in the code.

It is the action potential that plays an important role in the above process. An important feature of the action potential is that it is a triggered, regenerative and full-or-no event. Signals from bipolar cells and amacrine cells act on ganglion cells. If the effect is sufficient to reach the threshold, the action potential will be generated. Once the action potential is generated, its amplitude and duration will not be determined by the amplitude and duration of the stimulus. Larger stimulating currents do not generate larger action potentials. The longer stimuli do not extend action potentials. At this time, the relationship between the occurrence of the action potential (represented as AP) and the magnitude of the local graded potential (represented as LGP) is as follows. $AP = f (LGP) = {\begin{matrix} 0, & LGP < {LGP}^{*} \\ 1, & LGP \geq {LGP}^{*} \end{matrix},$ (1) where LGP^* denotes the threshold of the depolarization of the cell membrane caused by local graded potential, “1” denotes full event, and “0” denotes no event. Only if the entire sequence of the action potential has been completed can another action potential be triggered at the same position. After each action potential, there must be a quiet period (refractory period, represented as P), usually lasting a few milliseconds, during which the second impulse cannot be triggered. At this time, if the time of the last action potential triggered is T₀ (ms), the relationship between the occurrence of the action potential and time (abbreviated as T (ms)) is as follows. $AP = g (T) = {\begin{matrix} 1, & T = T_{0} \\ 0, & T_{0} < T \leq T_{0} + P \end{matrix},$ (2) where “1” denotes full event, and “0” denotes no event.

Since the magnitude of action potential is fixed, the information about the intensity of the stimulus is transmitted encoding the frequency of the discharge. A more effective visual stimulus will produce a larger local potential, which results in a higher discharge frequency from the ganglion cell [30]. This phenomenon was firstly described by Adrian [31], and he found that in a sensory nerve of the skin, the discharge frequency of the action potential is a measurement of the stimulus intensity. In addition, Adrian observed that the stronger the stimulation of the skin, the more sensory fibers were activated. Therefore, the action potential of a nerve fiber is not increased with the increase of the stimulation intensity that is more than the threshold. The sciatic nerve is composed of many nerve fibers. When the sciatic nerve trunk is stimulated by the stimulation whose intensity is more than a threshold, the number of nerve fibers that generate action potentials changes. As the intensity of stimulation increases, the number of nerve fibers that generate action potentials is increasing. Therefore, the magnitude of action potentials is greater and greater. When all nerve fibers generate action potentials, the magnitude of action potentials does not increase. Therefore, within a certain range, the amplitude of action potentials of the sciatic nerve trunk increases with the stimulation intensity. ReLU, LReLU, PLeLU, ELU, SReLU and MPELU match the property that the magnitude of the action potential increases with stimulus intensity increasing, but they don’t match the property that the magnitude of the action potential does not increase with enough stimulus intensity. They are not fully consistent with the biological nature. Furthermore, the frequency bandwidth of neurons will not be infinite. In other words, the response of the organism to the external stimuli is not infinitely increased. For example, the range of human hearing is 20-20, 000Hz, and the human vision can’t see anything when the light intensity is particularly bright. According to that, the organism’s response to the external stimuli should have an upper bound threshold. When the stimulation intensity exceeds the upper bound threshold, the organism’s response will no longer respond. But ReLU, LReLU, PLeLU, ELU, SReLU and MPELU can go to infinity with stimulus intensity increasing. Therefore, they are not fully consistent with the biological fact.

2.2 Symmetry-rectifier linear units

Neurobiology studies have shown that the action potential will be excited when the external physical stimuli reaches to the threshold value of the firing action potential. The generation of a single action potential is a full or no state. Since a piece of biological tissue has a nerve cell tissue, its response to the external physical stimulus is that the response begins when the stimulus intensity increases to the threshold. Then, as the stimulus intensity continues to increase, the response becomes more and more intense. Finally, with the increase of stimulus intensity, the response will no longer change. Many biological facts can verify this theory, such as, the relationship between the frog gastrocnemius stimulation response and the intensity of the stimulus frequency, and the relationship between the stimulus intensity, the stimulus frequency and the muscle contraction response. There is a question that what will happen when the response is no longer enhanced and the intensity of the stimulus continues to increase? There are few such biological experiments, but there are still some clues to the answer. For example, the audition of animals has an upper bound threshold, the muscle has a sense of numbness in a short time after hit by the external high-intensity force and so on. Most previous studies on activation functions never considered the upper bound threshold of the stimulus intensity, e.g. ReLU activation function. It is defined as Equation (3). $ReLU (x) = {\begin{matrix} 0, & x < 0 \\ x, & x \geq 0 \end{matrix} .$ (3) The biological response does not occur after the upper bound threshold. Based on these characteristics and ReLU activation function, we put forward the SyReLU with upper bound thresholds. Considering the vanishing gradient, we weaken this characteristic that the response will no longer change with the increase of the stimulus intensity in SyReLU. SyReLU function is defined as Equation (4). $SyReLU (x) = {\begin{matrix} 0, & x < 0 \\ x, & 0 \leq x \leq α \\ 2 α - x, & α < x \leq 2 α \\ 0, & x > 2 α \end{matrix},$ (4) where α is a constant and it can be determined according to requirements. SyReLU function is shown in Figure 1. Neurons that possess the nature of SyReLU are called SyReLU neurons.

Fig.1

SyReLU, i.e. a symmetrical form of ReLU.

Mathematically, SyReLU is a symmetric result of ReLU on the axis of symmetry x = α. The paper [23] pointed out that ReLU is sparser than sigmoid or tanh activation function. That is, when x < 0, the value of ReLU is 0, and the effective part of ReLU is all concentrated in the part of x > 0. Unlike ReLU, sigmoid and tanh do not have this property. Meanwhile, the paper [23] also pointed out the rationality and benefits of the sparseness of ReLU. In neuroscience, in addition to the new activation frequency function, neuroscientists have discovered the sparse activation of neurons. In 2001, Attwell et al. [32], based on observational learning of brain energy consumption, speculated that neuronal coding methods were sparse and distributed. In 2003, Lennie et al. [33] estimated that only 1 ∼ 4 percent of the neurons in the brain are activated at the same time, which additionally showed the sparsity of neuronal work. That is to say, neurons only selectively respond to a small portion of the input signal at the same time, and a large number of signals are deliberately shielded. In this way, the learning accuracy can be improved, and the sparse features can be extracted better and faster. In other words, almost half of the neurons of the traditional sigmoid function are activated at the same time, which is not consistent with the research of neuroscience and also brings great problems to training of deep networks. ReLU can satisfy the requirements of the network and neuroscience about sparsity. When x > 2α, the value of SyReLU is also 0, which is sparser than ReLU and greatly outperforms ReLU in terms of sparsity.

According to the definition of SyReLU function (4), its derivative is computed as Equation (5). ${SyReLU}^{'} (x) = {\begin{matrix} 0, & x < 0 \\ 1, & 0 \leq x \leq α \\ - 1, & α < x \leq 2 α \\ 0, & x > 2 α \end{matrix} .$ (5) As shown in Equation (5), the derivative of SyReLU function is relatively easy to compute. The saturation of an activation function is important for training the network, which is related to whether the gradient is suitable during training and whether the gradient vanishes or not. The vanishing gradient is a very important issue in the research of artificial neural networks. Therefore, the research on the saturation of activation functions is also concerned. Bengio et al. [34] defined the saturation of an activation function as follows. The activation function f (x), which can be derived everywhere in the domain and whose derivatives on both sides of the domain gradually approach to 0 (i.e. $lim_{x \to \infty} f^{'} (x) = 0$ ), is defined as a soft saturated activation function. Similarly to the definition of the limit, the saturated activation function is divided into the left saturated activation function and the right saturated activation function. The left saturated activation function is defined as $lim_{x \to - \infty} f^{'} (x) = 0$ . The right saturated activation function is defined as $lim_{x \to + \infty} f^{'} (x) = 0$ . The opposite of a soft saturated activation function is a hard saturated activation function that is defined as f′ (x) =0 when |x| > c, where c is a constant. According to this definition, both sigmoid and tanh belong to the soft saturated activation function. ReLU is a type of left-hand saturated and right-hand unsaturated activation function whose derivative is shown in Equation (6). ${ReLU}^{'} (x) = {\begin{matrix} 0, & x < 0 \\ 1, & x \geq 0 \end{matrix} .$ (6) This is why ReLU makes a breakthrough effect. Compared with sigmoid and tanh, ReLU effectively solves the problem of vanishing gradient. According to the definition of the saturation of an activation function, SyReLU is a kind of activation function that is the left hard saturated, the right hard saturated and the middle unsaturated. In the face of the problem of vanishing gradient, SyReLU inherits the advantages of ReLU.

2.3 Residual learning of deep CNN with SyReLU for image denoising

Here, we take DnCNN as an example of Deep CNN. To improve the performance of DnCNN1 for image denoising, we propose residual learning of DnCNN with SyReLU, i.e., replacing ReLU with SyReLU. The difference between them is activation function. In addition, the architecture of DnCNN with SyReLU is shown in Figure 2. Given DnCNN with SyReLU with depth D, there are three types of layers, shown in Figure 2 with three different colors. (i) Conv+SyReLU: for the first layer, 64 filters of size 3 × 3 × c are used to generate 64 feature maps, and a SyReLU is then utilized for nonlinearity. Here c repsesents the number of image channels, i.e., c = 1 for gray image. (ii) Conv+BN+SyReLU: for layers 2 ∼ (D - 1), 64 filters of size 3 × 3 ×64 are used, and batch normalization (BN) [35] is added between convolution and SyReLU. (iii) Conv: for the last layer, c filters of size 3 × 3 ×64 are used to reconstruct the output.

Fig.2

The Architecture of Residual Learning of DnCNN with SyReLU for Image Denoising.

The input of DnCNN is noisy observation y = x + v. DnCNN doesn’t learn a mapping function $F (x) = x$ to predict the latent clean image and learn the residual learning formulation to train a residual mapping $R (y) = v$ , and then we have $x = y - R (y)$ . Formally, the mean squared error [36 –41] between the desired residual images and estimated ones from noisy input $ℓ (Θ) = \frac{1}{2 N} \sum_{i = 1}^{N} ∥ R (y_{i}; Θ) - (y_{i} - x_{i}) ∥_{F}^{2}$ (7) can be adopted as the loss function to learn the trainable parameters Θ in DnCNN. Here ${(y_{i}, x_{i})}_{i = 1}^{N}$ represents N noisy-clean training image pairs. Figure 2 illustrates the architecture of DnCNN for learning $R (y)$ . For the i-th noise-clean training image pair (y_i, x_i), let the feature map of each layer of DnCNN be F_i. Obviously, F₀ = y_i denotes the input image with noise, and $F_{D} = R (y_{i}; Θ)$ denotes the predicted noise image. Let the convolution kernel parameter of each layer of DnCNN be W_i (0 ≤ i ≤ D - 1).

Then, (i) Conv+SyReLU: for the first layer F₁, we have $F_{1} = SyReLU (F_{0} * W_{0}),$ (8) where “*” indicates convolution operation.

(ii) Conv+BN+SyReLU: for layers 2 ∼ (D - 1), we have $F_{i} = SyReLU (BN (F_{i - 1} * W_{i - 1})),$ (9) where BN indicates batch normalization operation.

(iii) Conv: for the last layer F_D, we have $F_{D} = R (y_{i}; Θ) = F_{D - 1} * W_{D - 1} .$ (10) The squared error of the image pair (y_i, x_i) is $ℓ_{i} (Θ) = \frac{1}{2} ∥ R (y_{i}; Θ) - (y_{i} - x_{i}) ∥_{F}^{2} .$ (11) ℓ (Θ) is the mean of ℓ_i (Θ), i.e. $ℓ (Θ) = \frac{1}{N} \sum_{i = 0}^{N} ℓ_{i} (Θ)$ . So the mean squared error is computed as Equation (7).

3 Experimental results and analysis

In the experiments, we use 400 images [22] of size 180 × 180 for training DnCNN with ReLU, DnCNN with LReLU and DnCNN with SyReLU (DnCNNs). To train DnCNNs for Gaussian denoising with known noise level, we consider noise level σ = 25. Then, we set the patch size as 40 × 40, crop 128×1, and 600 patches to train models. For the test images, we use two different test datasets for evalution: one is a test dataset containing 12 widely used test images (Set122) and the other one contains 68 natural Images from Berkeley segmentation datset (BSD68 3). In addition, we set the network depth to 17 for DnCNNs and α = 6.00 of SyReLU. The loss function in Equation (3) is adopted to learn the residual mapping $R (y)$ for predicting the residual v. We use Adam algorithm [42] with a batch size of 128 and train 50 epochs for DnCNNs on the machine with 4 NVIDIA Tesla K40c graphic cards.

We conduct the comparative experiments of DnCNN with ReLU, DnCNN with LReLU and DnCNN with SyReLU. The relationship between the average PSNR (dB) and epochs on the different test datasets(e.g. Set12 and BSD68) in Figures 3-4. Figure 3 shows the relationship between average PSNR (dB) and epochs for three different activation functions (ReLU, LReLU and SyReLU) on Set12 and BSD68 test datasets with batch normalization (BN). Seeing from Figure 3, we find that both DnCNN with SyReLU (the green line) and DnCNN with LReLU (the red line) converge faster and more stable than DnCNN with ReLU (the black line). Impressively, DnCNN with SyReLU converges fastest. The variances of PSNR of DnCNN with SyReLU, DnCNN with LReLU and DnCNN with ReLU are Var (SyReLU) =0.19, Var (LReLU) =0.22 and Var (ReLU) =0.39 on Set12 dataset, respectively. Similarly, The variances of PSNRs of DnCNN with SyReLU, DnCNN with LReLU and DnCNN with ReLU are Var (SyReLU) =0.15, Var (LReLU) =0.23 and Var (ReLU) =1.26 on BSD68 dataset, respectively. These results illustrate that DnCNN with SyReLU outperforms the other models. Figure 4 shows that the relationship between the average PSNR (dB) and epochs for two different activation functions (ReLU and SyReLU) on Set12 and BSD68 test datasets without BN. In Figure 4, we verify that DnCNN with SyReLU (the green line) converges faster and more stable than DnCNN with ReLU (the black line). At this time, the variances of PSNR of DnCNN with SyReLU and DnCNN with ReLU are Var (SyReLU) =0.02 and Var (ReLU) =0.04 on Set12 dataset, respectively. Similarly, The variances of PSNR of DnCNN with SyReLU and DnCNN with ReLU are Var (SyReLU) =0.01 and Var (ReLU) =0.02 on BSD68 dataset, respectively. Based on the above results, We can see that DnCNN with SyReLU has achieved better effectiveness than DnCNN with ReLU and DnCNN with LReLU in terms of image denoising.

Fig.3

The Gaussian denoising results (i.e. the relationship between the average PSNR (dB) and epochs) of three specific models, which are in different combinations of activation functions (ReLU, LReLU and SyReLU) and batch normalization (BN) and are trained with noise level σ = 25, on (Left:) Set12 and (Right:) BSD68 test datasets.

Fig.4

The Gaussian denoising results (i.e. the relationship between the average PSNR (dB) and epochs) of three specific models, which are in different combinations of activation functions (ReLU and SyReLU) and batch normalization (BN) and are trained with noise level σ = 25, on (Left:) Set12 and (Right:) BSD68 test datasets.

The comparative performance of DnCNN with SyReLU, DnCNN with LReLU and DnCNN with ReLU for image denoising is shown in Table 1 and 2. Specifically, Table 1 presents the PSNR of every image and the average PSNR after DnCNNs train 50 epochs with BN on Set12 dataset. In Table 1, we can see that the average PSNR of DnCNN with SyReLU, DnCNN with LReLU and DnCNN with ReLU is 30.39 (dB), 30.37 (dB) and 30.34 (dB) with BN, respectively. For image denoising, DnCNN with SyReLU has 0.05 (dB) and 0.02 (dB) performance improvement than DnCNN with ReLU and DnCNN with LReLU respectively. From single image perspective, there are 10 and 7 images in DnCNN with SyReLU and DnCNN with LReLU respectively, whose PSNRs are more than that in DnCNN with ReLU. The image with the largest difference of the PSNR between in DnCNN with SyReLU and in DnCNN with ReLU is the C.man image. The PSNR of the C.man image in DnCNN with SyReLU is 0.15 (dB) more than that in DnCNN with ReLU. The image with the largest difference of the PSNR between in DnCNN with LReLU and in DnCNN with ReLU is the Monar image. The PSNR of the Monar image in DnCNN with LReLU is 0.17 (dB) more than that in DnCNN with ReLU. Without BN, the average PSNR of DnCNN with SyReLU and DnCNN with ReLU is 30.25 (dB) and 30.22 (dB) respectively. DnCNN with SyReLU has an increase by 0.03 (dB) than DnCNN with ReLU. From single image perspective, there are 9 images in DnCNN with SyReLU, whose PSNRs are more than that in DnCNN with ReLU.

Table 1

The PSNRs (dB) of different networks on Set12 dataset with noise level σ = 25.

Images	BN+ReLU	BN+LReLU	BN+SyReLU	ReLU	SyReLU
C.man	29.92	30.07	30.07	29.95	29.94
House	32.94	33.09	32.95	32.86	32.82
Peppers	30.76	30.76	30.81	30.58	30.55
Starfish	29.37	29.37	29.48	29.21	29.29
Monar.	30.24	30.41	30.37	30.21	30.31
Airpl.	29.11	28.99	29.05	29.03	29.04
Parrot	29.39	29.41	29.46	29.26	29.32
Lena	32.38	32.32	32.40	32.16	32.21
Barbara	29.91	29.85	29.87	29.56	29.64
Boat	30.10	30.14	30.11	29.99	30.07
Man	29.99	30.04	30.00	29.91	29.95
Couple	30.01	30.04	30.10	29.87	29.90
Average	30.34	30.37	30.39	30.22	30.25

Table 2 presents the average PSNR after DnCNNs train 50 epochs with BN on BSD68 dataset. In Table 2, we can see that the average PSNR of DnCNN with SyReLU, DnCNN with LReLU and DnCNN with ReLU is 29.18 (dB), 29.14 (dB) and 29.17 (dB) respectively with BN. DnCNN with SyReLU has 0.01 (dB) and 0.04 (dB) performance improvement than DnCNN with ReLU and DnCNN with LReLU respectively. The average PSNR of DnCNN with SyReLU and DnCNN with ReLU is 29.09 (dB) and 29.06 (dB) respectively without BN. DnCNN with SyReLU has 0.03 (dB) increase than DnCNN with ReLU. Based on the above analysis, we can see that DnCNN with SyReLU has superior performance than DnCNN with LReLU and DnCNN with ReLU in terms of image denoising, especially images with lower grey value.

Table 2

The PSNRs (dB) of different networks on BSD68 dataset with noise level σ = 25.

Methods	BN+ReLU	BN+LReLU	BN+SyReLU	ReLU	SyReLU
Average	29.17	29.14	29.18	29.06	29.09

4 Conclusion

In order to improve the performance of DnCNN for image denoising, we propose the Symmetry-Rectifier Linear Unit (SyReLU) with upper threshold according to activation response characteristics of biological neurons. We further explore the influence of ReLU, LReLU and SyReLU on image denoising. Experimental results demonstrate that DnCNN with SyReLU has superior performance than DnCNN with LReLU and DnCNN with ReLU for image denoising, especially images with lower grey value. From the view of activation function, SyReLU is more consistent with the biological neuron essence than LReLU and ReLU. Therefore, the SyReLU neuron and DnCNN with SyReLU that we propose are of great significance for image denoising. To improve the performance for image denoising, We set α = 6.00 in DnCNN with SyReLU. But the value of α is gradually modified according to the experimental results. How to quickly and accurately set α is the work to be studied in the future. In addition, we also want to perform denoising the color images in the next job.

Footnotes

References

Buades

, Coll

and Morel

J.-M.

, A non-local algorithm for image denoising, IEEE Conference on Computer Vision and Pattern Recognition 2 (2005), 60–65.

Dabov

, Foi

, Katkovnik

and Egiazarian

, Image denoising by sparse 3-D transform-domain collaborative filtering, IEEE Transactions on Image Processing 16(8) (2007), 2080–2095.

Buades

, Coll

and Morel

J.-M.

, Nonlocal image and movie denoising, International Journal of Computer Vision 76(2) (2008), 123–139.

Mairal

, Bach

, Ponce

, Sapiro

and Zisserman

, Non-local sparse models for image restoration, IEEE International Conference on Computer Vision, 2009, pp. 2272–2279.

Elad

and Aharon

, Image denoising via sparse and redundant representations over learned dictionaries, IEEE Transactions on Image Processing 15(12) (2006), 3736–3745.

Dong

, Zhang

, Shi

and Li

, Nonlocally centralized sparse representation for image restoration, IEEE Transactions on Image Processing 22(4) (2013), 1620–1630.

Rudin

L.I.

, Osher

and Fatemi

, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena 60(1) (1992), 259–268.

Osher

, Burger

, Goldfarb

, Xu

and Yin

, An iterative regularization method for total variation-based image restoration, Multiscale Modeling & Simulation 4(2) (2005), 460–489.

Weiss

and Freeman

W.T.

, What makes a good model of natural images? IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8.

10.

Lan

, Roth

, Huttenlocher

and Black

M.J.

, Efficient belief propagation with learned higher-order Markov random fields, European Conference on Computer Vision, 2006, pp. 269–282.

11.

S.Z.

, Markov random field modeling in image analysis, Springer Science & Business Media, 2009.

12.

Roth

and Black

M.J.

, Fields of experts, International Journal of Computer Vision 82(2) (2009), 205–229.

13.

, Zhang

, Zuo

and Feng

, Weighted nuclear norm minimization with application to image denoising, IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2862–2869.

14.

Schmidt

and Roth

, Shrinkage fields for effective image restoration, IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2774–2781.

15.

Chen

, Yu

and Pock

, On learning optimized reaction diffusion processes for effective image restoration, IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5261–5269.

16.

Chen

and Pock

, Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration, IEEE transactions on Pattern Analysis and Machine Intelligence, 2016.

17.

Schmidt

, Rother

, Nowozin

, Jancsary

and Roth

, Discriminative non-blind deblurring, IEEE Conference on Computer Vision and Pattern Recognition, 2013, pp. 604–611.

18.

Schmidt

, Jancsary

, Nowozin

, Roth

and Rother

, Cascades of regression tree fields for image restoration, IEEE Conference on Computer Vision and Pattern Recognition 38(4) (2016), 677–689.

19.

Jain

and Seung

, Natural image denoising with convolutional networks, Advances in Neural Information Processing Systems, 2009, pp. 769–776.

20.

Burger

H.C.

, Schuler

C.J.

and Harmeling

, Image denoising: Can plain neural networks compete with BM3D? IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 2392–2399.

21.

Xie

, Xu

and Chen

, Image denoising and inpainting with deep neural networks, Advances in Neural Information Processing Systems (2012), 341–349.

22.

Zhang

, Zuo

, Chen

, Meng

and Zhang

, Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising, IEEE Transactions on Image Processing 26(7) (2017), 3142–3155.

23.

Glorot

, Bordes

and Bengio

, Deep sparse rectifier neural networks, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.

24.

Maas

A.L.

, Hannun

A.Y.

and Ng

A.Y.

, Rectifier nonlinearities improve neural network acoustic models, Proc Icml 30(1) (2013), 3–8.

25.

, Zhang

, Ren

, et al., Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1026–1034.

26.

Clevert

D.A.

, Unterthiner

and Hochreiter

, Fast and accurate deep network learning by exponential linear units (elus), ICLR, 2015.

27.

Jin

, Xu

, Feng

, et al., Deep learning with s-shaped rectified linear activation units, Thirtieth AAAI Conference on Artificial Intelligence, 2016.

28.

, Fan

, Li

, et al., Improving deep neural network with multiple parametric exponential linear units, Neurocomputing 301 (2018), 11–24.

29.

Nicholls

J.G.

, Martin

A.R.

, Fuchs

P.A.

, Brown

D.A.

, Diamond

M.E.

and Weisblat

D.A.

, From neuron to brain, Sunderland, MA: Sinauer Associates, 2012, 5th edition.

30.

Baylor

D.A.

and Fettiplace

, Synaptic drive and impulse generation in ganglion cells of turtle retina, Journal of Physiology 288(1) (1979), 107–127.

31.

Adrian

E.D.

, The Physical Background of Perception, Clarendon, Oxford, England, 1946.

32.

Attwell

and Laughlin

S.B.

, An energy budget for signaling in the grey matter of the brain, Journal of Cerebral Blood Flow & Metabolism 21(10) (2001), 1133–1145.

33.

Lennie and Peter , The cost of cortical computation, Current Biology 13(6) (2003), 493–497.

34.

Gulcehre

, Moczulski

, Denil

and Bengio

, Noisy activation functions, International Conference on International Conference on Machine Learning, JMLR.org, 2016, pp. 3059–3068.

35.

Ioffe

and Szegedy

, Batch normalization: Accelerating deep network training by reducing internal covariate shift, International Conference on International Conference on Machine Learning, JMLR.org, 2015, pp. 448–456.

36.

de Jesĺšs

Rubio Josĺę

, Edwin

, et al., Neural network updating via argument Kalman filter for modeling of Takagi-Sugeno fuzzy models, Journal of Intelligent & Fuzzy Systems 35(2) (2018), 2585–2596.

37.

Soares

A.M.

, Fernandes

B.J.T.

and Bastos-Filho

C.J.A.

, Pyramidal neural networks with evolved variable receptive fields, Neural Computing and Applications 29(12) (2018), 1443–1453.

38.

de Jesĺšs

Rubio

and Josĺę , Error convergence analysis of the SUFIN and CSUFIN, Applied Soft Computing 72 (2018), 587–595.

39.

Liu

, Wang

, Yuan

, et al., Partial-nodes-based state estimation for complex networksWith unbounded distributed delays, IEEE Transactions on Neural Networks and Learning Systems 29(8) (2018), 3906–3912.

40.

Rubio

D.J.

and J., SOFMLS: Online self-organizing fuzzy modified least-squares network, IEEE Transactions on Fuzzy Systems 17(6) (2009), 1296–1309.

41.

Xiaotong

, Hua

, Bingzhen

, et al., Assessing information security risk for an evolving smart city based on fuzzy and grey FMEA, Journal of Intelligent & Fuzzy Systems 34(4) (2018), 2491–2501.

42.

Kingma

and Ba

, Adam: A method for stochastic optimization, International Conference for Learning Representations, 2015.