Abstract
Image denoising is a hot topic in many research fields, such as image processing and computer vision. With the development of deep learning, deep neural networks are widely used for image denoising and have achieved good effectiveness. Inspired by the characteristics of feed-forward denoising convolutional neural network (DnCNN) and biological neuron response, we propose a Symmetry-Rectifier Linear Unit (SyReLU) and further offer a corresponding SyReLU activation function, which has a better consistency with biological neuron characteristics in comparison with other activation functions, e.g. Rectifier Linear Unit (ReLU) and Leaky Rectifier Linear Unit(LReLU). Also, in order to denoise image, we use SyReLU activation function for residual learning of CNN (e.g. DnCNN). Specially, the experimental results indicate DnCNN with SyReLU can achieve better effectiveness than DnCNN with other activation functions (e.g.ReLU and LReLU) for image denosing on Set12 and BSD68 datasets. Briefly, the proposed method plays an important role in the development of activation function and is very useful in deep neural networks for image denosing.
Keywords
Introduction
Image denoising is not only a classical and active topic in low level vision but also an indispensable step in many practical applications. The goal of image denoising is to recover a clean image x from a noisy observation y which follows an image degradation model y = x + v. Generally speaking, v is additive white Gaussion noise (AWGN) with standard deviation σ. In view of a Bayesian, the image prior modeling will play a key role in image denoising when the probability is known. In the past few decades, many models have been used to construct image priors, including nonlocal self-similarity (NSS) model [1–4], sparse models [4–6], gradient models [7–9] and Markov random field (MRF) models [10–12]. In particular, the NSS models are popular in state-of-art methods such as BM3D [2], LSSC [4], NCSR [6] and WNNM [13]. Although most of the image prior-based methods have obtained high denoising quality, they usually have two major drawbacks. Firstly, these methods usually involve a complex optimization problem in the test phase, which makes the denoising process time-consuming [6, 13]. Therefore, most prior-based methods can hardly achieve high performance without sacrificing computational efficiency. Secondly, the models are usually nonconvex, including a number of manually chosen parameters, which provide some leeway for improving the denoising performance.
In order to overcome the limitations of prior-based approaches, several discriminant learning methods have been developed to learn image prior models in the context of truncated inference procedure [14–19]. The resulting models are able to get rid of the iterative optimization procedure in the test phase. Schmidt and Roth [14] proposed a cascade of shrinkage fields (CSF) method that unifies the random field-based model and the unrolled half-quadratic optimization algorithm into a single learning framework. Chenet al. [15, 16] proposed a trainable nonlinear reaction diffusion (TNRD) model which learns a modified fields of experts [12] image prior by unfolding a fixed number of gradient descend inference steps. Besides, some of the other related works can be found in [17, 18]. Although CSF and TNRD have shown the promising results toward bridging the gap between computational efficiency and denoising quality, their performance could be inherently restricted to the specified forms of prior. To be specific, the priors adopted in CSF and TNRD are based on the analysis model, which is limited in capturing the full characteristics of image structures. In addition, the parameters are learned by stage-wise greedy training plus joint fine-tuning among all stages, and many handcrafted parameters are involved. Another nonnegligible drawback is that they train a specific model for a certain noise level, and are limited in blind image denosing. Recently, there have been several attempts to handle the denoising problems by deep neural networks. Jain and Seung [19] proposed to use convolutional neural networks (CNNs) for image denoising and claimed that CNNs have similar or even better representation power than the MRF model. The multi-layer perceptron (MLP) [20] was successfully applied for image denoising. Stacked sparse denoising auto-encoders method [21] was adopted to handle Gaussian noise removal and achieved comparable results to K-SVD [5]. Zhanget al. [22] proposed a denoising convolutional neural network (DnCNN) instead of learning a discriminative model with an explicit image prior. DnCNN treats image denoising as the typical discriminative learning problems, i.e., separating the noise from a noisy image by feed-forward convolutional neural networks (CNN). DnCNN is an end-to-end trainable deep CNN for Gaussion denoising. In contrast to the existing deep neural netwok-based methods which directly estimate the latent clean image, the network adopts the residual learning strategy to remove the latent clean image from noise. Residual learning and batch normalization can not only speed up the training but also boost the denoising performance in the DnCNN. For Gaussian denoising with a certain noise level, DnCNN outperforms state-of-the-art methods in terms of both quatitative metrics and visual quality. It is promising to solve three general image denoising tasks, i.e., blind Gaussian denoising, Single image super-resolution(SISR) and JPEG deblocking, with only a single DnCNN model.
From the view of neural network, the activation function is crucial to the performance of deep neural networks. As we know, Rectified Linear Unit (ReLU) [23] is one of keys to the breakthrough of deep neural networks and used in DnCNN. To further improve the performance of deep neural networks, there are a number of new activation functions are presented basing on ReLU, such as Leaky ReLU (LReLU) [24], Parametric Linear Unit (PLeLU) [25], Exponential Linear Unit (ELU) [26], S-shaped Rectified Linear Unit (SReLU) [27] and Multiple Parametric Exponential Linear Unit (MPELU) [28]. Furthermore, neurobiology studies have shown that all nerve cells have a resting potential [29]. The electrical signals generated by nerve cells fall into two broad categories. The first category is local graded potentials. The action potential (also known as the neural activity) is the second major category of electrical signals. It is the action potential that plays an important role in the process of signals transmission through the retina. Neurobiology studies further show that the magnitude of action potentials is generated and becomes bigger and bigger as the stimulation intensity increases and is more than a threshold, while the magnitude of action potentials does not increase when the stimulation intensity increases and is more than another larger threshold. Obviously, ReLU, LReLU, PLeLU, ELU, SReLU and MPELU are not compatible with the biological characteristic of nerve cells. In addition, the stimulation intensity should have the third lager threshold. The magnitude of action potentials is zero when the stimulation intensity is more than the third larger threshold. Human Auditory System works in a range of frequencies. When the vibration intensity of an object exceeds 20, 000 Hz, we can’t hear anything. ReLU, LReLU, PLeLU, ELU, SReLU and MPELU do not meet this condition. Inspired by the characteristic and condition, we explore the upper bound response of activation functions. Further considering the possibility that the gradient may be zero, we present Symmetry-Rectifier Linear Unit (SyReLU). Then, we combine DnCNN with LReLU and SyReLU respectively for image denoising.
This paper is organized as follows. Section 2 presents the response characteristics of biological neurons and the definition of Symmetry-Rectifier Linear Unit (SyReLU) and constructs DnCNN with LReLU and SyReLU respectively for image denoising. Section 3 offers the comparative experimental results of DnCNN with LReLU and SyReLU and the corresponding analysis. Section 4 draws the conclusions.
Residual learning of deep CNN for image denoising
Response characteristics of biological neurons
Neurobiology studies have shown that all nerve cells have a resting potential, i.e., the intracellular fluid is negative relative to the extracellular fluid (less than 100mV) [29]. All electrical signals produced by nerve cells are superimposed above the resting potential. Some signals depolarize the cell membrane to decrease the resting potential, and others hyperpolarize the cell membrane to increase the resting potential. The electrical signals generated by nerve cells fall into two broad categories. The first category is local graded potentials. These potentials are generated by the external physical stimuli, such as, the light irradiated on the photoreceptors in the eye, the sound waves that deform hair cells in the ear, the ophthalmic stimulation of sensory nerve endings on the skin, and the activity at synaptic sites (junctions between nerve cells and their target cells). The action potential (also known as the neural activity) is the second major category of electrical signals. The action potential is generated when the local grading potential reaches a level sufficient to depolarize the cell membrane beyond a certain critical level (called the threshold). Once the action potential is generated, it is quickly propagated over long distances. Unlike local grading potentials, action potentials that occur in neurons are fixed in amplitude and duration, just as the points in the code.
It is the action potential that plays an important role in the above process. An important feature of the action potential is that it is a triggered, regenerative and full-or-no event. Signals from bipolar cells and amacrine cells act on ganglion cells. If the effect is sufficient to reach the threshold, the action potential will be generated. Once the action potential is generated, its amplitude and duration will not be determined by the amplitude and duration of the stimulus. Larger stimulating currents do not generate larger action potentials. The longer stimuli do not extend action potentials. At this time, the relationship between the occurrence of the action potential (represented as AP) and the magnitude of the local graded potential (represented as LGP) is as follows.
Since the magnitude of action potential is fixed, the information about the intensity of the stimulus is transmitted encoding the frequency of the discharge. A more effective visual stimulus will produce a larger local potential, which results in a higher discharge frequency from the ganglion cell [30]. This phenomenon was firstly described by Adrian [31], and he found that in a sensory nerve of the skin, the discharge frequency of the action potential is a measurement of the stimulus intensity. In addition, Adrian observed that the stronger the stimulation of the skin, the more sensory fibers were activated. Therefore, the action potential of a nerve fiber is not increased with the increase of the stimulation intensity that is more than the threshold. The sciatic nerve is composed of many nerve fibers. When the sciatic nerve trunk is stimulated by the stimulation whose intensity is more than a threshold, the number of nerve fibers that generate action potentials changes. As the intensity of stimulation increases, the number of nerve fibers that generate action potentials is increasing. Therefore, the magnitude of action potentials is greater and greater. When all nerve fibers generate action potentials, the magnitude of action potentials does not increase. Therefore, within a certain range, the amplitude of action potentials of the sciatic nerve trunk increases with the stimulation intensity. ReLU, LReLU, PLeLU, ELU, SReLU and MPELU match the property that the magnitude of the action potential increases with stimulus intensity increasing, but they don’t match the property that the magnitude of the action potential does not increase with enough stimulus intensity. They are not fully consistent with the biological nature. Furthermore, the frequency bandwidth of neurons will not be infinite. In other words, the response of the organism to the external stimuli is not infinitely increased. For example, the range of human hearing is 20-20, 000Hz, and the human vision can’t see anything when the light intensity is particularly bright. According to that, the organism’s response to the external stimuli should have an upper bound threshold. When the stimulation intensity exceeds the upper bound threshold, the organism’s response will no longer respond. But ReLU, LReLU, PLeLU, ELU, SReLU and MPELU can go to infinity with stimulus intensity increasing. Therefore, they are not fully consistent with the biological fact.
Neurobiology studies have shown that the action potential will be excited when the external physical stimuli reaches to the threshold value of the firing action potential. The generation of a single action potential is a full or no state. Since a piece of biological tissue has a nerve cell tissue, its response to the external physical stimulus is that the response begins when the stimulus intensity increases to the threshold. Then, as the stimulus intensity continues to increase, the response becomes more and more intense. Finally, with the increase of stimulus intensity, the response will no longer change. Many biological facts can verify this theory, such as, the relationship between the frog gastrocnemius stimulation response and the intensity of the stimulus frequency, and the relationship between the stimulus intensity, the stimulus frequency and the muscle contraction response. There is a question that what will happen when the response is no longer enhanced and the intensity of the stimulus continues to increase? There are few such biological experiments, but there are still some clues to the answer. For example, the audition of animals has an upper bound threshold, the muscle has a sense of numbness in a short time after hit by the external high-intensity force and so on. Most previous studies on activation functions never considered the upper bound threshold of the stimulus intensity, e.g. ReLU activation function. It is defined as Equation (3).

SyReLU, i.e. a symmetrical form of ReLU.
Mathematically, SyReLU is a symmetric result of ReLU on the axis of symmetry x = α. The paper [23] pointed out that ReLU is sparser than sigmoid or tanh activation function. That is, when x < 0, the value of ReLU is 0, and the effective part of ReLU is all concentrated in the part of x > 0. Unlike ReLU, sigmoid and tanh do not have this property. Meanwhile, the paper [23] also pointed out the rationality and benefits of the sparseness of ReLU. In neuroscience, in addition to the new activation frequency function, neuroscientists have discovered the sparse activation of neurons. In 2001, Attwell et al. [32], based on observational learning of brain energy consumption, speculated that neuronal coding methods were sparse and distributed. In 2003, Lennie et al. [33] estimated that only 1 ∼ 4 percent of the neurons in the brain are activated at the same time, which additionally showed the sparsity of neuronal work. That is to say, neurons only selectively respond to a small portion of the input signal at the same time, and a large number of signals are deliberately shielded. In this way, the learning accuracy can be improved, and the sparse features can be extracted better and faster. In other words, almost half of the neurons of the traditional sigmoid function are activated at the same time, which is not consistent with the research of neuroscience and also brings great problems to training of deep networks. ReLU can satisfy the requirements of the network and neuroscience about sparsity. When x > 2α, the value of SyReLU is also 0, which is sparser than ReLU and greatly outperforms ReLU in terms of sparsity.
According to the definition of SyReLU function (4), its derivative is computed as Equation (5).
Here, we take DnCNN as an example of Deep CNN. To improve the performance of DnCNN1 for image denoising, we propose residual learning of DnCNN with SyReLU, i.e., replacing ReLU with SyReLU. The difference between them is activation function. In addition, the architecture of DnCNN with SyReLU is shown in Figure 2. Given DnCNN with SyReLU with depth D, there are three types of layers, shown in Figure 2 with three different colors. (i) Conv+SyReLU: for the first layer, 64 filters of size 3 × 3 × c are used to generate 64 feature maps, and a SyReLU is then utilized for nonlinearity. Here c repsesents the number of image channels, i.e., c = 1 for gray image. (ii) Conv+BN+SyReLU: for layers 2 ∼ (D - 1), 64 filters of size 3 × 3 ×64 are used, and batch normalization (BN) [35] is added between convolution and SyReLU. (iii) Conv: for the last layer, c filters of size 3 × 3 ×64 are used to reconstruct the output.

The Architecture of Residual Learning of DnCNN with SyReLU for Image Denoising.
The input of DnCNN is noisy observation y = x + v. DnCNN doesn’t learn a mapping function
Then, (i) Conv+SyReLU: for the first layer F1, we have
(ii) Conv+BN+SyReLU: for layers 2 ∼ (D - 1), we have
(iii) Conv: for the last layer F
D
, we have
In the experiments, we use 400 images [22] of size 180 × 180 for training DnCNN with ReLU, DnCNN with LReLU and DnCNN with SyReLU (DnCNNs). To train DnCNNs for Gaussian denoising with known noise level, we consider noise level σ = 25. Then, we set the patch size as 40 × 40, crop 128×1, and 600 patches to train models. For the test images, we use two different test datasets for evalution: one is a test dataset containing 12 widely used test images (Set122) and the other one contains 68 natural Images from Berkeley segmentation datset (BSD68 3). In addition, we set the network depth to 17 for DnCNNs and α = 6.00 of SyReLU. The loss function in Equation (3) is adopted to learn the residual mapping
We conduct the comparative experiments of DnCNN with ReLU, DnCNN with LReLU and DnCNN with SyReLU. The relationship between the average PSNR (dB) and epochs on the different test datasets(e.g. Set12 and BSD68) in Figures 3-4. Figure 3 shows the relationship between average PSNR (dB) and epochs for three different activation functions (ReLU, LReLU and SyReLU) on Set12 and BSD68 test datasets with batch normalization (BN). Seeing from Figure 3, we find that both DnCNN with SyReLU (the green line) and DnCNN with LReLU (the red line) converge faster and more stable than DnCNN with ReLU (the black line). Impressively, DnCNN with SyReLU converges fastest. The variances of PSNR of DnCNN with SyReLU, DnCNN with LReLU and DnCNN with ReLU are Var (SyReLU) =0.19, Var (LReLU) =0.22 and Var (ReLU) =0.39 on Set12 dataset, respectively. Similarly, The variances of PSNRs of DnCNN with SyReLU, DnCNN with LReLU and DnCNN with ReLU are Var (SyReLU) =0.15, Var (LReLU) =0.23 and Var (ReLU) =1.26 on BSD68 dataset, respectively. These results illustrate that DnCNN with SyReLU outperforms the other models. Figure 4 shows that the relationship between the average PSNR (dB) and epochs for two different activation functions (ReLU and SyReLU) on Set12 and BSD68 test datasets without BN. In Figure 4, we verify that DnCNN with SyReLU (the green line) converges faster and more stable than DnCNN with ReLU (the black line). At this time, the variances of PSNR of DnCNN with SyReLU and DnCNN with ReLU are Var (SyReLU) =0.02 and Var (ReLU) =0.04 on Set12 dataset, respectively. Similarly, The variances of PSNR of DnCNN with SyReLU and DnCNN with ReLU are Var (SyReLU) =0.01 and Var (ReLU) =0.02 on BSD68 dataset, respectively. Based on the above results, We can see that DnCNN with SyReLU has achieved better effectiveness than DnCNN with ReLU and DnCNN with LReLU in terms of image denoising.

The Gaussian denoising results (i.e. the relationship between the average PSNR (dB) and epochs) of three specific models, which are in different combinations of activation functions (ReLU, LReLU and SyReLU) and batch normalization (BN) and are trained with noise level σ = 25, on

The Gaussian denoising results (i.e. the relationship between the average PSNR (dB) and epochs) of three specific models, which are in different combinations of activation functions (ReLU and SyReLU) and batch normalization (BN) and are trained with noise level σ = 25, on
The comparative performance of DnCNN with SyReLU, DnCNN with LReLU and DnCNN with ReLU for image denoising is shown in Table 1 and 2. Specifically, Table 1 presents the PSNR of every image and the average PSNR after DnCNNs train 50 epochs with BN on Set12 dataset. In Table 1, we can see that the average PSNR of DnCNN with SyReLU, DnCNN with LReLU and DnCNN with ReLU is 30.39 (dB), 30.37 (dB) and 30.34 (dB) with BN, respectively. For image denoising, DnCNN with SyReLU has 0.05 (dB) and 0.02 (dB) performance improvement than DnCNN with ReLU and DnCNN with LReLU respectively. From single image perspective, there are 10 and 7 images in DnCNN with SyReLU and DnCNN with LReLU respectively, whose PSNRs are more than that in DnCNN with ReLU. The image with the largest difference of the PSNR between in DnCNN with SyReLU and in DnCNN with ReLU is the C.man image. The PSNR of the C.man image in DnCNN with SyReLU is 0.15 (dB) more than that in DnCNN with ReLU. The image with the largest difference of the PSNR between in DnCNN with LReLU and in DnCNN with ReLU is the Monar image. The PSNR of the Monar image in DnCNN with LReLU is 0.17 (dB) more than that in DnCNN with ReLU. Without BN, the average PSNR of DnCNN with SyReLU and DnCNN with ReLU is 30.25 (dB) and 30.22 (dB) respectively. DnCNN with SyReLU has an increase by 0.03 (dB) than DnCNN with ReLU. From single image perspective, there are 9 images in DnCNN with SyReLU, whose PSNRs are more than that in DnCNN with ReLU.
The PSNRs (dB) of different networks on Set12 dataset with noise level σ = 25.
Table 2 presents the average PSNR after DnCNNs train 50 epochs with BN on BSD68 dataset. In Table 2, we can see that the average PSNR of DnCNN with SyReLU, DnCNN with LReLU and DnCNN with ReLU is 29.18 (dB), 29.14 (dB) and 29.17 (dB) respectively with BN. DnCNN with SyReLU has 0.01 (dB) and 0.04 (dB) performance improvement than DnCNN with ReLU and DnCNN with LReLU respectively. The average PSNR of DnCNN with SyReLU and DnCNN with ReLU is 29.09 (dB) and 29.06 (dB) respectively without BN. DnCNN with SyReLU has 0.03 (dB) increase than DnCNN with ReLU. Based on the above analysis, we can see that DnCNN with SyReLU has superior performance than DnCNN with LReLU and DnCNN with ReLU in terms of image denoising, especially images with lower grey value.
The PSNRs (dB) of different networks on BSD68 dataset with noise level σ = 25.
In order to improve the performance of DnCNN for image denoising, we propose the Symmetry-Rectifier Linear Unit (SyReLU) with upper threshold according to activation response characteristics of biological neurons. We further explore the influence of ReLU, LReLU and SyReLU on image denoising. Experimental results demonstrate that DnCNN with SyReLU has superior performance than DnCNN with LReLU and DnCNN with ReLU for image denoising, especially images with lower grey value. From the view of activation function, SyReLU is more consistent with the biological neuron essence than LReLU and ReLU. Therefore, the SyReLU neuron and DnCNN with SyReLU that we propose are of great significance for image denoising. To improve the performance for image denoising, We set α = 6.00 in DnCNN with SyReLU. But the value of α is gradually modified according to the experimental results. How to quickly and accurately set α is the work to be studied in the future. In addition, we also want to perform denoising the color images in the next job.
