GAN-based classifier protection against adversarial attacks

Abstract

In recent years, deep neural networks have made significant progress in image classification, object detection and face recognition. However, they still have the problem of misclassification when facing adversarial examples. In order to address security issue and improve the robustness of the neural network, we propose a novel defense network based on generative adversarial network (GAN). The distribution of clean - and adversarial examples are matched to solve the mentioned problem. This guides the network to remove invisible noise accurately, and restore the adversarial example to a clean example to achieve the effect of defense. In addition, in order to maintain the classification accuracy of clean examples and improve the fidelity of neural network, we input clean examples into proposed network for denoising. Our method can effectively remove the noise of the adversarial examples, so that the denoised adversarial examples can be correctly classified. In this paper, extensive experiments are conducted on five benchmark datasets, namely MNIST, Fashion-MNIST, CIFAR10, CIFAR100 and ImageNet. Moreover, six mainstream attack methods are adopted to test the robustness of our defense method including FGSM, PGD, MIM, JSMA, CW and Deep-Fool. Results show that our method has strong defensive capabilities against the tested attack methods, which confirms the effectiveness of the proposed method.

Keywords

Deep neural network generative adversarial network adversarial example defense

1 Introduction

With the rapid development of deep learning in the field of computer vision [5 , 27], the efforts on image recognition have achieved great success. However, recent studies have revealed that deep neural networks are very vulnerable to adversarial examples. The phenomenon was firstly discovered in the work of Szegedy [24]. If a small disturbance is intentionally added to a clean sample as a input into neural network, the classifier will output a wrong prediction. For example, a disturbed picture of a cat can be misclassified as a dog by classifier, even if these disturbances are difficult to detect by human eyes. This feature exposes the vulnerability of deep neural networks. People use this leak [3, 4] to identify and understand the potential weakness of deep neural networks, thus improve the defense ability and robustness of the network.

Adversarial examples pose a potential security threat to real deep learning applications. Studies have shown that adversarial examples associated with traffic signs may cause autopilot vehicles recognition systems to make wrong decisions and behaviors, which may lead to dangerous maneuver [7]. For example, a stopping traffic sign could be classified as a sign of speed-limit 80, which is obviously dangerous for autonomous driving. So it is very important and urgent to develop defense methods against the adversarial examples. Since adversarial examples are constructed by adding noises to original images, a natural idea is to denoise adversarial examples before sending them to the target model.

The non-linear characteristics of neural networks were once considered to be the reason of adversarial examples [1]. In 2014, Goodfellow argued that the generation of adversarial examples was caused by the linear characteristics of deep neural network rather than the nonlinear characteristics, which was confirmed by Fast Gradient Sign Method (FGSM) [8]. Since the high dimensional linear properties of deep networks are difficult to avoid in practical applications, so it is more difficult to defend against adversarial examples. Small perturbations contained in adversarial examples(for a specific classifier) can lead the classifier to make a wrong classification. Currently, how to defend against adversarial examples is still a challenging problem. If a algorithm can eliminate disturbance on adversarial examples, at the same time, adversarial examples satisfy a condition that it is visually consistent with the clean images, the algorithm will be applied to defensive adversarial examples. In the method of Adversarial training [24] and defensive distillation [21], adversarial examples were fed back to the training process to train the model. Adversarial training adds adversarial examples to a training set for joint training, but this method requires large amount of calculation. It has been argued that no matter how many adversarial examples are added, there are always have new adversarial examples that can deceive the network. Defensive distillation uses soft labels to softer the final model output and improve the robustness of new model. However, this approach does not essentially solve the problem of poor robustness of the model against the adversarial examples. Therefore, reducing the impact of adversarial examples on neural networks remains a huge challenge.

In order to achieve the purpose of defense and improve the classification accuracy of adversarial examples, we propose a new method to defend against it. In this paper, due to its strong ability, we adopted Generative adversarial network (GAN) to simulate image distribution and provide some help for denoising. All in all, our propsosed method has three objectives: 1) eliminate the disturbance effectively, 2) make the classifier classify correctly, 3) enhance the robustness of the classifier.

As shown in Fig. 1, 66.17%, 7.88% and 46.27% are the score of coonfidence of the label of images being predicted as an image of digit “5”. 78.1%, 27.5% and 79.8% are the score of confidence of the label of images being predicted as a car. It can be seen that proposed method can be applied to both gray-scale images and RGB images.

Fig. 1

Denoise of two different images, from left to right, the clean image, the adversarial image, and the reconstructed image. (Best viewed in color).

Our proposed method mainly focuses on retaining the features of clean samples, while denoising the adversarial samples, reducing training difficulty and faster fitting, so as to achieve good results. Existing defense methods ignore the characteristics of clean samples, and there is a problem that the classification success rate decreases when denoising the clean samples.

Therefore, our contributions are as follows: Firstly, a denoising model consisting of a generator and a discriminator is proposed, which simulates the approximate distribution of clean and adversarial examples to eliminate the difference.

Secondly, an isomorphic loss of images and labels between denoised adversarial examples and clean examples is proposed to generate better denoised images.

Thirdly, the effectiveness of the defense on five datasets, MNIST, fashion-MNIST, cifar10, cifar100, and ImageNet is demonstrated. Meanwhile, our defense model is tested against by six attack methods. Results show the proposed defense method has strong ability to defense against the methods.

The rest of the paper is organized as follows: In Section 2, we introduce the background of the problem and some common used attack methods and defense methods. In Sections 3, we discuss the details of the proposed method, including the description of the principle and the setting of the objective function. Sections 4 introduce the experimental settings and results, which include the experimental details and the algorithm settings of this paper. Finally, we summarize the views of the full text and show the prospects for future work in Sections 5.

2 Related work

Problem.

Szegedy et al. [24] first discovered the perceived inconsistency between human and machine learning models. If an image is disturbed, the deep neural network may treat it as a completely different image and give totally different predictions, but humans may not even see this difference [12]. The data that can cause wrong predictions with little changes are called adversarial examples. These blind spots make neural network models vulnerable to malicious attacks.

Therefore, we can use an approximate function to model the task of generating adversarial examples, as Equation (1) $\underset{p_{x}}{argmin} ∥ p_{x} ∥, s . t . {\begin{matrix} y_{t} = f (x + p_{x}) \\ y = f (x) \\ y_{t} \neq y \\ x_{adv} = x + p_{x} \end{matrix}$ (1) Where f (·) denotes the classifier and x denotes the ground truth. x_adv denotes the adversarial example; p_x denotes the required perturbation; y denotes the label of ground truth; y_t denotes the label of output, y ≠ y_t, and each dimension of input images is scaled to [0,1].

Attack.

Many kinds of methods for generating adversarial examples have been proposed. FGSM (Fast Gradient Sign Method) is a primary method for generating adversarial examples. Goodfellow et al. [8] proposed this method to find a adversarial perturbation for a given input. Equation (2) is the formula for calculating noise. $p_{x} = ɛ \cdot sign (\nabla_{x} J (w, x, y_{t}))$ (2) Where J (· , · , ·) denotes the loss function used to train a deep neural network. ∇_xJ (· , · , ·) denotes the gradient of the loss function w.r.t the input x. w is the trained network parameter. ɛ denotes a constant that controls the noise intensity. sign denotes a sign function.

FGSM calculates the sign of the gradient of the loss function and multiplies it by a constant, and finally adds the obtained result to the clean image as noise to obtain adversarial examples. The advantages of this method is that it is fast, as well as the generated adversarial examples have strong transferability, but the disadvantage of this method is that the added noise can be easily eliminated, such as using median filtering. Moreover, Moosavi-Dezfooli [18] proposed the deep-fool method. This method calculates the minimum norm perturbation for a given image in an iterative manner and finds the projection of an input x on the decision surface, by moving x a small amount along the found projection direction. The advantage of this approach is the high attack rate for white-box attacks. However, the disadvantage is that it’s hard to use them for black-box attacks. In addition, Madry [16] proposed the Project Gradient Descent (PGD) attack.

It is an iterative attack method, which can be regarded as a replica of K-FGSM (K represents the number of iterations). The idea of this method is to iterate multiple times. One step at a time and the disturbance will be clipped to specified range for each iteration. In general, PGD attacks are better than FGSM.

A momentum-based iterative attack method MIM (Momentum Iterative Method) is proposed by Dong [6], which can stably update the direction and avoid local maximum in the iterative process, thus generating a transferable adversarial example. It is effective for not only the "white-box" attack but also the "black-box" attack. Equally important but different in principle, JSMA (Jacobian Saliency Map) method proposed by Papernot [20] uses the Jacobian matrix to find the difference among the saliency map from input to output. It can change the prediction category by modifying only a small number of input features. C&W attack was proposed by Carlini [3]. The attack method has two conditions to achieve successful attack. One is that the adversarial and clean examples should be visually consistent. Another is to use as few disturbances as possible to achieve a higher classification error rate. A patch-based attack [3] was proposed by Liu in 2019. The purpose of this method is to make the area where PATCH exists the only effective RoI. The proposed method uses this area for classification and ignores the characteristics of other areas.

In addition, empirical data show that adversarial examples can be transferred among models trained on the same task [19], and Goodfellow illustrates this phenomenon by introducing the mobility of adversarial examples [8]. Further research indicates that adversarial examples appear in continuous regions or adversarial subspaces.

Defense.

There are several main defense methods based on adversarial learning. One representative defense method is MagNet [17]. This method does not modify the protected classifier and knows nothing about the process of generating adversarial examples. MagNet contains one or more separate detector networks and a fine-tuning network. The detector part of this method also needs to distinguish between adversarial samples and clean samples, but the method used is different from other detection methods. The idea of MagNet is distinguished by approximating the manifold of normal samples. The article discusses the difficulty of defending against white-box attacks, and proposes a method of defending against gray-box 1 attacks(inspired by randomness in cryptography). Experiments prove that the method proposed in this paper is effective for both gray-box and black-box attacks. This method defends adversarial samples from the perspective of detection, while our method defends from the perspective of modifying the input image. The next thing to introduce is Defense-GAN [17]. This method uses the expression capabilities of the generative model of WGAN [2] to defend against adversarial samples. Defense-GAN is trained to simulate the distribution of undisturbed images. The purpose is to find a distribution close to a clean sample and output it to the classifier to achieve the purpose of defense. It also does not modify the structure or training process of the classifier. Although defense-gan defends from the perspective of modifying the input image, this method is to simulate the distribution of the adversarial sample after removing the disturbance, that is, there is no reference for the distribution of the clean image. However, our method is to input the clean image into the generator to obtain the prior knowledge of the distribution of clean samples, thereby reducing the training difficulty. In addition, it can also play a certain role in denoising the adversarial samples. The last defense method I want to introduce is APE-GAN [9], which is the origin of our motivation. The article defends from the perspective of changing the input image, that is, processing the image before classification. However, this article does not consider the processing of clean sample distributions, and it cannot handle more complex images. Therefore, we improved based on this to enable it to process more complex images. Moreover, our proposed method takes the distribution of clean samples into account, thereby reducing the difficulty and time of training. Our method aims to improve the generation quality of the generator, while reducing the training difficulty, making the neural network fit faster and achieving good denoising effects. At the same time, we reduce the impact of the adversarial examples on the classifier through multiple pairing losses.

3 Our approach

3.1 Preliminary

In our method, the generative adversarial network is utilized to reduce the value of perturbation. The generator network is used for image denoising, and the discriminator network is used to discriminate adversarial examples and real examples. In the training progress, the discriminator distinguishes the real and fake examples as correctly as possible. The generator minimizes the gap between the clean and adversarial example at the image level, aiming to confuse the discriminator’s judgment so as to generate a better denoised adversarial example. The network architecture is shown in Fig. 2. Adversarial example x_adv is input into the generator, while denoised adversarial example x_fake is obtained as the output of the generator. Clean image x_real is the input to the generator for denoising, finally we got denoised clean image x_dec. In this paper, there are three pair of loss functions: the loss between x_fake and x_real at the pixel level is the repair loss l_xf; the loss between x_real and x_dec at the pixel level is the recovery loss l_xd; the loss between x_fake and x_dec at the pixel level is the isomorphic loss l_df. The generative adversarial network is trained by the above three loss functions to obtain a good denoising result.

Fig. 2

The method used by the defense network is to input clean samples and adversarial samples into the generator, obtain clean samples and adversarial samples after denoising, and use a discriminator to determine true and false to achieve the defense purpose.

All in all, our method is attributed to the powerful generation ability of the generator, which enables the denoising of the adversarial samples while retaining the features of the clean samples in the adversarial samples.

3.2 Learning objective

Combined with the network structure definition and main ideas of this paper, there are three loss functions for training different indicators: repair loss, recovery loss and isomorphic loss. The meaning is explained separately below.

Repair loss. In order to ensure that the adversarial samples after denoising are visually similar to clean samples, we minimized the loss between the two and named it repair loss l_xf, which means the repair of the adversarial samples. We use this loss to reflect the repair effect of adversarial samples. The repair loss contains two parts: The first part is the Mean Square Error loss (MSE) between denoised adversarial example and clean example. The second part is the Binary Cross Entropy loss (BCE) between the predicted label and the ground truth label, as shown in Equation (3): $l_{xf} = l_{x f_{-} mse} + l_{x f_{-} bce}$ (3)

Among them, the role of l_{xf
_-
mse} is to reduce the error of x_real and x_fake at the pixel level, prompting the generator to generate denoised images with higher quality. As shown in Equation (4): $l_{x f_{-} mse} = \frac{1}{WH} \sum_{w, h = 1}^{W, H} {({(x_{real})}_{w, h} - {(x_{fake})}_{w, h})}^{2}$ (4)

The label-level error used in the two-category is shown in Equation (5), which is the cross-entropy loss. The difference between ground truth label and predicted label of the denoised adversarial example is calculated. The purpose is to reduce the loss at the label level between x_real and x_fake so that the denoised examples can be correctly classified and guide the generator. $\begin{matrix} l_{x f_{-} bce} = & \sum_{n = 1}^{N} [log D_{θ_{D}} (x_{real}) \\ + log (1 - D_{θ_{D}} (x_{fake}))] \end{matrix}$ (5)

Recovery loss. In order to facilitate the effect of denoising, while retaining the characteristics of clean samples and limiting the quality of the generator, we hope that the distribution of clean samples after reconstruction by the encoder and decoder is the same as the distribution without reconstruction. Therefore, we named it recovery loss l_xd, which means to keep the features of clean samples for backup and improve the generation performance of the generator. The loss is also composed of two parts and the purpose of this formula is to reduce the gap of distribution between the clean image and reconstructed clean image, and achieve better denoising effect by using generator. As shown in Eqs.(6), (7) and (8): $l_{xd} = l_{{xd}_{-} mse} + l_{{xd}_{-} bce}$ (6) $l_{{xd}_{-} mse} = \frac{1}{WH} \sum_{w, h = 1}^{W, H} {({(x_{real})}_{w, h} - {(x_{dec})}_{w, h})}^{2}$ (7) $l_{{xd}_{-} bce} = \sum_{n = 1}^{N} [log (1 - D_{θ_{D}} (x_{dec}))]$ (8)

Isomorphic loss. Since image x_dec and image x_fake are both designed to minimize the loss from clean samples, we propose a loss to constrain x_dec and x_fake. Moreover, both two images are modified from a clean image and have the same structure, so we call it isomorphic loss l_df. This loss is used to show the similarity of x_dec and x_fake. By reducing the loss, the generator’s generation quality is improved and good denoising effects are achieved. Specifically, by reducing the difference at the pixel level between x_dec and x_fake, the denoised clean image is closer to the denoised adversarial example, and corresponding predicted label could be as close as possible. As shown in Eqs.(9), (10) and (11): $l_{df} = l_{{df}_{-} mse} + l_{{df}_{-} bce}$ (9) $l_{{df}_{-} mse} = \frac{1}{WH} \sum_{w, h = 1}^{W, H} {({(x_{dec})}_{w, h} - {(x_{fake})}_{w, h})}^{2}$ (10) $\begin{matrix} l_{d f_{-} bce} = & \sum_{n = 1}^{N} [log D_{θ_{D}} (x_{dec}) \\ + log (1 - D_{θ_{D}} (x_{fake}))] \end{matrix}$ (11)

In summary, the total loss function L_all is composed of the above three loss functions, and the purpose is to obtain high quality denoised images. As shown in Equation (3): $L_{all} = α_{1} l_{xf} + α_{2} l_{xd} + α_{3} l_{df}$ (12) The selection of weight hyperparameters has been verified by multiple combinations, and this group was finally selected, that is, α₁, α₂, α₃ are 0.7, 0.2, and 0.1 respectively.

4 Experimental results and analysis

In order to verify the feasibility of the proposed model, qualitative and quantitative experiments are conducted on various defense methods and datasets. Specifically, six attack methods described in Section 2 are used to generate adversarial examples so as to verify the effectiveness of the proposed defense method. Among them, for each dataset, three different Convolutional Neural Networks (CNNs) classification networks is used to classify the adversarial examples as correctly as possible. At the same time, we used universal indicators (classification accuracy) and image quality indicators (peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [25]) to measure the effectiveness of our defense.

4.1 Datasets

We used five common datasets for experiments, including grayscale and color datasets. The first one we are going to introduce is the simplest handwritten digital dataset MNIST [13] in the grayscale dataset, which contains ten classes (numbers 0-9). On this basis, we want to introduce a clothing dataset Fashion-MNIST [26] with relatively complex texture features in the grayscale dataset. The clothing dataset also contains ten classes (mainly contains clothing and bags). Grayscale images have a size of 28×28. The advantage of the grayscale image is that the image is relatively small and the operation when processing the image is simpler. Next we introduce the color data set, which is divided into the smaller size CIFAR [10] series dataset (CIFAR10 and CIFAR100) and the larger size ImageNet dataset [11]. The size of the CIFAR series dataset is 32×32. The difference between CIFAR10 and CIFAR100 is the number of categories: CIFAR10 contains 10 classes while CIFAR100 contains 100 classes. Compared with the previous data set, the largest ImageNet is a dataset collected by Li Feifei that can be used for classification tasks and object detection tasks. The size of the imagenet dataset we used is 128×128. The images are sampled in nature. Among these datasets, mnist is compiled by the National Institute of Standards and Technology, so it can be considered as a simulation experiment, and imagenet is a dataset created for all the objects in the world, so our experiment includes simulation experiments and real experiment.

4.2 Experimental details

Training methods on the generate adversarial network: the entire network is trained 70,00 epochs. Learning rate is initialized with 0.0002 and the Adam optimizer is used to update parameters and optimize the network. Batchsize is set to 32. The device parameters are CPU: Intel i7-8700, GPU: RTX2080Ti-11G, memory: DDR4-3000-32G. The code runs under the PyTorch deep learning framework, and the entire network training completion time is about 5 days. After calculation, the number of our floating point operands (FLOPS) is about 1.5B, mainly including the convolution and deconvolution operations and the operation of the full connection operation in the network structure.

4.3 Algorithm structure

Algorithm 1 Training process algorithm (taking FGSM as an example)
Initialization:
iterations epochs=200; learning rate = 0.0002; loss function parameters: α₁ = 0.7, α₂ = 0.2, α₃ = 0.1; batchsize=64; extract training data:train_loader;
Pre-training:
classification network:Lenet,Alexnet;
Generate data:{x_real, x_adv}
1: whileit < epochsdo
2: number = 2;
3: fori in train _ loader () do
4: train G:
5: G (x_real) → x_dec;
6: G (x_adv) → x_fake;
7: Calculate l_{xf_mse}, l_{df_mse}, l_{xd_mse};
8: ifi % number = =0 then
9: train D:
10: D (x_dec, x_fake, x_real) → y_dec, y_fake, y_real
11: Calculate l_{xf_bce}, l_{df_bce}, l_{xd_bce};
12: end if
13: end for
14: let it ← it + 1;
15: end while

4.4 Qualitative experiments

Due to the display complexity caused by multiple attack methods and multiple datasets, our qualitative experiments only show the defensive effect of the adversarial samples generated by the FGSM method [8], in which the displayed images are correctly classified. In this paper, according to the principle of FGSM, we have different settings for the perturbation multiplier of the grayscale image and the RGB image, where for the grayscale image we set the perturbation multiplier to 0.3, and for the RGB image we set it to 0.15, this setting can achieve the best attack effect, that is, it can achieve the effect of misclassification of the neural network and invisible disturbance to the human eye at the same time. As shown in Fig. 3, it can be seen that the denoised result of the proposed method is visually plausible. Fig. 4 is the experiment deployed on the Fashion-MNIST dataset. It is also well reflected in the features of the dataset. For example, in the third line, it can be seen that the effect is obvious, and the features of clothes are also revealed. It also indicates that the denoising effect is easier to reflect on MNIST, Fashion-MNIST. The proposed model is also verified by experiments on RGB image datasets. As shown in Figs. 5 and 6, the experiment is performed on CIFAR10 and CIFAR100 seperately. Compared with CIFAR10, CIFAR100 can better reflect the detailed features of the image, and the semantic features of the image are more clear. In order to make the experiment sufficiently, this paper has done experiments on the ImageNet dataset as well, as shown in Fig. 7. It can be seen from the experimental results that when we process the RGB image, although the image texture is more complex than the grayscale image, it can still achieve a good denoising and defense effect.

Fig. 3

The experimental results of the FGSM method on the MNIST, the first line “Normal” is a normal clean sample, the second line “Adv” is the generated adversarial examples, and the third line “Ours” is the denoised results of our method.

Fig. 4

The experimental results of the FGSM method on the Fashion-MNIST, the first line “Normal” is a normal clean sample, the second line “Adv” is the generated adversarial examples, and the third line “Ours” is the denoised results of our method.

Fig. 5

The experimental results of the FGSM method on CIFAR10, the first line “Normal” is a normal clean sample, the second line “Adv” is the generated adversarial examples, and the third line “Ours” is the denoised results of our method.

Fig. 6

The experimental results of the FGSM method on CIFAR100, the first line “Normal” is a normal clean sample, the second line “Adv” is the generated adversarial examples, and the third line “Ours” is the denoised results of our method.

Fig. 7

The experimental results of the FGSM method on ImageNet, the first line “Normal” is a normal clean sample, the second line “Adv” is the generated adversarial examples, and the third line “Ours” is the denoised results of our method.

4.5 Quantitative experiments

This paper uses the aforementioned six attack methods for quantitative experiments, including classification accuracy, peak signal-to-noise ratio, and structural similarity, as shown in Tables 3. Table 1 is the experiment of classification accuracy. Result shows the proposed method can effectively defend the adversarial examples generated by the above several attack methods, and the "base" column gives the classification accuracy on the adversarial examples of the classification model without defense operations. The "Ours" column is the classification accuracy of the samples denoised by the model proposed in this paper. Results show the adversarial examples generated by the five datasets selected in the six attack methods and the accuracy obtained by the three target classifiers. It can be seen from the Table 1 that the method of this paper has a good effect on the recovery of the adversarial examples, and the classification accuracy after denoising is significantly improved. Among them, for the attack of MIM method on the complex ImageNet dataset, the classification accuracy after denoising increases from 22.6% to 78.7%.

Table 1
Classification accuracy of five datasets under six attack methods

ATTACK MNIST_CNN FMNIST_LeNet CIFAR10_LeNet CIFAR100_AlexNet ImageNet_AlexNet

Base Ours Base Ours Base Ours Base Ours Base Ours

Normal 0.762 0.782 0.791 0.801 0.674 0.757 0.694 0.745 0.781 0.782

FGSM 0.27 0.761 0.26 0.768 0.502 0.712 0.512 0.723 0.275 0.798

PGD 0.343 0.763 0.322 0.762 0.388 0.71 0.352 0.782 0.234 0.752

MIM 0.318 0.754 0.321 0.765 0.399 0.707 0.355 0.745 0.226 0.787

Deep-Fool 0.265 0.652 0.299 0.752 0.272 0.724 0.326 0.788 0.208 0.745

JSMA 0.252 0.664 0.301 0.758 0.254 0.687 0.335 0.721 0.251 0.732

C&W 0.194 0.601 0.297 0.755 0.124 0.605 0.32 0.741 0.244 0.712

ATTACK	MNIST_CNN	FMNIST_LeNet	CIFAR10_LeNet	CIFAR100_AlexNet	ImageNet_AlexNet
Normal	0.762	0.782	0.791	0.801	0.674	0.757	0.694	0.745	0.781	0.782
FGSM	0.27	0.761	0.26	0.768	0.502	0.712	0.512	0.723	0.275	0.798
PGD	0.343	0.763	0.322	0.762	0.388	0.71	0.352	0.782	0.234	0.752
MIM	0.318	0.754	0.321	0.765	0.399	0.707	0.355	0.745	0.226	0.787
Deep-Fool	0.265	0.652	0.299	0.752	0.272	0.724	0.326	0.788	0.208	0.745
JSMA	0.252	0.664	0.301	0.758	0.254	0.687	0.335	0.721	0.251	0.732
C&W	0.194	0.601	0.297	0.755	0.124	0.605	0.32	0.741	0.244	0.712

Table 2

Comparative study of three defense methods on five classifiers

Classifier\Defense	No attack	No defense	APE-gan	Defense-gan [17]	Our proposed method
MNIST_CNN	0.752	0.271	0.721	0.732	0.761
FMNIST_LeNet	0.761	0.262	0.715	0.742	0.768
CIFAR10_LeNet	0.674	0.502	0.701	0.706	0.712
CIFAR100_AlexNet	0.694	0.512	0.711	0.703	0.723
ImageNet_AlexNet	0.781	0.275	0.724	0.755	0.798

Table 3

The PSNR of the six methods before and after denoising

ATTACK	MNIST_CNN		FMNIST_LeNet		CIFAR10_LeNet		CIFAR100_AlexNet		ImageNet_AlexNet
	Base	Ours	Base	Ours	Base	Ours	Base	Ours	Base	Ours
FGSM	25.11	26.63	29.04	31.49	29.85	30.43	30.21	30.23	33.21	33.34
PGD	25.25	27.87	29.22	30.86	26.65	27.72	30.24	31.32	33.14	34.22
MIM	26.66	28.24	29.23	30.81	25.56	26.76	30.25	31.33	32.25	33.16
Deep-Fool	25.01	26.32	25.13	26.33	25.03	26.35	25.33	26.45	24.22	25.13
JSMA	25.22	25.62	26.37	27.87	26.62	27.65	25.12	25.99	25.32	26.21
C&W	25.32	26.96	25.97	26.85	25.87	26.54	26.53	27.32	26.15	27.65

To satisfy the purpose of classifier defense against attack, we compare our proposed method with existing methods, as shown in Table 2. The results show that our method is effective compared with the other two methods. This can be shown more clearly in the Imagenet dataset, and the classification accuracy of Defense-Gan [17] is lower than our proposed method, which is 4.3%. At the same time, our method is more effective than APE-GAN. In the Imagenet dataset, the classification accuracy of our method is higher than that of APE-GAN [9] 7.4%. In order to evaluate the effect of the proposed model, it is required not only to correctly classify the images, but also to make the denoised image as similar as possible to the original image. Therefore, the quantitative analysis of PSNR and SSIM of the denoised images and the adversarial examples is also carried out, as shown in Tables 3 and 4. PSNR is a common quality evaluation metric, which is used to evaluate the quality of the generated image compared with the clean image. The general value range is 20-40dB. SSIM (structural similarity index) is an index to measure the similarity of two images, the structure similarity ranges from 0 to 1. When two images are identical, the value of SSIM is equal to 1. For PSNR and SSIM, they have the same criteria, that is, the larger the value, the better the image quality. $MSE (X, Y) = \frac{1}{H \times W} \sum_{i, j = 1}^{H, W} (X (i, j) - Y (i, j))^{2}$ (13) $PSNR (X, Y) = 10 log (\frac{{(2^{n} - 1)}^{2}}{MSE})$ (14) Where X and Y represent two contrasting images, H and W represent the height and width of the image respectively, and n is the number of bits per pixel, which is generally set to 8.

Table 4

The SSIM of the six methods before and after denoising

ATTACK	MNIST_CNN		FMNIST_LeNet		CIFAR10_LeNet		CIFAR100_AlexNet		ImageNet_AlexNet
	Base	Ours	Base	Ours	Base	Ours	Base	Ours	Base	Ours
FGSM	0.823	0.844	0.815	0.822	0.832	0.843	0.811	0.826	0.836	0.841
PGD	0.711	0.721	0.727	0.730	0.724	0.729	0.735	0.741	0.738	0.744
MIM	0.761	0.766	0.755	0.767	0.745	0.754	0.751	0.759	0.746	0.753
Deep-Fool	0.731	0.736	0.754	0.760	0.726	0.734	0.744	0.752	0.729	0.731
JSMA	0.715	0.722	0.732	0.745	0.711	0.723	0.714	0.726	0.736	0.748
C&W	0.725	0.734	0.721	0.733	0.742	0.754	0.745	0.749	0.747	0.750

Tables 3 and 4 show the PSNR and SSIM comparison of the proposed method applied to the six attack methods. The “Base” column represents the PSNR and SSIM calculated by the adversarial examples and the clean image; the "Ours" column represents the PSNR and SSIM between the denoised image and the clean image. The increase of values is mainly due to the constraint of isomorphic loss, which enhances the correlation between the adversarial examples and the clean examples.

5 Conclusion

In this paper, we propose a new method to defend against invisible noise. A discriminator is used to distinguish the output of the generator to maintain the classification accuracy of clean samples, thereby improving the defense ability of the network against adversarial samples and improving the robustness of the network. We then deployed six common attack methods to attack the network and tested the effectiveness of our proposed defense network through qualitative and quantitative experiments. Experiments prove the denoising effect of our method and the robustness of the network. There are still some weaknesses in our method. The most typical one is that our mechanism has certain limitations, i.e. dealing with patch-based attack, which are mainly limited by our generators. In the future, we focus on improving the network for complex adversarial examples denoising problems, combining with new denoising methods to achieve better denoising effects at semantic level.

Acknowledgments

The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper. This work was supported by the grants from the National Natural Science Foundation of China (Nos. 61673396, 61976245) and the Fundamental Research Funds for the Central Universities (18CX02140A).

Footnotes

Except for the parameters, attacker knows everything else about target model.

References

Akhtar

and Mian

, Threat of adversarial attacks on deep learning in computer vision: A survey, IEEE Access 6 (2018), 14410–14430.

Arjovsky

, Chintala

and Bottou , Wasserstein generative adversarial networks. In International Conference on Learning Representations(ICLR), 2017.

Carlini

and Wagner

, Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.

K.T.

, Muñoz-González

, de Maupeou

and Lupu

E.C.

, Procedural noise adversarial examples for blackbox attacks on deep convolutional networks. In ACM SIGSAC Conference on Computer and Communications Security, pages 275–289. ACM, 2019.

Ding

, Guo

, Lei

and Yun

, Oneshot face recognition via generative learning. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 2018.

Dong

, Liao

, Pang

, Su

, Zhu

, Hu

and Li

, Boosting adversarial attacks with momentum. In IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 9185–9193, 2018.

Eykholt

, Evtimov

, Fernandes

, Li

, Rahmati

, Xiao

, Prakash

, Kohno

and Song

, Robust physical-world attacks on deep learning visual classification. In IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 1625–1634, 2018.

Goodfellow

I.J.

, Shlens

and Szegedy

, Explaining and harnessing adversarial examples. In International Conference on Learning Representations(ICLR), 2015.

Jin

, Shen

, Zhang

, Dai

and Zhang

, Ape-gan: Adversarial perturbation elimination with gan. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3842–3846. IEEE, 2019.

10.

Krizhevsky

, Learning multiple layers of features from tiny images. Technical report, 2009.

11.

Krizhevsky

, Sutskever

and Hinton

G.E.

, Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105, 2012.

12.

Kurakin

, Goodfellow

I.J.

and Bengio

, Adversarial examples in the physicalworld. In International Conference on Learning Representations(ICLR), 2017.

13.

Lecun

, Bottou

, Bengio

and Haffner

, Gradient-based learning applied to document recognition, 86 (1998), 2278–2324.

14.

Liu

, Sun

, Fang

and Di

, Cross-modal zero-shot-learning for tactile object recognition, IEEE Transactions on Systems Man & Cybernetics Systems PP(99) (2018), 1–9.

15.

Liu

, Yang

, Liu

, Song

, Chen

and Li

, Dpatch: An adversarial patch attack on object detectors. In SafeAI@AAAI, 2019.

16.

Madry

, Makelov

, Schmidt

, Tsipras

and Vladu

, Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations(ICLR), 2018.

17.

Meng

and Chen

, Magnet: a two-pronged defense against adversarial examples. In ACM SIGSAC Conference on Computer and Communications Security, pages 135–147, 2017.

18.

Moosavi-Dezfooli

S.-M.

, Fawzi

and Frossard

, Deepfool: A simple and accurate method to fool deep neural networks. In IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 2574–2582. IEEE Computer Society, 2016.

19.

Papernot

, McDaniel

and Goodfellow

, Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.

20.

Papernot

, McDaniel

, Jha

, Fredrikson

, Celik

Z.B.

and Swami

, The limitations of deep learning in adversarial settings. In IEEE Symposium on Security and Privacy (SP), pages 372–387. IEEE, 2016.

21.

Papernot

, McDaniel

, Wu

, Jha

and Swami

, Distillation as a defense to adversarial perturbations against deep neural networks. In IEEE Symposium on Security and Privacy (SP), pages 582–597. IEEE, 2016.

22.

Samangouei

, Kabkab

and Chellappa

, Defense-gan: Protecting classifiers against adversarial attacks using generative models. In International Conference on Learning Representations(ICLR), 2018.

23.

Shen

, Li

, Zhang

, Tao

and Zeng

, Compressed sensing-based inpainting of aqua moderate resolution imaging spectroradiometer band 6 using adaptive spectrum-weighted sparse bayesian dictionary learning, IEEE Trans Geosci Remote Sens 52(2) (2014), 894–906.

24.

Szegedy

, Zaremba

, Sutskever

, Bruna

, Erhan

, Goodfellow

I.J.

and Fergus

, Intriguing properties of neural networks. In Yoshua Bengio and Yann LeCun, editors, International Conference on Learning Representations(ICLR), 2014.

25.

Wang

, Bovik

A.C.

, Sheikh

H.R.

and Simoncelli

E.P.

, Image quality assessment: from error visibility to structural similarity, Image Processing, IEEE Transactions on 13(4) (2004), 600–612.

26.

Xiao

, Rasul

and Vollgraf

, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.

27.

Yang

, Wu

, Liang

, Sun

and Wang

, Self-paced balance learning for clinical skin disease recognition, IEEE Transactions on Neural Networks and Learning Systems PP(99) (2019), 1–15.