• 검색 결과가 없습니다.

Selective Untargeted Evasion Attack: An Adversarial Example That Will Not Be Classified as Certain Avoided Classes

N/A
N/A
Protected

Academic year: 2021

Share "Selective Untargeted Evasion Attack: An Adversarial Example That Will Not Be Classified as Certain Avoided Classes"

Copied!
11
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

Selective Untargeted Evasion Attack: An

Adversarial Example That Will Not Be

Classified as Certain Avoided Classes

HYUN KWON 1, YONGCHUL KIM2, HYUNSOO YOON1,

AND DAESEON CHOI 3, (Member, IEEE)

1School of Computing, Korea Advanced Institute of Science and Technology, Daejeon 34141, South Korea 2Department of Electrical Engineering, Korea Military Academy, Seoul 01819, South Korea

3Department of Medical Information, Kongju National University, Gongju-si 32588, South Korea Corresponding author: Daeseon Choi (sunchoi@kongju.ac.kr)

This work was supported in part by the Institute for Information and Communications Technology Promotion (IITP) under Grant 2016-0-00173 and in part by the National Research Foundation of Korea (NRF) under Grant 2017R1A2B4006026.

ABSTRACT Deep neural networks (DNNs) have useful applications in machine learning tasks involving

recognition and pattern analysis. Despite the favorable applications of DNNs, these systems can be exploited by adversarial examples. An adversarial example, which is created by adding a small amount of noise to an original sample, can cause misclassification by the DNN. Under specific circumstances, it may be necessary to create a selective untargeted adversarial example that will not be classified as certain avoided classes. Such is the case, for example, if a modified tank cover can cause misclassification by a DNN, but the bandit equipped with the DNN must misclassify the modified tank as a class other than certain avoided classes, such as a tank, armored vehicle, or self-propelled gun. That is, selective untargeted adversarial examples are needed that will not be perceived as certain classes, such as tanks, armored vehicles, or self-propelled guns. In this study, we propose a selective untargeted adversarial example that exhibits 100% attack success with minimum distortions. The proposed scheme creates a selective untargeted adversarial example that will not be classified as certain avoided classes while minimizing distortions in the original sample. To generate untargeted adversarial examples, a transformation is performed to minimize the probability of certain avoided classes and distortions in the original sample. As experimental datasets, we used MNIST and CIFAR-10, including the Tensorflow library. The experimental results demonstrate that the proposed scheme creates a selective untargeted adversarial example that exhibits 100% attack success with minimum distortions (1.325 and 34.762 for MNIST and CIFAR-10, respectively).

INDEX TERMS Machine learning, adversarial example, deep neural network (DNN), avoided classes.

I. INTRODUCTION

Deep neural networks (DNNs) [1] exhibit superior per-formance in machine learning tasks involving image recognition [2], speech recognition [3], pattern analysis [4], and intrusion detection [5]. However, an example created by adding a small amount of noise to an original sample results in misclassification by the DNN. Such adversarial examples [6] pose serious security threats to the DNN. For example, if a modified left-turn symbol is incorrectly catego-rized by the DNN as a U-turn symbol, an autonomous vehicle that has been configured with the DNN will also incorrectly The associate editor coordinating the review of this manuscript and approving it for publication was Guitao Cao.

categorize the modified symbol as a U-turn symbol; however, a person will correctly categorize the modified symbol as a left-turn symbol. Research on these adversarial examples is actively under way.

Adversarial examples are classified as targeted and untar-geted. A targeted adversarial example is one that has been misclassified as a particular target class by an attacker. On the other hand, an untargeted adversarial example is one that has been misclassified as any arbitrary class other than the original class. There are several advantages and disadvan-tages associated with the two methods. The targeted adver-sarial example exhibits the disadvantage of requiring more processing time and including greater distortions; however, its advantage is that an attacker can perform sophisticated

(2)

attacks that can be misclassified as the target class by the attacker. The untargeted adversarial example exhibits the dis-advantage of not being applicable to sophisticated attacks; however, it exhibits the advantage of requiring less processing time and including lesser distortions.

In specific situations such as military scenarios, an attacker may need to create a selective untargeted adversarial example that will not be classified as certain classes while minimizing distortions in the original sample. For example, it may be necessary for an enemy aircraft to misidentify allied armored vehicles as objects other than allied armored vehicles or artillery vehicles. In this case, it is important to reduce distor-tions and simultaneously maintain 100% attack success while preventing recognition in certain particular classes.

In this paper, we propose a selective untargeted adversar-ial example exhibiting 100% attack success with minimum distortions. The proposed scheme can create an untargeted adversarial example that will not be classified as certain classes while minimizing distortions in the original sample. The contributions of this paper include the following:

• To the best of our knowledge, this is the first study to present a selective untargeted adversarial example that can select several classes that the attacker does not wish to recognize. In addition, we systematically organized the structure and principle of the proposed scheme com-pared to the targeted and untargeted adversarial exam-ple.

• We analyzed the attack success rates and distortions for targeted, untargeted, and the proposed adversarial examples. Although the proposed method includes some distortions compared to the untargeted adversarial exam-ple, the proposed method maintains a 100% success rate, including human perception, while avoiding specific avoided classes. We also analyzed the average distor-tions according to the number of avoided classes that were input to the proposed method.

• We demonstrate the effectiveness of our proposed scheme through an experiment using the MNIST [7] and CIFAR-10 [8] datasets. The proposed method exhibits performance that is similar to the untargeted adversarial example in terms of human perception, distortion, and attack success rate.

The remainder of this paper is structured as follows: In Section 2, we introduce the background and related works. In Section 3, we present the definition of the problem. In Section 4, we present the proposed scheme. In Section 5, we present the experiment and its evaluation. The proposed scheme is discussed in Section 6, and Section 7 finally con-cludes the paper.

II. BACKGROUND AND RELATED WORK

The concept of adversarial examples was introduced by Szegedy et al. [6]. An adversarial example, which is created by adding a small amount of noise to the original image, causes misclassification by a DNN while minimizing distor-tions to the image.

Section II-A presents an overview of a neural network. The process of generating adversarial examples is described in Section II-B. Section II-C presents defense methods of an adversarial example. Adversarial examples can be catego-rized as follows: based on target model information, recogni-tion, distance measured, method of generarecogni-tion, and method of recognizing different classes for multiple models; these are described in Sections II-D, II-E, II-F, II-G, and II-H, respectively.

A. NEURAL NETWORKS

A neural network [2] is a machine learning model that forms a network based on a combination of nodes and weights. The structure of a neural network consists of an input layer, hidden layers, and an output layer. The input layer consists of a neuron for each input variable that has been matched 1:1. Hidden layers include neurons that have been created based on a combination of the neurons and weights. The complexity of the model is determined by the number of layers within hidden layers. In the output layer, neurons are created by combining neurons and weights in the hidden layer, and the number of output layers is determined by the type of output to be predicted. Neurons in the hidden and output layers calculate the sum of the input values and weights in the previous layer. These neurons also execute an activation function that outputs the weighted sum of the neurons as the input value for the next layer. The neural network learns through the training data and sets the parameters of each layer by selecting parameters with optimal loss values using backpropagation and gradient descent.

B. ADVERSARIAL EXAMPLE GENERATION

The structure used for creating an adversarial example con-sists of two elements: a transformer and a target model. The transformer accepts the original class, y, and original sample, x, as inputs. The transformer then creates a transformed example, x= x + , with noise value, , and provides the transformed example xto the target model. As feedback on the transformed example, the target model provides the transformer with the classification results (i.e., loss). The transformer changes the noise value such that other class probabilities are higher than the original class probabilities, while minimizing the noise value .

C. DEFENSES AGAINST ADVERSARIAL EXAMPLES

Defenses to resist adversarial examples can be divided into two types: the input data modification [9], [10] and the classifier modification [6], [11], [12]. The input data mod-ification can be divided into three methods: the filter-ing module [9], [10], feature squeezfilter-ing [13], and Magnet methods [14]. First, the filtering module method [10] elim-inates adversarial perturbations using generative adversarial nets [15]. Although this method preserves the classification accuracy of the original sample, the method requires an addi-tional module process to generate a filtering module. Second, the feature squeezing method [13] modifies input samples.

(3)

The method can detect the adversarial example by reduc-ing the depth of each pixel in an image and the difference between each pair of corresponding pixels. However, a sepa-rate processing procedure is required to alter the input data. Magnet [14] consists of a reformer and a detector for resisting adversarial example attacks. In this method, the reformer exchanges the input data to the nearest original sample, and the detector detects the adversarial example by measuring far distances. However, the Magnet method is vulnerable to white box attacks and requires a additional process for creating the reformer and detector modules.

The classifier modification can be divided into two meth-ods: a defensive distillation method [12] and an adver-sarial training method [6], [11]. The defensive distillation method [12] resists adversarial example attacks by block-ing the calculation of gradient descent. In order to block gradient descent calculation, this method uses two neural networks. Moreover, the output class probability of the clas-sifier is used as an input for the second stage of classi-fier training. However, this method is vulnerable to white box attacks and requires additional structural improvements. On the other hand, The adversarial training method [6], [11] involves imposing an additional learning process for adver-sarial examples. However, this method has a high probability of reducing the accuracy of classification of the original sample.

D. CATEGORIZATION BASED ON TARGET MODEL INFORMATION

Depending on the amount of information available concern-ing the target, an adversarial example can be categorized as follows: a white box attack [16], [17] or a black box attack [18]–[20]. A white box attack occurs when an attacker accesses target model information such as class probabili-ties of the output, the model parameters, model architecture. Therefore, the success rate of white box attacks is almost 100%. A black box attack occurs when an attacker fails to access information associated with the target model. Black box attacks typically involve transfer attacks [18], substi-tute networks [19], and universal perturbation methods [21]. A transfer attack involves attacking a black box model using an adversarial example created by a local model that is known to an attacker. The substitute network creates a model similar to the black box model through a series of queries on the black box model. The substitute network is a method to attack by using adversarial example created by similar model created in this method. The universal perturbation method creates universal noise that can be mistakenly recognized in several original images, which can be used to attack a black box model.

E. CATEGORIZATION BASED ON THE RECOGNITION OF ADVERSARIAL EXAMPLES

Adversarial examples can be divided into two types according to the wrong class that is intended for the misrecognition by the target model [16], [22]: targeted adversarial examples and

untargeted adversarial examples. In the first type, the targeted adversarial example is misclassified by the target model as a particular intended class chosen by the attacker and can be expressed mathematically as follows:

Given a target class y, an original sample x 2 X, and a target model, an optimization problem creates a targeted adversarial example xas follows:

x: argmin xL(x

,x) such that f (x) = y, (1) where L(·) is a measure of the distance between the adver-sarial example xand the original sample x, argmin

x L(x)

is the x value at which the function L(x) is minimal, and f (·) is an operation function of the target model that presents classification results based on the input values.

In the second type, the untargeted adversarial example is misclassified by the target model as any class other than the original class; it can be expressed mathematically as follows: Given an original class y, an original sample x 2 X, and a target model, an optimization problem is used to create an untargeted adversarial example xas follows:

x⇤: argmin

xL(x

,x) such that f (x) 6= y, (2) Because the untargeted adversarial example exhibits the advantages of shorter learning time and fewer distortions compared to the targeted adversarial example, we focus on a scenario involving an untargeted adversarial example in this study.

F. CATEGORIZATION BASED ON THE DISTANCE MEASURED

In order to measure the distortion [16] between the adversar-ial example and the original sample, there are three distance measures used: L0, L2, and L1. The L0distance is the sum of the numbers of all changed pixels:

n

X

i=0

xi⇤ xi ,

where xi⇤is the ithpixel in adversarial example x, and xiis

the ithpixel in original sample x. The L2distance is the square roots of the sum of the squared differences between each pair of corresponding pixels, as follows:

v u u t n X i=0 (xi⇤ xi)2.

The L1 distance is the maximum value of the distance between xiand xi⇤, as follows:

max

i ( x

i xi ). (3)

The smaller the three distances, the more similar the adver-sarial example is to the original sample from a human per-spective. In this study, we use L2distance measure.

(4)

TABLE 1. Class scores of an adversarial example: ‘‘3’’ ! ‘‘9’’.

G. CATEGORIZATION BASED ON THE ADVERSARIAL EXAMPLE GENERATION METHOD

There are five typical attacks that create adversarial exam-ples: the fast-gradient sign method (FGSM) [11], iterative FGSM (I-FGSM) [23], DeepFool [24], the Jacobian-based saliency map attack (JSMA) [17], and the Carlini–Wagner attack [16]. FGSM [11] can obtain xusing L1:

x= x + ✏ · sign(OlossF,t(x)), (4)

where F is an object function and t is a target class. In the one step of the FGSM, the gradient descent is changed by ✏ value from the original x, and xis obtained through optimization. This is a simple method that demonstrates good performance. I-FGSM [23] is an extension of FGSM. Instead of updating the amount ✏ at every step, a smaller amount, ↵, is changed and is eventually clipped by ✏:

xi⇤= xi 1⇤ clip✏(↵ · sign(OlossF,t(xi 1⇤ ))). (5)

As I-FGSM creates an adversarial example during a given iteration on a target model, it exhibits a higher attack suc-cess rate in terms of white box attacks compared to FGSM. I-FGSM exhibits better performance compared to FGSM.

The DeepFool method [24] creates an adversarial example in a manner that is more efficient than FGSM and one that is similar to the original image. To create an adversarial exam-ple, this method focuses on xusing the linearization approx-imation method in a neural network. However, the DeepFool method is a more complicated process than FGSM because the neural network requires a lot of iterations and is not completely linear.

The JSMA method [17] is a targeted attack using a simple iterative method. To induce minimum distortions, the method focuses on a component that reduces the adversarial exam-ple’s saliency value. The saliency value, as a measure of an element, determines the output class of a model. This method performs targeted attacks with minimal distortions by reducing saliency values, but this method computationally requires time consuming.

The Carlini–Wagner attack [16] is a state-of-the-art attack method with 100% attack success and exhibits superior per-formance compared to FGSM or I-FGSM. A key principle of this method updates a different objective function:

D(x⇤,x) + c · f (x⇤).

This method adds an attack success loss function c · f (x) instead of using the objective function D(x,x). In addition, this method can control the attack success rate even at the

cost of some increase in distortion by incorporating a confi-dence value. We apply a modified Carlini–Wagner attack to construct the proposed method.

H. METHOD OF RECOGNIZING DIFFERENT CLASSES OF MULTIPLE MODELS

Adversarial example methods that enable multiple models to recognize different classes by sequentially modifying each image have been recently proposed. One of these methods, the friend-safe method [25], creates a friend-safe adversarial example that is properly recognized by a friendly classifier and is not properly recognized by an enemy classifier when friends and enemies are combined as in a military situation. Because this friend-safe adversarial example includes min-imal distortions, a human cannot discern any differences in the original sample. Another method creates a multi-targeted adversarial example [26] that is meant to be recognized as a different class by each model when several models are being attacked. For example, if there are models A, B, and C, the attacker can create an adversarial example that makes A incorrectly recognize it as a right turn, makes B recognize it as a left turn, and makes C recognize it as U-turn. This method is an extended version of the friend-safe adversarial example, and its performance was evaluated based on the MNIST dataset.

III. PROBLEM DEFINITION

Table1presents class scores for an adversarial example that has been incorrectly classified as a class ‘‘9’’ score instead of a class ‘‘3’’ score. This table illustrates the process of gener-ating an adversarial example, wherein the process continues until the score of the wrong class is slightly higher than that of the original class because of distance minimization in the original sample.

The concept of decision boundaries is illustrated in Fig.1. The figure presents the decision boundaries of the model M, which misclassifies the untargeted, targeted, and pro-posed adversarial example for the original sample x. The line denotes the decision boundary of the model function f (·). For example, in Fig.1, the original sample x is correctly classified a truck class, ytruck, in the decision boundary area of the truck class:

f (x) = ytruck (6)

The distance between the original sample and the adversarial example is generally minimized, and the difference between the three methods is that xis misidentified by each of the

(5)

FIGURE 1. Examples of decision boundaries of model M for untargeted, targeted, and proposed adversarial examples.

three methods. Let xbe xuin the case of an untargeted adversarial example, xtin the case of a targeted adversarial example, and xpin the case of the proposed adversarial exam-ple. First, in Fig. 1 (a), the untargeted adversarial method generates xnearest to x, which is recognized as an incorrect class and not that of x. For example, in Fig.1(a), the untar-geted adversarial method generates x, which is the car class closest to x, the truck class:

f (x

u) 6= ytruck (7)

Second, in Fig.1 (b), the targeted adversarial method gen-erates x, which indicates the closest distance from x to the target class chosen by the attacker. For example, in Fig.1(b), the targeted adversarial method generates x, which indicates the closest distance from the truck class x to the horse class chosen by the attacker.

f (xt) = yhorse (8)

Third, in Fig.1(c), the proposed method generates x, which is the nearest distance from x among the remaining classes except for the avoided classes chosen by the attacker. For example, in Fig. 1 (c), the proposed method generates x, which is the closest distance from the truck class x to the remaining classes except for the car and boat that the attacker must avoid. In this case, the avoided classes are vehicles (car, boat, and truck):

f (x

p) 6= ycarand f (xp) 6= yboatand f (xp) 6= ytruck (9) In this paper, we propose a selective untargeted adversarial example that maintains a 100% attack success rate and has a smaller distortion rate compared to the targeted adversarial example.

IV. PROPOSED SCHEME

The proposed architecture for creating a selective untargeted adversarial example comprises a transformer and a target model M, as shown in Fig. 2. The transformer takes the original class y, the original sample x, and the avoided classes zi(1  i  n) as input values. The transformer then creates

the transformed example as output and provides it to the target

FIGURE 2. Schematic of the proposed architecture.

model M. The target model M takes the transformed example as an input value, and provides the results of classification (i.e., loss) as feedback to the transformer.

The proposed scheme aims to create a transformed exam-ple xthat is misclassified by model M as any wrong class other than specific avoided classes while minimizing the distance between the transformed example xand the original sample x. In the mathematical expressions, the operation function of M is denoted by f (x). Given the avoided classes zi (1  i  n), original sample x, and pretrained model M,

this optimization problem creates an adversarial example x:

x⇤: argmin

x⇤ L(x

,x) such that f (x) 6= zi(1  i  n), (10) where L(·) denotes the distance between the transformed example xand the original sample x. To achieve the objec-tive of the proposed scheme, the transformer creates a selec-tive untargeted adversarial example xby considering the avoided classes zi and the original sample x as input

val-ues. The method proposed in this study comprises a trans-former architecture, which is a modification of the methods in [16], [27], and x, which is defined as follows:

x= x + , (11)

where is the modifier, indicating the amount of noise. The model M considers xas the input value of the transformer. As feedback, the model M presents the loss results of xto the transformer. The transformer then calculates the total loss lossT and changes the transformed example x⇤ through the iterative process above to optimally minimize lossT. This total

(6)

loss is defined as follows:

lossT= lossdistortion+ lossattack, (12) where lossdistortiondenotes the distortion function of the trans-formed example, and lossattackdenotes the classification loss function of model M.

lossdistortiondenotes the distance between the transformed example xand the original sample x:

lossdistortion= | |. (13) The distortion loss function is L0distance such as |xx| = | |. To satisfy f (x) 6= zi (1  i  n), lossattack must be minimized: lossattack= c · n X i gi(x⇤), (14)

where gi(k) = Zi(k)zi max Zi(k)j : j 6= zi . The value c denotes the weighted value of lossattack, with an initial value of 1. Zi(·) [16], [17] provides the class probabilities

that are predicted by the model M. f (x) will have a higher probability of predicting the wrong class than the avoided classes by optimally minimizing lossattack. Detailed

proce-dures involved in creating the selective untargeted adversarial example are presented in Algorithm1.

Algorithm 1 Selective Untargeted Adversarial Example Gen-eration

Input: original sample x, avoided classes zi (1  i  n),

iteration count r, weighted value c Output: 0 z zi for r step do x x + lossa Pni Zi(x⇤)z max Z(x⇤)j : j 6= z lossT | | + c ⇥ lossa

Update by minimizing lossT

end for return x

V. EXPERIMENT AND EVALUATION

We demonstrate, through experiments, that the proposed scheme creates a selective untargeted adversarial example that is misclassified by model M as a wrong class other than the avoided classes while minimizing the distortion from the original sample. We used the Tensorflow [28] as a machine learning library.

A. DATASETS

MNIST [7] and CIFAR-10 [8] were used in the experiment. MNIST includes handwritten images (the digits 0 to 9) and it is a standard dataset. MNIST comprises (28 ⇥ 28 ⇥ 1)-pixel matrices and exhibits the advantages of fast learning time and ease of use in experiments because of the one-dimensionality

of the images. 10,000 test data and 60,000 training data from MNIST were used. CIFAR-10 includes color images of 10 classes: planes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. CIFAR-10 comprises (32⇥32⇥3)-pixel matrices that are three-dimensional images. The dataset has been widely used in machine learning experiments. CIFAR-10 includes 10,000 test data and 50,000 training data.

B. PRETRAINING OF THE TARGET MODEL

Table6 in the Appendix presents the MNIST model used to create the proposed architecture, which is a convolutional neural network architecture [29]. Table7presents the MNIST model parameters. The MNIST model M demonstrated a 99% accuracy rate with the 10,000 test data.

Table8in the Appendix presents the CIFAR-10 model as a VGG19 network [2] used for creating the proposed architec-ture. Table7presents the CIFAR-10 model parameters. The CIFAR-10 model M demonstrated a 90% accuracy rate with the 10,000 test data.

Adam [30] used the optimizer as a parameter of the trans-former to create the adversarial examples. In the MNIST dataset, the number of iterations was 1000, the initial constant of the learning rate was 0.001, and the learning rate was 0.1. In the CIFAR-10 dataset, the number of iterations was 10,000, the initial constant of the learning rate was 0.001, and the learning rate was 0.001.

C. EXPERIMENTAL RESULTS

The experimental results illustrate the performance of the proposed scheme in two parts, for the MNIST dataset and the CIFAR-10 dataset. The attack success rate formula depends on the untargeted, targeted, or the proposed adversarial exam-ple. For the untargeted adversarial example, the attack success rate indicates the rate of inconsistency between the original class and the output class of the adversarial example. For the targeted adversarial example, the attack success rate indicates the coincidence rate between the target class and the output class of the adversarial example. In the selective untargeted adversarial example, the attack success rate is denoted by the inconsistency ratio between the avoided classes and the out-put class of the adversarial example. Definition of distortion is denoted by L2, which is the square root of the sum of each pixel’s squared difference in the original sample.

1) MNIST

Table2 presents a sampling of selective untargeted adver-sarial examples that are incorrectly classified by model M as a wrong class other than specific avoided classes (the number of avoided classes is five) when the attack success rate is 100%. The targeted adversarial examples presented in the table were created in 400 iterations, while the untargeted adversarial examples and the proposed adversarial examples were created in 300 iterations. Each adversarial example presented in the table is misclassified by the model M, but to the human eye, each adversarial example is similar to the corresponding original sample.

(7)

TABLE 2. Sampling of untargeted, targeted, and proposed adversarial examples for MNIST when the attack success rate is 100%. This sampling is the image of the average distortion in each method and is presented in Table3. In Tables 3, the proposed method exhibits 0.171 more distortions in MNIST than in the untargeted adversarial method. However, because the difference in distortion levels between the untargeted adversarial example and the proposed method is so small, there is little difference in terms of human perception.

FIGURE 3. Average attack success rate and distortion for 900 random untargeted, targeted, and proposed adversarial examples in MNIST.

TABLE 3. Comparison of the untargeted, targeted, and proposed methods when the attack success rate is 100% in MNIST. SD denotes standard deviation.

Fig. 3 (a) illustrates the average attack success rate for 900 random adversarial examples of each type (untargeted, targeted, and proposed). In the proposed adversarial example, five specific classes were randomly chosen. As observed in the graph, the targeted adversarial example is a sophisticated attack, requiring more iterations than the other adversarial examples to achieve a 100% attack success rate. On the other hand, the number of iterations required for the proposed adversarial example is 300, which is similar to that of the untargeted adversarial example, and the attack success rate is 100%.

Fig. 3 (b) shows the average distortion for a random 900 adversarial examples of each type (untargeted, targeted,

and proposed). In the proposed adversarial example, five spe-cific classes were randomly chosen. As observed in the graph, the targeted adversarial example includes more distortions than other adversarial examples because it is a sophisticated targeted attack. The proposed adversarial example exhibits slightly more distortions than the untargeted adversarial example because of the imposed condition that excludes specific classes. However, to the human eye, the proposed adversarial example is not that different from the untargeted adversarial example, as shown in Table2.

Table 3 shows the distortion and iteration count that are required to achieve 100% attack success in each case of 900 adversarial example. The proposed adversarial exam-ple and the untargeted adversarial examexam-ple reach 100% faster than the targeted adversarial example, and distortion in the proposed and the untargeted case is also smaller than in the targeted case.

2) CIFAR-10

Table4 presents a sampling of selective untargeted adver-sarial examples that are incorrectly classified by model M as a wrong class other than specific avoided classes (the number of avoided classes is five) when the attack

(8)

TABLE 4. Sampling of untargeted, targeted, and proposed adversarial examples for the CIFAR-10 dataset when the attack success rate is 100%. This sampling is the image of the average distortion in each method, which is presented in Table5. In Tables 5, the proposed method includes 28.09 more distortions in CIFAR10 than in the untargeted adversarial method. However, because the difference in distortions between the untargeted adversarial example and the proposed method is so small, there is little difference in human perception. Particularly for CIFAR10, it is almost impossible to identify the difference because it is a color image.

FIGURE 4. Average attack success rate and distortion for 900 random untargeted, targeted, and proposed adversarial examples in the CIFAR-10 dataset.

success rate is 100%. The targeted adversarial examples presented in the table were created in 10,000 iterations, and the untargeted adversarial example and the proposed adversarial example were created in 4000 iterations. Because the CIFAR-10 dataset (Table4) comprises three-dimensional images, unlike the MNIST dataset (Table2), the adversarial examples do not appear to be different from their correspond-ing original samples to the human eye.

Fig. 4 (a) illustrates the average attack success rate for 900 random adversarial examples of each type (untargeted, targeted, and proposed). In the proposed adversarial example, five specific classes were randomly chosen. To achieve a 100% attack success rate, the CIFAR-10 dataset required over 4000 iterations for each adversarial example compared with 300 iterations required by the MNIST dataset. Specif-ically comparing the CIFAR-10 dataset with MNIST, it can be observed that the targeted adversarial example does not exhibit a large increase in attack success rate per iteration.

Fig.4(b) presents the average distortion for 900 random adversarial examples of each type (untargeted, targeted, and proposed). In the proposed adversarial example, five specific classes were randomly chosen. The average distortion pat-terns of each adversarial example is similar to that of MNIST

TABLE 5. Comparison of the untargeted, targeted, and proposed methods when the attack success rate is 100% in the CIFAR-10 dataset.

SD denotes standard deviation.

(Fig.3(b)). On the other hand, for CIFAR-10, the reduction in distortion of the targeted adversarial example was smaller than that for the other methods (Fig.4(b)).

Table 5 shows the distortion and iteration count that are required to achieve 100% attack success in each case of 900 adversarial example. Similar to MNIST, the untargeted adversarial example and the proposed adversarial example reach 100% faster than the targeted adversarial example, and distortion in the proposed and the untargeted case is also smaller than in the targeted case.

Fig.5shows the average distortion according to the number of avoided classes for 900 random adversarial examples of

(9)

FIGURE 5. Average distortion of 900 random selective untargeted adversarial examples for each quantity of avoided classes with a 100% attack success rate.

the proposed type when the attack success rate was 100%, for both MNIST and CIFAR-10. For MNIST, the number of iterations required was 300. As the number of avoided classes increases, the average distortion increases, as shown in the figure. However, the rate of increasing distortion was not large. For the number of avoided classes set to five, the pro-posed adversarial examples shown in Table2exhibit an aver-age distortion of 1.325. For the CIFAR-10 dataset, although CIFAR-10 exhibits more distortions (34.762) than MNIST, each adversarial example is more similar to the corresponding original sample, owing to the three-dimensionality of the images.

VI. DISCUSSION

A. ASSUMPTION

The proposed method considers a white box attack because a transformer requires information on classification proba-bilities for the input values to compute the gradient descent. Although this assumption is somewhat strong, it indicates a feasible attack. A substitute method [19] has recently demon-strated that white box attacks can be executed even when the model is a black box. A white box model that is similar to a black box model can be created using multiple queries. An adversarial example created by a similar white box model can then transfer attacks against the black box model.

B. DATASET

The experimental results demonstrate that the proposed scheme can create a selective untargeted adversarial exam-ple using the MNIST and CIFAR-10 datasets, with a 100% attack success rate while minimizing distortion. Based on the experimental results, it was shown that the required dis-tortion and the number of iterations depend on the dataset. Because distortion is defined as the square root of the sum of each pixel’s squared difference in the original sample, the entire set of pixels in the image is affected. For example, the average distortion in the CIFAR-10 dataset are larger than in the MNIST dataset because the total pixel count of a CIFAR-10 image is 3072 (32 ⇥ 32 ⇥ 3), while that of an MNIST image is 782 (28 ⇥ 28 ⇥ 1). Although the average distortion in the CIFAR-10 dataset are high, the adversarial example is not different from the original sample to the

human eye because of the three-dimensionality of the image. In terms of iterations, because a CIFAR-10 image has more pixels than an MNIST image, more iterations are required to create an adversarial example, which can result in mis-classification with minimum distortion. Because a CIFAR-10 image has more pixels than an MNIST image, more itera-tions are needed to create an adversarial example using the CIFAR-10 dataset, which will be misclassified with mini-mal distortion. It can be observed that owing to these rea-sons, CIFAR-10 requires about 10 times more iterations than MNIST.

C. COMPARISON WITH THE UNTARGETED ADVERSARIAL EXAMPLE

The untargeted adversarial example is misclassified as any class other than the original class. On the other hand, the pro-posed example is misclassified as any class except those avoided by the attacker. Therefore, the proposed method adds more distortions than the untargeted method.

However, the proposed method has a significant advantage over the existing baseline method because it can overcome the problems associated with pattern vulnerability. The untar-geted adversarial example can be recognized as a wrong class that is similar to the original class because of pattern vul-nerability. For example, in the CIFAR10 dataset, the original sample ‘‘car’’ is similar to the ‘‘truck’’ image, and so the untargeted adversarial example based on the original sample ‘‘car’’ is more likely to be misclassified as the ‘‘truck’’ class. However, the proposed method can select several classes for which the attacker does not wish to recognize. This is useful for scenarios in which an attacker must avoid certain areas of multiple classes. For example, if an attacker chooses specific avoided classes (transport classes) such as car, truck, boat, or plane, the DNN misclassifies the proposed example as a class, except for those avoided classes. Therefore, the pro-posed method exhibits the advantage of not recognizing spe-cific classes even though it adds more distortion than the untargeted adversarial example.

In terms of attack success rate, although the untargeted adversarial example increases at a faster rate than the pro-posed method, the propro-posed method can also achieve a 100% attack success rate after 300 and 4000 iterations of the MNIST and CIFAR10 datasets, respectively.

In terms of the distortion, the proposed method is slightly more distorted than the untargeted adversarial example, but the distortion is a small number with little difference in terms of the human perspective. Therefore, the proposed method exhibits performance that is similar to that of the untar-geted adversarial example in terms of human perception and distortion.

D. APPLICATION

The proposed scheme exhibits potential applications in sit-uations involving friendly military equipment disguise. For example, if a self-propelled gun is to be camouflaged, the camouflage must be recognized as a class other than a

(10)

TABLE 6. Model M architecture for MNIST: Convolutional layer is denoted by Con; ReLU is denoted by RL; Fully connected layer is denoted by FC; Max pooling is denoted by MP.

specific subset of classes such as artillery equipment, which includes self-propelled guns. The proposed scheme can also be applied to traffic signs to prevent recognition as certain classes.

E. ATTACK CONSIDERATION

For an attacker that must prevent recognition as specific classes, the proposed method provides a 100% attack rate with a lower distortion and a lower iteration process com-pared to the targeted method. The proposed method exhibits the advantage of requiring less distortion than the targeted method. The proposed method produces slightly more dis-tortion than the untargeted method, but the difference is not significant. In addition, the proposed method exhibits no vul-nerability patterns, unlike the untargeted method, which can result in classifications for avoiding classes that are similar to the original class. Because the average distortion increases as the number of avoided classes increase, an attacker must con-sider information concerning the number of avoided classes.

VII. CONCLUSION

In this study, we proposed a selective untargeted adversarial example with a 100% attack success rate and minimum dis-tortions. The proposed scheme creates a selective untargeted adversarial example that is misclassified by model M as a wrong class other than specific avoided classes while main-taining a minimal distortion distance from the original sam-ple. The experimental results demonstrate that the proposed scheme can create a selective untargeted adversarial example with a 100% attack success rate and minimum distortions (1.325 and 34.762 using the MNIST and CIFAR-10 datasets, respectively), even when the number of avoided classes is five.

Future studies can expand this method to other datasets such as those in the voice and video domains [31]. Research on adversarial examples in the voice and video domains is more complex compared to the image domain because the data format is sequenced data. In addition, future studies can be extended to focus on the black box method using the proposed method. Moreover, instead of using a transformer, the proposed adversarial example can be created using a generative adversarial network [15]. Finally, an additional

TABLE 7.Model M parameters.

TABLE 8.Model M architecture [2] for CIFAR-10: Convolutional layer is denoted by Con; ReLU is denoted by RL; Fully connected layer is denoted FC; Max pooling is denoted by MP.

challenge would be to develop a countermeasure for the proposed scheme.

APPENDIX See Tables 6–8. REFERENCES

[1] J. Schmidhuber, ‘‘Deep learning in neural networks: An overview,’’ Neural Netw., vol. 61, pp. 85–117, Jan. 2015.

[2] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for large-scale image recognition,’’ in Proc. 3rd Int. Conf. Learn. Rep-resent. (ICLR), San Diego, CA, USA, May 2015. [Online]. Available: http://arxiv.org/abs/1409.1556

[3] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. M. N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, ‘‘Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,’’ IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012.

[4] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, ‘‘Mastering the game of Go with deep neural networks and tree search,’’ Nature, vol. 529, no. 7587, pp. 484–489, 2016.

[5] S. Potluri and C. Diedrich, ‘‘Accelerated deep neural networks for enhanced intrusion detection system,’’ in Proc. IEEE 21st Int. Conf. Emerg. Technol. Factory Automat. (ETFA), Sep. 2016, pp. 1–8.

(11)

[6] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, ‘‘Intriguing properties of neural networks,’’ in Proc. 2nd Int. Conf. Learn. Represent. (ICLR), Banff, AB, Canada, Apr. 2014. [Online]. Available: http://arxiv.org/abs/1312.6199

[7] Y. LeCun, C. Cortes, and C. J. Burges. (2010). Mnist Handwrit-ten Digit Database. AT&T Labs. [Online]. Available: http://yann. lecun.com/exdb/mnist=

[8] A. Krizhevsky, V. Nair, and G. Hinton. (2014). The Cifar-10 Dataset. [Online]. Available: http://www.cs.toronto.ed/kriz/cifar.html

[9] A. Fawzi, O. Fawzi, and P. Frossard, ‘‘Analysis of classifiers’ robustness to adversarial perturbations,’’ Mach. Learn., vol. 107, no. 3, pp. 481–508, 2015.

[10] G. Jin et al., ‘‘APE-GAN: Adversarial perturbation elimination with GAN,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2019, pp. 3842–3846.

[11] I. J. Goodfellow, J. Shlens, and C. Szegedy, ‘‘Explaining and har-nessing adversarial examples,’’ in Proc. 3rd Int. Conf. Learn. Repre-sent. (ICLR), San Diego, CA, USA, May 2015. [Online]. Available: http://arxiv.org/abs/1412.6572

[12] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, ‘‘Distillation as a defense to adversarial perturbations against deep neural networks,’’ in Proc. IEEE Symp. Secur. Privacy (SP), May 2016, pp. 582–597. [13] W. Xu, D. Evans, and Y. Qi, ‘‘Feature squeezing: Detecting adversarial

examples in deep neural networks,’’ 2017, arXiv:1704.01155. [Online]. Available: https://arxiv.org/abs/1704.01155

[14] D. Meng and H. Chen, ‘‘MagNet: A two-pronged defense against adver-sarial examples,’’ in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., 2017, pp. 135–147.

[15] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, ‘‘Generative adversarial nets,’’ in Proc. Adv. Neural Inf. Process. Syst., 2014, pp. 2672–2680.

[16] N. Carlini and D. Wagner, ‘‘Towards evaluating the robustness of neu-ral networks,’’ in Proc. IEEE Symp. Secur. Privacy (SP), May 2017, pp. 39–57.

[17] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, ‘‘The limitations of deep learning in adversarial settings,’’ in Proc. IEEE Eur. Symp. Secur. Privacy (EuroS&P), Mar. 2016, pp. 372–387.

[18] Y. Liu, X. Chen, C. Liu, and D. Song, ‘‘Delving into transferable adversarial examples and black-box attacks,’’ in Proc. 5th Int. Conf. Learn. Represent. (ICLR), Toulon, France, Apr. 2017. [Online]. Available: https://openreview.net/forum?id=Sys6GJqxl

[19] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, ‘‘Practical black-box attacks against machine learning,’’ in Proc. ACM Asia Conf. Comput. Commun. Secur., 2017, pp. 506–519.

[20] H. Kwon, Y. Kim, K.-W. Park, H. Yoon, and D. Choi, ‘‘Advanced ensemble adversarial example on unknown deep neural network classifiers,’’ IEICE Trans. Inf. Syst., vol. 101, no. 10, pp. 2485–2500, 2018.

[21] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, ‘‘Univer-sal adversarial perturbations,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1765–1773.

[22] G. L. Oliveira, A. Valada, C. Bollen, W. Burgard, and T. Brox, ‘‘Deep learning for human part discovery in images,’’ in Proc. IEEE Int. Conf. Robot. Automat. (ICRA), May 2016, pp. 1634–1641.

[23] A. Kurakin, I. J. Goodfellow, and S. Bengio, ‘‘Adversarial examples in the physical world,’’ in Proc. 5th Int. Conf. Learn. Represent. (ICLR), Toulon, France, Apr. 2017. [Online]. Available: https://openreview.net/forum?id=HJGU3Rodl

[24] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, ‘‘DeepFool: A simple and accurate method to fool deep neural networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 2574–2582.

[25] H. Kwon, Y. Kim, K.-W. Park, H. Yoon, and D. Choi, ‘‘Friend-safe evasion attack: An adversarial example that is correctly recognized by a friendly classifier,’’ Comput. Secur., vol. 78, pp. 380–397, Sep. 2018.

[26] H. Kwon, Y. Kim, K.-W. Park, H. Yoon, and D. Choi, ‘‘Multi-targeted adversarial example in evasion attack on deep neural network,’’ IEEE Access, vol. 6, pp. 46084–46096, 2018.

[27] F. Zhang, P. P. K. Chan, B. Biggio, D. S. Yeung, and F. Roli, ‘‘Adversarial feature selection against evasion attacks,’’ IEEE Trans. Cybern., vol. 46, no. 3, pp. 766–777, Mar. 2016.

[28] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, and M. Isard, ‘‘TensorFlow: A system for large-scale machine learning,’’ in Proc. OSDI, vol. 16, 2016, pp. 265–283.

[29] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based learn-ing applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998.

[30] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’ in Proc. 3rd Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, May 2015. [Online]. Available: http://arxiv.org/abs/1412.6980

[31] M. Osadchy, J. Hernandez-Castro, S. Gibson, O. Dunkelman, and D. Pérez-Cabo, ‘‘No bot expects the DeepCAPTCHA! Introducing immutable adversarial examples, with applications to CAPTCHA genera-tion,’’ IEEE Trans. Inf. Forensics Security, vol. 12, no. 11, pp. 2640–2653, 2017.

HYUN KWON received the B.S. degree in math-ematics from Korea Military Academy, South Korea, in 2010, and the M.S. degree from the School of Computing, Korea Advanced Institute of Science and Technology (KAIST), in 2015, where he is currently pursuing the Ph.D. degree with the School of Computing. His research interests include information security, computer security, and intrusion tolerant systems.

YONGCHUL KIM received the B.E. degree in electrical engineering from Korea Military Academy, South Korea, in 1998, the M.S. degree in electrical engineering from the University of Sur-rey, U.K., in 2001, and the Ph.D. degree in electri-cal and computer engineering with the Department of Electrical and Computer Engineering, North Carolina State University, in 2007. He has been a Professor with the Department of Electrical Engi-neering, Korea Military Academy. His research interests include WiMAX and wireless relay networks.

HYUNSOO YOON received the B.E. degree in electronics engineering from Seoul National Uni-versity, South Korea, in 1979, the M.S. degree in computer science from the Korea Advanced Institute of Science and Technology (KAIST), in 1981, and the Ph.D. degree in computer and infor-mation science from The Ohio State University, Columbus, OH, USA, in 1988. He has been a Fac-ulty Member of the School of Computing, KAIST, since 1989. His main research interests include wireless sensor networks, 4G networks, and network security.

DAESEON CHOI received the B.S. degree in computer science from Dongguk University, South Korea, in 1995, the M.S. degree in computer sci-ence from the Pohang Institute of Scisci-ence and Technology (POSTECH), South Korea, in 1997, and the Ph.D. degree in computer science from the Korea Advanced Institute of Science and Technol-ogy (KAIST), South Korea, in 2009. He is cur-rently a Professor with the Department of Medical Information, Kongju National University, South Korea. His research interests include information security and identity management.

수치

FIGURE 1. Examples of decision boundaries of model M for untargeted, targeted, and proposed adversarial examples.
Table 6 in the Appendix presents the MNIST model used to create the proposed architecture, which is a convolutional neural network architecture [29]
FIGURE 5. Average distortion of 900 random selective untargeted adversarial examples for each quantity of avoided classes with a 100% attack success rate.
TABLE 8. Model M architecture [2] for CIFAR-10: Convolutional layer is denoted by Con; ReLU is denoted by RL; Fully connected layer is denoted FC; Max pooling is denoted by MP.

참조

관련 문서

The index is calculated with the latest 5-year auction data of 400 selected Classic, Modern, and Contemporary Chinese painting artists from major auction houses..

The key issue is whether HTS can be defined as the 6th generation of violent extremism. That is, whether it will first safely settle as a locally embedded group

1 John Owen, Justification by Faith Alone, in The Works of John Owen, ed. John Bolt, trans. Scott Clark, "Do This and Live: Christ's Active Obedience as the

In a recent study([10]), Jung and Kim have studied the problem of scalar curvature functions on Lorentzian warped product manifolds and obtained partial

Usefulness of co-treatment with immunomodulators in patients with inflammatory bowel disease treated with scheduled infliximab maintenance therapy.. Oussalah A, Chevaux JB, Fay

Inclusion and Inclusiveness: Shared Vision of Youth for Local, National, and Global Village Inclusion at large stands for embracing populations with disabilities and

웹 표준을 지원하는 플랫폼에서 큰 수정없이 실행 가능함 패키징을 통해 다양한 기기를 위한 앱을 작성할 수 있음 네이티브 앱과

It is impossible to change the voltage across a capacitor by a finite amount in zero time, for this requires an infinite current through the capacitor.. (A capacitor resists