Adversarial vulnerability for any classifier

Transcript

1 Adversarial vulnerability for any classifier Alhussein Fawzi Omar Fawzi Hamza Fawzi ∗ Department of Applied Mathematics DeepMind ENS de Lyon & Theoretical Physics [email protected] [email protected] University of Cambridge [email protected] Abstract Despite achieving impressive performance, state-of-the-art classifiers remain highly vulnerable to small, imperceptible, adversarial perturbations. This vulnerability has proven empirically to be very intricate to address. In this paper, we study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. We derive fundamental upper bounds on the robustness to perturbations of any classification function, and prove the existence of adversarial perturbations that transfer well across different classifiers with small risk. Our analysis of the robustness also provides insights onto key properties of generative models, such as their smoothness and dimensionality of latent space. We conclude with numerical experimental results showing that our bounds provide informative baselines to the maximal achievable robustness on several datasets. 1 Introduction Deep neural networks are powerful models that achieve state-of-the-art performance across several 1 , 2 ], speech [ domains, such as bioinformatics [ ], and computer vision [ 4 , 5 ]. Though deep networks 3 have exhibited very good performance in classification tasks, they have recently been shown to be unstable to adversarial perturbations of the data [ 6 , 7 ]. In fact, very small and often imperceptible perturbations of the data samples are sufficient to fool state-of-the-art classifiers and result in incorrect classification. This discovery of the surprising vulnerability of classifiers to perturbations has led to a , 8 9 , 10 , large body of work that attempts to design robust classifiers [ , 12 , 13 ]. However, advances 11 in designing robust classifiers have been accompanied with stronger perturbation schemes that defeat such defenses [14, 15, 16]. In this paper, we assume that the data distribution is defined by a smooth generative model (map- ping latent representations to images), and study theoretically the existence of small adversarial perturbations for arbitrary classifiers. We summarize our main contributions as follows: • We show fundamental upper bounds on the robustness of any classifier to perturbations, which provides a baseline to the maximal achievable robustness. When the latent space of the data distribution is high dimensional, our analysis shows that any classifier is vulnerable to very small perturbations. Our results further suggest the existence of a tight relation between robustness and linearity of the classifier in the latent space. • We prove the existence of adversarial perturbations that transfer across different classifiers. This provides theoretical justification to previous empirical findings that highlighted the existence of such transferable perturbations. ∗ Univ Lyon, ENS de Lyon, CNRS, UCBL, LIP, F-69342, Lyon Cedex 07, France 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.

2 • We quantify the difference between the robustness to adversarial examples in the data mani- and unconstrained fold adversarial examples, and show that the two notions of robustness with in-distribution robustness r can be precisely related: for any classifier f , there exists a ̃ that achieves unconstrained robustness 2 . This further provides support to the f classifier r/ empirical observations in [17, 18]. • We evaluate our bounds in several experimental setups (CIFAR-10 and SVHN), and show that they yield informative baselines to the maximal achievable robustness. Our robustness analysis provides in turn insights onto desirable properties of generative models capturing real-world distributions. In particular, the intriguing generality of our analysis implies that when the data distribution is modeled through a smooth and generative model with high-dimensional latent space, there exist small-norm perturbations of images that fool humans for any discriminative task defined on the data distribution. If, on the other hand, it is the case that the human visual system is inherently robust to small perturbations (e.g., in ` norm), then our analysis shows that p a distribution over natural images cannot be modeled by smooth and high-dimensional generative models. Going forward in modeling complex natural image distributions, our results hence suggest that low dimensional, non-smooth generative models are important constraints to capture the real- world distribution of images; not satisfying such constraints can lead to small adversarial perturbations for any classifier, including the human visual system. 2 Related work It was proven in [ , 20 ] that for certain families of classifiers, there exist adversarial perturbations 19 √ / ) d O , where d is the data dimension, provided the (1 that cause misclassification of magnitude robustness to random noise is fixed (which is typically the case if e.g., the data is normalized). In 19 ] for some simple addition, fundamental limits on the robustness of classifiers were derived in [ classification families. Other works have instead studied the existence of adversarial perturbations, 18 , under strong assumptions on the data distribution [ ]. In this work, motivated by the success 21 of generative models mapping latent representations with a normal prior, we instead study the existence of robust classifiers under this general data-generating procedure and derive bounds on the robustness that hold for any classification function. A large number of techniques have recently been proposed to improve the robustness of classifiers to perturbations, such as adversarial training 13 ], robust optimization [ , 10 ], regularization [ 11 ], distillation [ 12 [ 9 ], etc... 8 ], stochastic networks [ Unfortunately, such techniques have been shown to fail whenever a more complex attack strategy is used [ 14 , 15 ], or when it is evaluated on a more complex dataset. Other works have recently studied 22 , 23 , 24 procedures and algorithms to provably guarantee a certain level of robustness [ 25 , 26 ], , and have been applied to small datasets (e.g., MNIST). For large scale, high dimensional datasets, the problem of designing robust classifiers is entirely open. We finally note that adversarial examples for generative models have recently been considered in [ ]; our aim here is however different as our 27 goal is to bound the robustness of classifiers when data comes from a generative model. 3 Definitions and notations d m z ∈Z Let R g to the space of images X := R be a generative model that maps latent vectors , := with m denoting the number of pixels. To generate an image according to the distribution of natural images μ z ∼ ν according to the standard Gaussian distribution , we generate a random vector g = ,I ν ) , and we apply the map g ; the resulting image is then (0 ( z ) . This data-generating N d procedure is motivated by numerous previous works on generative models, whereby natural-looking 28 ], [ 29 ], [ 30 images are obtained by transforming normal vectors through a deep neural network [ ], m m 2 1 ]. Let f : R 31 → { 32 ,...,K } be a classifier mapping images in R ], [ to discrete labels [ { 1 ,...,K } . The discriminator f partitions X into K sets C each of which = { x ∈X : f ( x ) = i } i i is equal to corresponds to a different predicted label. The relative proportion of points in class − 1 − 1 ) = ν ( g C P ( C Z )) , the Gaussian measure of g ( C ( . in ) i i i 2 Instead of sampling from N (0 ,I , some generative models sample from the uniform distribution in ) in Z d d [ − 1 , 1] . The results of this paper can be easily extended to such generative procedures. 2

3 The goal of this paper is to study the robustness to additive perturbations under the assumption of f . We define two notions of robustness. These effectively g that the data is generated according to measure the minimum distance one has to travel in image space to change the classification decision. x , we define the in-distribution robustness g ( z ) • r = ( x ) In-distribution robustness: For in as follows: ( x ) = min , ) ‖ g ( z + r ) r x ‖ s.t. f ( g ( z + r )) 6 = f ( x − in ∈Z r + denotes an arbitrary norm on . Note that the perturbed image, g ‖·‖ z X r ) is where ( of g , and hence belongs to the support of the distribution μ . constrained to lie in the image Unlike the in-distribution setting, we measure here the ro- Unconstrained robustness: • bustness to arbitrary perturbations in the image space; that is, the perturbed image is not constrained anymore to belong to the data distribution μ . . ( x ) = min r ) ‖ r ‖ s.t. f ( x + r ) 6 = f ( x unc ∈X r This notion of robustness corresponds to the widely used definition of adversarial pertur- bations. It is easy to see that this robustness definition is smaller than the in-distribution r . ( x ) ≤ r robustness; i.e., ( x ) in unc modulus In this paper, we assume that the generative model is smooth, in the sense that it satisfies a of continuity property, defined as follows: 3 g admits a monotone invertible modulus of continuity ω ; i.e., Assumption 1. We assume that ′ ′ ′ ∈Z , ‖ g ( z ∀ − g ( z z,z ) ‖≤ ω ( ‖ z − z ) ‖ (1) ) . 2 Note that the above assumption is milder than assuming Lipschitz continuity. In fact, the Lipschitz ( property corresponds to choosing to be a linear function of t . In particular, the above assumption t ω ) (0) = 0 ω does not require that , which potentially allows us to model distributions with disconnected 4 support. It should be noted that generator smoothness is a desirable property of generative models. This property is often illustrated empirically by generating images along a straight path in the latent space ], and verifying that the images undergo gradual semantic changes between the two endpoints. In [ 30 fact, smooth transitions is often used as a qualitative evidence that the generator has learned relevant factors of variation. Fig. 1 summarizes the problem setting and notations. Assuming that the data is generated according g , we analyze in the remainder of the paper the robustness of arbitrary classifiers to perturbations. to 4 Analysis of the robustness to perturbations 4.1 Upper bounds on robustness We state a general bound on the robustness to perturbations and derive two special cases to make more explicit the dependence on the distribution and number of classes. m f : R →{ Theorem 1. 1 ,...,K } be an arbitrary classification function defined on the image Let η satisfies: space. Then, the fraction of datapoints having robustness less than K ∑ − 1 ) ≤ η ) ≥ )) P (2) ( (Φ( a r + ω ( x ( η )) − Φ( a , i = 6 6 = i in =1 i )) ( ( ⋃ − 1 , and a . C where = Φ Φ is the cdf of N P (0 , 1) = j 6 i j 6 = i 3 z (see C.2 in the supp. material). For ease of exposition This assumption can be extended to random however, we use here the deterministic assumption. 4 In this paper, we use the term smooth generative models to denote that the function ω ( δ ) takes small values for small δ . 3

4 Figure 1: Setting used in this paper. The data distribution is obtained by mapping N ,I (0 ) through d g and g ( z ) = (cos(2 πz ) , sin(2 πz )) in this example). The thick circle indicates the (we set = 1 d m f ( m = 2 here). The binary discriminative function R separates support of the data distribution in μ the data space into two classification regions (red and blue colors). While the in-distribution perturbed image is required to belong to the data support, this is not necessarily the case in the unconstrained , resulting in potentially arbitrary partitioning setting. In this paper, we do not put any assumption on f of the data space. While the existence of very small adversarial perturbations seems counter-intuitive ), we r r can be large for some choices of f and in this low-dimensional illustrative example (i.e., in unc show in the next sections that this is the case in high dimensions. 1 P ( C In particular, if for all ) ≤ i , (the classes are not too unbalanced), we have i 2 √ − 2 1 π − ω ( η ) / 2 x e ) (3) ≤ η r ( ) ≥ 1 P − ( . in 2 To see the dependence on the number of classes more explicitly, consider the setting where the classes 1 C ) = are equiprobable, i.e., P ( i , for all ≥ 5 , then K i K √ ) ( √ 2 K log − η 2 1 − π ) 4 π log( K − / 2 ω ( η ) (4) . ( P x ) ≤ η ) ≥ 1 − ( r e e in 2 This theorem is a consequence of the Gaussian isoperimetric inequality first proved in [ 33 ] and [ 34 ]. The proofs can be found in the supplementary material. Remark 1. Interpretation. For easiness of interpretation, we assume that the function is Lipschitz g 1 − where ) is replaced with η/L ( L is the Lipschitz constant. Then, ω η continuous, in which case ∝ L that can fool any classifier. This Eq. (3) shows the existence of perturbations of norm η ‖ g ( z ) ‖ . By normalizing the data, we E norm should be compared to the typical norm given by 5 ‖ g ( z can assume ‖ = E ‖ z ‖ has a normal distribution, we without loss of generality. E As z ) 2 √ √ − ‖ and thus the typical norm of an element in the data set satisfies ∈ [ E have ‖ 1 , z d ] d 2 √ ‖ g ( z ) ‖≥ . Now if we plug in d − 1 E η = 2 L , we obtain that the robustness is less than 2 L with √ probability exceeding 0.8. This should be compared to the typical norm which is at least − 1 . Our d √ d g is smooth (in the sense that L  d ), there exist result therefore shows that when is large and f . Fig. 2 provides an illustration of small adversarial perturbations that can fool arbitrary classifiers ω is the identity function. the upper bound, in the case where K Theorem 1 shows an increasing probability of misclassification with Remark 2. Dependence on . the number of classes . In other words, it is easier to find adversarial perturbations in the setting K 6 where the number of classes is large, than for a binary classification task. This dependence confirms empirical results whereby the robustness is observed to decrease with the number of classes. The dependence on K captured in our bounds is in contrast to previous bounds that showed decreasing probability of fooling the classifier, for larger number of classes [20]. Remark 3. Classification-agnostic bound. f , and Our bounds hold for any classification function are not specific to a family of classifiers. This is unlike the work of [ ] that establishes bounds on 19 the robustness for specific classes of functions (e.g., linear or quadratic classifiers). 5 Without this assumption, the following discussion applies if we replace the Lipschitz constant with the ‖ z ‖ E ′ 2 L L normalized Lipschitz constant = . ‖ g ( z ) ‖ E 6 We assume here equiprobable classes. 4

5 Remark 4. How tight is the upper bound on robustness in Theorem 1? Assuming that the f be such that ◦ g separates the smoothness assumption in Eq. 1 is an equality, let the classifier f − 1 − 1 = ( C latent space into ) = { z : z . Then, it ≥ 0 B and B } = g } g ( C 0 ) = { z : z < 1 2 2 1 1 1 follows that ( r )))) ( x )) ≤ η ) = P ( ∃ r : ‖ g ( z + r ) − g ( z ) ‖≤ η,f ( g ( z + r )) 6 = f P g ( z ( in − 1 sgn ‖ r ‖ ≤ ω P : ( η ) , r ( z ∃ + r ) sgn ( z ) < 0) = ( 1 1 1 2 − 1 − 1 − 1 ,z = < ω P ( ( η )) + P ( z ∈ B Φ(0)) ,z − ≥− ω z ∈ ( η )) = 2(Φ( ω B , ( η )) 1 1 1 2 which precisely corresponds to Eq. (2). In this case, the bound in Eq. (2) is therefore an equality. More generally, this bound is an equality if the classifier induces linearly separable regions in the 7 latent space. This suggests that classifiers are maximally robust when the induced classification boundaries in the latent space are linear. We stress on the fact that boundaries in the Z -space can be very different from the boundaries in the image space. In particular, as g is in general non-linear, f might be a highly non-linear function z 7→ ( f ◦ g )( z ) is a linear function in of the input space, while . We provide an explicit example in the supplementary material illustrating this remark. z 0.1 K = 2 K = 10 K = 100 0.08 K = 1000 0.06 0.04 0.02 Normalized robustness 0 0 1000 600 400 200 800 Dimension d √ d / Figure 2: Upper bound (Theorem 1) on the median of the normalized robustness r for different in values of the number of classes , in the setting where ω ( t ) = t . We assume that classes have equal K ). P C measure (i.e., ) = 1 /K ( i Remark 5. Adversarial perturbations in the latent space While the quantities introduced in Section 3 measure the robustness in the image space , an alternative is to measure the robustness in the latent space , defined as r . For natural images, latent = min )) ‖ r ‖ z s.t. f ( g ( z + r )) 6 = f ( g ( 2 r Z vectors provide a decomposition of images into meaningful factors of variation, such as features of objects in the image, illumination, etc... Hence, perturbations of vectors in the latent space measure the amount of change one needs to apply to such meaningful latent features to cause data misclassification. A bound on the magnitude of the minimal perturbation in the latent space (i.e., ) r Z to identity (i.e., ω can be directly obtained from Theorem 1 by setting t ) = t ). Importantly, note ω ( g that no assumptions on the smoothness of the generator are required for our bounds to hold when considering this notion of robustness. Relation between in-distribution robustness and unconstrained robustness. While the previous bound is specifically looking at the in-distribution robustness, in many cases, we unconstrained robustness; that is, the perturbed image is not constrained are interested in achieving g ). It is easy to see that any to belong to the data distribution (or equivalently to the range of r ( x ) also holds for the unconstrained robustness bound derived for the in-distribution robustness in r . One may wonder whether it is possible to get ( x ) since it clearly holds that r ) ( x ) ≤ r x ( in unc unc r ( a better upper bound on ) directly. We show here that this is not possible if we require our x unc bound to hold for any general classifier. Specifically, we construct a family of classifiers for which 1 ≥ ( x ) r , which we now present: r ) ( x unc in 2 7 In the case where Eq. (1) is an inequality, we will not exactly achieve the bound, but get closer to it when f ◦ g is linear. 5

6 ̃ For a given classifier f constructed in a nearest neighbour in the image space, define the classifier f strategy: ∗ ∗ ̃ f g ( z ( )) with z ) = = arg min x ‖ g ( z ) − x ‖ . (5) f ( z ̃ behaves exactly in the same way as f on the image of g Note that (in particular, it has the same risk f and in-distribution robustness). We show here that it has an unconstrained robustness that is at least half of the in-distribution robustness of f . 1 ̃ , we have r Theorem 2. ( x ) ≥ x f r ) ( For the classifier . in unc 2 r , then we can construct a classifier This result shows that if a classifier has in-distribution robustness 2 , through a simple modification of the original classifier f with unconstrained robustness r/ . Hence, classification-agnostic limits derived for both notions of robustness are essentially the same. It should further be noted that the procedure in Eq. (5) provides a constructive method to increase the robustness of any classifier to unconstrained perturbations. Such a nearest neighbour strategy is useful when the in-distribution robustness is much larger than the unconstrained robustness, and permits the latter to match the former. This approach has recently been found to be successful in increasing the 35 ]. Other techniques robustness of classifiers when accurate generative models can be learned in [ [17] build on this approach, and further use methods to increase the in-distribution robustness. 4.2 Transferability of perturbations , ] 6 36 One of the most intriguing properties about adversarial perturbations is their transferability [ across different models. Under our data model distribution, we study the existence of transferable adversarial perturbations, and show that two models with approximately zero risk will have shared adversarial perturbations. . Let f,h be two classifiers. Assume that Theorem 3 ( f ◦ g ( z ) 6 = (Transferability of perturbations) P ◦ g ( z )) ≤ δ (e.g., if f and h have a risk bounded by δ/ 2 for the data set generated by g ). In h 1 8 addition, assume that ( ( f )) + δ ≤ P Then, for all i . C i 2 { } f ( g ( z ) + )) ) 6 = f ( g ( z v ‖ ≤ η and ∃ v : P v ‖ 2 h z ) + v ) 6 = h ( g ( z )) ( g ( (6) √ − 1 2 π ω ) 2 ( η / − e − 2 δ. 1 ≥ − 2 Compared to Theorem 1 which bounds the robustness to adversarial perturbations, the extra price to transferable adversarial perturbations is the 2 δ term, which is small if the risk of pay here to find both classifiers is small. Hence, our bounds provide a theoretical explanation for the existence of transferable adversarial perturbations, which were previously shown to exist in [ 36 ]. The existence 6 , of transferable adversarial perturbations across several models with small risk has important security implications, as adversaries can, in principle, fool different classifiers with a single, classifier-agnostic, perturbation. The existence of such perturbations significantly reduces the difficulty of attacking (potentially black box) machine learning models. 4.3 Approximate generative model In the previous results, we have assumed that the data distribution is exactly described by the g (i.e., μ ). However, in many g generative model ( ν ) where g ( ν ) is the pushforward of ν via g = ∗ ∗ cases, such generative models only provide an approximation μ . In to the true data distribution g ) ( ν this section, we specifically assume that the generated distribution provides an approximation ∗ to the true underlying distribution in the 1-Wasserstein sense on the metric space X , ‖·‖ ) ; i.e., ( W ( g , and derive upper bounds on the robustness. This assumption is in line with ( ν ) ,μ ) ≤ δ ∗ recent advances in generative models, whereby the generator provides a good approximation (in the Wasserstein sense) to the true distribution, but does not exactly fit it [ 31 ]. We show here that similar upper bounds on the robustness (in expectation) hold, as long as g provides an accurate ( ν ) ∗ approximation of the true distribution μ . 8 This assumption is only to simplify the statement, a general statement can be easily derived in the same way. 6

7 Theorem 4. We use the same notations as in Theorem 1. Assume that the generator provides a δ g μ X , ‖·‖ ) ; in the 1-Wasserstein sense on the metric space approximation of the true distribution ( ( g g ( ν ) ,μ ) ≤ δ (where g that is, ( ν ) is the pushforward of ν via W ), the following inequality holds ∗ ∗ provided is concave ω ) ( 2 K − a 2 / ∑ i 6 = e √ ≤ ( x r E ω ) − a a δ, + ) + − Φ( unc i i = 6 = 6 μ ∼ x 2 π =1 i 5 ( ) is the unconstrained robustness in the image space. In particular, for K ≥ x equiprob- where r unc able classes, we have ( ) log(4 π log( K )) √ E δ. + ( x ) ≤ ω r unc μ ∼ x K ) 2 log( In words, when the data is defined according to a distribution which can be approximated by a smooth, high-dimensional generative model, our results show that arbitrary classifiers will have small adversarial examples in expectation. We also note that as K grows, this bound decreases and even ω is continuous at . Note however that the decrease is goes to zero under the sole condition that 0 slow as it is only logarithmic. 5 Experimental evaluation We now evaluate our bounds on the SVHN dataset [ ] which contains color images of house numbers, 37 and the task is to classify the digit at the center of the image. In all this section, computations of 9 ]. , The dataset contains 73 perturbations are done using the algorithm in [ 257 38 training images, and 26 032 test images (we do not use the images in the ’extra’ set). We train a DCGAN [ 30 ] generative , d model on this dataset, with a latent vector dimension , and further consider several neural = 100 10 networks architectures for classification. For each classifier, the empirical robustness is compared 11 to our upper bound. In addition to reporting the in-distribution and unconstrained robustness, we also report the robustness in the latent space: r . For this = min )) ‖ r ‖ z s.t. f ( g ( z + r )) 6 = f ( g ( 2 r Z ω set to the robustness setting, note that the upper bound exactly corresponds to Theorem 1 with identity map. Results are reported in Table 1. Observe first that the upper bound on the robustness in the latent space is of the same order of magnitude as the empirical robustness computed in the Z -space, for the different tested classifiers. This suggests that the isoperimetric inequality (which is the only source of inequality in our bound, when factoring out smoothness) provides a reasonable baseline that is on par with the robustness of best classifiers. In the image space, the theoretical prediction from our classifier-agnostic bounds is one order of magnitude larger than the empirical estimates. Note however that our bound is still 1 non-vacuous, as it predicts the norm of the required perturbation to be approximately 3 of the / norm of images (i.e., normalized robustness of . 36 ). This potentially leaves room for improving 0 the robustness in the image space. Moreover, we believe that the bound on the robustness in the image space is not tight (unlike the bound in the Z space) as the smoothness assumption on g can be conservative. Further comparisons of the figures between in-distribution and unconstrained robustness in the image space interestingly show that for the simple LeNet architecture, a large gap exists between these two quantities. However, by using more complex classifiers (ResNet-18 and ResNet-101), the gap between in-distribution and unconstrained robustness gets smaller. Recall that Theorem 2 says that any classifier can be modified in a way that the in-distribution robustness and unconstrained robustness 9 Note that in order to estimate robustness quantities (e.g., ), we do not need the ground truth label, as the r in definition only involves the change of the estimated label. Estimation of the robustness can therefore be readily done for automatically generated images. 10 For the SVHN and CIFAR-10 experiments, we show examples of generated images and perturbed images in the supplementary document (Section C.3). Moreover, we provide in C.1 details on the architectures of the used models. 11 To evaluate numerically the upper bound, we have used a probabilistic version of the modulus of continuity, ′ all z,z where the property is not required to be satisfied for , but rather with high probability, and accounted for the error probability in the bound. We refer to the supp. material for the detailed optimization used to estimate the smoothness parameters. 7

8 Upper bound ResNet-18 ResNet-101 2-Layer LeNet on robustness - Error rate 4.2 % 11% 4.8% − − 3 − 3 − 3 3 1 × 10 10 -space 6 . 1 × 10 × 6 6 . 6 × 10 Z . Robustness in the 16 − 2 − 2 − 2 − 2 In-distribution robustness 1 3 3 . 10 × 10 . 3 3 . 1 × 10 × 10 × 36 − 2 − − 2 2 − 2 Unconstrained robustness × . 0 1 . 1 × 10 10 × 1 . 4 × 10 36 10 39 25% normalized robustness Table 1: Experiments on SVHN dataset. We report the percentile of the at each cell, where probabilities are computed either theoretically (for the upper bound) or empirically. More precisely, we report the following quantities for the upper bound column: For the robustness Z space , we report t/ E ( ‖ z ‖ ≥ ) such that P (min ) ‖ r ‖ t s.t. in the ( g ( z + r )) 6 = f ( g ( z )) ≤ f 2 2 r . 25 0 ω taken as identity. For the robustness in image-space , we report , using Theorem 1 with t/ E ( ‖ g ( z ) ‖ estimated empirically ) such that P ( r ω ( x ) ≤ t ) ≥ 0 . 25 , using Theorem 1, with in 2 (Section C.2 in supp. material). Wide ResNet Upper bound + Adv. training Wide ResNet [41] VGG [40] on robustness [10, 15] 5 . 5% 3 Error rate 9% 16 . 0% - . 3 − 3 − − 3 . 5 × 10 3 Robustness in the 3 . 0 × 10 Z -space 0 . 6 × 10 . 016 2 3 − 3 − − 3 In-distribution robustness 10 0 10 5 . 9 × 10 4 . 8 . 3 × 10 . 8 × − 3 − 3 − 3 Unconstrained robustness × 10 20 0 0 . . × 10 10 0 2 . 0 × 10 . 23 Table 2: Experiments on CIFAR-10 (same setting as in Table 1). See supp. for details about models. only differ by a factor , while preserving the accuracy. But this modification may result in a more 2 complicated classifier compared to the original one; for example starting with a linear classifier, the modified classifier will in general not be linear. This interestingly matches with our numerical values for this experiment, as the multiplicative gap between in-distribution and unconstrained robustness approaches 2 as we make the classification function more complex (e.g., in-distribution robustness of − 2 − 2 1 3 and out-distribution . . 1 × 10 × 10 for ResNet-101). 4 We now consider the more complex CIFAR-10 dataset [ 39 ]. The CIFAR-10 dataset consists of 10 classes of 32 × 32 color natural images. Similarly to the previous experiment, we used a DCGAN generative model with d = 100 , and tested the robustness of state-of-the-art deep neural network classifiers. Quantitative results are reported in Table 2. Our bounds notably predict that any classifier defined on this task will have perturbations not exceeding / 10 of the norm of the image, for 25% of 1 10 ] (which the datapoints in the distribution. Note that using the PGD adversarial training strategy of [ constitutes one of the most robust models to date [ 15 ]), the robustness is significantly improved, despite still being ∼ 1 order of magnitude smaller than the baseline of 0 . 1 for the in-distribution robustness. The construction of more robust classifiers, alongside better empirical estimates of the quantities involved in the bound/improved bounds will hopefully lead to a convergence of these two quantities, hence guaranteeing optimality of the robustness of our classifiers. 6 Discussion We have shown the existence of a baseline robustness that no classifier can surpass, whenever the distribution is approximable by a generative model mapping latent representations to images. The bounds lead to informative numerical results: for example, on the CIFAR-10 task (with a DCGAN approximator), our upper bound shows that a significant portion of datapoints can be fooled with a perturbation of magnitude 10% that of an image. Existing classifiers however do not match the derived upper bound. Moving forward, we expect the design of more robust classifiers to get closer to this upper bound. The existence of a baseline robustness is fundamental in that context in order to measure the progress made and compare to the optimal robustness we can hope to achieve. 8

9 In addition to providing a baseline, this work has several practical implications on the robustness front. To construct classifiers with better robustness, our analysis suggests that these should have linear decision boundaries in the latent space; in particular, classifiers with multiple disconnected classification regions will be more prone to small perturbations. We further provided a constructive way to provably close the gap between unconstrained robustness and in-distribution robustness. Our analysis at the intersection of classifiers’ robustness and generative modeling has further led generative models , due to its intriguing generality. If we take as a premise that to insights onto human visual system classifiers require large-norm perturbations to be fooled (which is implicitly assumed in many works on adversarial robustness, though see [ 42 ]), our work shows that natural image distributions cannot be modeled as very high dimensional and smooth mappings. While = 100 ) do not lead to any contradiction with current dimensions used for the latent space (e.g., d this assumption (as upper bounds are sufficiently large), moving to higher dimensions for more complex datasets might lead to very small bounds. To model such datasets, the prior distribution, smoothness and dimension properties should therefore be carefully set to avoid contradictions with the premise. For example, conditional generative models can be seen as non-smooth generative models, as different generating functions are used for each class. We finally note that the derived norm of the perturbation, and not the human perceptibility, which is much harder results do bound the to quantify. We leave it as an open question to derive bounds on more perceptual metrics. Acknowledgments A.F. would like thank Seyed Moosavi, Wojtek Czarnecki, Neil Rabinowitz, Bernardino Romera- Paredes and the DeepMind team for useful feedbacks and discussions. References [1] M. Spencer, J. Eickholt, and J. Cheng, “A deep learning network approach to ab initio protein secondary structure prediction,” IEEE/ACM Trans. Comput. Biol. Bioinformatics , vol. 12, no. 1, pp. 103–112, 2015. [2] D. Chicco, P. Sadowski, and P. Baldi, “Deep autoencoder neural networks for gene ontology annotation ACM Conference on Bioinformatics, Computational Biology, and Health Informatics predictions,” in , pp. 533–540, 2014. G. E. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, [3] T. N. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” , vol. 29, no. 6, pp. 82–97, 2012. IEEE Signal Process. Mag. [4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv preprint arXiv:1512.03385 , 2015. [5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural Advances in neural information processing systems (NIPS) , pp. 1097–1105, 2012. networks,” in [6] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in International Conference on Learning Representations (ICLR) , 2014. ́ c, P. Laskov, G. Giacinto, and F. Roli, “Evasion B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndi [7] attacks against machine learning at test time,” in Joint European Conference on Machine Learning and , pp. 387–402, 2013. Knowledge Discovery in Databases [8] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Interna- tional Conference on Learning Representations (ICLR) , 2015. [9] U. Shaham, Y. Yamada, and S. Negahban, “Understanding adversarial training: Increasing local stability of neural nets through robust optimization,” arXiv preprint arXiv:1511.05432 , 2015. A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to [10] arXiv preprint arXiv:1706.06083 , 2017. adversarial attacks,” M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin, and N. Usunier, “Parseval networks: Improving robustness [11] International Conference on Machine Learning , pp. 854–863, 2017. to adversarial examples,” in [12] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturba- tions against deep neural networks,” arXiv preprint arXiv:1511.04508 , 2015. arXiv A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” [13] preprint arXiv:1612.00410 , 2016. [14] N. Carlini and D. Wagner, “Adversarial examples are not easily detected: Bypassing ten detection methods,” in Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security , pp. 3–14, ACM, 2017. 9

10 [15] J. Uesato, B. O’Donoghue, A. v. d. Oord, and P. Kohli, “Adversarial risk and the dangers of evaluating arXiv preprint arXiv:1802.05666 , 2018. against weak attacks,” , 2017. http://robust.vision [16] J. Rauber and W. Brendel, “The robust vision benchmark.” [17] A. Ilyas, A. Jalal, E. Asteri, C. Daskalakis, and A. G. Dimakis, “The robust manifold defense: Adversarial , 2017. arXiv preprint arXiv:1712.09196 training using generative models,” [18] J. Gilmer, L. Metz, F. Faghri, S. S. Schoenholz, M. Raghu, M. Wattenberg, and I. Goodfellow, “Adversarial spheres,” arXiv preprint arXiv:1801.02774 , 2018. [19] A. Fawzi, O. Fawzi, and P. Frossard, “Analysis of classifiers’ robustness to adversarial perturbations,” , vol. abs/1502.02590, 2015. CoRR [20] A. Fawzi, S. Moosavi-Dezfooli, and P. Frossard, “Robustness of classifiers: from adversarial to random , 2016. Neural Information Processing Systems (NIPS) noise,” in [21] T. Tanay and L. Griffin, “A boundary tilting persepective on the phenomenon of adversarial examples,” arXiv preprint arXiv:1608.07690 , 2016. [22] M. Hein and M. Andriushchenko, “Formal guarantees on the robustness of a classifier against adversarial manipulation,” in , pp. 2263–2273, 2017. Advances in Neural Information Processing Systems [23] J. Peck, J. Roels, B. Goossens, and Y. Saeys, “Lower bounds on the robustness to adversarial perturbations,” Advances in Neural Information Processing Systems , pp. 804–813, 2017. in [24] A. Sinha, H. Namkoong, and J. Duchi, “Certifiable distributional robustness with principled adversarial training,” arXiv preprint arXiv:1710.10571 , 2017. A. Raghunathan, J. Steinhardt, and P. Liang, “Certified defenses against adversarial examples,” arXiv [25] preprint arXiv:1801.09344 , 2018. K. Dvijotham, R. Stanforth, S. Gowal, T. Mann, and P. Kohli, “A dual approach to scalable verification of [26] deep networks,” , 2018. arXiv preprint arXiv:1803.06567 J. Kos, I. Fischer, and D. Song, “Adversarial examples for generative models,” arXiv preprint [27] , 2017. arXiv:1702.06832 [28] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114 , 2013. [29] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems , pp. 2672–2680, 2014. A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional [30] , 2015. generative adversarial networks,” arXiv preprint arXiv:1511.06434 International M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in [31] , pp. 214–223, 2017. Conference on Machine Learning I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein [32] Advances in Neural Information Processing Systems , pp. 5769–5779, 2017. gans,” in C. Borell, “The Brunn-Minkowski inequality in Gauss space,” [33] , vol. 30, no. 2, Inventiones mathematicae pp. 207–216, 1975. [34] V. N. Sudakov and B. S. Tsirel’son, “Extremal properties of half-spaces for spherically invariant measures,” Journal of Soviet Mathematics , vol. 9, no. 1, pp. 9–18, 1978. [35] P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-gan: Protecting classifiers against adversarial attacks using generative models,” in International Conference on Learning Representations , 2018. [36] Y. Liu, X. Chen, C. Liu, and D. Song, “Delving into transferable adversarial examples and black-box attacks,” arXiv preprint arXiv:1611.02770 , 2016. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with [37] NIPS workshop on deep learning and unsupervised feature learning , unsupervised feature learning,” in 2011. [38] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple and accurate method to fool deep neural networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2016. [39] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Master’s thesis, Department of Computer Science, University of Toronto , 2009. [40] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR) , 2014. [41] S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146 , 2016. G. F. Elsayed, S. Shankar, B. Cheung, N. Papernot, A. Kurakin, I. Goodfellow, and J. Sohl-Dickstein, [42] “Adversarial examples that fool both human and computer vision,” arXiv preprint arXiv:1802.08195 , 2018. 10

Related documents

untitled

untitled

GEARING E-GOVERNMENT TO SUPPORT TRANSFORMATION TOWARDS SUSTAINABLE AND RESILIENT SOCIETIES UNITED NATIONS E-GOVERNMENT SURVEY 2018 GEARING E-GOVERNMENT TO SUPPORT TRANSFORMATION TOWARDS SUSTAINABLE AN...

More info »
doj final opinion

doj final opinion

UNITED STAT ES DIS TRICT COURT IC F OR THE D ISTR T OF CO LU M BIA UNITED STAT F AMERICA, : ES O : : la in t if f, P 99 No. on cti l A vi Ci : 96 (GK) -24 : and : TOBACCO-F UND, : REE KIDS ACTION F : ...

More info »
CoverSheet

CoverSheet

LA-UR-13-21822 Approved for public release; distribution is unlimited. Title: Listing of Available ACE Data Tables Author(s): Conlin, Jeremy Lloyd Parsons, Donald K. Gardiner, Steven J. Gray, Mark G. ...

More info »
Nastran Dmap Error Message List

Nastran Dmap Error Message List

Overview of Error Messages NX Nastran displays User Information, Warning, and Error messages in the printed output. The amount of information reported in a message is controlled by system cell 319. Wh...

More info »
House Calendar 05/06/2019

House Calendar 05/06/2019

State of Connecticut House of Representatives C A L E N D A R Monday, May 6, 2019 (All matters marked X have been in the files one day.) (All matters marked XX are ready for the action of the House.) ...

More info »
House Calendar 05/02/2019

House Calendar 05/02/2019

State of Connecticut House of Representatives C A L E N D A R Thursday, May 2, 2019 (All matters marked X have been in the files one day.) (All matters marked XX are ready for the action of the House....

More info »
Microsoft Word   PT Pre activity Packet

Microsoft Word PT Pre activity Packet

Personal Trainer: Client name: W# Returning Client: YES or NO Personal Training Pre-Participation Packet Dear Client, Welcome to the Personal Training Program. We are e xcited that you have chosen to ...

More info »
ACC Staff Directory

ACC Staff Directory

Alpena Community College Staff/Faculty Directory First Name Phone Email Office Position/Dept. Last Name Director of Volunteer Center [email protected] Abraham -7335 CTR 108 Catherine 989 -358 Coor...

More info »
ACC Staff Directory

ACC Staff Directory

Alpena Community College Staff/Faculty Directory Phone Last Name Email First Name Office Position/Dept. Director of Volunteer Center [email protected] CTR 108 -7335 -358 Catherine 989 Abraham Coor...

More info »
Microsoft Word   Teacher Shortage Areas Report 2017 18.docx

Microsoft Word Teacher Shortage Areas Report 2017 18.docx

Teacher Shortage Areas Nationwide Listing 1990–1991 through 2017–2018 June 2017 U.S. Department of Education Office of Postsecondary Education Freddie Cross Senior Statistician U.S. Dept. of Education...

More info »
Microsoft Word   Steps in applying Probability Proportional to Size ABierrenbach.doc

Microsoft Word Steps in applying Probability Proportional to Size ABierrenbach.doc

Steps in applying Probability Proportional to Size (PPS) and calculating Basic Probability Weights PPS sampling → larger clusters have bigger probability of being s First stage: ampled Second stage: S...

More info »
ECSO State of the Art Syllabus v2

ECSO State of the Art Syllabus v2

STATE OF THE ART SYLLABUS Overview of existing Cybersecurity standards and certification schemes v 2 WG1 – S tandardisation, certification, labelling and supply chain management DECEMBER 2017

More info »
sr663

sr663

Federal Reserve Bank of New York Staff Reports Assessing Financial Stability: The Capital and Loss Assessment under Stress Scenarios (CLASS) Model Beverly Hirtle Anna Kovner James Vickery Meru Bhanot ...

More info »
FD GeneralGuidelines BestPractices HandlingRetrievals Chargebacks

FD GeneralGuidelines BestPractices HandlingRetrievals Chargebacks

Retrieval & Chargeback Best Practices A Merchant User Guide to Help Manage Disputes Visa MasterCard Discover American Express April 2018 www.First D ata.com

More info »
7660

7660

DIRECTORY of MISSISSIPPI HEALTH FACILITIES Mississippi State Department of Health Main Office: 570 East Woodrow Wilson Jackson, Mississippi 39216 Health Facilities Licensure and Certification Physical...

More info »
Microsoft Word   Kam, Morin, Meyer, & Topolnytsky Main.docx

Microsoft Word Kam, Morin, Meyer, & Topolnytsky Main.docx

COMMITMENT PROFILES AND LATENT TRANSITION ANALYSIS 1 ARE COMMITMENT PROFILES STABLE AND PREDICTABLE? A LATENT TRANSITION ANALYSIS Chester Kam The University of Macau Alexandre J.S. Morin The Universit...

More info »
FINAL2011FHPhList

FINAL2011FHPhList

H elping Hawai`i Live Life Well F I N D I N G H E L P P H O N E L I S T J u l y 2 0 1 1 Mental Health America of Hawai`i 1124 Fort Street Mall, Suite #205 • Honolulu, Hawai`i 96813 (Wheelchair Accessi...

More info »
u11

u11

11 Language syntax Contents 11.1 Overview varlist 11.1.1 by varlist: 11.1.2 11.1.3 if exp in range 11.1.4 11.1.5 =exp 11.1.6 weight 11.1.7 options numlist 11.1.8 11.1.9 datelist 11.1.10 Prefix command...

More info »
CDIR 2018 07 27

CDIR 2018 07 27

S. Pub. 115-7 2017-2018 Official Congressional Directory 115th Congress Convened January 3, 2017 JOINT COMMITTEE ON PRINTING UNITED STATES CONGRESS UNITED STATES GOVERNMENT PUBLISHING OFFICE WASHINGTO...

More info »