Adversarial Robustness Against the Union of Multiple Perturbation Models

123 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Pratyush Maini

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Pratyush Maini - Eric Wong - J. Zico Kolter

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Owing to the susceptibility of deep learning systems to adversarial attacks, there has been a great deal of work in developing (both empirically and certifiably) robust classifiers. While most work has defended against a single type of attack, recent work has looked at defending against multiple perturbation models using simple aggregations of multiple attacks. However, these methods can be difficult to tune, and can easily result in imbalanced degrees of robustness to individual perturbation models, resulting in a sub-optimal worst-case loss over the union. In this work, we develop a natural generalization of the standard PGD-based procedure to incorporate multiple perturbation models into a single attack, by taking the worst-case over all steepest descent directions. This approach has the advantage of directly converging upon a trade-off between different perturbation models which minimizes the worst-case performance over the union. With this approach, we are able to train standard architectures which are simultaneously robust against $ell_infty$, $ell_2$, and $ell_1$ attacks, outperforming past approaches on the MNIST and CIFAR10 datasets and achieving adversarial accuracy of 47.0% against the union of ($ell_infty$, $ell_2$, $ell_1$) perturbations with radius = (0.03, 0.5, 12) on the latter, improving upon previous approaches which achieve 40.6% accuracy.

قيم البحث

342 - Ji Gao , Beilun Wang , Zeming Lin 2017

Recent studies have shown that deep neural networks (DNN) are vulnerable to adversarial samples: maliciously-perturbed samples crafted to yield incorrect model outputs. Such attacks can severely undermine DNN systems, particularly in security-sensiti ve settings. It was observed that an adversary could easily generate adversarial samples by making a small perturbation on irrelevant feature dimensions that are unnecessary for the current classification task. To overcome this problem, we introduce a defensive mechanism called DeepCloak. By identifying and removing unnecessary features in a DNN model, DeepCloak limits the capacity an attacker can use generating adversarial samples and therefore increase the robustness against such inputs. Comparing with other defensive approaches, DeepCloak is easy to implement and computationally efficient. Experimental results show that DeepCloak can increase the performance of state-of-the-art DNN models against adversarial samples.

التعلم الآلي الذكاء الاصطناعي التشفير والأمن

Towards Robustness against Unsuspicious Adversarial Examples

93 - Liang Tong , Minzhe Guo , Atul Prakash 2020

Despite the remarkable success of deep neural networks, significant concerns have emerged about their robustness to adversarial perturbations to inputs. While most attacks aim to ensure that these are imperceptible, physical perturbation attacks typi cally aim for being unsuspicious, even if perceptible. However, there is no universal notion of what it means for adversarial examples to be unsuspicious. We propose an approach for modeling suspiciousness by leveraging cognitive salience. Specifically, we split an image into foreground (salient region) and background (the rest), and allow significantly larger adversarial perturbations in the background, while ensuring that cognitive salience of background remains low. We describe how to compute the resulting non-salience-preserving dual-perturbation attacks on classifiers. We then experimentally demonstrate that our attacks indeed do not significantly change perceptual salience of the background, but are highly effective against classifiers robust to conventional attacks. Furthermore, we show that adversarial training with dual-perturbation attacks yields classifiers that are more robust to these than state-of-the-art robust learning approaches, and comparable in terms of robustness to conventional attacks.

التعلم الآلي التشفير والأمن التعلم الالي

A PAC-Bayes Analysis of Adversarial Robustness

102 - Guillaume Vidot 2021

We propose the first general PAC-Bayesian generalization bounds for adversarial robustness, that estimate, at test time, how much a model will be invariant to imperceptible perturbations in the input. Instead of deriving a worst-case analysis of the risk of a hypothesis over all the possible perturbations, we leverage the PAC-Bayesian framework to bound the averaged risk on the perturbations for majority votes (over the whole class of hypotheses). Our theoretically founded analysis has the advantage to provide general bounds (i) independent from the type of perturbations (i.e., the adversarial attacks), (ii) that are tight thanks to the PAC-Bayesian framework, (iii) that can be directly minimized during the learning phase to obtain a robust model on different attacks at test time.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Regularizing Neural Networks via Adversarial Model Perturbation

258 - Yaowei Zheng , Richong Zhang , Yongyi Mao 2020

Effective regularization techniques are highly desired in deep learning for alleviating overfitting and improving generalization. This work proposes a new regularization scheme, based on the understanding that the flat local minima of the empirical r isk cause the model to generalize better. This scheme is referred to as adversarial model perturbation (AMP), where instead of directly minimizing the empirical risk, an alternative AMP loss is minimized via SGD. Specifically, the AMP loss is obtained from the empirical risk by applying the worst norm-bounded perturbation on each point in the parameter space. Comparing with most existing regularization schemes, AMP has strong theoretical justifications, in that minimizing the AMP loss can be shown theoretically to favour flat local minima of the empirical risk. Extensive experiments on various modern deep architectures establish AMP as a new state of the art among regularization schemes. Our code is available at https://github.com/hiyouga/AMP-Regularizer.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Householder Activations for Provable Robustness against Adversarial Attacks

73 - Sahil Singla , Surbhi Singla , Soheil Feizi 2021

Training convolutional neural networks (CNNs) with a strict Lipschitz constraint under the l_{2} norm is useful for provable adversarial robustness, interpretable gradients and stable training. While 1-Lipschitz CNNs can be designed by enforcing a 1- Lipschitz constraint on each layer, training such networks requires each layer to have an orthogonal Jacobian matrix (for all inputs) to prevent gradients from vanishing during backpropagation. A layer with this property is said to be Gradient Norm Preserving (GNP). To construct expressive GNP activation functions, we first prove that the Jacobian of any GNP piecewise linear function is only allowed to change via Householder transformations for the function to be continuous. Building on this result, we introduce a class of nonlinear GNP activations with learnable Householder transformations called Householder activations. A householder activation parameterized by the vector $mathbf{v}$ outputs $(mathbf{I} - 2mathbf{v}mathbf{v}^{T})mathbf{z}$ for its input $mathbf{z}$ if $mathbf{v}^{T}mathbf{z} leq 0$; otherwise it outputs $mathbf{z}$. Existing GNP activations such as $mathrm{MaxMin}$ can be viewed as special cases of $mathrm{HH}$ activations for certain settings of these transformations. Thus, networks with $mathrm{HH}$ activations have higher expressive power than those with $mathrm{MaxMin}$ activations. Although networks with $mathrm{HH}$ activations have nontrivial provable robustness against adversarial attacks, we further boost their robustness by (i) introducing a certificate regularization and (ii) relaxing orthogonalization of the last layer of the network. Our experiments on CIFAR-10 and CIFAR-100 show that our regularized networks with $mathrm{HH}$ activations lead to significant improvements in both the standard and provable robust accuracy over the prior works (gain of 3.65% and 4.46% on CIFAR-100 respectively).

التعلم الآلي