Generating Natural Adversarial Hyperspectral examples with a modified Wasserstein GAN

72 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jean-Christophe Burnel

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jean-Christophe Burnel

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Adversarial examples are a hot topic due to their abilities to fool a classifiers prediction. There are two strategies to create such examples, one uses the attacked classifiers gradients, while the other only requires access to the clas-sifiers prediction. This is particularly appealing when the classifier is not full known (black box model). In this paper, we present a new method which is able to generate natural adversarial examples from the true data following the second paradigm. Based on Generative Adversarial Networks (GANs) [5], it reweights the true data empirical distribution to encourage the classifier to generate ad-versarial examples. We provide a proof of concept of our method by generating adversarial hyperspectral signatures on a remote sensing dataset.

قيم البحث

اقرأ أيضاً

Natural Adversarial Examples

230 - Dan Hendrycks , Kevin Zhao , Steven Basart 2019

We introduce two challenging datasets that reliably cause machine learning model performance to substantially degrade. The datasets are collected with a simple adversarial filtration technique to create datasets with limited spurious cues. Our datase ts real-world, unmodified examples transfer to various unseen models reliably, demonstrating that computer vision models have shared weaknesses. The first dataset is called ImageNet-A and is like the ImageNet test set, but it is far more challenging for existing models. We also curate an adversarial out-of-distribution detection dataset called ImageNet-O, which is the first out-of-distribution detection dataset created for ImageNet models. On ImageNet-A a DenseNet-121 obtains around 2% accuracy, an accuracy drop of approximately 90%, and its out-of-distribution detection performance on ImageNet-O is near random chance levels. We find that existing data augmentation techniques hardly boost performance, and using other public training datasets provides improvements that are limited. However, we find that improvements to computer vision architectures provide a promising path towards robust models.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

AI-GAN: Attack-Inspired Generation of Adversarial Examples

159 - Tao Bai , Jun Zhao , Jinlin Zhu 2020

Deep neural networks (DNNs) are vulnerable to adversarial examples, which are crafted by adding imperceptible perturbations to inputs. Recently different attacks and strategies have been proposed, but how to generate adversarial examples perceptually realistic and more efficiently remains unsolved. This paper proposes a novel framework called Attack-Inspired GAN (AI-GAN), where a generator, a discriminator, and an attacker are trained jointly. Once trained, it can generate adversarial perturbations efficiently given input images and target classes. Through extensive experiments on several popular datasets eg MNIST and CIFAR-10, AI-GAN achieves high attack success rates and reduces generation time significantly in various settings. Moreover, for the first time, AI-GAN successfully scales to complicated datasets eg CIFAR-100 with around $90%$ success rates among all classes.

التعلم الآلي التشفير والأمن التعلم الالي

Wasserstein Adversarial Examples via Projected Sinkhorn Iterations

72 - Eric Wong , Frank R. Schmidt , J. Zico Kolter 2019

A rapidly growing area of work has studied the existence of adversarial examples, datapoints which have been perturbed to fool a classifier, but the vast majority of these works have focused primarily on threat models defined by $ell_p$ norm-bounded perturbations. In this paper, we propose a new threat model for adversarial attacks based on the Wasserstein distance. In the image classification setting, such distances measure the cost of moving pixel mass, which naturally cover standard image manipulations such as scaling, rotation, translation, and distortion (and can potentially be applied to other settings as well). To generate Wasserstein adversarial examples, we develop a procedure for projecting onto the Wasserstein ball, based upon a modified version of the Sinkhorn iteration. The resulting algorithm can successfully attack image classification models, bringing traditional CIFAR10 models down to 3% accuracy within a Wasserstein ball with radius 0.1 (i.e., moving 10% of the image mass 1 pixel), and we demonstrate that PGD-based adversarial training can improve this adversarial accuracy to 76%. In total, this work opens up a new direction of study in adversarial robustness, more formally considering convex metrics that accurately capture the invariances that we typically believe should exist in classifiers. Code for all experiments in the paper is available at https://github.com/locuslab/projected_sinkhorn.

التعلم الآلي التعلم الالي

Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

64 - Puyudi Yang , Jianbo Chen , Cho-Jui Hsieh 2018

We present a probabilistic framework for studying adversarial attacks on discrete data. Based on this framework, we derive a perturbation-based method, Greedy Attack, and a scalable learning-based method, Gumbel Attack, that illustrate various tradeo ffs in the design of attacks. We demonstrate the effectiveness of these methods using both quantitative metrics and human evaluation on various state-of-the-art models for text classification, including a word-based CNN, a character-based CNN and an LSTM. As as example of our results, we show that the accuracy of character-based convolutional networks drops to the level of random selection by modifying only five characters through Greedy Attack.

التعلم الآلي الذكاء الاصطناعي الحساب واللغة

Reevaluating Adversarial Examples in Natural Language

151 - John X. Morris , Eli Lifland , Jack Lanchantin 2020

State-of-the-art attacks on NLP models lack a shared definition of a what constitutes a successful attack. We distill ideas from past work into a unified framework: a successful natural language adversarial example is a perturbation that fools the mo del and follows some linguistic constraints. We then analyze the outputs of two state-of-the-art synonym substitution attacks. We find that their perturbations often do not preserve semantics, and 38% introduce grammatical errors. Human surveys reveal that to successfully preserve semantics, we need to significantly increase the minimum cosine similarities between the embeddings of swapped words and between the sentence encodings of original and perturbed sentences.With constraints adjusted to better preserve semantics and grammaticality, the attack success rate drops by over 70 percentage points.

الحساب واللغة الذكاء الاصطناعي التشفير والأمن