Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

65 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Puyudi Yang

تاريخ النشر 2018

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Puyudi Yang - Jianbo Chen - Cho-Jui Hsieh

التعلم الآلي الذكاء الاصطناعي الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a probabilistic framework for studying adversarial attacks on discrete data. Based on this framework, we derive a perturbation-based method, Greedy Attack, and a scalable learning-based method, Gumbel Attack, that illustrate various tradeoffs in the design of attacks. We demonstrate the effectiveness of these methods using both quantitative metrics and human evaluation on various state-of-the-art models for text classification, including a word-based CNN, a character-based CNN and an LSTM. As as example of our results, we show that the accuracy of character-based convolutional networks drops to the level of random selection by modifying only five characters through Greedy Attack.

قيم البحث

80 - Tianjin Huang , Vlado Menkovski , Yulong Pei 2021

Deep neural networks are vulnerable to adversarial examples that are crafted by imposing imperceptible changes to the inputs. However, these adversarial examples are most successful in white-box settings where the model and its parameters are availab le. Finding adversarial examples that are transferable to other models or developed in a black-box setting is significantly more difficult. In this paper, we propose the Direction-Aggregated adversarial attacks that deliver transferable adversarial examples. Our method utilizes aggregated direction during the attack process for avoiding the generated adversarial examples overfitting to the white-box model. Extensive experiments on ImageNet show that our proposed method improves the transferability of adversarial examples significantly and outperforms state-of-the-art attacks, especially against adversarial robust models. The best averaged attack success rates of our proposed method reaches 94.6% against three adversarial trained models and 94.8% against five defense methods. It also reveals that current defense approaches do not prevent transferable adversarial attacks.

التعلم الآلي التشفير والأمن

Energy Attack: On Transferring Adversarial Examples

112 - Ruoxi Shi , Borui Yang , Yangzhou Jiang 2021

In this work we propose Energy Attack, a transfer-based black-box $L_infty$-adversarial attack. The attack is parameter-free and does not require gradient approximation. In particular, we first obtain white-box adversarial perturbations of a surrogat e model and divide these perturbations into small patches. Then we extract the unit component vectors and eigenvalues of these patches with principal component analysis (PCA). Base on the eigenvalues, we can model the energy distribution of adversarial perturbations. We then perform black-box attacks by sampling from the perturbation patches according to their energy distribution, and tiling the sampled patches to form a full-size adversarial perturbation. This can be done without the available access to victim models. Extensive experiments well demonstrate that the proposed Energy Attack achieves state-of-the-art performance in black-box attacks on various models and several datasets. Moreover, the extracted distribution is able to transfer among different model architectures and different datasets, and is therefore intrinsic to vision architectures.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

Fast Local Attack: Generating Local Adversarial Examples for Object Detectors

117 - Quanyu Liao , Xin Wang , Bin Kong 2020

The deep neural network is vulnerable to adversarial examples. Adding imperceptible adversarial perturbations to images is enough to make them fail. Most existing research focuses on attacking image classifiers or anchor-based object detectors, but t hey generate globally perturbation on the whole image, which is unnecessary. In our work, we leverage higher-level semantic information to generate high aggressive local perturbations for anchor-free object detectors. As a result, it is less computationally intensive and achieves a higher black-box attack as well as transferring attack performance. The adversarial examples generated by our method are not only capable of attacking anchor-free object detectors, but also able to be transferred to attack anchor-based object detector.

الرؤية الحاسوبية وتمييز الأنماط

Adversarial Examples on Graph Data: Deep Insights into Attack and Defense

258 - Huijun Wu , Chen Wang , Yuriy Tyshetskiy 2019

Graph deep learning models, such as graph convolutional networks (GCN) achieve remarkable performance for tasks on graph data. Similar to other types of deep models, graph deep learning models often suffer from adversarial attacks. However, compared with non-graph data, the discrete features, graph connections and different definitions of imperceptible perturbations bring unique challenges and opportunities for the adversarial attacks and defenses for graph data. In this paper, we propose both attack and defense techniques. For attack, we show that the discreteness problem could easily be resolved by introducing integrated gradients which could accurately reflect the effect of perturbing certain features or edges while still benefiting from the parallel computations. For defense, we observe that the adversarially manipulated graph for the targeted attack differs from normal graphs statistically. Based on this observation, we propose a defense approach which inspects the graph and recovers the potential adversarial perturbations. Our experiments on a number of datasets show the effectiveness of the proposed methods.

التعلم الآلي التشفير والأمن التعلم الالي

AI-GAN: Attack-Inspired Generation of Adversarial Examples

159 - Tao Bai , Jun Zhao , Jinlin Zhu 2020

Deep neural networks (DNNs) are vulnerable to adversarial examples, which are crafted by adding imperceptible perturbations to inputs. Recently different attacks and strategies have been proposed, but how to generate adversarial examples perceptually realistic and more efficiently remains unsolved. This paper proposes a novel framework called Attack-Inspired GAN (AI-GAN), where a generator, a discriminator, and an attacker are trained jointly. Once trained, it can generate adversarial perturbations efficiently given input images and target classes. Through extensive experiments on several popular datasets eg MNIST and CIFAR-10, AI-GAN achieves high attack success rates and reduces generation time significantly in various settings. Moreover, for the first time, AI-GAN successfully scales to complicated datasets eg CIFAR-100 with around $90%$ success rates among all classes.

التعلم الآلي التشفير والأمن التعلم الالي