Defending Against Physically Realizable Attacks on Image Classification

127 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Tong Wu

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Tong Wu - Liang Tong - Yevgeniy Vorobeychik

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the problem of defending deep neural network approaches for image classification from physically realizable attacks. First, we demonstrate that the two most scalable and effective methods for learning robust models, adversarial training with PGD attacks and randomized smoothing, exhibit very limited effectiveness against three of the highest profile physical attacks. Next, we propose a new abstract adversarial model, rectangular occlusion attacks, in which an adversary places a small adversarially crafted rectangle in an image, and develop two approaches for efficiently computing the resulting adversarial examples. Finally, we demonstrate that adversarial training using our new attack yields image classification models that exhibit high robustness against the physically realizable attacks we study, offering the first effective generic defense against such attacks.

قيم البحث

125 - Jonathan Hayase , Weihao Kong , Raghav Somani 2021

Modern machine learning increasingly requires training on a large collection of data from multiple sources, not all of which can be trusted. A particularly concerning scenario is when a small fraction of poisoned data changes the behavior of the trai ned model when triggered by an attacker-specified watermark. Such a compromised model will be deployed unnoticed as the model is accurate otherwise. There have been promising attempts to use the intermediate representations of such a model to separate corrupted examples from clean ones. However, these defenses work only when a certain spectral signature of the poisoned examples is large enough for detection. There is a wide range of attacks that cannot be protected against by the existing defenses. We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data. This defense provides a clean model, completely removing the backdoor, even in regimes where previous methods have no hope of detecting the poisoned examples. Code and pre-trained models are available at https://github.com/SewoongLab/spectre-defense .

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Defending against adversarial attacks by randomized diversification

122 - Olga Taran , Shideh Rezaeifar , Taras Holotyak 2019

The vulnerability of machine learning systems to adversarial attacks questions their usage in many applications. In this paper, we propose a randomized diversification as a defense strategy. We introduce a multi-channel architecture in a gray-box sce nario, which assumes that the architecture of the classifier and the training data set are known to the attacker. The attacker does not only have access to a secret key and to the internal states of the system at the test time. The defender processes an input in multiple channels. Each channel introduces its own randomization in a special transform domain based on a secret key shared between the training and testing stages. Such a transform based randomization with a shared key preserves the gradients in key-defined sub-spaces for the defender but it prevents gradient back propagation and the creation of various bypass systems for the attacker. An additional benefit of multi-channel randomization is the aggregation that fuses soft-outputs from all channels, thus increasing the reliability of the final score. The sharing of a secret key creates an information advantage to the defender. Experimental evaluation demonstrates an increased robustness of the proposed method to a number of known state-of-the-art attacks.

التعلم الآلي التشفير والأمن التعلم الالي

Deep Bayesian Image Set Classification: A Defence Approach against Adversarial Attacks

113 - Nima Mirnateghi , Syed Afaq Ali Shah , Mohammed Bennamoun 2021

Deep learning has become an integral part of various computer vision systems in recent years due to its outstanding achievements for object recognition, facial recognition, and scene understanding. However, deep neural networks (DNNs) are susceptible to be fooled with nearly high confidence by an adversary. In practice, the vulnerability of deep learning systems against carefully perturbed images, known as adversarial examples, poses a dire security threat in the physical world applications. To address this phenomenon, we present, what to our knowledge, is the first ever image set based adversarial defence approach. Image set classification has shown an exceptional performance for object and face recognition, owing to its intrinsic property of handling appearance variability. We propose a robust deep Bayesian image set classification as a defence framework against a broad range of adversarial attacks. We extensively experiment the performance of the proposed technique with several voting strategies. We further analyse the effects of image size, perturbation magnitude, along with the ratio of perturbed images in each image set. We also evaluate our technique with the recent state-of-the-art defence methods, and single-shot recognition task. The empirical results demonstrate superior performance on CIFAR-10, MNIST, ETH-80, and Tiny ImageNet datasets.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Defending against Contagious Attacks on a Network with Resource Reallocation

116 - Rufan Bai , Haoxing Lin , Xinyu Yang 2020

In classic network security games, the defender distributes defending resources to the nodes of the network, and the attacker attacks a node, with the objective to maximize the damage caused. Existing models assume that the attack at node u causes da mage only at u. However, in many real-world security scenarios, the attack at a node u spreads to the neighbors of u and can cause damage at multiple nodes, e.g., for the outbreak of a virus. In this paper, we consider the network defending problem against contagious attacks. Existing works that study shared resources assume that the resource allocated to a node can be shared or duplicated between neighboring nodes. However, in real world, sharing resource naturally leads to a decrease in defending power of the source node, especially when defending against contagious attacks. To this end, we study the model in which resources allocated to a node can only be transferred to its neighboring nodes, which we refer to as a reallocation process. We show that this more general model is difficult in two aspects: (1) even for a fixed allocation of resources, we show that computing the optimal reallocation is NP-hard; (2) for the case when reallocation is not allowed, we show that computing the optimal allocation (against contagious attack) is also NP-hard. For positive results, we give a mixed integer linear program formulation for the problem and a bi-criteria approximation algorithm. Our experimental results demonstrate that the allocation and reallocation strategies our algorithm computes perform well in terms of minimizing the damage due to contagious attacks.

علوم الكمبيوتر ونظرية الألعاب

Defending against Backdoor Attacks in Natural Language Generation

435 - Chun Fan , Xiaoya Li , Yuxian Meng 2021

The frustratingly fragile nature of neural network models make current natural language generation (NLG) systems prone to backdoor attacks and generate malicious sequences that could be sexist or offensive. Unfortunately, little effort has been inves ted to how backdoor attacks can affect current NLG models and how to defend against these attacks. In this work, we investigate this problem on two important NLG tasks, machine translation and dialogue generation. By giving a formal definition for backdoor attack and defense, and developing corresponding benchmarks, we design methods to attack NLG models, which achieve high attack success to ask NLG models to generate malicious sequences. To defend against these attacks, we propose to detect the attack trigger by examining the effect of deleting or replacing certain words on the generation outputs, which we find successful for certain types of attacks. We will discuss the limitation of this work, and hope this work can raise the awareness of backdoor risks concealed in deep NLG systems. (Code and data are available at https://github.com/ShannonAI/backdoor_nlg.)

الحساب واللغة