No Arabic abstract
Todays state-of-the-art image classifiers fail to correctly classify carefully manipulated adversarial images. In this work, we develop a new, localized adversarial attack that generates adversarial examples by imperceptibly altering the backgrounds of normal images. We first use this attack to highlight the unnecessary sensitivity of neural networks to changes in the background of an image, then use it as part of a new training technique: localized adversarial training. By including locally adversarial images in the training set, we are able to create a classifier that suffers less loss than a non-adversarially trained counterpart model on both natural and adversarial inputs. The evaluation of our localized adversarial training algorithm on MNIST and CIFAR-10 datasets shows decreased accuracy loss on natural images, and increased robustness against adversarial inputs.
Deep networks are well-known to be fragile to adversarial attacks. We conduct an empirical analysis of deep representations under the state-of-the-art attack method called PGD, and find that the attack causes the internal representation to shift closer to the false class. Motivated by this observation, we propose to regularize the representation space under attack with metric learning to produce more robust classifiers. By carefully sampling examples for metric learning, our learned representation not only increases robustness, but also detects previously unseen adversarial samples. Quantitative experiments show improvement of robustness accuracy by up to 4% and detection efficiency by up to 6% according to Area Under Curve score over prior work. The code of our work is available at https://github.com/columbia/Metric_Learning_Adversarial_Robustness.
Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters. Our free adversarial training algorithm achieves comparable robustness to PGD adversarial training on the CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40% accuracy against PGD attacks. The code is available at https://github.com/ashafahi/free_adv_train.
Meta-learning model can quickly adapt to new tasks using few-shot labeled data. However, despite achieving good generalization on few-shot classification tasks, it is still challenging to improve the adversarial robustness of the meta-learning model in few-shot learning. Although adversarial training (AT) methods such as Adversarial Query (AQ) can improve the adversarially robust performance of meta-learning models, AT is still computationally expensive training. On the other hand, meta-learning models trained with AT will drop significant accuracy on the original clean images. This paper proposed a meta-learning method on the adversarially robust neural network called Long-term Cross Adversarial Training (LCAT). LCAT will update meta-learning model parameters cross along the natural and adversarial sample distribution direction with long-term to improve both adversarial and clean few-shot classification accuracy. Due to cross-adversarial training, LCAT only needs half of the adversarial training epoch than AQ, resulting in a low adversarial training computation. Experiment results show that LCAT achieves superior performance both on the clean and adversarial few-shot classification accuracy than SOTA adversarial training methods for meta-learning models.
The vulnerabilities of deep neural networks against adversarial examples have become a significant concern for deploying these models in sensitive domains. Devising a definitive defense against such attacks is proven to be challenging, and the methods relying on detecting adversarial samples are only valid when the attacker is oblivious to the detection mechanism. In this paper we first present an adversarial example detection method that provides performance guarantee to norm constrained adversaries. The method is based on the idea of training adversarial robust subspace detectors using asymmetrical adversarial training (AAT). The novel AAT objective presents a minimax problem similar to that of GANs; it has the same convergence property, and consequently supports the learning of class conditional distributions. We first demonstrate that the minimax problem could be reasonably solved by PGD attack, and then use the learned class conditional generative models to define generative detection/classification models that are both robust and more interpretable. We provide comprehensive evaluations of the above methods, and demonstrate their competitive performances and compelling properties on adversarial detection and robust classification problems.
Despite their unmatched performance, deep neural networks remain susceptible to targeted attacks by nearly imperceptible levels of adversarial noise. While the underlying cause of this sensitivity is not well understood, theoretical analyses can be simplified by reframing each layer of a feed-forward network as an approximate solution to a sparse coding problem. Iterative solutions using basis pursuit are theoretically more stable and have improved adversarial robustness. However, cascading layer-wise pursuit implementations suffer from error accumulation in deeper networks. In contrast, our new method of deep pursuit approximates the activations of all layers as a single global optimization problem, allowing us to consider deeper, real-world architectures with skip connections such as residual networks. Experimentally, our approach demonstrates improved robustness to adversarial noise.