No Arabic abstract
Training convolutional neural networks (CNNs) with a strict Lipschitz constraint under the l_{2} norm is useful for provable adversarial robustness, interpretable gradients and stable training. While 1-Lipschitz CNNs can be designed by enforcing a 1-Lipschitz constraint on each layer, training such networks requires each layer to have an orthogonal Jacobian matrix (for all inputs) to prevent gradients from vanishing during backpropagation. A layer with this property is said to be Gradient Norm Preserving (GNP). To construct expressive GNP activation functions, we first prove that the Jacobian of any GNP piecewise linear function is only allowed to change via Householder transformations for the function to be continuous. Building on this result, we introduce a class of nonlinear GNP activations with learnable Householder transformations called Householder activations. A householder activation parameterized by the vector $mathbf{v}$ outputs $(mathbf{I} - 2mathbf{v}mathbf{v}^{T})mathbf{z}$ for its input $mathbf{z}$ if $mathbf{v}^{T}mathbf{z} leq 0$; otherwise it outputs $mathbf{z}$. Existing GNP activations such as $mathrm{MaxMin}$ can be viewed as special cases of $mathrm{HH}$ activations for certain settings of these transformations. Thus, networks with $mathrm{HH}$ activations have higher expressive power than those with $mathrm{MaxMin}$ activations. Although networks with $mathrm{HH}$ activations have nontrivial provable robustness against adversarial attacks, we further boost their robustness by (i) introducing a certificate regularization and (ii) relaxing orthogonalization of the last layer of the network. Our experiments on CIFAR-10 and CIFAR-100 show that our regularized networks with $mathrm{HH}$ activations lead to significant improvements in both the standard and provable robust accuracy over the prior works (gain of 3.65% and 4.46% on CIFAR-100 respectively).
Recent studies have shown that deep neural networks (DNNs) are highly vulnerable to adversarial attacks, including evasion and backdoor (poisoning) attacks. On the defense side, there have been intensive interests in both empirical and provable robustness against evasion attacks; however, provable robustness against backdoor attacks remains largely unexplored. In this paper, we focus on certifying robustness against backdoor attacks. To this end, we first provide a unified framework for robustness certification and show that it leads to a tight robustness condition for backdoor attacks. We then propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. Moreover, we evaluate the certified robustness of a family of smoothed models which are trained in a differentially private fashion, and show that they achieve better certified robustness bounds. In addition, we theoretically show that it is possible to train the robust smoothed models efficiently for simple models such as K-nearest neighbor classifiers, and we propose an exact smooth-training algorithm which eliminates the need to sample from a noise distribution. Empirically, we conduct comprehensive experiments for different machine learning (ML) models such as DNNs, differentially private DNNs, and K-NN models on MNIST, CIFAR-10 and ImageNet datasets (focusing on binary classifiers), and provide the first benchmark for certified robustness against backdoor attacks. In addition, we evaluate K-NN models on a spambase tabular dataset to demonstrate the advantages of the proposed exact algorithm. Both the theoretical analysis and the comprehensive benchmark on diverse ML models and datasets shed lights on further robust learning strategies against training time attacks or other general adversarial attacks.
Following the recent adoption of deep neural networks (DNN) accross a wide range of applications, adversarial attacks against these models have proven to be an indisputable threat. Adversarial samples are crafted with a deliberate intention of undermining a system. In the case of DNNs, the lack of better understanding of their working has prevented the development of efficient defenses. In this paper, we propose a new defense method based on practical observations which is easy to integrate into models and performs better than state-of-the-art defenses. Our proposed solution is meant to reinforce the structure of a DNN, making its prediction more stable and less likely to be fooled by adversarial samples. We conduct an extensive experimental study proving the efficiency of our method against multiple attacks, comparing it to numerous defenses, both in white-box and black-box setups. Additionally, the implementation of our method brings almost no overhead to the training procedure, while maintaining the prediction performance of the original model on clean samples.
Attention-based networks have achieved state-of-the-art performance in many computer vision tasks, such as image classification. Unlike Convolutional Neural Network (CNN), the major part of the vanilla Vision Transformer (ViT) is the attention block that brings the power of mimicking the global context of the input image. This power is data hunger and hence, the larger the training data the better the performance. To overcome this limitation, many ViT-based networks, or hybrid-ViT, have been proposed to include local context during the training. The robustness of ViTs and its variants against adversarial attacks has not been widely invested in the literature. Some robustness attributes were revealed in few previous works and hence, more insight robustness attributes are yet unrevealed. This work studies the robustness of ViT variants 1) against different $L_p$-based adversarial attacks in comparison with CNNs and 2) under Adversarial Examples (AEs) after applying preprocessing defense methods. To that end, we run a set of experiments on 1000 images from ImageNet-1k and then provide an analysis that reveals that vanilla ViT or hybrid-ViT are more robust than CNNs. For instance, we found that 1) Vanilla ViTs or hybrid-ViTs are more robust than CNNs under $L_0$, $L_1$, $L_2$, $L_infty$-based, and Color Channel Perturbations (CCP) attacks. 2) Vanilla ViTs are not responding to preprocessing defenses that mainly reduce the high frequency components while, hybrid-ViTs are more responsive to such defense. 3) CCP can be used as a preprocessing defense and larger ViT variants are found to be more responsive than other models. Furthermore, feature maps, attention maps, and Grad-CAM visualization jointly with image quality measures, and perturbations energy spectrum are provided for an insight understanding of attention-based models.
We analyze the properties of adversarial training for learning adversarially robust halfspaces in the presence of agnostic label noise. Denoting $mathsf{OPT}_{p,r}$ as the best robust classification error achieved by a halfspace that is robust to perturbations of $ell_{p}$ balls of radius $r$, we show that adversarial training on the standard binary cross-entropy loss yields adversarially robust halfspaces up to (robust) classification error $tilde O(sqrt{mathsf{OPT}_{2,r}})$ for $p=2$, and $tilde O(d^{1/4} sqrt{mathsf{OPT}_{infty, r}} + d^{1/2} mathsf{OPT}_{infty,r})$ when $p=infty$. Our results hold for distributions satisfying anti-concentration properties enjoyed by log-concave isotropic distributions among others. We additionally show that if one instead uses a nonconvex sigmoidal loss, adversarial training yields halfspaces with an improved robust classification error of $O(mathsf{OPT}_{2,r})$ for $p=2$, and $O(d^{1/4}mathsf{OPT}_{infty, r})$ when $p=infty$. To the best of our knowledge, this is the first work to show that adversarial training provably yields robust classifiers in the presence of noise.
We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded $ell_infty$ norm less than $epsilon = 0.1$), and code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial.