No Arabic abstract
We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the $ell_p$ ball of radius $epsilon$ when $p>2$. Although random smoothing has been well understood for the $ell_2$ case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of $p>2$. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the $ell_infty$ threat model. In this work, we show that any noise distribution $mathcal{D}$ over $mathbb{R}^d$ that provides $ell_p$ robustness for all base classifiers with $p>2$ must satisfy $mathbb{E}eta_i^2=Omega(d^{1-2/p}epsilon^2(1-delta)/delta^2)$ for 99% of the features (pixels) of vector $etasimmathcal{D}$, where $epsilon$ is the robust radius and $delta$ is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in $[0,255]$, the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.
In this paper we introduce a provably stable architecture for Neural Ordinary Differential Equations (ODEs) which achieves non-trivial adversarial robustness under white-box adversarial attacks even when the network is trained naturally. For most existing defense methods withstanding strong white-box attacks, to improve robustness of neural networks, they need to be trained adversarially, hence have to strike a trade-off between natural accuracy and adversarial robustness. Inspired by dynamical system theory, we design a stabilized neural ODE network named SONet whose ODE blocks are skew-symmetric and proved to be input-output stable. With natural training, SONet can achieve comparable robustness with the state-of-the-art adversarial defense methods, without sacrificing natural accuracy. Even replacing only the first layer of a ResNet by such a ODE block can exhibit further improvement in robustness, e.g., under PGD-20 ($ell_infty=0.031$) attack on CIFAR-10 dataset, it achieves 91.57% and natural accuracy and 62.35% robust accuracy, while a counterpart architecture of ResNet trained with TRADES achieves natural and robust accuracy 76.29% and 45.24%, respectively. To understand possible reasons behind this surprisingly good result, we further explore the possible mechanism underlying such an adversarial robustness. We show that the adaptive stepsize numerical ODE solver, DOPRI5, has a gradient masking effect that fails the PGD attacks which are sensitive to gradient information of training loss; on the other hand, it cannot fool the CW attack of robust gradients and the SPSA attack that is gradient-free. This provides a new explanation that the adversarial robustness of ODE-based networks mainly comes from the obfuscated gradients in numerical ODE solvers.
Deep Neural Networks (DNNs) could be easily fooled by Adversarial Examples (AEs) with the imperceptible difference to original samples in human eyes. To keep the difference imperceptible, the existing attacking bound the adversarial perturbations by the $ell_infty$ norm, which is then served as the standard to align different attacks for a fair comparison. However, when investigating attack transferability, i.e., the capability of the AEs from attacking one surrogate DNN to cheat other black-box DNN, we find that only using the $ell_infty$ norm is not sufficient to measure the attack strength, according to our comprehensive experiments concerning 7 transfer-based attacks, 4 white-box surrogate models, and 9 black-box victim models. Specifically, we find that the $ell_2$ norm greatly affects the transferability in $ell_infty$ attacks. Since larger-perturbed AEs naturally bring about better transferability, we advocate that the strength of all attacks should be measured by both the widely used $ell_infty$ and also the $ell_2$ norm. Despite the intuitiveness of our conclusion and advocacy, they are very necessary for the community, because common evaluations (bounding only the $ell_infty$ norm) allow tricky enhancements of the attack transferability by increasing the attack strength ($ell_2$ norm) as shown by our simple counter-example method, and the good transferability of several existing methods may be due to their large $ell_2$ distances.
Several recent works have shown that state-of-the-art classifiers are vulnerable to worst-case (i.e., adversarial) perturbations of the datapoints. On the other hand, it has been empirically observed that these same classifiers are relatively robust to random noise. In this paper, we propose to study a textit{semi-random} noise regime that generalizes both the random and worst-case noise regimes. We propose the first quantitative analysis of the robustness of nonlinear classifiers in this general noise regime. We establish precise theoretical bounds on the robustness of classifiers in this general regime, which depend on the curvature of the classifiers decision boundary. Our bounds confirm and quantify the empirical observations that classifiers satisfying curvature constraints are robust to random noise. Moreover, we quantify the robustness of classifiers in terms of the subspace dimension in the semi-random noise regime, and show that our bounds remarkably interpolate between the worst-case and random noise regimes. We perform experiments and show that the derived bounds provide very accurate estimates when applied to various state-of-the-art deep neural networks and datasets. This result suggests bounds on the curvature of the classifiers decision boundaries that we support experimentally, and more generally offers important insights onto the geometry of high dimensional classification problems.
This paper investigates the theory of robustness against adversarial attacks. It focuses on the family of randomization techniques that consist in injecting noise in the network at inference time. These techniques have proven effective in many contexts, but lack theoretical arguments. We close this gap by presenting a theoretical analysis of these approaches, hence explaining why they perform well in practice. More precisely, we make two new contributions. The first one relates the randomization rate to robustness to adversarial attacks. This result applies for the general family of exponential distributions, and thus extends and unifies the previous approaches. The second contribution consists in devising a new upper bound on the adversarial generalization gap of randomized neural networks. We support our theoretical claims with a set of experiments.
Great advancement in deep neural networks (DNNs) has led to state-of-the-art performance on a wide range of tasks. However, recent studies have shown that DNNs are vulnerable to adversarial attacks, which have brought great concerns when deploying these models to safety-critical applications such as autonomous driving. Different defense approaches have been proposed against adversarial attacks, including: 1) empirical defenses, which can be adaptively attacked again without providing robustness certification; and 2) certifiably robust approaches, which consist of robustness verification providing the lower bound of robust accuracy against any attacks under certain conditions and corresponding robust training approaches. In this paper, we focus on these certifiably robust approaches and provide the first work to perform large-scale systematic analysis of different robustness verification and training approaches. In particular, we 1) provide a taxonomy for the robustness verification and training approaches, as well as discuss the detailed methodologies for representative algorithms, 2) reveal the fundamental connections among these approaches, 3) discuss current research progresses, theoretical barriers, main challenges, and several promising future directions for certified defenses for DNNs, and 4) provide an open-sourced unified platform to evaluate 20+ representative verification and corresponding robust training approaches on a wide range of DNNs.