No Arabic abstract
Deep Neural Networks (DNNs) could be easily fooled by Adversarial Examples (AEs) with the imperceptible difference to original samples in human eyes. To keep the difference imperceptible, the existing attacking bound the adversarial perturbations by the $ell_infty$ norm, which is then served as the standard to align different attacks for a fair comparison. However, when investigating attack transferability, i.e., the capability of the AEs from attacking one surrogate DNN to cheat other black-box DNN, we find that only using the $ell_infty$ norm is not sufficient to measure the attack strength, according to our comprehensive experiments concerning 7 transfer-based attacks, 4 white-box surrogate models, and 9 black-box victim models. Specifically, we find that the $ell_2$ norm greatly affects the transferability in $ell_infty$ attacks. Since larger-perturbed AEs naturally bring about better transferability, we advocate that the strength of all attacks should be measured by both the widely used $ell_infty$ and also the $ell_2$ norm. Despite the intuitiveness of our conclusion and advocacy, they are very necessary for the community, because common evaluations (bounding only the $ell_infty$ norm) allow tricky enhancements of the attack transferability by increasing the attack strength ($ell_2$ norm) as shown by our simple counter-example method, and the good transferability of several existing methods may be due to their large $ell_2$ distances.
Adversarial attacks aim to confound machine learning systems, while remaining virtually imperceptible to humans. Attacks on image classification systems are typically gauged in terms of $p$-norm distortions in the pixel feature space. We perform a behavioral study, demonstrating that the pixel $p$-norm for any $0le p le infty$, and several alternative measures including earth movers distance, structural similarity index, and deep net embedding, do not fit human perception. Our result has the potential to improve the understanding of adversarial attack and defense strategies.
Evaluating adversarial robustness amounts to finding the minimum perturbation needed to have an input sample misclassified. The inherent complexity of the underlying optimization requires current gradient-based attacks to be carefully tuned, initialized, and possibly executed for many computationally-demanding iterations, even if specialized to a given perturbation model. In this work, we overcome these limitations by proposing a fast minimum-norm (FMN) attack that works with different $ell_p$-norm perturbation models ($p=0, 1, 2, infty$), is robust to hyperparameter choices, does not require adversarial starting points, and converges within few lightweight steps. It works by iteratively finding the sample misclassified with maximum confidence within an $ell_p$-norm constraint of size $epsilon$, while adapting $epsilon$ to minimize the distance of the current sample to the decision boundary. Extensive experiments show that FMN significantly outperforms existing attacks in terms of convergence speed and computation time, while reporting comparable or even smaller perturbation sizes.
The vulnerability of machine learning systems to adversarial attacks questions their usage in many applications. In this paper, we propose a randomized diversification as a defense strategy. We introduce a multi-channel architecture in a gray-box scenario, which assumes that the architecture of the classifier and the training data set are known to the attacker. The attacker does not only have access to a secret key and to the internal states of the system at the test time. The defender processes an input in multiple channels. Each channel introduces its own randomization in a special transform domain based on a secret key shared between the training and testing stages. Such a transform based randomization with a shared key preserves the gradients in key-defined sub-spaces for the defender but it prevents gradient back propagation and the creation of various bypass systems for the attacker. An additional benefit of multi-channel randomization is the aggregation that fuses soft-outputs from all channels, thus increasing the reliability of the final score. The sharing of a secret key creates an information advantage to the defender. Experimental evaluation demonstrates an increased robustness of the proposed method to a number of known state-of-the-art attacks.
Diffusion is a fundamental graph procedure and has been a basic building block in a wide range of theoretical and empirical applications such as graph partitioning and semi-supervised learning on graphs. In this paper, we study computationally efficient diffusion primitives beyond random walk. We design an $widetilde{O}(m)$-time randomized algorithm for the $ell_2$-norm flow diffusion problem, a recently proposed diffusion model based on network flow with demonstrated graph clustering related applications both in theory and in practice. Examples include finding locally-biased low conductance cuts. Using a known connection between the optimal dual solution of the flow diffusion problem and the local cut structure, our algorithm gives an alternative approach for finding such cuts in nearly linear time. From a technical point of view, our algorithm contributes a novel way of dealing with inequality constraints in graph optimization problems. It adapts the high-level algorithmic framework of nearly linear time Laplacian system solvers, but requires several new tools: vertex elimination under constraints, a new family of graph ultra-sparsifiers, and accelerated proximal gradient methods with inexact proximal mapping computation.
We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the $ell_p$ ball of radius $epsilon$ when $p>2$. Although random smoothing has been well understood for the $ell_2$ case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of $p>2$. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the $ell_infty$ threat model. In this work, we show that any noise distribution $mathcal{D}$ over $mathbb{R}^d$ that provides $ell_p$ robustness for all base classifiers with $p>2$ must satisfy $mathbb{E}eta_i^2=Omega(d^{1-2/p}epsilon^2(1-delta)/delta^2)$ for 99% of the features (pixels) of vector $etasimmathcal{D}$, where $epsilon$ is the robust radius and $delta$ is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in $[0,255]$, the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.