ترغب بنشر مسار تعليمي؟ اضغط هنا

Attacking Adversarial Attacks as A Defense

109   0   0.0 ( 0 )
 نشر من قبل Boxi Wu
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

It is well known that adversarial attacks can fool deep neural networks with imperceptible perturbations. Although adversarial training significantly improves model robustness, failure cases of defense still broadly exist. In this work, we find that the adversarial attacks can also be vulnerable to small perturbations. Namely, on adversarially-trained models, perturbing adversarial examples with a small random noise may invalidate their misled predictions. After carefully examining state-of-the-art attacks of various kinds, we find that all these attacks have this deficiency to different extents. Enlightened by this finding, we propose to counter attacks by crafting more effective defensive perturbations. Our defensive perturbations leverage the advantage that adversarial training endows the ground-truth class with smaller local Lipschitzness. By simultaneously attacking all the classes, the misled predictions with larger Lipschitzness can be flipped into correct ones. We verify our defensive perturbation with both empirical experiments and theoretical analyses on a linear model. On CIFAR10, it boosts the state-of-the-art model from 66.16% to 72.66% against the four attacks of AutoAttack, including 71.76% to 83.30% against the Square attack. On ImageNet, the top-1 robust accuracy of FastAT is improved from 33.18% to 38.54% under the 100-step PGD attack.



قيم البحث

اقرأ أيضاً

127 - Bin Zhu , Zhaoquan Gu , Le Wang 2021
Recent work shows that deep neural networks are vulnerable to adversarial examples. Much work studies adversarial example generation, while very little work focuses on more critical adversarial defense. Existing adversarial detection methods usually make assumptions about the adversarial example and attack method (e.g., the word frequency of the adversarial example, the perturbation level of the attack method). However, this limits the applicability of the detection method. To this end, we propose TREATED, a universal adversarial detection method that can defend against attacks of various perturbation levels without making any assumptions. TREATED identifies adversarial examples through a set of well-designed reference models. Extensive experiments on three competitive neural networks and two widely used datasets show that our method achieves better detection performance than baselines. We finally conduct ablation studies to verify the effectiveness of our method.
211 - Zifei Zhang , Kai Qiao , Jian Chen 2020
Though deep neural networks perform challenging tasks excellently, they are susceptible to adversarial examples, which mislead classifiers by applying human-imperceptible perturbations on clean inputs. Under the query-free black-box scenario, adversa rial examples are hard to transfer to unknown models, and several methods have been proposed with the low transferability. To settle such issue, we design a max-min framework inspired by input transformations, which are benificial to both the adversarial attack and defense. Explicitly, we decrease loss values with inputs affline transformations as a defense in the minimum procedure, and then increase loss values with the momentum iterative algorithm as an attack in the maximum procedure. To further promote transferability, we determine transformed values with the max-min theory. Extensive experiments on Imagenet demonstrate that our defense-guided transferable attacks achieve impressive increase on transferability. Experimentally, we show that our ASR of adversarial attack reaches to 58.38% on average, which outperforms the state-of-the-art method by 12.1% on the normally trained models and by 11.13% on the adversarially trained models. Additionally, we provide elucidative insights on the improvement of transferability, and our method is expected to be a benchmark for assessing the robustness of deep models.
Adversarial attacks expose important vulnerabilities of deep learning models, yet little attention has been paid to settings where data arrives as a stream. In this paper, we formalize the online adversarial attack problem, emphasizing two key elemen ts found in real-world use-cases: attackers must operate under partial knowledge of the target model, and the decisions made by the attacker are irrevocable since they operate on a transient data stream. We first rigorously analyze a deterministic variant of the online threat model by drawing parallels to the well-studied $k$-secretary problem in theoretical computer science and propose Virtual+, a simple yet practical online algorithm. Our main theoretical result show Virtual+ yields provably the best competitive ratio over all single-threshold algorithms for $k<5$ -- extending previous analysis of the $k$-secretary problem. We also introduce the textit{stochastic $k$-secretary} -- effectively reducing online blackbox transfer attacks to a $k$-secretary problem under noise -- and prove theoretical bounds on the performance of textit{any} online algorithms adapted to this setting. Finally, we complement our theoretical results by conducting experiments on both MNIST and CIFAR-10 with both vanilla and robust classifiers, revealing not only the necessity of online algorithms in achieving near-optimal performance but also the rich interplay of a given attack strategy towards online attack selection, enabling simple strategies like FGSM to outperform classically strong whitebox adversaries.
This paper introduces stochastic sparse adversarial attacks (SSAA), simple, fast and purely noise-based targeted and untargeted $L_0$ attacks of neural network classifiers (NNC). SSAA are devised by exploiting a simple small-time expansion idea widel y used for Markov processes and offer new examples of $L_0$ attacks whose studies have been limited. They are designed to solve the known scalability issue of the family of Jacobian-based saliency maps attacks to large datasets and they succeed in solving it. Experiments on small and large datasets (CIFAR-10 and ImageNet) illustrate further advantages of SSAA in comparison with the-state-of-the-art methods. For instance, in the untargeted case, our method called Voting Folded Gaussian Attack (VFGA) scales efficiently to ImageNet and achieves a significantly lower $L_0$ score than SparseFool (up to $frac{2}{5}$ lower) while being faster. Moreover, VFGA achieves better $L_0$ scores on ImageNet than Sparse-RS when both attacks are fully successful on a large number of samples. Codes are publicly available through the link https://github.com/SSAA3/stochastic-sparse-adv-attacks
78 - Yueyao Yu , Pengfei Yu , Wenye Li 2019
Deep learning models are vulnerable to adversarial examples, which poses an indisputable threat to their applications. However, recent studies observe gradient-masking defenses are self-deceiving methods if an attacker can realize this defense. In th is paper, we propose a new defense method based on appending information. We introduce the Aux Block model to produce extra outputs as a self-ensemble algorithm and analytically investigate the robustness mechanism of Aux Block. We have empirically studied the efficiency of our method against adversarial examples in two types of white-box attacks, and found that even in the full white-box attack where an adversary can craft malicious examples from defense models, our method has a more robust performance of about 54.6% precision on Cifar10 dataset and 38.7% precision on Mini-Imagenet dataset. Another advantage of our method is that it is able to maintain the prediction accuracy of the classification model on clean images, and thereby exhibits its high potential in practical applications

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا