ترغب بنشر مسار تعليمي؟ اضغط هنا

Gotta Catch Em All: Using Honeypots to Catch Adversarial Attacks on Neural Networks

292   0   0.0 ( 0 )
 نشر من قبل Shawn Shan
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Deep neural networks (DNN) are known to be vulnerable to adversarial attacks. Numerous efforts either try to patch weaknesses in trained models, or try to make it difficult or costly to compute adversarial examples that exploit them. In our work, we explore a new honeypot approach to protect DNN models. We intentionally inject trapdoors, honeypot weaknesses in the classification manifold that attract attackers searching for adversarial examples. Attackers optimization algorithms gravitate towards trapdoors, leading them to produce attacks similar to trapdoors in the feature space. Our defense then identifies attacks by comparing neuron activation signatures of inputs to those of trapdoors. In this paper, we introduce trapdoors and describe an implementation of a trapdoor-enabled defense. First, we analytically prove that trapdoors shape the computation of adversarial attacks so that attack inputs will have feature representations very similar to those of trapdoors. Second, we experimentally show that trapdoor-protected models can detect, with high accuracy, adversarial examples generated by state-of-the-art attacks (PGD, optimization-based CW, Elastic Net, BPDA), with negligible impact on normal classification. These results generalize across classification domains, including image, facial, and traffic-sign recognition. We also present significant results measuring trapdoors robustness against customized adaptive attacks (countermeasures).



قيم البحث

اقرأ أيضاً

The existence of adversarial examples underscores the importance of understanding the robustness of machine learning models. Bayesian neural networks (BNNs), due to their calibrated uncertainty, have been shown to posses favorable adversarial robustn ess properties. However, when approximate Bayesian inference methods are employed, the adversarial robustness of BNNs is still not well understood. In this work, we employ gradient-free optimization methods in order to find adversarial examples for BNNs. In particular, we consider genetic algorithms, surrogate models, as well as zeroth order optimization methods and adapt them to the goal of finding adversarial examples for BNNs. In an empirical evaluation on the MNIST and Fashion MNIST datasets, we show that for various approximate Bayesian inference methods the usage of gradient-free algorithms can greatly improve the rate of finding adversarial examples compared to state-of-the-art gradient-based methods.
Recent years have witnessed the emergence and development of graph neural networks (GNNs), which have been shown as a powerful approach for graph representation learning in many tasks, such as node classification and graph classification. The researc h on the robustness of these models has also started to attract attentions in the machine learning field. However, most of the existing work in this area focus on the GNNs for node-level tasks, while little work has been done to study the robustness of the GNNs for the graph classification task. In this paper, we aim to explore the vulnerability of the Hierarchical Graph Pooling (HGP) Neural Networks, which are advanced GNNs that perform very well in the graph classification in terms of prediction accuracy. We propose an adversarial attack framework for this task. Specifically, we design a surrogate model that consists of convolutional and pooling operators to generate adversarial samples to fool the hierarchical GNN-based graph classification models. We set the preserved nodes by the pooling operator as our attack targets, and then we perturb the attack targets slightly to fool the pooling operator in hierarchical GNNs so that they will select the wrong nodes to preserve. We show the adversarial samples generated from multiple datasets by our surrogate model have enough transferability to attack current state-of-art graph classification models. Furthermore, we conduct the robust train on the target models and demonstrate that the retrained graph classification models are able to better defend against the attack from the adversarial samples. To the best of our knowledge, this is the first work on the adversarial attack against hierarchical GNN-based graph classification models.
Graph Neural Networks (GNNs) have achieved promising performance in various real-world applications. However, recent studies have shown that GNNs are vulnerable to adversarial attacks. In this paper, we study a recently-introduced realistic attack sc enario on graphs -- graph injection attack (GIA). In the GIA scenario, the adversary is not able to modify the existing link structure and node attributes of the input graph, instead the attack is performed by injecting adversarial nodes into it. We present an analysis on the topological vulnerability of GNNs under GIA setting, based on which we propose the Topological Defective Graph Injection Attack (TDGIA) for effective injection attacks. TDGIA first introduces the topological defective edge selection strategy to choose the original nodes for connecting with the injected ones. It then designs the smooth feature optimization objective to generate the features for the injected nodes. Extensive experiments on large-scale datasets show that TDGIA can consistently and significantly outperform various attack baselines in attacking dozens of defense GNN models. Notably, the performance drop on target GNNs resultant from TDGIA is more than double the damage brought by the best attack solution among hundreds of submissions on KDD-CUP 2020.
Recently, the object detection based on deep learning has proven to be vulnerable to adversarial patch attacks. The attackers holding a specially crafted patch can hide themselves from the state-of-the-art person detectors, e.g., YOLO, even in the ph ysical world. This kind of attack can bring serious security threats, such as escaping from surveillance cameras. In this paper, we deeply explore the detection problems about the adversarial patch attacks to the object detection. First, we identify a leverageable signature of existing adversarial patches from the point of the visualization explanation. A fast signature-based defense method is proposed and demonstrated to be effective. Second, we design an improved patch generation algorithm to reveal the risk that the signature-based way may be bypassed by the techniques emerging in the future. The newly generated adversarial patches can successfully evade the proposed signature-based defense. Finally, we present a novel signature-independent detection method based on the internal content semantics consistency rather than any attack-specific prior knowledge. The fundamental intuition is that the adversarial object can appear locally but disappear globally in an input image. The experiments demonstrate that the signature-independent method can effectively detect the existing and improved attacks. It has also proven to be a general method by detecting unforeseen and even other types of attacks without any attack-specific prior knowledge. The two proposed detection methods can be adopted in different scenarios, and we believe that combining them can offer a comprehensive protection.
Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas. In most instances where these tasks are deployed in real-world scenarios, the models used in them have been shown to be susceptible to adversarial attacks, making it imperative for us to address the challenge of their adversarial robustness. Existing techniques for adversarial robustness fall into three broad categories: defensive distillation techniques, adversarial training techniques, and randomized or non-deterministic model based techniques. In this paper, we propose a novel neural network paradigm that falls under the category of randomized models for adversarial robustness, but differs from all existing techniques under this category in that it models each parameter of the network as a statistical distribution with learnable parameters. We show experimentally that this framework is highly robust to a variety of white-box and black-box adversarial attacks, while preserving the task-specific performance of the traditional neural network model.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا