ترغب بنشر مسار تعليمي؟ اضغط هنا

Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review

94   0   0.0 ( 0 )
 نشر من قبل Yansong Gao Dr
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to the attackers capability and affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative learning and post-deployment. Accordingly, attacks under each categorization are combed. The countermeasures are categorized into four general classes: blind backdoor removal, offline backdoor inspection, online backdoor inspection, and post backdoor removal. Accordingly, we review countermeasures, and compare and analyze their advantages and disadvantages. We have also reviewed the flip side of backdoor attacks, which are explored for i) protecting intellectual property of deep learning models, ii) acting as a honeypot to catch adversarial example attacks, and iii) verifying data deletion requested by the data contributor.Overall, the research on defense is far behind the attack, and there is no single defense that can prevent all types of backdoor attacks. In some cases, an attacker can intelligently bypass existing defenses with an adaptive attack. Drawing the insights from the systematic review, we also present key areas for future research on the backdoor, such as empirical security evaluations from physical trigger attacks, and in particular, more efficient and practical countermeasures are solicited.



قيم البحث

اقرأ أيضاً

Deep neural networks (DNNs) have been proven vulnerable to backdoor attacks, where hidden features (patterns) trained to a normal model, which is only activated by some specific input (called triggers), trick the model into producing unexpected behav ior. In this paper, we create covert and scattered triggers for backdoor attacks, invisible backdoors, where triggers can fool both DNN models and human inspection. We apply our invisible backdoors through two state-of-the-art methods of embedding triggers for backdoor attacks. The first approach on Badnets embeds the trigger into DNNs through steganography. The second approach of a trojan attack uses two types of additional regularization terms to generate the triggers with irregular shape and size. We use the Attack Success Rate and Functionality to measure the performance of our attacks. We introduce two novel definitions of invisibility for human perception; one is conceptualized by the Perceptual Adversarial Similarity Score (PASS) and the other is Learned Perceptual Image Patch Similarity (LPIPS). We show that the proposed invisible backdoors can be fairly effective across various DNN models as well as four datasets MNIST, CIFAR-10, CIFAR-100, and GTSRB, by measuring their attack success rates for the adversary, functionality for the normal users, and invisibility scores for the administrators. We finally argue that the proposed invisible backdoor attacks can effectively thwart the state-of-the-art trojan backdoor detection approaches, such as Neural Cleanse and TABOR.
The physical, black-box hard-label setting is arguably the most realistic threat model for cyber-physical vision systems. In this setting, the attacker only has query access to the model and only receives the top-1 class label without confidence info rmation. Creating small physical stickers that are robust to environmental variation is difficult in the discrete and discontinuous hard-label space because the attack must both design a small shape to perturb within and find robust noise to fill it with. Unfortunately, we find that existing $ell_2$ or $ell_infty$ minimizing hard-label attacks do not easily extend to finding such robust physical perturbation attacks. Thus, we propose GRAPHITE, the first algorithm for hard-label physical attacks on computer vision models. We show that survivability, an estimate of physical variation robustness, can be used in new ways to generate small masks and is a sufficiently smooth function to optimize with gradient-free optimization. We use GRAPHITE to attack a traffic sign classifier and a publicly-available Automatic License Plate Recognition (ALPR) tool using only query access. We evaluate both tools in real-world field tests to measure its physical-world robustness. We successfully cause a Stop sign to be misclassified as a Speed Limit 30 km/hr sign in 95.7% of physical images and cause errors in 75% of physical images for the ALPR tool.
Although deep neural networks (DNNs) have made rapid progress in recent years, they are vulnerable in adversarial environments. A malicious backdoor could be embedded in a model by poisoning the training dataset, whose intention is to make the infect ed model give wrong predictions during inference when the specific trigger appears. To mitigate the potential threats of backdoor attacks, various backdoor detection and defense methods have been proposed. However, the existing techniques usually require the poisoned training data or access to the white-box model, which is commonly unavailable in practice. In this paper, we propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. We introduce a gradient-free optimization algorithm to reverse-engineer the potential trigger for each class, which helps to reveal the existence of backdoor attacks. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models. Extensive experiments on hundreds of DNN models trained on several datasets corroborate the effectiveness of our method under the black-box setting against various backdoor attacks.
Numerous previous works have studied deep learning algorithms applied in the context of side-channel attacks, which demonstrated the ability to perform successful key recoveries. These studies show that modern cryptographic devices are increasingly t hreatened by side-channel attacks with the help of deep learning. However, the existing countermeasures are designed to resist classical side-channel attacks, and cannot protect cryptographic devices from deep learning based side-channel attacks. Thus, there arises a strong need for countermeasures against deep learning based side-channel attacks. Although deep learning has the high potential in solving complex problems, it is vulnerable to adversarial attacks in the form of subtle perturbations to inputs that lead a model to predict incorrectly. In this paper, we propose a kind of novel countermeasures based on adversarial attacks that is specifically designed against deep learning based side-channel attacks. We estimate several models commonly used in deep learning based side-channel attacks to evaluate the proposed countermeasures. It shows that our approach can effectively protect cryptographic devices from deep learning based side-channel attacks in practice. In addition, our experiments show that the new countermeasures can also resist classical side-channel attacks.
Deep Neural Networks (DNNs) have been utilized in various applications ranging from image classification and facial recognition to medical imagery analysis and real-time object detection. As our models become more sophisticated and complex, the compu tational cost of training such models becomes a burden for small companies and individuals; for this reason, outsourcing the training process has been the go-to option for such users. Unfortunately, outsourcing the training process comes at the cost of vulnerability to backdoor attacks. These attacks aim at establishing hidden backdoors in the DNN such that the model performs well on benign samples but outputs a particular target label when a trigger is applied to the input. Current backdoor attacks rely on generating triggers in the image/pixel domain; however, as we show in this paper, it is not the only domain to exploit and one should always check the other doors. In this work, we propose a complete pipeline for generating a dynamic, efficient, and invisible backdoor attack in the frequency domain. We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks through extensive experiments on various datasets and network architectures. The backdoored models are shown to break various state-of-the-art defences. We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them. We conclude the work with some remarks regarding a networks learning capacity and the capability of embedding a backdoor attack in the model.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا