Adversarial YOLO: Defense Human Detection Patch Attacks via Detecting Adversarial Patches

121 0 0.0 ( 0 )

Download Cite

Added by Xueshuang Xiang

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Nan Ji - YanFei Feng - Haidong Xie

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The security of object detection systems has attracted increasing attention, especially when facing adversarial patch attacks. Since patch attacks change the pixels in a restricted area on objects, they are easy to implement in the physical world, especially for attacking human detection systems. The existing defenses against patch attacks are mostly applied for image classification problems and have difficulty resisting human detection attacks. Towards this critical issue, we propose an efficient and effective plug-in defense component on the YOLO detection system, which we name Ad-YOLO. The main idea is to add a patch class on the YOLO architecture, which has a negligible inference increment. Thus, Ad-YOLO is expected to directly detect both the objects of interest and adversarial patches. To the best of our knowledge, our approach is the first defense strategy against human detection attacks. We investigate Ad-YOLOs performance on the YOLOv2 baseline. To improve the ability of Ad-YOLO to detect variety patches, we first use an adversarial training process to develop a patch dataset based on the Inria dataset, which we name Inria-Patch. Then, we train Ad-YOLO by a combination of Pascal VOC, Inria, and Inria-Patch datasets. With a slight drop of $0.70%$ mAP on VOC 2007 test set, Ad-YOLO achieves $80.31%$ AP of persons, which highly outperforms $33.93%$ AP for YOLOv2 when facing white-box patch attacks. Furthermore, compared with YOLOv2, the results facing a physical-world attack are also included to demonstrate Ad-YOLOs excellent generalization ability.

rate research

Shape Defense Against Adversarial Attacks

152 - Ali Borji 2020

Humans rely heavily on shape information to recognize objects. Conversely, convolutional neural networks (CNNs) are biased more towards texture. This is perhaps the main reason why CNNs are vulnerable to adversarial examples. Here, we explore how shape bias can be incorporated into CNNs to improve their robustness. Two algorithms are proposed, based on the observation that edges are invariant to moderate imperceptible perturbations. In the first one, a classifier is adversarially trained on images with the edge map as an additional channel. At inference time, the edge map is recomputed and concatenated to the image. In the second algorithm, a conditional GAN is trained to translate the edge maps, from clean and/or perturbed images, into clean images. Inference is done over the generated image corresponding to the inputs edge map. Extensive experiments over 10 datasets demonstrate the effectiveness of the proposed algorithms against FGSM and $ell_infty$ PGD-40 attacks. Further, we show that a) edge information can also benefit other adversarial training methods, and b) CNNs trained on edge-augmented inputs are more robust against natural image corruptions such as motion blur, impulse noise and JPEG compression, than CNNs trained solely on RGB images. From a broader perspective, our study suggests that CNNs do not adequately account for image structures that are crucial for robustness. Code is available at:~url{https://github.com/aliborji/Shapedefence.git}.

Computer Vision and Pattern Recognition Artificial Intelligence Machine Learning

Patch-wise++ Perturbation for Adversarial Targeted Attacks

216 - Lianli Gao , Qilong Zhang , Jingkuan Song 2020

Although great progress has been made on adversarial attacks for deep neural networks (DNNs), their transferability is still unsatisfactory, especially for targeted attacks. There are two problems behind that have been long overlooked: 1) the conventional setting of $T$ iterations with the step size of $epsilon/T$ to comply with the $epsilon$-constraint. In this case, most of the pixels are allowed to add very small noise, much less than $epsilon$; and 2) usually manipulating pixel-wise noise. However, features of a pixel extracted by DNNs are influenced by its surrounding regions, and different DNNs generally focus on different discriminative regions in recognition. To tackle these issues, our previous work proposes a patch-wise iterative method (PIM) aimed at crafting adversarial examples with high transferability. Specifically, we introduce an amplification factor to the step size in each iteration, and one pixels overall gradient overflowing the $epsilon$-constraint is properly assigned to its surrounding regions by a project kernel. But targeted attacks aim to push the adversarial examples into the territory of a specific class, and the amplification factor may lead to underfitting. Thus, we introduce the temperature and propose a patch-wise++ iterative method (PIM++) to further improve transferability without significantly sacrificing the performance of the white-box attack. Our method can be generally integrated to any gradient-based attack methods. Compared with the current state-of-the-art attack methods, we significantly improve the success rate by 33.1% for defense models and 31.4% for normally trained models on average.

Computer Vision and Pattern Recognition

Detecting Potential Local Adversarial Examples for Human-Interpretable Defense

90 - Xavier Renard , Thibault Laugel , Marie-Jeanne Lesot 2018

Machine learning models are increasingly used in the industry to make decisions such as credit insurance approval. Some people may be tempted to manipulate specific variables, such as the age or the salary, in order to get better chances of approval. In this ongoing work, we propose to discuss, with a first proposition, the issue of detecting a potential local adversarial example on classical tabular data by providing to a human expert the locally critical features for the classifiers decision, in order to control the provided information and avoid a fraud.

Machine Learning Cryptography and Security Machine Learning

Defense-guided Transferable Adversarial Attacks

211 - Zifei Zhang , Kai Qiao , Jian Chen 2020

Though deep neural networks perform challenging tasks excellently, they are susceptible to adversarial examples, which mislead classifiers by applying human-imperceptible perturbations on clean inputs. Under the query-free black-box scenario, adversarial examples are hard to transfer to unknown models, and several methods have been proposed with the low transferability. To settle such issue, we design a max-min framework inspired by input transformations, which are benificial to both the adversarial attack and defense. Explicitly, we decrease loss values with inputs affline transformations as a defense in the minimum procedure, and then increase loss values with the momentum iterative algorithm as an attack in the maximum procedure. To further promote transferability, we determine transformed values with the max-min theory. Extensive experiments on Imagenet demonstrate that our defense-guided transferable attacks achieve impressive increase on transferability. Experimentally, we show that our ASR of adversarial attack reaches to 58.38% on average, which outperforms the state-of-the-art method by 12.1% on the normally trained models and by 11.13% on the adversarially trained models. Additionally, we provide elucidative insights on the improvement of transferability, and our method is expected to be a benchmark for assessing the robustness of deep models.

Machine Learning Computer Vision and Pattern Recognition

PatchCleanser: Certifiably Robust Defense against Adversarial Patches for Any Image Classifier

135 - Chong Xiang , Saeed Mahloujifar , Prateek Mittal 2021

The adversarial patch attack against image classification models aims to inject adversarially crafted pixels within a localized restricted image region (i.e., a patch) for inducing model misclassification. This attack can be realized in the physical world by printing and attaching the patch to the victim object and thus imposes a real-world threat to computer vision systems. To counter this threat, we propose PatchCleanser as a certifiably robust defense against adversarial patches that is compatible with any image classifier. In PatchCleanser, we perform two rounds of pixel masking on the input image to neutralize the effect of the adversarial patch. In the first round of masking, we apply a set of carefully generated masks to the input image and evaluate the model prediction on every masked image. If model predictions on all one-masked images reach a unanimous agreement, we output the agreed prediction label. Otherwise, we perform a second round of masking to settle the disagreement, in which we evaluate model predictions on two-masked images to robustly recover the correct prediction label. Notably, we can prove that our defense will always make correct predictions on certain images against any adaptive white-box attacker within our threat model, achieving certified robustness. We extensively evaluate our defense on the ImageNet, ImageNette, CIFAR-10, CIFAR-100, SVHN, and Flowers-102 datasets and demonstrate that our defense achieves similar clean accuracy as state-of-the-art classification models and also significantly improves certified robustness from prior works. Notably, our defense can achieve 83.8% top-1 clean accuracy and 60.4% top-1 certified robust accuracy against a 2%-pixel square patch anywhere on the 1000-class ImageNet dataset.

Computer Vision and Pattern Recognition Cryptography and Security