No Arabic abstract
In this paper, we study physical adversarial attacks on object detectors in the wild. Previous works mostly craft instance-dependent perturbations only for rigid or planar objects. To this end, we propose to learn an adversarial pattern to effectively attack all instances belonging to the same object category, referred to as Universal Physical Camouflage Attack (UPC). Concretely, UPC crafts camouflage by jointly fooling the region proposal network, as well as misleading the classifier and the regressor to output errors. In order to make UPC effective for non-rigid or non-planar objects, we introduce a set of transformations for mimicking deformable properties. We additionally impose optimization constraint to make generated patterns look natural to human observers. To fairly evaluate the effectiveness of different physical-world attacks, we present the first standardized virtual database, AttackScenes, which simulates the real 3D world in a controllable and reproducible environment. Extensive experiments suggest the superiority of our proposed UPC compared with existing physical adversarial attackers not only in virtual environments (AttackScenes), but also in real-world physical environments. Code and dataset are available at https://mesunhlf.github.io/index_physical.html.
We propose a universal and physically realizable adversarial attack on a cascaded multi-modal deep learning network (DNN), in the context of self-driving cars. DNNs have achieved high performance in 3D object detection, but they are known to be vulnerable to adversarial attacks. These attacks have been heavily investigated in the RGB image domain and more recently in the point cloud domain, but rarely in both domains simultaneously - a gap to be filled in this paper. We use a single 3D mesh and differentiable rendering to explore how perturbing the meshs geometry and texture can reduce the robustness of DNNs to adversarial attacks. We attack a prominent cascaded multi-modal DNN, the Frustum-Pointnet model. Using the popular KITTI benchmark, we showed that the proposed universal multi-modal attack was successful in reducing the models ability to detect a car by nearly 73%. This work can aid in the understanding of what the cascaded RGB-point cloud DNN learns and its vulnerability to adversarial attacks.
Adversarial attacks are feasible in the real world for object detection. However, most of the previous works have tried to learn patches applied to an object to fool detectors, which become less effective or even ineffective in squint view angles. To address this issue, we propose the Dense Proposals Attack (DPA) to learn robust, physical and targeted adversarial camouflages for detectors. The camouflages are robust because they remain adversarial when filmed under arbitrary viewpoint and different illumination conditions, physical because they function well both in the 3D virtual scene and the real world, and targeted because they can cause detectors to misidentify an object as a specific target class. In order to make the generated camouflages robust in the physical world, we introduce a combination of viewpoint shifts, lighting and other natural transformations to model the physical phenomena. In addition, to improve the attacks, DPA substantially attacks all the classifications in the fixed region proposals. Moreover, we build a virtual 3D scene using the Unity simulation engine to fairly and reproducibly evaluate different physical attacks. Extensive experiments demonstrate that DPA outperforms the state-of-the-art methods significantly, and generalizes well to the real world, posing a potential threat to the security-critical computer vision systems.
In this paper, we propose a natural and robust physical adversarial example attack method targeting object detectors under real-world conditions. The generated adversarial examples are robust to various physical constraints and visually look similar to the original images, thus these adversarial examples are natural to humans and will not cause any suspicions. First, to ensure the robustness of the adversarial examples in real-world conditions, the proposed method exploits different image transformation functions, to simulate various physical changes during the iterative optimization of the adversarial examples generation. Second, to construct natural adversarial examples, the proposed method uses an adaptive mask to constrain the area and intensities of the added perturbations, and utilizes the real-world perturbation score (RPS) to make the perturbations be similar to those real noises in physical world. Compared with existing studies, our generated adversarial examples can achieve a high success rate with less conspicuous perturbations. Experimental results demonstrate that, the generated adversarial examples are robust under various indoor and outdoor physical conditions, including different distances, angles, illuminations, and photographing. Specifically, the attack success rate of generated adversarial examples indoors and outdoors is high up to 73.33% and 82.22%, respectively. Meanwhile, the proposed method ensures the naturalness of the generated adversarial example, and the size of added perturbations is much smaller than the perturbations in the existing works. Further, the proposed physical adversarial attack method can be transferred from the white-box models to other object detection models.
Deep learning models are vulnerable to adversarial examples. As a more threatening type for practical deep learning systems, physical adversarial examples have received extensive research attention in recent years. However, without exploiting the intrinsic characteristics such as model-agnostic and human-specific patterns, existing works generate weak adversarial perturbations in the physical world, which fall short of attacking across different models and show visually suspicious appearance. Motivated by the viewpoint that attention reflects the intrinsic characteristics of the recognition process, this paper proposes the Dual Attention Suppression (DAS) attack to generate visually-natural physical adversarial camouflages with strong transferability by suppressing both model and human attention. As for attacking, we generate transferable adversarial camouflages by distracting the model-shared similar attention patterns from the target to non-target regions. Meanwhile, based on the fact that human visual attention always focuses on salient items (e.g., suspicious distortions), we evade the human-specific bottom-up attention to generate visually-natural camouflages which are correlated to the scenario context. We conduct extensive experiments in both the digital and physical world for classification and detection tasks on up-to-date models (e.g., Yolo-V5) and significantly demonstrate that our method outperforms state-of-the-art methods.
Although deep neural networks (DNNs) have been shown to be susceptible to image-agnostic adversarial attacks on natural image classification problems, the effects of such attacks on DNN-based texture recognition have yet to be explored. As part of our work, we find that limiting the perturbations $l_p$ norm in the spatial domain may not be a suitable way to restrict the perceptibility of universal adversarial perturbations for texture images. Based on the fact that human perception is affected by local visual frequency characteristics, we propose a frequency-tuned universal attack method to compute universal perturbations in the frequency domain. Our experiments indicate that our proposed method can produce less perceptible perturbations yet with a similar or higher white-box fooling rates on various DNN texture classifiers and texture datasets as compared to existing universal attack techniques. We also demonstrate that our approach can improve the attack robustness against defended models as well as the cross-dataset transferability for texture recognition problems.