No Arabic abstract
Modern deep neural networks(DNNs) are vulnerable to adversarial samples. Sparse adversarial samples are a special branch of adversarial samples that can fool the target model by only perturbing a few pixels. The existence of the sparse adversarial attack points out that DNNs are much more vulnerable than people believed, which is also a new aspect for analyzing DNNs. However, current sparse adversarial attack methods still have some shortcomings on both sparsity and invisibility. In this paper, we propose a novel two-stage distortion-aware greedy-based method dubbed as GreedyFool. Specifically, it first selects the most effective candidate positions to modify by considering both the gradient(for adversary) and the distortion map(for invisibility), then drops some less important points in the reduce stage. Experiments demonstrate that compared with the start-of-the-art method, we only need to modify $3times$ fewer pixels under the same sparse perturbation setting. For target attack, the success rate of our method is 9.96% higher than the start-of-the-art method under the same pixel budget. Code can be found at https://github.com/LightDXY/GreedyFool.
Image denoising can remove natural noise that widely exists in images captured by multimedia devices due to low-quality imaging sensors, unstable image transmission processes, or low light conditions. Recent works also find that image denoising benefits the high-level vision tasks, e.g., image classification. In this work, we try to challenge this common sense and explore a totally new problem, i.e., whether the image denoising can be given the capability of fooling the state-of-the-art deep neural networks (DNNs) while enhancing the image quality. To this end, we initiate the very first attempt to study this problem from the perspective of adversarial attack and propose the adversarial denoise attack. More specifically, our main contributions are three-fold: First, we identify a new task that stealthily embeds attacks inside the image denoising module widely deployed in multimedia devices as an image post-processing operation to simultaneously enhance the visual image quality and fool DNNs. Second, we formulate this new task as a kernel prediction problem for image filtering and propose the adversarial-denoising kernel prediction that can produce adversarial-noiseless kernels for effective denoising and adversarial attacking simultaneously. Third, we implement an adaptive perceptual region localization to identify semantic-related vulnerability regions with which the attack can be more effective while not doing too much harm to the denoising. We name the proposed method as Pasadena (Perceptually Aware and Stealthy Adversarial DENoise Attack) and validate our method on the NeurIPS17 adversarial competition dataset, CVPR2021-AIC-VI: unrestricted adversarial attacks on ImageNet,etc. The comprehensive evaluation and analysis demonstrate that our method not only realizes denoising but also achieves a significantly higher success rate and transferability over state-of-the-art attacks.
We propose LSDAT, an image-agnostic decision-based black-box attack that exploits low-rank and sparse decomposition (LSD) to dramatically reduce the number of queries and achieve superior fooling rates compared to the state-of-the-art decision-based methods under given imperceptibility constraints. LSDAT crafts perturbations in the low-dimensional subspace formed by the sparse component of the input sample and that of an adversarial sample to obtain query-efficiency. The specific perturbation of interest is obtained by traversing the path between the input and adversarial sparse components. It is set forth that the proposed sparse perturbation is the most aligned sparse perturbation with the shortest path from the input sample to the decision boundary for some initial adversarial sample (the best sparse approximation of shortest path, likely to fool the model). Theoretical analyses are provided to justify the functionality of LSDAT. Unlike other dimensionality reduction based techniques aimed at improving query efficiency (e.g, ones based on FFT), LSD works directly in the image pixel domain to guarantee that non-$ell_2$ constraints, such as sparsity, are satisfied. LSD offers better control over the number of queries and provides computational efficiency as it performs sparse decomposition of the input and adversarial images only once to generate all queries. We demonstrate $ell_0$, $ell_2$ and $ell_infty$ bounded attacks with LSDAT to evince its efficiency compared to baseline decision-based attacks in diverse low-query budget scenarios as outlined in the experiments.
High-level representation-guided pixel denoising and adversarial training are independent solutions to enhance the robustness of CNNs against adversarial attacks by pre-processing input data and re-training models, respectively. Most recently, adversarial training techniques have been widely studied and improved while the pixel denoising-based method is getting less attractive. However, it is still questionable whether there exists a more advanced pixel denoising-based method and whether the combination of the two solutions benefits each other. To this end, we first comprehensively investigate two kinds of pixel denoising methods for adversarial robustness enhancement (i.e., existing additive-based and unexplored filtering-based methods) under the loss functions of image-level and semantic-level restorations, respectively, showing that pixel-wise filtering can obtain much higher image quality (e.g., higher PSNR) as well as higher robustness (e.g., higher accuracy on adversarial examples) than existing pixel-wise additive-based method. However, we also observe that the robustness results of the filtering-based method rely on the perturbation amplitude of adversarial examples used for training. To address this problem, we propose predictive perturbation-aware pixel-wise filtering, where dual-perturbation filtering and an uncertainty-aware fusion module are designed and employed to automatically perceive the perturbation amplitude during the training and testing process. The proposed method is termed as AdvFilter. Moreover, we combine adversarial pixel denoising methods with three adversarial training-based methods, hinting that considering data and models jointly is able to achieve more robust CNNs. The experiments conduct on NeurIPS-2017DEV, SVHN, and CIFAR10 datasets and show the advantages over enhancing CNNs robustness, high generalization to different models, and noise levels.
Adversarial attacks find perturbations that can fool models into misclassifying images. Previous works had successes in generating noisy/edge-rich adversarial perturbations, at the cost of degradation of image quality. Such perturbations, even when they are small in scale, are usually easily spottable by human vision. In contrast, we propose Harmonic Adversar- ial Attack Methods (HAAM), that generates edge-free perturbations by using harmonic functions. The property of edge-free guarantees that the generated adversarial images can still preserve visual quality, even when perturbations are of large magnitudes. Experiments also show that adversaries generated by HAAM often have higher rates of success when transferring between models. In addition, we find harmonic perturbations can simulate natural phenomena like natural lighting and shadows. It would then be possible to help find corner cases for given models, as a first step to improving them.
In recent years, adversarial attacks have drawn more attention for their value on evaluating and improving the robustness of machine learning models, especially, neural network models. However, previous attack methods have mainly focused on applying some $l^p$ norm-bounded noise perturbations. In this paper, we instead introduce a novel adversarial attack method based on haze, which is a common phenomenon in real-world scenery. Our method can synthesize potentially adversarial haze into an image based on the atmospheric scattering model with high realisticity and mislead classifiers to predict an incorrect class. We launch experiments on two popular datasets, i.e., ImageNet and NIPS~2017. We demonstrate that the proposed method achieves a high success rate, and holds better transferability across different classification models than the baselines. We also visualize the correlation matrices, which inspire us to jointly apply different perturbations to improve the success rate of the attack. We hope this work can boost the development of non-noise-based adversarial attacks and help evaluate and improve the robustness of DNNs.