On the Limitations of Denoising Strategies as Adversarial Defenses


الملخص بالإنكليزية

As adversarial attacks against machine learning models have raised increasing concerns, many denoising-based defense approaches have been proposed. In this paper, we summarize and analyze the defense strategies in the form of symmetric transformation via data denoising and reconstruction (denoted as $F+$ inverse $F$, $F-IF$ Framework). In particular, we categorize these denoising strategies from three aspects (i.e. denoising in the spatial domain, frequency domain, and latent space, respectively). Typically, defense is performed on the entire adversarial example, both image and perturbation are modified, making it difficult to tell how it defends against the perturbations. To evaluate the robustness of these denoising strategies intuitively, we directly apply them to defend against adversarial noise itself (assuming we have obtained all of it), which saving us from sacrificing benign accuracy. Surprisingly, our experimental results show that even if most of the perturbations in each dimension is eliminated, it is still difficult to obtain satisfactory robustness. Based on the above findings and analyses, we propose the adaptive compression strategy for different frequency bands in the feature domain to improve the robustness. Our experiment results show that the adaptive compression strategies enable the model to better suppress adversarial perturbations, and improve robustness compared with existing denoising strategies.

تحميل البحث