Essential Features: Content-Adaptive Pixel Discretization to Improve Model Robustness to Adaptive Adversarial Attacks


الملخص بالإنكليزية

To remove the effects of adversarial perturbations, preprocessing defenses such as pixel discretization are appealing due to their simplicity but have so far been shown to be ineffective except on simple datasets such as MNIST, leading to the belief that pixel discretization approaches are doomed to failure as a defense technique. This paper revisits the pixel discretization approaches. We hypothesize that the reason why existing approaches have failed is that they have used a fixed codebook for the entire dataset. In particular, we find that can lead to situations where images become more susceptible to adversarial perturbations and also suffer significant loss of accuracy after discretization. We propose a novel image preprocessing technique called Essential Features that uses an adaptive codebook that is based on per-image content and threat model. Essential Features adaptively selects a separable set of color clusters for each image to reduce the color space while preserving the pertinent features of the original image, maximizing both separability and representation of colors. Additionally, to limit the adversarys ability to influence the chosen color clusters, Essential Features takes advantage of spatial correlation with an adaptive blur that moves pixels closer to their original value without destroying original edge information. We design several adaptive attacks and find that our approach is more robust than previous baselines on $L_infty$ and $L_2$ bounded attacks for several challenging datasets including CIFAR-10, GTSRB, RESISC45, and ImageNet.

تحميل البحث