No Arabic abstract
This paper presents a DNN bottleneck reinforcement scheme to alleviate the vulnerability of Deep Neural Networks (DNN) against adversarial attacks. Typical DNN classifiers encode the input image into a compressed latent representation more suitable for inference. This information bottleneck makes a trade-off between the image-specific structure and class-specific information in an image. By reinforcing the former while maintaining the latter, any redundant information, be it adversarial or not, should be removed from the latent representation. Hence, this paper proposes to jointly train an auto-encoder (AE) sharing the same encoding weights with the visual classifier. In order to reinforce the information bottleneck, we introduce the multi-scale low-pass objective and multi-scale high-frequency communication for better frequency steering in the network. Unlike existing approaches, our scheme is the first reforming defense per se which keeps the classifier structure untouched without appending any pre-processing head and is trained with clean images only. Extensive experiments on MNIST, CIFAR-10 and ImageNet demonstrate the strong defense of our method against various adversarial attacks.
Deep neural networks (DNNs) are vulnerable to adversarial noise. Their adversarial robustness can be improved by exploiting adversarial examples. However, given the continuously evolving attacks, models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples. To solve this problem, in this paper, we propose to remove adversarial noise by learning generalizable invariant features across attacks which maintain semantic classification information. Specifically, we introduce an adversarial feature learning mechanism to disentangle invariant features from adversarial noise. A normalization term has been proposed in the encoded space of the attack-invariant features to address the bias issue between the seen and unseen types of attacks. Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-the-art approaches, especially against unseen types of attacks and adaptive attacks.
Deep learning algorithms have been known to be vulnerable to adversarial perturbations in various tasks such as image classification. This problem was addressed by employing several defense methods for detection and rejection of particular types of attacks. However, training and manipulating networks according to particular defense schemes increases computational complexity of the learning algorithms. In this work, we propose a simple yet effective method to improve robustness of convolutional neural networks (CNNs) to adversarial attacks by using data dependent adaptive convolution kernels. To this end, we propose a new type of HyperNetwork in order to employ statistical properties of input data and features for computation of statistical adaptive maps. Then, we filter convolution weights of CNNs with the learned statistical maps to compute dynamic kernels. Thereby, weights and kernels are collectively optimized for learning of image classification models robust to adversarial attacks without employment of additional target detection and rejection algorithms. We empirically demonstrate that the proposed method enables CNNs to spontaneously defend against different types of attacks, e.g. attacks generated by Gaussian noise, fast gradient sign methods (Goodfellow et al., 2014) and a black-box attack(Narodytska & Kasiviswanathan, 2016).
The training of Deep Neural Networks (DNN) is costly, thus DNN can be considered as the intellectual properties (IP) of model owners. To date, most of the existing protection works focus on verifying the ownership after the DNN model is stolen, which cannot resist piracy in advance. To this end, we propose an active DNN IP protection method based on adversarial examples against DNN piracy, named ActiveGuard. ActiveGuard aims to achieve authorization control and users fingerprints management through adversarial examples, and can provide ownership verification. Specifically, ActiveGuard exploits the elaborate adversarial examples as users fingerprints to distinguish authorized users from unauthorized users. Legitimate users can enter fingerprints into DNN for identity authentication and authorized usage, while unauthorized users will obtain poor model performance due to an additional control layer. In addition, ActiveGuard enables the model owner to embed a watermark into the weights of DNN. When the DNN is illegally pirated, the model owner can extract the embedded watermark and perform ownership verification. Experimental results show that, for authorized users, the test accuracy of LeNet-5 and Wide Residual Network (WRN) models are 99.15% and 91.46%, respectively, while for unauthorized users, the test accuracy of the two DNNs are only 8.92% (LeNet-5) and 10% (WRN), respectively. Besides, each authorized user can pass the fingerprint authentication with a high success rate (up to 100%). For ownership verification, the embedded watermark can be successfully extracted, while the normal performance of the DNN model will not be affected. Further, ActiveGuard is demonstrated to be robust against fingerprint forgery attack, model fine-tuning attack and pruning attack.
Many deep learning algorithms can be easily fooled with simple adversarial examples. To address the limitations of existing defenses, we devised a probabilistic framework that can generate an exponentially large ensemble of models from a single model with just a linear cost. This framework takes advantage of neural network depth and stochastically decides whether or not to insert noise removal operators such as VAEs between layers. We show empirically the important role that model gradients have when it comes to determining transferability of adversarial examples, and take advantage of this result to demonstrate that it is possible to train models with limited adversarial attack transferability. Additionally, we propose a detection method based on metric learning in order to detect adversarial examples that have no hope of being cleaned of maliciously engineered noise.
There is now extensive evidence demonstrating that deep neural networks are vulnerable to adversarial examples, motivating the development of defenses against adversarial attacks. However, existing adversarial defenses typically improve model robustness against individual specific perturbation types. Some recent methods improve model robustness against adversarial attacks in multiple $ell_p$ balls, but their performance against each perturbation type is still far from satisfactory. To better understand this phenomenon, we propose the emph{multi-domain} hypothesis, stating that different types of adversarial perturbations are drawn from different domains. Guided by the multi-domain hypothesis, we propose emph{Gated Batch Normalization (GBN)}, a novel building block for deep neural networks that improves robustness against multiple perturbation types. GBN consists of a gated sub-network and a multi-branch batch normalization (BN) layer, where the gated sub-network separates different perturbation types, and each BN branch is in charge of a single perturbation type and learns domain-specific statistics for input transformation. Then, features from different branches are aligned as domain-invariant representations for the subsequent layers. We perform extensive evaluations of our approach on MNIST, CIFAR-10, and Tiny-ImageNet, and demonstrate that GBN outperforms previous defense proposals against multiple perturbation types, i.e, $ell_1$, $ell_2$, and $ell_{infty}$ perturbations, by large margins of 10-20%.