ReabsNet: Detecting and Revising Adversarial Examples

175 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jiefeng Chen

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jiefeng Chen - Zihang Meng - Changtian Sun

التعلم الآلي التشفير والأمن

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Though deep neural network has hit a huge success in recent studies and applica- tions, it still remains vulnerable to adversarial perturbations which are imperceptible to humans. To address this problem, we propose a novel network called ReabsNet to achieve high classification accuracy in the face of various attacks. The approach is to augment an existing classification network with a guardian network to detect if a sample is natural or has been adversarially perturbed. Critically, instead of simply rejecting adversarial examples, we revise them to get their true labels. We exploit the observation that a sample containing adversarial perturbations has a possibility of returning to its true class after revision. We demonstrate that our ReabsNet outperforms the state-of-the-art defense method under various adversarial attacks.

قيم البحث

188 - Jinyu Tian , Jiantao Zhou , Yuanman Li 2021

Deep neural networks (DNNs) have been shown to be vulnerable against adversarial examples (AEs), which are maliciously designed to cause dramatic model output errors. In this work, we reveal that normal examples (NEs) are insensitive to the fluctuati ons occurring at the highly-curved region of the decision boundary, while AEs typically designed over one single domain (mostly spatial domain) exhibit exorbitant sensitivity on such fluctuations. This phenomenon motivates us to design another classifier (called dual classifier) with transformed decision boundary, which can be collaboratively used with the original classifier (called primal classifier) to detect AEs, by virtue of the sensitivity inconsistency. When comparing with the state-of-the-art algorithms based on Local Intrinsic Dimensionality (LID), Mahalanobis Distance (MD), and Feature Squeezing (FS), our proposed Sensitivity Inconsistency Detector (SID) achieves improved AE detection performance and superior generalization capabilities, especially in the challenging cases where the adversarial perturbation levels are small. Intensive experimental results on ResNet and VGG validate the superiority of the proposed SID.

التعلم الآلي التشفير والأمن الرؤية الحاسوبية وتمييز الأنماط

Using Anomaly Feature Vectors for Detecting, Classifying and Warning of Outlier Adversarial Examples

441 - Nelson Manohar-Alers , Ryan Feng , Sahib Singh 2021

We present DeClaW, a system for detecting, classifying, and warning of adversarial inputs presented to a classification neural network. In contrast to current state-of-the-art methods that, given an input, detect whether an input is clean or adversar ial, we aim to also identify the types of adversarial attack (e.g., PGD, Carlini-Wagner or clean). To achieve this, we extract statistical profiles, which we term as anomaly feature vectors, from a set of latent features. Preliminary findings suggest that AFVs can help distinguish among several types of adversarial attacks (e.g., PGD versus Carlini-Wagner) with close to 93% accuracy on the CIFAR-10 dataset. The results open the door to using AFV-based methods for exploring not only adversarial attack detection but also classification of the attack type and then design of attack-specific mitigation strategies.

التعلم الآلي التشفير والأمن

Detecting Adversarial Examples via Neural Fingerprinting

109 - Sumanth Dathathri , Stephan Zheng , Tianwei Yin 2018

Deep neural networks are vulnerable to adversarial examples, which dramatically alter model output using small input changes. We propose Neural Fingerprinting, a simple, yet effective method to detect adversarial examples by verifying whether model b ehavior is consistent with a set of secret fingerprints, inspired by the use of biometric and cryptographic signatures. The benefits of our method are that 1) it is fast, 2) it is prohibitively expensive for an attacker to reverse-engineer which fingerprints were used, and 3) it does not assume knowledge of the adversary. In this work, we pose a formal framework to analyze fingerprints under various threat models, and characterize Neural Fingerprinting for linear models. For complex neural networks, we empirically demonstrate that Neural Fingerprinting significantly improves on state-of-the-art detection mechanisms by detecting the strongest known adversarial attacks with 98-100% AUC-ROC scores on the MNIST, CIFAR-10 and MiniImagenet (20 classes) datasets. In particular, the detection accuracy of Neural Fingerprinting generalizes well to unseen test-data under various black- and whitebox threat models, and is robust over a wide range of hyperparameters and choices of fingerprints.

التعلم الآلي

Detecting Adversarial Examples for Speech Recognition via Uncertainty Quantification

97 - Sina Daubener , Lea Schonherr , Asja Fischer 2020

Machine learning systems and also, specifically, automatic speech recognition (ASR) systems are vulnerable against adversarial attacks, where an attacker maliciously changes the input. In the case of ASR systems, the most interesting cases are target ed attacks, in which an attacker aims to force the system into recognizing given target transcriptions in an arbitrary audio sample. The increasing number of sophisticated, quasi imperceptible attacks raises the question of countermeasures. In this paper, we focus on hybrid ASR systems and compare four acoustic models regarding their ability to indicate uncertainty under attack: a feed-forward neural network and three neural networks specifically designed for uncertainty quantification, namely a Bayesian neural network, Monte Carlo dropout, and a deep ensemble. We employ uncertainty measures of the acoustic model to construct a simple one-class classification model for assessing whether inputs are benign or adversarial. Based on this approach, we are able to detect adversarial examples with an area under the receiving operator curve score of more than 0.99. The neural networks for uncertainty quantification simultaneously diminish the vulnerability to the attack, which is reflected in a lower recognition accuracy of the malicious target text in comparison to a standard hybrid ASR system.

معالجة الصوت والكلام التشفير والأمن التعلم الآلي

Detecting Potential Local Adversarial Examples for Human-Interpretable Defense

90 - Xavier Renard , Thibault Laugel , Marie-Jeanne Lesot 2018

Machine learning models are increasingly used in the industry to make decisions such as credit insurance approval. Some people may be tempted to manipulate specific variables, such as the age or the salary, in order to get better chances of approval. In this ongoing work, we propose to discuss, with a first proposition, the issue of detecting a potential local adversarial example on classical tabular data by providing to a human expert the locally critical features for the classifiers decision, in order to control the provided information and avoid a fraud.

التعلم الالي التشفير والأمن التعلم الآلي