Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Online Defense of Trojaned Models using Misattributions

76 0 0.0 ( 0 )

Download Cite

Added by Panagiota Kiourti

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Panagiota Kiourti - Wenchao Li - Anirban Roy

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper proposes a new approach to detecting neural Trojans on Deep Neural Networks during inference. This approach is based on monitoring the inference of a machine learning model, computing the attribution of the models decision on different features of the input, and then statistically analyzing these attributions to detect whether an input sample contains the Trojan trigger. The anomalous attributions, aka misattributions, are then accompanied by reverse-engineering of the trigger to evaluate whether the input sample is truly poisoned with a Trojan trigger. We evaluate our approach on several benchmarks, including models trained on MNIST, Fashion MNIST, and German Traffic Sign Recognition Benchmark, and demonstrate the state of the art detection accuracy.

rate research

Harnessing adversarial examples with a surprisingly simple defense

74 - Ali Borji 2020

I introduce a very simple method to defend against adversarial examples. The basic idea is to raise the slope of the ReLU function at the test time. Experiments over MNIST and CIFAR-10 datasets demonstrate the effectiveness of the proposed defense against a number of strong attacks in both untargeted and targeted settings. While perhaps not as effective as the state of the art adversarial defenses, this approach can provide insights to understand and mitigate adversarial attacks. It can also be used in conjunction with other defenses.

Cryptography and Security Computer Vision and Pattern Recognition Machine Learning

Malware Evasion Attack and Defense

151 - Yonghong Huang , Utkarsh Verma , Celeste Fralick 2019

Machine learning (ML) classifiers are vulnerable to adversarial examples. An adversarial example is an input sample which is slightly modified to induce misclassification in an ML classifier. In this work, we investigate white-box and grey-box evasion attacks to an ML-based malware detector and conduct performance evaluations in a real-world setting. We compare the defense approaches in mitigating the attacks. We propose a framework for deploying grey-box and black-box attacks to malware detection systems.

Cryptography and Security Machine Learning Machine Learning

Responsible Disclosure of Generative Models Using Scalable Fingerprinting

289 - Ning Yu , Vladislav Skripniuk , Dingfan Chen 2020

Over the past six years, deep generative models have achieved a qualitatively new level of performance. Generated data has become difficult, if not impossible, to be distinguished from real data. While there are plenty of use cases that benefit from this technology, there are also strong concerns on how this new technology can be misused to spoof sensors, generate deep fakes, and enable misinformation at scale. Unfortunately, current deep fake detection methods are not sustainable, as the gap between real and fake continues to close. In contrast, our work enables a responsible disclosure of such state-of-the-art generative models, that allows researchers and companies to fingerprint their models, so that the generated samples containing a fingerprint can be accurately detected and attributed to a source. Our technique achieves this by an efficient and scalable ad-hoc generation of a large population of models with distinct fingerprints. Our recommended operation point uses a 128-bit fingerprint which in principle results in more than $10^{36}$ identifiable models. Experiments show that our method fulfills key properties of a fingerprinting mechanism and achieves effectiveness in deep fake detection and attribution.

Cryptography and Security Computer Vision and Pattern Recognition Computers and Society

Attack as Defense: Characterizing Adversarial Examples using Robustness

80 - Zhe Zhao , Guangke Chen , Jingyi Wang 2021

As a new programming paradigm, deep learning has expanded its application to many real-world problems. At the same time, deep learning based software are found to be vulnerable to adversarial attacks. Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks. In this work, we propose a novel characterization to distinguish adversarial examples from benign ones based on the observation that adversarial examples are significantly less robust than benign ones. As existing robustness measurement does not scale to large networks, we propose a novel defense framework, named attack as defense (A2D), to detect adversarial examples by effectively evaluating an examples robustness. A2D uses the cost of attacking an input for robustness evaluation and identifies those less robust examples as adversarial since less robust examples are easier to attack. Extensive experiment results on MNIST, CIFAR10 and ImageNet show that A2D is more effective than recent promising approaches. We also evaluate our defence against potential adaptive attacks and show that A2D is effective in defending carefully designed adaptive attacks, e.g., the attack success rate drops to 0% on CIFAR10.

Cryptography and Security Artificial Intelligence Software Engineering

Detecting Trojaned DNNs Using Counterfactual Attributions

250 - Karan Sikka , Indranil Sur , Susmit Jha 2020

We target the problem of detecting Trojans or backdoors in DNNs. Such models behave normally with typical inputs but produce specific incorrect predictions for inputs poisoned with a Trojan trigger. Our approach is based on a novel observation that the trigger behavior depends on a few ghost neurons that activate on trigger pattern and exhibit abnormally higher relative attribution for wrong decisions when activated. Further, these trigger neurons are also active on normal inputs of the target class. Thus, we use counterfactual attributions to localize these ghost neurons from clean inputs and then incrementally excite them to observe changes in the models accuracy. We use this information for Trojan detection by using a deep set encoder that enables invariance to the number of model classes, architecture, etc. Our approach is implemented in the TrinityAI tool that exploits the synergies between trustworthiness, resilience, and interpretability challenges in deep learning. We evaluate our approach on benchmarks with high diversity in model architectures, triggers, etc. We show consistent gains (+10%) over state-of-the-art methods that rely on the susceptibility of the DNN to specific adversarial attacks, which in turn requires strong assumptions on the nature of the Trojan attack.

Machine Learning Artificial Intelligence Computer Vision and Pattern Recognition

comments

Fetching comments

Sham Higher Institute of Forensic Sciences and the Arabic language and Islamic studies and research

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Online Defense of Trojaned Models using Misattributions

Ask ChatGPT about the research

No Arabic abstract

Read More