On the Robustness of the Backdoor-based Watermarking in Deep Neural Networks

115 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Masoumeh Shafieinejad

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Masoumeh Shafieinejad - Jiaqi Wang - Nils Lukas

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Obtaining the state of the art performance of deep learning models imposes a high cost to model generators, due to the tedious data preparation and the substantial processing requirements. To protect the model from unauthorized re-distribution, watermarking approaches have been introduced in the past couple of years. We investigate the robustness and reliability of state-of-the-art deep neural network watermarking schemes. We focus on backdoor-based watermarking and propose two -- a black-box and a white-box -- attacks that remove the watermark. Our black-box attack steals the model and removes the watermark with minimum requirements; it just relies on public unlabeled data and a black-box access to the classification label. It does not need classification confidences or access to the models sensitive information such as the training data set, the trigger set or the model parameters. The white-box attack, proposes an efficient watermark removal when the parameters of the marked model are available; our white-box attack does not require access to the labeled data or the trigger set and improves the runtime of the black-box attack up to seventeen times. We as well prove the security inadequacy of the backdoor-based watermarking in keeping the watermark undetectable by proposing an attack that detects whether a model contains a watermark. Our attacks show that a recipient of a marked model can remove a backdoor-based watermark with significantly less effort than training a new model and some other techniques are needed to protect against re-distribution by a motivated attacker.

قيم البحث

205 - Linyi Li , Xiangyu Qi , Tao Xie 2020

Great advancement in deep neural networks (DNNs) has led to state-of-the-art performance on a wide range of tasks. However, recent studies have shown that DNNs are vulnerable to adversarial attacks, which have brought great concerns when deploying th ese models to safety-critical applications such as autonomous driving. Different defense approaches have been proposed against adversarial attacks, including: 1) empirical defenses, which can be adaptively attacked again without providing robustness certification; and 2) certifiably robust approaches, which consist of robustness verification providing the lower bound of robust accuracy against any attacks under certain conditions and corresponding robust training approaches. In this paper, we focus on these certifiably robust approaches and provide the first work to perform large-scale systematic analysis of different robustness verification and training approaches. In particular, we 1) provide a taxonomy for the robustness verification and training approaches, as well as discuss the detailed methodologies for representative algorithms, 2) reveal the fundamental connections among these approaches, 3) discuss current research progresses, theoretical barriers, main challenges, and several promising future directions for certified defenses for DNNs, and 4) provide an open-sourced unified platform to evaluate 20+ representative verification and corresponding robust training approaches on a wide range of DNNs.

التعلم الآلي التشفير والأمن التعلم الالي

Regula Sub-rosa: Latent Backdoor Attacks on Deep Neural Networks

125 - Yuanshun Yao , Huiying Li , Haitao Zheng 2019

Recent work has proposed the concept of backdoor attacks on deep neural networks (DNNs), where misbehaviors are hidden inside normal models, only to be triggered by very specific inputs. In practice, however, these attacks are difficult to perform an d highly constrained by sharing of models through transfer learning. Adversaries have a small window during which they must compromise the student model before it is deployed. In this paper, we describe a significantly more powerful variant of the backdoor attack, latent backdoors, where hidden rules can be embedded in a single Teacher model, and automatically inherited by all Student models through the transfer learning process. We show that latent backdoors can be quite effective in a variety of application contexts, and validate its practicality through real-world attacks against traffic sign recognition, iris identification of lab volunteers, and facial recognition of public figures (politicians). Finally, we evaluate 4 potential defenses, and find that only one is effective in disrupting latent backdoors, but might incur a cost in classification accuracy as tradeoff.

التعلم الآلي التشفير والأمن

Explainability-based Backdoor Attacks Against Graph Neural Networks

91 - Jing Xu , Minhui Xue , Stjepan Picek 2021

Backdoor attacks represent a serious threat to neural network models. A backdoored model will misclassify the trigger-embedded inputs into an attacker-chosen target label while performing normally on other benign inputs. There are already numerous wo rks on backdoor attacks on neural networks, but only a few works consider graph neural networks (GNNs). As such, there is no intensive research on explaining the impact of trigger injecting position on the performance of backdoor attacks on GNNs. To bridge this gap, we conduct an experimental investigation on the performance of backdoor attacks on GNNs. We apply two powerful GNN explainability approaches to select the optimal trigger injecting position to achieve two attacker objectives -- high attack success rate and low clean accuracy drop. Our empirical results on benchmark datasets and state-of-the-art neural network models demonstrate the proposed methods effectiveness in selecting trigger injecting position for backdoor attacks on GNNs. For instance, on the node classification task, the backdoor attack with trigger injecting position selected by GraphLIME reaches over $84 %$ attack success rate with less than $2.5 %$ accuracy drop

التعلم الآلي التشفير والأمن

Non-Determinism in Neural Networks for Adversarial Robustness

84 - Daanish Ali Khan , Linhong Li , Ninghao Sha 2019

Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas. In most instances where these tasks are deployed in real-world scenarios, the models used in them have been shown to be susceptible to adversarial attacks, making it imperative for us to address the challenge of their adversarial robustness. Existing techniques for adversarial robustness fall into three broad categories: defensive distillation techniques, adversarial training techniques, and randomized or non-deterministic model based techniques. In this paper, we propose a novel neural network paradigm that falls under the category of randomized models for adversarial robustness, but differs from all existing techniques under this category in that it models each parameter of the network as a statistical distribution with learnable parameters. We show experimentally that this framework is highly robust to a variety of white-box and black-box adversarial attacks, while preserving the task-specific performance of the traditional neural network model.

التعلم الآلي التشفير والأمن التعلم الالي

Subnet Replacement: Deployment-stage backdoor attack against deep neural networks in gray-box setting

125 - Xiangyu Qi , Jifeng Zhu , Chulin Xie 2021

We study the realistic potential of conducting backdoor attack against deep neural networks (DNNs) during deployment stage. Specifically, our goal is to design a deployment-stage backdoor attack algorithm that is both threatening and realistically im plementable. To this end, we propose Subnet Replacement Attack (SRA), which is capable of embedding backdoor into DNNs by directly modifying a limited number of model parameters. Considering the realistic practicability, we abandon the strong white-box assumption widely adopted in existing studies, instead, our algorithm works in a gray-box setting, where architecture information of the victim model is available but the adversaries do not have any knowledge of parameter values. The key philosophy underlying our approach is -- given any neural network instance (regardless of its specific parameter values) of a certain architecture, we can always embed a backdoor into that model instance, by replacing a very narrow subnet of a benign model (without backdoor) with a malicious backdoor subnet, which is designed to be sensitive (fire large activation value) to a particular backdoor trigger pattern.

التعلم الآلي التشفير والأمن