ﻻ يوجد ملخص باللغة العربية
Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an adversarial perturbation has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of adversarial perturbations to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of using publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original model predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within an $ell_{infty}$ norm around the original model weights. We introduce adversarial perturbations in the model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent. We empirically show that these adversarial weight perturbations exist universally across several computer vision and natural language processing tasks. Our results show that backdoors can be successfully injected with a very small average relative change in model weight values for several applications.
Since training a large-scale backdoored model from scratch requires a large training dataset, several recent attacks have considered to inject backdoors into a trained clean model without altering model behaviors on the clean data. Previous work find
Adversarial examples are delicately perturbed inputs, which aim to mislead machine learning models towards incorrect outputs. While most of the existing work focuses on generating adversarial perturbations in multi-class classification problems, many
Deep Neural Networks, despite their great success in diverse domains, are provably sensitive to small perturbations on correctly classified examples and lead to erroneous predictions. Recently, it was proposed that this behavior can be combatted by o
Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to ca
Recent breakthroughs in the field of deep learning have led to advancements in a broad spectrum of tasks in computer vision, audio processing, natural language processing and other areas. In most instances where these tasks are deployed in real-world