ﻻ يوجد ملخص باللغة العربية
In this paper, we study PAC learnability and certification of predictions under instance-targeted poisoning attacks, where the adversary who knows the test instance may change a fraction of the training set with the goal of fooling the learner at the test instance. Our first contribution is to formalize the problem in various settings and to explicitly model subtle aspects such as the proper or improper nature of the learning, learners randomness, and whether (or not) adversarys attack can depend on it. Our main result shows that when the budget of the adversary scales sublinearly with the sample complexity, (improper) PAC learnability and certification are achievable; in contrast, when the adversarys budget grows linearly with the sample complexity, the adversary can potentially drive up the expected 0-1 loss to one. We also study distribution-specific PAC learning in the same attack model and show that proper learning with certification is possible for learning half spaces under natural distributions. Finally, we empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets against targeted-poisoning attacks. Our experimental results show that many models, especially state-of-the-art neural networks, are indeed vulnerable to these strong attacks. Interestingly, we observe that methods with high standard accuracy might be more vulnerable to instance-targeted poisoning attacks.
Certifiers for neural networks have made great progress towards provable robustness guarantees against evasion attacks using adversarial examples. However, introducing certifiers into deep learning systems also opens up new attack vectors, which need
Data poisoning is an attack on machine learning models wherein the attacker adds examples to the training set to manipulate the behavior of the model at test time. This paper explores poisoning attacks on neural nets. The proposed attacks use clean-l
In a poisoning attack, an adversary with control over a small fraction of the training data attempts to select that data in a way that induces a corrupted model that misbehaves in favor of the adversary. We consider poisoning attacks against convex m
Backdoor attacks inject poisoning samples during training, with the goal of enforcing a machine-learning model to output an attacker-chosen class when presented a specific trigger at test time. Although backdoor attacks have been demonstrated in a va
A backdoor data poisoning attack is an adversarial attack wherein the attacker injects several watermarked, mislabeled training examples into a training set. The watermark does not impact the test-time performance of the model on typical data; howeve