Learning and Certification under Instance-targeted Poisoning


Abstract in English

In this paper, we study PAC learnability and certification of predictions under instance-targeted poisoning attacks, where the adversary who knows the test instance may change a fraction of the training set with the goal of fooling the learner at the test instance. Our first contribution is to formalize the problem in various settings and to explicitly model subtle aspects such as the proper or improper nature of the learning, learners randomness, and whether (or not) adversarys attack can depend on it. Our main result shows that when the budget of the adversary scales sublinearly with the sample complexity, (improper) PAC learnability and certification are achievable; in contrast, when the adversarys budget grows linearly with the sample complexity, the adversary can potentially drive up the expected 0-1 loss to one. We also study distribution-specific PAC learning in the same attack model and show that proper learning with certification is possible for learning half spaces under natural distributions. Finally, we empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets against targeted-poisoning attacks. Our experimental results show that many models, especially state-of-the-art neural networks, are indeed vulnerable to these strong attacks. Interestingly, we observe that methods with high standard accuracy might be more vulnerable to instance-targeted poisoning attacks.

Download