ﻻ يوجد ملخص باللغة العربية
Discrete adversarial attacks are symbolic perturbations to a language input that preserve the output label but lead to a prediction error. While such attacks have been extensively explored for the purpose of evaluating model robustness, their utility for improving robustness has been limited to offline augmentation only, i.e., given a trained model, attacks are used to generate perturbed (adversarial) examples, and the model is re-trained exactly once. In this work, we address this gap and leverage discrete attacks for online augmentation, where adversarial examples are generated at every step, adapting to the changing nature of the model. We also consider efficient attacks based on random sampling, that unlike prior work are not based on expensive search-based procedures. As a second contribution, we provide a general formulation for multiple search-based attacks from past work, and propose a new attack based on best-first search. Surprisingly, we find that random sampling leads to impressive gains in robustness, outperforming the commonly-used offline augmentation, while leading to a speedup at training time of ~10x. Furthermore, online augmentation with search-based attacks justifies the higher training cost, significantly improving robustness on three datasets. Last, we show that our proposed algorithm substantially improves robustness compared to prior methods.
Adversarial training is one of the most effective approaches defending against adversarial examples for deep learning models. Unlike other defense strategies, adversarial training aims to promote the robustness of models intrinsically. During the las
Contrastive learning (CL) has recently emerged as an effective approach to learning representation in a range of downstream tasks. Central to this approach is the selection of positive (similar) and negative (dissimilar) sets to provide the model the
Adversarial training can considerably robustify deep neural networks to resist adversarial attacks. However, some works suggested that adversarial training might comprise the privacy-preserving and generalization abilities. This paper establishes and
Recent research has made the surprising finding that state-of-the-art deep learning models sometimes fail to generalize to small variations of the input. Adversarial training has been shown to be an effective approach to overcome this problem. Howeve
Alongside the well-publicized accomplishments of deep neural networks there has emerged an apparent bug in their success on tasks such as object recognition: with deep models trained using vanilla methods, input images can be slightly corrupted in or