ﻻ يوجد ملخص باللغة العربية
Recent work has demonstrated the vulnerability of modern text classifiers to universal adversarial attacks, which are input-agnostic sequences of words added to text processed by classifiers. Despite being successful, the word sequences produced in such attacks are often ungrammatical and can be easily distinguished from natural text. We develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems when added to benign inputs. We leverage an adversarially regularized autoencoder (ARAE) to generate triggers and propose a gradient-based search that aims to maximize the downstream classifiers prediction loss. Our attacks effectively reduce model accuracy on classification tasks while being less identifiable than prior models as per automatic detection metrics and human-subject studies. Our aim is to demonstrate that adversarial attacks can be made harder to detect than previously thought and to enable the development of appropriate defenses.
There are now many adversarial attacks for natural language processing systems. Of these, a vast majority achieve success by modifying individual document tokens, which we call here a textit{token-modification} attack. Each token-modification attack
We find that images contain intrinsic structure that enables the reversal of many adversarial attacks. Attack vectors cause not only image classifiers to fail, but also collaterally disrupt incidental structure in the image. We demonstrate that modif
Adversarial attacks against machine learning models have threatened various real-world applications such as spam filtering and sentiment analysis. In this paper, we propose a novel framework, learning to DIScriminate Perturbations (DISP), to identify
Deep neural networks (DNNs) are known to be vulnerable to adversarial images, while their robustness in text classification is rarely studied. Several lines of text attack methods have been proposed in the literature, including character-level, word-
Texts convey sophisticated knowledge. However, texts also convey sensitive information. Despite the success of general-purpose language models and domain-specific mechanisms with differential privacy (DP), existing text sanitization mechanisms still