أظهرت الدراسات الحديثة أن النماذج العميقة العصبية المستندة إلى الشبكة المعرضة للأمثلة المصنوعة عن قصد، ويقترح أساليب مختلفة للدفاع ضد هجمات استبدال الكلمات العدائية لنماذج NLP العصبية. ومع ذلك، هناك نقص في الدراسة المنهجية حول مقارنة النهج الدفاعية المختلفة بموجب نفس الإعداد الهجومية. في هذه الورقة، نسعى إلى ملء فجوة الدراسات المنهجية من خلال أبحاث شاملة بشأن فهم سلوك مصنفات النص العصبي المدربين من قبل طرق دفاعية مختلفة بموجب هجمات المشدلات التمثيلية. بالإضافة إلى ذلك، نقترح طريقة فعالة لزيادة تحسين متانة المصنفات النصية العصبية ضد هذه الهجمات، وحققت أعلى دقة على كل من الأمثلة النظيفة والمنعدة على مجموعات بيانات Agnews و IMDB بمهامش مهم. نأمل أن توفر هذه الدراسة أدلة مفيدة للبحث في المستقبل على الدفاع المشددي النصي. تتوفر الرموز في https://github.com/rockylzy/textdefender.
Recent studies have shown that deep neural network-based models are vulnerable to intentionally crafted adversarial examples, and various methods have been proposed to defend against adversarial word-substitution attacks for neural NLP models. However, there is a lack of systematic study on comparing different defense approaches under the same attacking setting. In this paper, we seek to fill the gap of systematic studies through comprehensive researches on understanding the behavior of neural text classifiers trained by various defense methods under representative adversarial attacks. In addition, we propose an effective method to further improve the robustness of neural text classifiers against such attacks, and achieved the highest accuracy on both clean and adversarial examples on AGNEWS and IMDB datasets by a significant margin. We hope this study could provide useful clues for future research on text adversarial defense. Codes are available at https://github.com/RockyLzy/TextDefender.
References used
https://aclanthology.org/
Computational models of human language often involve combinatorial problems. For instance, a probabilistic parser may marginalize over exponentially many trees to make predictions. Algorithms for such problems often employ dynamic programming and are
The robustness and security of natural language processing (NLP) models are significantly important in real-world applications. In the context of text classification tasks, adversarial examples can be designed by substituting words with synonyms unde
We propose the first general-purpose gradient-based adversarial attack against transformer models. Instead of searching for a single adversarial example, we search for a distribution of adversarial examples parameterized by a continuous-valued matrix
Recent literatures have shown that knowledge graph (KG) learning models are highly vulnerable to adversarial attacks. However, there is still a paucity of vulnerability analyses of cross-lingual entity alignment under adversarial attacks. This paper
Word embeddings are a core component of modern natural language processing systems, making the ability to thoroughly evaluate them a vital task. We describe DiaLex, a benchmark for intrinsic evaluation of dialectal Arabic word embeddings. DiaLex cove