أظهر العمل الحديث مدى ضعف مصنف النصوص الحديثة للهجمات الخصومة العالمية، والتي هي تسلسل مدخلات غير مرغقة من الكلمات المضافة إلى النص المصنوع من قبل المصنفين. على الرغم من أن تكون ناجحة، فإن تسلسل الكلمات المنتجة في هذه الهجمات غالبا ما تكون غير رسمية ويمكن تمييزها بسهولة عن النص الطبيعي. نقوم بتطوير هجمات عدائية تظهر أقرب إلى عبارات اللغة الإنجليزية الطبيعية وحتى الآن أنظمة التصنيف عند إضافتها إلى المدخلات الحميدة. نحن نستفيد من AutoNCoder المنعصنة (ARAE) لتوليد المشغلات واقتراح بحث يستند إلى التدرج يهدف إلى زيادة فقدان تنبؤ التنبؤ بالتنبؤ في المصب. تقلل هجماتنا بشكل فعال دقة النموذج على مهام التصنيف مع كونها أقل تحديدا من النماذج السابقة وفقا لمقاييس الكشف التلقائي والدراسات البشرية. هدفنا هو إثبات أن الهجمات المشنة يمكن أن تكتشف أكثر صعوبة مما كان يعتقد سابقا وتمكين تطوير الدفاعات المناسبة.
Recent work has demonstrated the vulnerability of modern text classifiers to universal adversarial attacks, which are input-agnostic sequences of words added to text processed by classifiers. Despite being successful, the word sequences produced in such attacks are often ungrammatical and can be easily distinguished from natural text. We develop adversarial attacks that appear closer to natural English phrases and yet confuse classification systems when added to benign inputs. We leverage an adversarially regularized autoencoder (ARAE) to generate triggers and propose a gradient-based search that aims to maximize the downstream classifier's prediction loss. Our attacks effectively reduce model accuracy on classification tasks while being less identifiable than prior models as per automatic detection metrics and human-subject studies. Our aim is to demonstrate that adversarial attacks can be made harder to detect than previously thought and to enable the development of appropriate defenses.
References used
https://aclanthology.org/
Deep neural networks are vulnerable to adversarial attacks, where a small perturbation to an input alters the model prediction. In many cases, malicious inputs intentionally crafted for one model can fool another model. In this paper, we present the
Contextual representations learned by language models can often encode undesirable attributes, like demographic associations of the users, while being trained for an unrelated target task. We aim to scrub such undesirable attributes and learn fair re
Continual learning has become increasingly important as it enables NLP models to constantly learn and gain knowledge over time. Previous continual learning methods are mainly designed to preserve knowledge from previous tasks, without much emphasis o
Deep learning is at the heart of the current rise of artificial intelligence. In the field of Computer Vision, it has become the workhorse for applications ranging from self-driving cars to surveillance and security. Whereas deep neural networks have
Providing pretrained language models with simple task descriptions in natural language enables them to solve some tasks in a fully unsupervised fashion. Moreover, when combined with regular learning from examples, this idea yields impressive few-shot