تعد نماذج معالجة وأمن معالجة اللغة الطبيعية (NLP) مهمة بشكل ملحوظ في تطبيقات العالم الحقيقي. في سياق مهام تصنيف النص، يمكن تصميم أمثلة الخصومة من خلال استبدال الكلمات مع المرادفات تحت بعض القيود الدلالية والمنظمات الأساسية، بحيث يكون نموذج مدرب جيدا سيعطي تنبؤا خاطئا. لذلك، من الأهمية بمكان تطوير تقنيات لتوفير ضمان قوي وقضايا ضد هذه الهجمات. في هذه الورقة، نقترح WordDP لتحقيق متانة مصدقة ضد استبدال الكلمات في تصنيف النص عن طريق الخصوصية التفاضلية (DP). نحدد العلاقة بين موانئ دبي والمودة القومية لأول مرة في المجال النصي واقتراح خوارزمية قائمة على الآلية المفاهيمية التي تعتمد على الآلية لتحقيق القابة رسميا. ونحن نقدم كذلك آلية أسيانية محاكاة عملية لها استنتاج فعال مع متانة معتمدة. نحن لا نقدم فقط اشتقاق تحليلي صارم للحالة المعتمدة ولكن أيضا مقارنة فائدة WordDP أيضا بشكل تجريبي مع خوارزميات الدفاع الحالية. تظهر النتائج أن WordDP تحقق دقة أعلى وأكثر من 30x تحسن كفاءة على آلية متانة حديثة معتمدة في مهام تصنيف النص النموذجي.
The robustness and security of natural language processing (NLP) models are significantly important in real-world applications. In the context of text classification tasks, adversarial examples can be designed by substituting words with synonyms under certain semantic and syntactic constraints, such that a well-trained model will give a wrong prediction. Therefore, it is crucial to develop techniques to provide a rigorous and provable robustness guarantee against such attacks. In this paper, we propose WordDP to achieve certified robustness against word substitution at- tacks in text classification via differential privacy (DP). We establish the connection between DP and adversarial robustness for the first time in the text domain and propose a conceptual exponential mechanism-based algorithm to formally achieve the robustness. We further present a practical simulated exponential mechanism that has efficient inference with certified robustness. We not only provide a rigorous analytic derivation of the certified condition but also experimentally compare the utility of WordDP with existing defense algorithms. The results show that WordDP achieves higher accuracy and more than 30X efficiency improvement over the state-of-the-art certified robustness mechanism in typical text classification tasks.
References used
https://aclanthology.org/
Deep neural networks for natural language processing are fragile in the face of adversarial examples---small input perturbations, like synonym substitution or word duplication, which cause a neural network to change its prediction. We present an appr
Seq2seq models have demonstrated their incredible effectiveness in a large variety of applications. However, recent research has shown that inappropriate language in training samples and well-designed testing cases can induce seq2seq models to output
NLP models are vulnerable to data poisoning attacks. One type of attack can plant a backdoor in a model by injecting poisoned examples in training, causing the victim model to misclassify test instances which include a specific pattern. Although defe
Latent Dirichlet allocation (LDA), a widely used topic model, is often employed as a fundamental tool for text analysis in various applications. However, the training process of the LDA model typically requires massive text corpus data. On one hand,
Neural language models are known to have a high capacity for memorization of training samples. This may have serious privacy im- plications when training models on user content such as email correspondence. Differential privacy (DP), a popular choice