نقدم خوارزمية تدريبية مستهدفة بسيطة ولكنها فعالة (TAT) لتحسين التدريب الخصم لفهم اللغة الطبيعية.الفكرة الرئيسية هي أن تخطئ الأخطاء الحالية وتحديد أولويات التدريب على الخطوات إلى حيث يخطئ النموذج أكثر.تظهر التجارب أن TAT يمكن أن تحسن بشكل كبير الدقة على التدريب الخصم القياسي على الغراء وتحقيق نتائج جديدة من أحدث النتائج في XNLI.سيتم إصدار شفرة لدينا عند قبول الورقة.
We present a simple yet effective Targeted Adversarial Training (TAT) algorithm to improve adversarial training for natural language understanding. The key idea is to introspect current mistakes and prioritize adversarial training steps to where the model errs the most. Experiments show that TAT can significantly improve accuracy over standard adversarial training on GLUE and attain new state-of-the-art zero-shot results on XNLI. Our code will be released upon acceptance of the paper.
References used
https://aclanthology.org/
Coarse-grained linguistic information, such as named entities or phrases, facilitates adequately representation learning in pre-training. Previous works mainly focus on extending the objective of BERT's Masked Language Modeling (MLM) from masking ind
Knowledge Distillation (KD) is extensively used to compress and deploy large pre-trained language models on edge devices for real-world applications. However, one neglected area of research is the impact of noisy (corrupted) labels on KD. We present,
Advances in English language representation enabled a more sample-efficient pre-training task by Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA). Which, instead of training a model to recover masked tokens, it
Existing pre-trained language models (PLMs) have demonstrated the effectiveness of self-supervised learning for a broad range of natural language processing (NLP) tasks. However, most of them are not explicitly aware of domain-specific knowledge, whi
Natural language understanding is an important task in modern dialogue systems. It becomes more important with the rapid extension of the dialogue systems' functionality. In this work, we present an approach to zero-shot transfer learning for the tas