ندرس قوة الاهتمام الشامل في بنية المحولات في سياق نقل التعلم للترجمة الآلية، وتوسيع نتائج الدراسات في انتباه متقاطع عند التدريب من الصفر.نقوم بإجراء سلسلة من التجارب من خلال ضبط نموذج الترجمة بشكل جيد على البيانات حيث تغيرت المصدر أو اللغة المستهدفة.تكشف هذه التجارب أن ضبط المعلمات الراقية فقط فعالة تقريبا مثل ضبط جميع المعلمات (I.E.، نموذج الترجمة بأكمله).نحن نقدم رؤى في سبب هذا هو الحال والمراقبة أن الحد من الضبط الجميل بهذه الطريقة يؤدي إلى تضمين متفاوت المحاذاة عبر الإنترنت.تتضمن الآثار المترتبة على هذا البحث عن الباحثين والممارسين تخفيفا من النسيان الكارثي، وإمكانية ترجمة الطلقة الصفرية، والقدرة على تمديد نماذج الترجمة الآلية إلى العديد من أزواج لغة جديدة مع انخفاض مستوى تخزين المعلمة.
We study the power of cross-attention in the Transformer architecture within the context of transfer learning for machine translation, and extend the findings of studies into cross-attention when training from scratch. We conduct a series of experiments through fine-tuning a translation model on data where either the source or target language has changed. These experiments reveal that fine-tuning only the cross-attention parameters is nearly as effective as fine-tuning all parameters (i.e., the entire translation model). We provide insights into why this is the case and observe that limiting fine-tuning in this manner yields cross-lingually aligned embeddings. The implications of this finding for researchers and practitioners include a mitigation of catastrophic forgetting, the potential for zero-shot translation, and the ability to extend machine translation models to several new language pairs with reduced parameter storage overhead.
References used
https://aclanthology.org/
Ever since neural models were adopted in data-to-text language generation, they have invariably been reliant on extrinsic components to improve their semantic accuracy, because the models normally do not exhibit the ability to generate text that reli
In this paper, we describe our system used for SemEval 2021 Task 7: HaHackathon: Detecting and Rating Humor and Offense. We used a simple fine-tuning approach using different Pre-trained Language Models (PLMs) to evaluate their performance for humor
This paper presents our system for the Quantity span identification, Unit of measurement identification and Value modifier classification subtasks of the MeasEval 2021 task. The purpose of the Quantity span identification task was to locate spans of
This paper describes our submission to SemEval-2021 Task 1: predicting the complexity score for single words. Our model leverages standard morphosyntactic and frequency-based features that proved helpful for Complex Word Identification (a related tas
This paper discusses different approaches to the Toxic Spans Detection task. The problem posed by the task was to determine which words contribute mostly to recognising a document as toxic. As opposed to binary classification of entire texts, word-le