تعرض نماذج اللغة متعددة اللغات أداء أفضل لبعض اللغات مقارنة بالآخرين (Singh et al.، 2019)، وعدد العديد من اللغات لا تستفيد من تقاسم متعدد اللغات على الإطلاق، من المفترض أن تكون نتيجة تجزئة متعددة اللغات (بيزال O وآخرون)2020).يستكشف هذا العمل فكرة تعلم نماذج اللغة متعددة اللغات بناء على تجميع شرائح أحادية الأونلينغ.نعرض تحسينات كبيرة على تجزئة وتدريب وتعدد اللغات القياسية عبر تسعة لغات بشأن مهمة الإجابة على سؤال، سواء في نظام نموذج صغير ونموذج حجم قاعدة بيرت.
Multilingual language models exhibit better performance for some languages than for others (Singh et al., 2019), and many languages do not seem to benefit from multilingual sharing at all, presumably as a result of poor multilingual segmentation (Pyysal o et al., 2020). This work explores the idea of learning multilingual language models based on clustering of monolingual segments. We show significant improvements over standard multilingual segmentation and training across nine languages on a question answering task, both in a small model regime and for a model of the size of BERT-base.
References used
https://aclanthology.org/
Many recent works use consistency regularisation' to improve the generalisation of fine-tuned pre-trained models, both multilingual and English-only. These works encourage model outputs to be similar between a perturbed and normal version of the inpu
Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks. This improvement benefits from learning a large amount of monolingual and parallel corpora. Although it is genera
Although neural sequence-to-sequence models have been successfully applied to semantic parsing, they fail at compositional generalization, i.e., they are unable to systematically generalize to unseen compositions of seen components. Motivated by trad
Transformers that are pre-trained on multilingual corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer capabilities. In the zero-shot transfer setting, only English training data is used, and the fine-tuned model i
Social media is notoriously difficult to process for existing natural language processing tools, because of spelling errors, non-standard words, shortenings, non-standard capitalization and punctuation. One method to circumvent these issues is to nor