الأساليب القائمة على المحولات جذابة لتصنيف النص متعدد اللغات، ولكن معايير البحوث الشائعة مثل XNLI (Conneau et al.، 2018) لا تعكس توافر البيانات ومجموعة واسعة من تطبيقات الصناعة.نقدم مقارنة تجريبية من نماذج تصنيف النص المستند إلى المحولات في مجموعة متنوعة من إعدادات الاحتياطية وغير اللغوية المتعددة اللغات والضبط.نقيم هذه الأساليب على مهمتين متميزتين في خمس لغات مختلفة.المغادرة من العمل السابق، تظهر نتائجنا أن نماذج لغة متعددة اللغات يمكن أن تتفوق على تلك المهام المطردة في بعض المهام المصب واللغات المستهدفة.نوضح بالإضافة إلى ذلك أن التعديلات العملية مثل المهام وعمالة العمل التكيفية والتكييف يمكن أن تحسن أداء التصنيف دون الحاجة إلى بيانات إضافية إضافية.
Transformer-based methods are appealing for multilingual text classification, but common research benchmarks like XNLI (Conneau et al., 2018) do not reflect the data availability and task variety of industry applications. We present an empirical comparison of transformer-based text classification models in a variety of practical monolingual and multilingual pretraining and fine-tuning settings. We evaluate these methods on two distinct tasks in five different languages. Departing from prior work, our results show that multilingual language models can outperform monolingual ones in some downstream tasks and target languages. We additionally show that practical modifications such as task- and domain-adaptive pretraining and data augmentation can improve classification performance without the need for additional labeled data.
References used
https://aclanthology.org/
India is one of the richest language hubs on the earth and is very diverse and multilingual. But apart from a few Indian languages, most of them are still considered to be resource poor. Since most of the NLP techniques either require linguistic know
The recent Text-to-Text Transfer Transformer'' (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 th
Multilingual T5 pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks. In this paper, we improve multilingual text-to-text transfer Transformer with translation pairs (mT6).
Identifying whether a word carries the same meaning or different meaning in two contexts is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisati
Weakly-supervised text classification has received much attention in recent years for it can alleviate the heavy burden of annotating massive data. Among them, keyword-driven methods are the mainstream where user-provided keywords are exploited to ge