تظهر النماذج المسبقة للتدريب المستندة إلى المحولات مثل Bert و Electra حول مجموعة من كورسيا العربية، التي أظهرها كل من أرابيرت وأريكيكترا، نتيجة مثيرة للإعجاب في مهام المصب.ومع ذلك، فإن نماذج اللغة المستندة إلى المحولات قبل التدريب هي باهظة الثمن، خاصة بالنسبة للنماذج الكبيرة.في الآونة الأخيرة، تناول محول القمع التكرار المتسلسل داخل بنية المحولات من خلال ضغط تسلسل الدول المخفية، مما يؤدي إلى انخفاض كبير في تكلفة ما قبل التدريب.تدرس هذه الورقة تجريبية أداء وكفاءة بناء نموذج اللغة العربية مع محول القمع وهناك هدف Electra.نجد أن نموذجنا يحقق نتائج أحدث النتائج على العديد من المهام المصب العربية على الرغم من استخدام موارد حسابية أقل مقارنة بالنماذج الأخرى القائمة على بيرت.
Pre-training Transformer-based models such as BERT and ELECTRA on a collection of Arabic corpora, demonstrated by both AraBERT and AraELECTRA, shows an impressive result on downstream tasks. However, pre-training Transformer-based language models is computationally expensive, especially for large-scale models. Recently, Funnel Transformer has addressed the sequential redundancy inside Transformer architecture by compressing the sequence of hidden states, leading to a significant reduction in the pre-training cost. This paper empirically studies the performance and efficiency of building an Arabic language model with Funnel Transformer and ELECTRA objective. We find that our model achieves state-of-the-art results on several Arabic downstream tasks despite using less computational resources compared to other BERT-based models.
References used
https://aclanthology.org/
Recently, domain shift, which affects accuracy due to differences in data between source and target domains, has become a serious issue when using machine learning methods to solve natural language processing tasks. With additional pretraining and fi
Enabling empathetic behavior in Arabic dialogue agents is an important aspect of building human-like conversational models. While Arabic Natural Language Processing has seen significant advances in Natural Language Understanding (NLU) with language m
Adapters are light-weight modules that allow parameter-efficient fine-tuning of pretrained models. Specialized language and task adapters have recently been proposed to facilitate cross-lingual transfer of multilingual pretrained models (Pfeiffer et
The rapid rise of online social networks like YouTube, Facebook, Twitter allows people to express their views more widely online. However, at the same time, it can lead to an increase in conflict and hatred among consumers in the form of freedom of s
Task-oriented conversational systems often use dialogue state tracking to represent the user's intentions, which involves filling in values of pre-defined slots. Many approaches have been proposed, often using task-specific architectures with special