نقدم مبادرة Norlm المستمرة لدعم إنشاء واستخدام نماذج اللغة السياقية الكبيرة للغاية للنرويجية (ومن حيث المبدأ لغات الشمال الأخرى)، بما في ذلك بيئة برنامج جاهزة للاستخدام، بالإضافة إلى تقرير خبرة لإعداد البيانات والتدريبوبعدتقدم هذه الورقة أول نماذج لغوية واسعة النطاق للنرويجية، استنادا إلى كل من أطر ELMO و BERT.بالإضافة إلى تفصيل عملية التدريب، نقدم نتائج مرجعية للتناقض على مجموعة من مهام NLP للنرويجية.للحصول على خلفية إضافية والوصول إلى البيانات والنماذج والبرامج، يرجى الاطلاع على: http://norlm.nlpl.eu
We present the ongoing NorLM initiative to support the creation and use of very large contextualised language models for Norwegian (and in principle other Nordic languages), including a ready-to-use software environment, as well as an experience report for data preparation and training. This paper introduces the first large-scale monolingual language models for Norwegian, based on both the ELMo and BERT frameworks. In addition to detailing the training process, we present contrastive benchmark results on a suite of NLP tasks for Norwegian. For additional background and access to the data, models, and software, please see: http://norlm.nlpl.eu
References used
https://aclanthology.org/
Contextual advertising provides advertisers with the opportunity to target the context which is most relevant to their ads. The large variety of potential topics makes it very challenging to collect training documents to build a supervised classifica
This work demonstrates the development process of a machine learning architecture for inference that can scale to a large volume of requests. We used a BERT model that was fine-tuned for emotion analysis, returning a probability distribution of emoti
This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). In this work, we aim to build a single multilingual translation system with a hypothesis t
This paper describes TenTrans large-scale multilingual machine translation system for WMT 2021. We participate in the Small Track 2 in five South East Asian languages, thirty directions: Javanese, Indonesian, Malay, Tagalog, Tamil, English. We mainly
The embedding-based large-scale query-document retrieval problem is a hot topic in the information retrieval (IR) field. Considering that pre-trained language models like BERT have achieved great success in a wide variety of NLP tasks, we present a Q