تصف هذه الورقة العملية التدريبية لأول نماذج تمثيل اللغة الابتدائية الأولى بناء على بنية بيرت وألبرت.نقوم بتدريب نماذجنا مسبقا على أكثر من 340 كيلو من الجمل، والتي تبلغ أكثر من 50 مرة أكثر من نماذج متعددة اللغات التي تشمل البيانات التشيكية.نحن نتفوق النماذج متعددة اللغات في 9 من أصل 11 مجموعات من مجموعات البيانات.بالإضافة إلى ذلك، فإننا نؤسس النتائج الجديدة للدولة الجديدة على تسعة مجموعات البيانات.في النهاية، نقوم بمناقشة خصائص النماذج الأولية متعددة اللغات بناء على نتائجنا.نقوم بنشر جميع النماذج المدربة ومضبوطة مسبقا بحرية لمجتمع البحث.
This paper describes the training process of the first Czech monolingual language representation models based on BERT and ALBERT architectures. We pre-train our models on more than 340K of sentences, which is 50 times more than multilingual models that include Czech data. We outperform the multilingual models on 9 out of 11 datasets. In addition, we establish the new state-of-the-art results on nine datasets. At the end, we discuss properties of monolingual and multilingual models based upon our results. We publish all the pre-trained and fine-tuned models freely for the research community.
References used
https://aclanthology.org/
Abstract Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, knowledge embedding (KE) methods can effectively represent the relational facts in knowledge graphs (KGs) with informative entity
Recently, disentanglement based on a generative adversarial network or a variational autoencoder has significantly advanced the performance of diverse applications in CV and NLP domains. Nevertheless, those models still work on coarse levels in the d
Relational knowledge bases (KBs) are commonly used to represent world knowledge in machines. However, while advantageous for their high degree of precision and interpretability, KBs are usually organized according to manually-defined schemas, which l
A computationally expensive and memory intensive neural network lies behind the recent success of language representation learning. Knowledge distillation, a major technique for deploying such a vast language model in resource-scarce environments, tr
Temporal commonsense reasoning is a challenging task as it requires temporal knowledge usually not explicit in text. In this work, we propose an ensemble model for temporal commonsense reasoning. Our model relies on pre-trained contextual representat