نربط النماذج العصبية للتحليل المورفولوجي والجيل والليمون للغات الغنية بالمورفولوجيا.نقدم طريقة لاستخراج كمية كبيرة من البيانات التدريبية تلقائيا من FSTS لمدة 22 لغة، منها 17 مليار بالانقراض.تتبع النماذج العصبية نفس التشريع مثل FSTS من أجل تحقيقها لأنظمة الاحتياطية مع FSTS.تم إصدار التعليمات البرمجية المصدر والنماذج والشطونات على Zenodo.
We train neural models for morphological analysis, generation and lemmatization for morphologically rich languages. We present a method for automatically extracting substantially large amount of training data from FSTs for 22 languages, out of which 17 are endangered. The neural models follow the same tagset as the FSTs in order to make it possible to use them as fallback systems together with the FSTs. The source code, models and datasets have been released on Zenodo.
References used
https://aclanthology.org/
This paper describes our submission to the SemEval-2021 shared task on Lexical Complexity Prediction. We approached it as a regression problem and present an ensemble combining four systems, one feature-based and three neural with fine-tuning, freque
With the advent of contextualized embeddings, attention towards neural ranking approaches for Information Retrieval increased considerably. However, two aspects have remained largely neglected: i) queries usually consist of few keywords only, which i
Text classification is a central tool in NLP. However, when the target classes are strongly correlated with other textual attributes, text classification models can pick up wrong'' features, leading to bad generalization and biases. In social media a
Transfer learning based on pretraining language models on a large amount of raw data has become a new norm to reach state-of-the-art performance in NLP. Still, it remains unclear how this approach should be applied for unseen languages that are not c
Dravidian languages, such as Kannada and Tamil, are notoriously difficult to translate by state-of-the-art neural models. This stems from the fact that these languages are morphologically very rich as well as being low-resourced. In this paper, we fo