Research papers, master and doctoral theses about اللغات الهندية

TenTrans Multilingual Low-Resource Translation System for WMT21 Indo-European Languages Task

218 - Association for Computation Linguistics 2021 مقالة

This paper describes TenTrans' submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance of related high-resource languages. We mainly utilize back-translation, pivot-based methods, multilingual models, pre-trained model fine-tuning, and in-domain knowledge transfer to improve the translation quality. On the test set, our best-submitted system achieves an average of 43.45 case-sensitive BLEU scores across all low-resource pairs. Our data, code, and pre-trained models used in this work are available in TenTrans evaluation examples.

متعددة اللغات NMT. indo-european languages task multilingual low-resource مهمة اللغات الهندية الأوروبية متعدد اللغات منخفضة الموارد صناعة حمض الفوسفور

CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task

490 - Association for Computation Linguistics 2021 مقالة

This paper describes Charles University sub-mission for Terminology translation shared task at WMT21. The objective of this task is to design a system which translates certain terms based on a provided terminology database, while preserving high over all translation quality. We competed in English-French language pair. Our approach is based on providing the desired translations alongside the input sentence and training the model to use these provided terms. We lemmatize the terms both during the training and inference, to allow the model to learn how to produce correct surface forms of the words, when they differ from the forms provided in the terminology database.

multilingual low-resource translation indo-european languages shared languages shared task الترجمة متعددة اللغات منخفضة الموارد اللغات الهندية الأوروبية مشتركة المهام المشتركة لغات صناعة حمض الفوسفور المزيد..

Efficient Multilingual Text Classification for Indian Languages

527 - Association for Computation Linguistics 2021 مقالة

India is one of the richest language hubs on the earth and is very diverse and multilingual. But apart from a few Indian languages, most of them are still considered to be resource poor. Since most of the NLP techniques either require linguistic know ledge that can only be developed by experts and native speakers of that language or they require a lot of labelled data which is again expensive to generate, the task of text classification becomes challenging for most of the Indian languages. The main objective of this paper is to see how one can benefit from the lexical similarity found in Indian languages in a multilingual scenario. Can a classification model trained on one Indian language be reused for other Indian languages? So, we performed zero-shot text classification via exploiting lexical similarity and we observed that our model performs best in those cases where the vocabulary overlap between the language datasets is maximum. Our experiments also confirm that a single multilingual model trained via exploiting language relatedness outperforms the baselines by significant margins.

indian languages indian efficient multilingual text اللغات الهندية هندي كفاءة النص متعدد اللغات صناعة حمض الفوسفور المزيد..

Toward the creation of WordNets for ancient Indo-European languages

385 - Association for Computation Linguistics 2021 مقالة

This paper presents the work in progress toward the creation of a family of WordNets for Sanskrit, Ancient Greek, and Latin. Building on previous attempts in the field, we elaborate these efforts bridging together WordNet relational semantics with th eories of meaning from Cognitive Linguistics. We discuss some of the innovations we have introduced to the WordNet architecture, to better capture the polysemy of words, as well as Indo-European language family-specific features. We conclude the paper framing our work within the larger picture of resources available for ancient languages and showing that WordNet-backed search tools have the potential to re-define the kinds of questions that can be asked of ancient language corpora.

ancient indo-european languages ancient اللغات الهندية القديمة الأوروبية عتيق صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد