تتخذ لغات الموارد المنخفضة في بعض الأحيان خصائص مورفولوجية ومزودة مماثلة بسبب قريبها الجغرافي والتاريخ المشترك.وجدت اثنين من لغات المجاورة المنخفضة المجاورة في بيرو، كوكوا وأشانينكا، للوهلة الأولى، لغتين متشابهة مورفولوجي.من أجل ترجمة اللغتين، اتخذت النهج المختلفة.بالنسبة إلى Quechua، تم استخدام تعلم تحويل الآلة العصبية مع ترميز بايت زوج.بالنسبة لشركة Ashaninka، فإن لغة الاثنين مع عدد أقل من الموارد، يتم استخدام محول الحالة المحدودة لتحويل نصوص Ashaninka ولهجتها لاستخدام الترجمة الآلية.نحن نقيم وقارن بين نهجين من خلال محاولة استخدام Ashaninka Corpora المكون حديثا للترجمة الآلية العصبية.تظهر تجاربنا أن الجمع بين اللغتين المجاورةين، بينما كانت متشابهة في التشكل، ومشاركة الكلمات، والموقع الجغرافي، ويحسن Ashaninka-- الترجمة الإسبانية ولكنها تحطمت كوكوا - الترجمات الإسبانية.
Low-resource languages sometimes take on similar morphological and syntactic characteristics due to their geographic nearness and shared history. Two low-resource neighboring languages found in Peru, Quechua and Ashaninka, can be considered, at first glance, two languages that are morphologically similar. In order to translate the two languages, various approaches have been taken. For Quechua, neural machine transfer-learning has been used along with byte-pair encoding. For Ashaninka, the language of the two with fewer resources, a finite-state transducer is used to transform Ashaninka texts and its dialects for machine translation use. We evaluate and compare two approaches by attempting to use newly-formed Ashaninka corpora for neural machine translation. Our experiments show that combining the two neighboring languages, while similar in morphology, word sharing, and geographical location, improves Ashaninka-- Spanish translation but degrades Quechua--Spanish translations.
References used
https://aclanthology.org/
In this work, we investigate methods for the challenging task of translating between low- resource language pairs that exhibit some level of similarity. In particular, we consider the utility of transfer learning for translating between several Indo-
We translate a closed text that is known in advance and available in many languages into a new and severely low resource language. Most human translation efforts adopt a portionbased approach to translate consecutive pages/chapters in order, which ma
Neural Machine Translation (NMT) for Low Resource Languages (LRL) is often limited by the lack of available training data, making it necessary to explore additional techniques to improve translation quality. We propose the use of the Prefix-Root-Post
This paper describes TenTrans' submission to WMT21 Multilingual Low-Resource Translation shared task for the Romance language pairs. This task focuses on improving translation quality from Catalan to Occitan, Romanian and Italian, with the assistance
For most language combinations and parallel data is either scarce or simply unavailable. To address this and unsupervised machine translation (UMT) exploits large amounts of monolingual data by using synthetic data generation techniques such as back-