New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Malta National Language Technology Platform: A vision for enhancing Malta's official languages using Machine Translation

منصة تكنولوجيا اللغة الوطنية مالطا: رؤية لتعزيز اللغات الرسمية في مالطا باستخدام الترجمة الآلية

487 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

language technology platform malta national language national language technology منصة تكنولوجيا اللغة لغة مالطا الوطنية تكنولوجيا اللغة الوطنية صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper we introduce a vision towards establishing the Malta National Language Technology Platform; an ongoing effort that aims to provide a basis for enhancing Malta's official languages, namely Maltese and English, using Machine Translation. This will contribute towards the current niche of Language Technology support for the Maltese low-resource language, across multiple computational linguistics fields, such as speech processing, machine translation, text analysis, and multi-modal resources. The end goals are to remove language barriers, increase accessibility, foster cross-border services, and most importantly to facilitate the preservation of the Maltese language.

References used

https://aclanthology.org/

rate research

Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages

338 - Association for Computation Linguistics 2021 مقالة

Unsupervised translation has reached impressive performance on resource-rich language pairs such as English-French and English-German. However, early studies have shown that in more realistic settings involving low-resource, rare languages, unsupervi sed translation performs poorly, achieving less than 3.0 BLEU. In this work, we show that multilinguality is critical to making unsupervised systems practical for low-resource settings. In particular, we present a single model for 5 low-resource languages (Gujarati, Kazakh, Nepali, Sinhala, and Turkish) to and from English directions, which leverages monolingual and auxiliary parallel data from other high-resource language pairs via a three-stage training scheme. We outperform all current state-of-the-art unsupervised baselines for these languages, achieving gains of up to 14.4 BLEU. Additionally, we outperform strong supervised baselines for various language pairs as well as match the performance of the current state-of-the-art supervised model for Nepali-English. We conduct a series of ablation studies to establish the robustness of our model under different degrees of data quality, as well as to analyze the factors which led to the superior performance of the proposed approach over traditional unsupervised models.

unsupervised machine translation unsupervised machine ترجمة آلية غير معينة آلة غير منشأة صناعة حمض الفوسفور

Context-aware Decoder for Neural Machine Translation using a Target-side Document-Level Language Model

261 - Association for Computation Linguistics 2021 مقالة

Although many end-to-end context-aware neural machine translation models have been proposed to incorporate inter-sentential contexts in translation, these models can be trained only in domains where parallel documents with sentential alignments exist . We therefore present a simple method to perform context-aware decoding with any pre-trained sentence-level translation model by using a document-level language model. Our context-aware decoder is built upon sentence-level parallel data and target-side document-level monolingual data. From a theoretical viewpoint, our core contribution is the novel representation of contextual information using point-wise mutual information between context and the current sentence. We demonstrate the effectiveness of our method on English to Russian translation, by evaluating with BLEU and contrastive tests for context-aware translation.

الاضطرابات صناعة حمض الفوسفور

Using CollGram to Compare Formulaic Language in Human and Machine Translation

382 - Association for Computation Linguistics 2021 مقالة

A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences (FSs), and more high-frequency FSs. These observations can be related to the differences between second language learners of various levels and between translated and untranslated texts. The comparison between the neural machine translation systems indicates that some systems produce more FSs of both types than other systems.

compare formulaic language collgram to compare قارن لغة صيغة collgram للمقارنة صناعة حمض الفوسفور

NITK-UoH: Tamil-Telugu Machine Translation Systems for the WMT21 Similar Language Translation Task

335 - Association for Computation Linguistics 2021 مقالة

In this work, two Neural Machine Translation (NMT) systems have been developed and evaluated as part of the bidirectional Tamil-Telugu similar languages translation subtask in WMT21. The OpenNMT-py toolkit has been used to create quick prototypes of the systems, following which models have been trained on the training datasets containing the parallel corpus and finally the models have been evaluated on the dev datasets provided as part of the task. Both the systems have been trained on a DGX station with 4 -V100 GPUs. The first NMT system in this work is a Transformer based 6 layer encoder-decoder model, trained for 100000 training steps, whose configuration is similar to the one provided by OpenNMT-py and this is used to create a model for bidirectional translation. The second NMT system contains two unidirectional translation models with the same configuration as the first system, with the addition of utilizing Byte Pair Encoding (BPE) for subword tokenization through the pre-trained MultiBPEmb model. Based on the dev dataset evaluation metrics for both the systems, the first system i.e. the vanilla Transformer model has been submitted as the Primary system. Since there were no improvements in the metrics during training of the second system with BPE, it has been submitted as a contrastive system.

ماريان NMT language translation task مهمة ترجمة اللغة صناعة حمض الفوسفور

Learning Curricula for Multilingual Neural Machine Translation Training

523 - Association for Computation Linguistics 2021 مقالة

Low-resource Multilingual Neural Machine Translation (MNMT) is typically tasked with improving the translation performance on one or more language pairs with the aid of high-resource language pairs. In this paper and we propose two simple search base d curricula -- orderings of the multilingual training data -- which help improve translation performance in conjunction with existing techniques such as fine-tuning. Additionally and we attempt to learn a curriculum for MNMT from scratch jointly with the training of the translation system using contextual multi-arm bandits. We show on the FLORES low-resource translation dataset that these learned curricula can provide better starting points for fine tuning and improve overall performance of the translation system.

التكيف في العصبي صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Malta National Language Technology Platform: A vision for enhancing Malta's official languages using Machine Translation

منصة تكنولوجيا اللغة الوطنية مالطا: رؤية لتعزيز اللغات الرسمية في مالطا باستخدام الترجمة الآلية

Ask ChatGPT about the research

Read More

suggested questions