Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Domain Adaptation for Hindi-Telugu Machine Translation Using Domain Specific Back Translation

تكيف مجال الترجمة الآلية الهندية التيلجو باستخدام ترجمة المجال الخاصة بالترجمة

710 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

specific back translation domain specific back hindi-telugu machine translation ترجمة محددة المجال الخاص مرة أخرى ترجمة آلة الهندية التيلجو صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we present a novel approachfor domain adaptation in Neural MachineTranslation which aims to improve thetranslation quality over a new domain.Adapting new domains is a highly challeng-ing task for Neural Machine Translation onlimited data, it becomes even more diffi-cult for technical domains such as Chem-istry and Artificial Intelligence due to spe-cific terminology, etc. We propose DomainSpecific Back Translation method whichuses available monolingual data and gen-erates synthetic data in a different way.This approach uses Out Of Domain words.The approach is very generic and can beapplied to any language pair for any domain. We conduct our experiments onChemistry and Artificial Intelligence do-mains for Hindi and Telugu in both direc-tions. It has been observed that the usageof synthetic data created by the proposedalgorithm improves the BLEU scores significantly.

References used

https://aclanthology.org/

rate research

Using Confidential Data for Domain Adaptation of Neural Machine Translation

725 - Association for Computation Linguistics 2021 مقالة

We study the problem of domain adaptation in Neural Machine Translation (NMT) when domain-specific data cannot be shared due to confidentiality or copyright issues. As a first step, we propose to fragment data into phrase pairs and use a random sampl e to fine-tune a generic NMT model instead of the full sentences. Despite the loss of long segments for the sake of confidentiality protection, we find that NMT quality can considerably benefit from this adaptation, and that further gains can be obtained with a simple tagging technique.

معلومات شخصية صناعة حمض الفوسفور

MiSS@WMT21: Contrastive Learning-reinforced Domain Adaptation in Neural Machine Translation

896 - Association for Computation Linguistics 2021 مقالة

In this paper, we describe our MiSS system that participated in the WMT21 news translation task. We mainly participated in the evaluation of the three translation directions of English-Chinese and Japanese-English translation tasks. In the systems su bmitted, we primarily considered wider networks, deeper networks, relative positional encoding, and dynamic convolutional networks in terms of model structure, while in terms of training, we investigated contrastive learning-reinforced domain adaptation, self-supervised training, and optimization objective switching training methods. According to the final evaluation results, a deeper, wider, and stronger network can improve translation performance in general, yet our data domain adaption method can improve performance even more. In addition, we found that switching to the use of our proposed objective during the finetune phase using relatively small domain-related data can effectively improve the stability of the model's convergence and achieve better optimal performance.

مهمة neural machine learning-reinforced domain adaptation الآلة العصبية التعلم التعزيز التكيف صناعة حمض الفوسفور

Neural Machine Translation for Tamil--Telugu Pair

796 - Association for Computation Linguistics 2021 مقالة

The neural machine translation approach has gained popularity in machine translation because of its context analysing ability and its handling of long-term dependency issues. We have participated in the WMT21 shared task of similar language translati on on a Tamil-Telugu pair with the team name: CNLP-NITS. In this task, we utilized monolingual data via pre-train word embeddings in transformer model based neural machine translation to tackle the limitation of parallel corpus. Our model has achieved a bilingual evaluation understudy (BLEU) score of 4.05, rank-based intuitive bilingual evaluation score (RIBES) score of 24.80 and translation edit rate (TER) score of 97.24 for both Tamil-to-Telugu and Telugu-to-Tamil translations respectively.

لغة مماثلة telugu pair زوج التيلجو صناعة حمض الفوسفور

Low Resource Multimodal Neural Machine Translation of English-Hindi in News Domain

834 - Association for Computation Linguistics 2021 مقالة

Incorporating multiple input modalities in a machine translation (MT) system is gaining popularity among MT researchers. Unlike the publicly available dataset for Multimodal Machine Translation (MMT) tasks, where the captions are short image descript ions, the news captions provide a more detailed description of the contents of the images. As a result, numerous named entities relating to specific persons, locations, etc., are found. In this paper, we acquire two monolingual news datasets reported in English and Hindi paired with the images to generate a synthetic English-Hindi parallel corpus. The parallel corpus is used to train the English-Hindi Neural Machine Translation (NMT) and an English-Hindi MMT system by incorporating the image feature paired with the corresponding parallel corpus. We also conduct a systematic analysis to evaluate the English-Hindi MT systems with 1) more synthetic data and 2) by adding back-translated data. Our finding shows improvement in terms of BLEU scores for both the NMT (+8.05) and MMT (+11.03) systems.

low resource multimodal resource multimodal neural موارد منخفضة متعددة الوسائط الموارد متعددة الوسائط العصبية صناعة حمض الفوسفور

Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection

1080 - Association for Computation Linguistics 2021 مقالة

This paper considers the unsupervised domain adaptation problem for neural machine translation (NMT), where we assume the access to only monolingual text in either the source or target language in the new domain. We propose a cross-lingual data selec tion method to extract in-domain sentences in the missing language side from a large generic monolingual corpus. Our proposed method trains an adaptive layer on top of multilingual BERT by contrastive learning to align the representation between the source and target language. This then enables the transferability of the domain classifier between the languages in a zero-shot manner. Once the in-domain data is detected by the classifier, the NMT model is then adapted to the new domain by jointly learning translation and domain discrimination tasks. We evaluate our cross-lingual data selection method on NMT across five diverse domains in three language pairs, as well as a real-world scenario of translation for COVID-19. The results show that our proposed method outperforms other selection baselines up to +1.5 BLEU score.

تقدير الجودة الخاضعة للإشراف generalised unsupervised domain المجال المعمم غير المنشأ صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Domain Adaptation for Hindi-Telugu Machine Translation Using Domain Specific Back Translation

تكيف مجال الترجمة الآلية الهندية التيلجو باستخدام ترجمة المجال الخاصة بالترجمة

Ask ChatGPT about the research

Read More

suggested questions