New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Comparison of Sentence-Weighting Techniques for NMT

مقارنة بين تقنيات ترقدي الجملة ل NMT

334 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

comparison of sentence-weighting sentence-weighting techniques recursive neural tensor مقارنة الجملة الوزن تقنيات ترجيح الجملة تكريس الزوج العصبي صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Sentence weighting is a simple and powerful domain adaptation technique. We carry out domain classification for computing sentence weights with 1) language model cross entropy difference 2) a convolutional neural network 3) a Recursive Neural Tensor Network. We compare these approaches with regard to domain classification accuracy and and study the posterior probability distributions. Then we carry out NMT experiments in the scenario where we have no in-domain parallel corpora and and only very limited in-domain monolingual corpora. Here and we use the domain classifier to reweight the sentences of our out-of-domain training corpus. This leads to improvements of up to 2.1 BLEU for German to English translation.

References used

https://aclanthology.org/

rate research

A Comparison of Different NMT Approaches to Low-Resource Dutch-Albanian Machine Translation

363 - Association for Computation Linguistics 2021 مقالة

Low-resource languages can be understood as languages that are more scarce, less studied, less privileged, less commonly taught and for which there are less resources available (Singh, 2008; Cieri et al., 2016; Magueresse et al., 2020). Natural Langu age Processing (NLP) research and technology mainly focuses on those languages for which there are large data sets available. To illustrate differences in data availability: there are 6 million Wikipedia articles available for English, 2 million for Dutch, and merely 82 thousand for Albanian. The scarce data issue becomes increasingly apparent when large parallel data sets are required for applications such as Neural Machine Translation (NMT). In this work, we investigate to what extent translation between Albanian (SQ) and Dutch (NL) is possible comparing a one-to-one (SQ↔AL) model, a low-resource pivot-based approach (English (EN) as pivot) and a zero-shot translation (ZST) (Johnson et al., 2016; Mattoni et al., 2017) system. From our experiments, it results that the EN-pivot-model outperforms both the direct one-to-one and the ZST model. Since often, small amounts of parallel data are available for low-resource languages or settings, experiments were conducted using small sets of parallel NL↔SQ data. The ZST appeared to be the worst performing models. Even when the available parallel data (NL↔SQ) was added, i.e. in a few-shot setting (FST), it remained the worst performing system according to the automatic (BLEU and TER) and human evaluation.

nmt approaches dutch-albanian machine translation low-resource dutch-albanian machine ترجمة الهولندية الألبانية آلة منخفضة الموارد الهولندية الألبانية صناعة حمض الفوسفور

Sentiment-based Candidate Selection for NMT

376 - Association for Computation Linguistics 2021 مقالة

The explosion of user-generated content (UGC)---e.g. social media posts and comments and and reviews---has motivated the development of NLP applications tailored to these types of informal texts. Prevalent among these applications have been sentiment analysis and machine translation (MT). Grounded in the observation that UGC features highly idiomatic and sentiment-charged language and we propose a decoder-side approach that incorporates automatic sentiment scoring into the MT candidate selection process. We train monolingual sentiment classifiers in English and Spanish and in addition to a multilingual sentiment model and by fine-tuning BERT and XLM-RoBERTa. Using n-best candidates generated by a baseline MT model with beam search and we select the candidate that minimizes the absolute difference between the sentiment score of the source sentence and that of the translation and and perform two human evaluations to assess the produced translations. Unlike previous work and we select this minimally divergent translation by considering the sentiment scores of the source sentence and translation on a continuous interval and rather than using e.g. binary classification and allowing for more fine-grained selection of translation candidates. The results of human evaluations show that and in comparison to the open-source MT baseline model on top of which our sentiment-based pipeline is built and our pipeline produces more accurate translations of colloquial and sentiment-heavy source texts.

nlp applications tailored candidate selection sentiment-based candidate selection تطبيقات NLP مصممة اختيار المرشح اختيار المرشح القائم على المعنويات صناعة حمض الفوسفور المزيد..

CUNI Systems in WMT21: Revisiting Backtranslation Techniques for English-Czech NMT

258 - Association for Computation Linguistics 2021 مقالة

We describe our two NMT systems submitted to the WMT2021 shared task in English-Czech news translation: CUNI-DocTransformer (document-level CUBBITT) and CUNI-Marian-Baselines. We improve the former with a better sentence-segmentation pre-processing a nd a post-processing for fixing errors in numbers and units. We use the latter for experiments with various backtranslation techniques.

revisiting backtranslation techniques cuni systems revisiting backtranslation إعادة النظر في تقنيات الخلفية أنظمة CUNI إعادة النظر وراء الترجمة صناعة حمض الفوسفور المزيد..

A Deep Decomposable Model for Disentangling Syntax and Semantics in Sentence Representation

375 - Association for Computation Linguistics 2021 مقالة

Recently, disentanglement based on a generative adversarial network or a variational autoencoder has significantly advanced the performance of diverse applications in CV and NLP domains. Nevertheless, those models still work on coarse levels in the d isentanglement of closely related properties, such as syntax and semantics in human languages. This paper introduces a deep decomposable model based on VAE to disentangle syntax and semantics by using total correlation penalties on KL divergences. Notably, we decompose the KL divergence term of the original VAE so that the generated latent variables can be separated in a more clear-cut and interpretable way. Experiments on benchmark datasets show that our proposed model can significantly improve the disentanglement quality between syntactic and semantic representations for semantic similarity tasks and syntactic similarity tasks.

deep decomposable model disentangling syntax نموذج التحلل العميق بناء بناء الجملة صناعة حمض الفوسفور

344 - Association for Computation Linguistics 2021 مقالة

This paper describes the SEBAMAT contribution to the 2021 WMT Similar Language Translation shared task. Using the Marian neural machine translation toolkit, translation systems based on Google's transformer architecture were built in both directions of Catalan--Spanish and Portuguese--Spanish. The systems were trained in two contrastive parameter settings (different vocabulary sizes for byte pair encoding) using only the parallel but not the comparable corpora provided by the shared task organizers. According to their official evaluation results, the SEBAMAT system turned out to be competitive with rankings among the top teams and BLEU scores between 38 and 47 for the language pairs involving Portuguese and between 76 and 80 for the language pairs involving Catalan.

مورد لغة مشابه wmt similar language marian nmt WMT لغة مماثلة ماريان NMT صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Comparison of Sentence-Weighting Techniques for NMT

مقارنة بين تقنيات ترقدي الجملة ل NMT

Ask ChatGPT about the research

Read More

suggested questions