نماذج الترجمة الآلية العصبية حساسة للضوضاء في نصوص الإدخال، مثل كلمات أخطاء إملائية والإنشاءات غير الرسمية.تفشل تقنيات المتانة الحالية عموما عند مواجهة أنواع غير مرئية من الضوضاء وأدائها تتحلل من النصوص النظيفة.في هذه الورقة، نركز على ثلاثة أنواع من الضوضاء الواقعية التي يتم إنشاؤها عادة من قبل البشر وإدخال فكرة السياق البصري لتحسين متانة الترجمة للنصوص الصاخبة.بالإضافة إلى ذلك، نصف نظام تدريب تصحيح خطأ رواية يمكن استخدامه كمهمة مساعدة لزيادة تحسين متانة الترجمة.تظهر تجارب الترجمة الإنجليزية والفرنسية والإنجليزية - الألمانية أن كل من مكونات تصحيح الأخطاء المتعددة الوسائط والخروج تعمل على تحسين متانة النموذج للنصوص الصاخبة، بينما لا تزال تحتفظ بجودة الترجمة على النصوص النظيفة.
Neural Machine Translation models are sensitive to noise in the input texts, such as misspelled words and ungrammatical constructions. Existing robustness techniques generally fail when faced with unseen types of noise and their performance degrades on clean texts. In this paper, we focus on three types of realistic noise that are commonly generated by humans and introduce the idea of visual context to improve translation robustness for noisy texts. In addition, we describe a novel error correction training regime that can be used as an auxiliary task to further improve translation robustness. Experiments on English-French and English-German translation show that both multimodal and error correction components improve model robustness to noisy texts, while still retaining translation quality on clean texts.
References used
https://aclanthology.org/
State-of-the-art approaches to spelling error correction problem include Transformer-based Seq2Seq models, which require large training sets and suffer from slow inference time; and sequence labeling models based on Transformer encoders like BERT, wh
Grammatical error correction (GEC) suffers from a lack of sufficient parallel data. Studies on GEC have proposed several methods to generate pseudo data, which comprise pairs of grammatical and artificially produced ungrammatical sentences. Currently
In recent years, a number of studies have used linear models for personality prediction based on text. In this paper, we empirically analyze and compare the lexical signals captured in such models. We identify lexical cues for each dimension of the M
Although grammatical error correction (GEC) has achieved good performance on texts written by learners of English as a second language, performance on low error density domains where texts are written by English speakers of varying levels of proficie
This paper discusses a classification-based approach to machine translation evaluation, as opposed to a common regression-based approach in the WMT Metrics task. Recent machine translation usually works well but sometimes makes critical errors due to