في هذا العمل، ندرس الهلوسة في الترجمة الآلية العصبية (NMT)، والتي تكمن في نهاية متطرفة على طيف أمراض NMT.أولا، نربط ظاهرة الهلوسة تحت اضطراب المصدر إلى النظرية الطويلة للذيل من فيلدمان، وتقديم فرضية صحيحة تجريبية تشرح الهلوسة تحت اضطرابات المصدر.ثانيا، نفكر في الهلوسة بموجب الضوضاء على مستوى Corpus (بدون أي اضطراب مصدر) وإظهار أن هناك نوعين بارزين من الهلوسة الطبيعية (النواتج المنفجرة والتذمر) يمكن أن يتم توليدها وشرحها من خلال أنماط ضوضاء ذات مستوى كوربوس معين.أخيرا، نوضح ظاهرة التضخيم الهلوسي في عمليات توليد البيانات الشعبية مثل تقطير المعارف على مستوى البيانات والتسلسل.لقد أصدرنا مجموعات البيانات والرمز لتكرار نتائجنا.
In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies. Firstly, we connect the phenomenon of hallucinations under source perturbation to the Long-Tail theory of Feldman, and present an empirically validated hypothesis that explains hallucinations under source perturbation. Secondly, we consider hallucinations under corpus-level noise (without any source perturbation) and demonstrate that two prominent types of natural hallucinations (detached and oscillatory outputs) could be generated and explained through specific corpus-level noise patterns. Finally, we elucidate the phenomenon of hallucination amplification in popular data-generation processes such as Backtranslation and sequence-level Knowledge Distillation. We have released the datasets and code to replicate our results.
References used
https://aclanthology.org/
In this paper and we explore different techniques of overcoming the challenges of low-resource in Neural Machine Translation (NMT) and specifically focusing on the case of English-Marathi NMT. NMT systems require a large amount of parallel corpora to
Many NLP models operate over sequences of subword tokens produced by hand-crafted tokenization rules and heuristic subword induction algorithms. A simple universal alternative is to represent every computerized text as a sequence of bytes via UTF-8,
The paper presents experiments in neural machine translation with lexical constraints into a morphologically rich language. In particular and we introduce a method and based on constrained decoding and which handles the inflected forms of lexical ent
One key ingredient of neural machine translation is the use of large datasets from different domains and resources (e.g. Europarl, TED talks). These datasets contain documents translated by professional translators using different but consistent tran
The neural machine translation approach has gained popularity in machine translation because of its context analysing ability and its handling of long-term dependency issues. We have participated in the WMT21 shared task of similar language translati