الاتساق المصطلحات هو شرط أساسي للترجمة الصناعية.تحتوي المصطلحات ذات الجودة اليدوية عالية الجودة على إدخالات في أشكالها الاسمية.دمج مثل هذه المصطلحات في الترجمة الآلية ليست مهمة تافهة.يجب أن يكون نظام MT قادرا على إزالة المهاطين على الجانب المصدر واختر WordForm الصحيح على الجانب المستهدف.في هذا العمل، نقترح طريقة بسيطة ولكنها فعالة ل Disambiguation Homograph وطريقة اختيار WordForm من خلال إدخال قيود معجمية متعددة الخيارات.نقترح أيضا مقياس قياسي لقياس الاتساق المصطحي للترجمة.نتائجنا لها تحسن كبير على سوتا الحالي من حيث الاتساق المصطاعي دون أي خسارة في النتيجة بلو.سيتم نشر جميع التعليمات البرمجية المستخدمة في هذا العمل كمصدر مفتوح.
Terminological consistency is an essential requirement for industrial translation. High-quality, hand-crafted terminologies contain entries in their nominal forms. Integrating such a terminology into machine translation is not a trivial task. The MT system must be able to disambiguate homographs on the source side and choose the correct wordform on the target side. In this work, we propose a simple but effective method for homograph disambiguation and a method of wordform selection by introducing multi-choice lexical constraints. We also propose a metric to measure the terminological consistency of the translation. Our results have a significant improvement over the current SOTA in terms of terminological consistency without any loss of the BLEU score. All the code used in this work will be published as open-source.
References used
https://aclanthology.org/
The paper presents experiments in neural machine translation with lexical constraints into a morphologically rich language. In particular and we introduce a method and based on constrained decoding and which handles the inflected forms of lexical ent
One key ingredient of neural machine translation is the use of large datasets from different domains and resources (e.g. Europarl, TED talks). These datasets contain documents translated by professional translators using different but consistent tran
Many NLP models operate over sequences of subword tokens produced by hand-crafted tokenization rules and heuristic subword induction algorithms. A simple universal alternative is to represent every computerized text as a sequence of bytes via UTF-8,
The neural machine translation approach has gained popularity in machine translation because of its context analysing ability and its handling of long-term dependency issues. We have participated in the WMT21 shared task of similar language translati
Interactive-predictive translation is a collaborative iterative process and where human translators produce translations with the help of machine translation (MT) systems interactively. Various sampling techniques in active learning (AL) exist to upd