نؤيد موضوع اتجاه الترجمة في البيانات المستخدمة لتدريب أنظمة الترجمة الآلية العصبية والتركيز على سيناريو في العالم الحقيقي مع اتجاه الترجمة المعروفة والاختلالات في اتجاه الترجمة: هانزارد الكندي.وفقا للمقاييس التلقائية ونحن نلاحظ أنه باستخدام البيانات الموازية التي تم إنتاجها في "اتجاه الترجمة" المطابقة (الهدف الأصيل والترجمة) يحسن جودة الترجمة.في حالات عدم توازن البيانات من حيث اتجاه الترجمة ونتجد أن وضع العلامات على اتجاه الترجمة يمكن إغلاق فجوة الأداء.نقوم بإجراء تقييم بشري يختلف قليلا عن المقاييس التلقائية، لكنه يؤكد ذلك على هذه البيانات الفرنسية الإنجليزية المعروفة لاحتواء ترجمات عالية الجودة ومصدر مختلط أصيل أو مختار على تحسين المصدر المرتبط بالترجمة للتدريب.
We revisit the topic of translation direction in the data used for training neural machine translation systems and focusing on a real-world scenario with known translation direction and imbalances in translation direction: the Canadian Hansard. According to automatic metrics and we observe that using parallel data that was produced in the matching'' translation direction (Authentic source and translationese target) improves translation quality. In cases of data imbalance in terms of translation direction and we find that tagging of translation direction can close the performance gap. We perform a human evaluation that differs slightly from the automatic metrics and but nevertheless confirms that for this French-English dataset that is known to contain high-quality translations and authentic or tagged mixed source improves over translationese source for training.
References used
https://aclanthology.org/
This work describes analysis of nature and causes of MT errors observed by different evaluators under guidance of different quality criteria: adequacy and comprehension and and a not specified generic mixture of adequacy and fluency. We report result
Language is contextual as meanings of words are dependent on their contexts. Contextuality is, concomitantly, a well-defined concept in quantum mechanics where it is considered a major resource for quantum computations. We investigate whether natural
In this paper, we present the systems submitted by our team from the Institute of ICT (HEIG-VD / HES-SO) to the Unsupervised MT and Very Low Resource Supervised MT task. We first study the improvements brought to a baseline system by techniques such
We present the findings of the WMT2021 Shared Tasks in Unsupervised MT and Very Low Resource Supervised MT. Within the task, the community studied very low resource translation between German and Upper Sorbian, unsupervised translation between German
Large, pre-trained transformer language models, which are pervasive in natural language processing tasks, are notoriously expensive to train. To reduce the cost of training such large models, prior work has developed smaller, more compact models whic