Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Is This Translation Error Critical?: Classification-Based Human and Automatic Machine Translation Evaluation Focusing on Critical Errors

هل هذا خطأ الترجمة أمر بالغ الأهمية؟: تقييم الترجمة من الآلات والآلية القائمة على التصنيف يركز على الأخطاء الحرجة

681 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

translation evaluation focusing تقييم التركيز التركيز صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper discusses a classification-based approach to machine translation evaluation, as opposed to a common regression-based approach in the WMT Metrics task. Recent machine translation usually works well but sometimes makes critical errors due to just a few wrong word choices. Our classification-based approach focuses on such errors using several error type labels, for practical machine translation evaluation in an age of neural machine translation. We made additional annotations on the WMT 2015-2017 Metrics datasets with fluency and adequacy labels to distinguish different types of translation errors from syntactic and semantic viewpoints. We present our human evaluation criteria for the corpus development and automatic evaluation experiments using the corpus. The human evaluation corpus will be publicly available upon publication.

References used

https://aclanthology.org/

rate research

Error Identification for Machine Translation with Metric Embedding and Attention

555 - Association for Computation Linguistics 2021 مقالة

Quality Estimation (QE) for Machine Translation has been shown to reach relatively high accuracy in predicting sentence-level scores, relying on pretrained contextual embeddings and human-produced quality scores. However, the lack of explanations alo ng with decisions made by end-to-end neural models makes the results difficult to interpret. Furthermore, word-level annotated datasets are rare due to the prohibitive effort required to perform this task, while they could provide interpretable signals in addition to sentence-level QE outputs. In this paper, we propose a novel QE architecture which tackles both the word-level data scarcity and the interpretability limitations of recent approaches. Sentence-level and word-level components are jointly pretrained through an attention mechanism based on synthetic data and a set of MT metrics embedded in a common space. Our approach is evaluated on the Eval4NLP 2021 shared task and our submissions reach the first position in all language pairs. The extraction of metric-to-input attention weights show that different metrics focus on different parts of the source and target text, providing strong rationales in the decision-making process of the QE model.

خط أنابيب مستقلة identification for machine error identification تحديد الهوية تحديد الخطأ صناعة حمض الفوسفور

BLEU, METEOR, BERTScore: Evaluation of Metrics Performance in Assessing Critical Translation Errors in Sentiment-Oriented Text

812 - Association for Computation Linguistics 2021 مقالة

Social media companies as well as censorship authorities make extensive use of artificial intelligence (AI) tools to monitor postings of hate speech, celebrations of violence or profanity. Since AI software requires massive volumes of data to train c omputers, automatic-translation of the online content is usually implemented to compensate for the scarcity of text in some languages. However, machine translation (MT) mistakes are a regular occurrence when translating sentiment-oriented user-generated content (UGC), especially when a low-resource language is involved. In such scenarios, the adequacy of the whole process relies on the assumption that the translation can be evaluated correctly. In this paper, we assess the ability of automatic quality metrics to detect critical machine translation errors which can cause serious misunderstanding of the affect message. We compare the performance of three canonical metrics on meaningless translations as compared to meaningful translations with a critical error that distorts the overall sentiment of the source text. We demonstrate the need for the fine-tuning of automatic metrics to make them more robust in detecting sentiment critical errors.

assessing critical translation assessing critical performance in assessing تقييم الترجمة الهامة تقييم الحرجة الأداء في التقييم صناعة حمض الفوسفور المزيد..

Competence-based Curriculum Learning for Multilingual Machine Translation

674 - Association for Computation Linguistics 2021 مقالة

Currently, multilingual machine translation is receiving more and more attention since it brings better performance for low resource languages (LRLs) and saves more space. However, existing multilingual machine translation models face a severe challe nge: imbalance. As a result, the translation performance of different languages in multilingual translation models are quite different. We argue that this imbalance problem stems from the different learning competencies of different languages. Therefore, we focus on balancing the learning competencies of different languages and propose Competence-based Curriculum Learning for Multilingual Machine Translation, named CCL-M. Specifically, we firstly define two competencies to help schedule the high resource languages (HRLs) and the low resource languages: 1) Self-evaluated Competence, evaluating how well the language itself has been learned; and 2) HRLs-evaluated Competence, evaluating whether an LRL is ready to be learned according to HRLs' Self-evaluated Competence. Based on the above competencies, we utilize the proposed CCL-M algorithm to gradually add new languages into the training set in a curriculum learning manner. Furthermore, we propose a novel competence-aware dynamic balancing sampling strategy for better selecting training samples in multilingual training. Experimental results show that our approach has achieved a steady and significant performance gain compared to the previous state-of-the-art approach on the TED talks dataset.

لغة إزالة السموم صناعة حمض الفوسفور

Using CollGram to Compare Formulaic Language in Human and Machine Translation

748 - Association for Computation Linguistics 2021 مقالة

A comparison of formulaic sequences in human and neural machine translation of quality newspaper articles shows that neural machine translations contain less lower-frequency, but strongly-associated formulaic sequences (FSs), and more high-frequency FSs. These observations can be related to the differences between second language learners of various levels and between translated and untranslated texts. The comparison between the neural machine translation systems indicates that some systems produce more FSs of both types than other systems.

compare formulaic language collgram to compare قارن لغة صيغة collgram للمقارنة صناعة حمض الفوسفور

Explaining Errors in Machine Translation with Absolute Gradient Ensembles

958 - Association for Computation Linguistics 2021 مقالة

Current research on quality estimation of machine translation focuses on the sentence-level quality of the translations. By using explainability methods, we can use these quality estimations for word-level error identification. In this work, we compa re different explainability techniques and investigate gradient-based and perturbation-based methods by measuring their performance and required computational efforts. Throughout our experiments, we observed that using absolute word scores boosts the performance of gradient-based explainers significantly. Further, we combine explainability methods to ensembles to exploit the strengths of individual explainers to get better explanations. We propose the usage of absolute gradient-based methods. These work comparably well to popular perturbation-based ones while being more time-efficient.

absolute gradient ensembles gradient ensembles فرم التدرج المطلق فرم التدرج صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Is This Translation Error Critical?: Classification-Based Human and Automatic Machine Translation Evaluation Focusing on Critical Errors

هل هذا خطأ الترجمة أمر بالغ الأهمية؟: تقييم الترجمة من الآلات والآلية القائمة على التصنيف يركز على الأخطاء الحرجة

Ask ChatGPT about the research

Read More

suggested questions