Incorporating Terminology Constraints in Automatic Post-Editing

108 0 0.0 ( 0 )

Download Cite

Added by David Wan

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors David Wan - Chris Kedzie - Faisal Ladhak

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Users of machine translation (MT) may want to ensure the use of specific lexical terminologies. While there exist techniques for incorporating terminology constraints during inference for MT, current APE approaches cannot ensure that they will appear in the final translation. In this paper, we present both autoregressive and non-autoregressive models for lexically constrained APE, demonstrating that our approach enables preservation of 95% of the terminologies and also improves translation quality on English-German benchmarks. Even when applied to lexically constrained MT output, our approach is able to improve preservation of the terminologies. However, we show that our models do not learn to copy constraints systematically and suggest a simple data augmentation technique that leads to improved performance and robustness.

rate research

UdS Submission for the WMT 19 Automatic Post-Editing Task

81 - Hongfei Xu , Qiuhui Liu , Josef van Genabith 2019

In this paper, we describe our submission to the English-German APE shared task at WMT 2019. We utilize and adapt an NMT architecture originally developed for exploiting context information to APE, implement this in our own transformer model and explore joint training of the APE task with a de-noising encoder.

Computation and Language

Encouraging Neural Machine Translation to Satisfy Terminology Constraints

100 - Melissa Ailem , Jinghsu Liu , Raheel Qader 2021

We present a new approach to encourage neural machine translation to satisfy lexical constraints. Our method acts at the training step and thereby avoiding the introduction of any extra computational overhead at inference step. The proposed method combines three main ingredients. The first one consists in augmenting the training data to specify the constraints. Intuitively, this encourages the model to learn a copy behavior when it encounters constraint terms. Compared to previous work, we use a simplified augmentation strategy without source factors. The second ingredient is constraint token masking, which makes it even easier for the model to learn the copy behavior and generalize better. The third one, is a modification of the standard cross entropy loss to bias the model towards assigning high probabilities to constraint words. Empirical results show that our method improves upon related baselines in terms of both BLEU score and the percentage of generated constraint terms.

Computation and Language Artificial Intelligence

Repairing Pronouns in Translation with BERT-Based Post-Editing

82 - Reid Pryzant 2021

Pronouns are important determinants of a texts meaning but difficult to translate. This is because pronoun choice can depend on entities described in previous sentences, and in some languages pronouns may be dropped when the referent is inferrable from the context. These issues can lead Neural Machine Translation (NMT) systems to make critical errors on pronouns that impair intelligibility and even reinforce gender bias. We investigate the severity of this pronoun issue, showing that (1) in some domains, pronoun choice can account for more than half of a NMT systems errors, and (2) pronouns have a disproportionately large impact on perceived translation quality. We then investigate a possible solution: fine-tuning BERT on a pronoun prediction task using chunks of source-side sentences, then using the resulting classifier to repair the translations of an existing NMT model. We offer an initial case study of this approach for the Japanese-English language pair, observing that a small number of translations are significantly improved according to human evaluators.

Computation and Language

MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset

100 - Marina Fomicheva , Shuo Sun , Erick Fonseca 2020

We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains eleven language pairs, with human labels for up to 10,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.

Computation and Language

Automatic Non-Linear Video Editing Transfer

334 - Nathan Frey , Peggy Chi , Weilong Yang 2021

We propose an automatic approach that extracts editing styles in a source video and applies the edits to matched footage for video creation. Our Computer Vision based techniques considers framing, content type, playback speed, and lighting of each input video segment. By applying a combination of these features, we demonstrate an effective method that automatically transfers the visual and temporal styles from professionally edited videos to unseen raw footage. We evaluated our approach with real-world videos that contained a total of 3872 video shots of a variety of editing styles, including different subjects, camera motions, and lighting. We reported feedback from survey participants who reviewed a set of our results.

Computer Vision and Pattern Recognition

comments

Fetching comments

Peninsula Private University

Additional details More universities

Incorporating Terminology Constraints in Automatic Post-Editing

Ask ChatGPT about the research

No Arabic abstract

Read More