New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Detecting Post-Edited References and Their Effect on Human Evaluation

اكتشاف المراجع بعد التحرير وتأثيرها على التقييم البشري

152 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

كلمة أصلية تضمين effect on human detecting post-edited references تأثير على الإنسان الكشف عن المراجع بعد التحرير صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper provides a quick overview of possible methods how to detect that reference translations were actually created by post-editing an MT system. Two methods based on automatic metrics are presented: BLEU difference between the suspected MT and some other good MT and BLEU difference using additional references. These two methods revealed a suspicion that the WMT 2020 Czech reference is based on MT. The suspicion was confirmed in a manual analysis by finding concrete proofs of the post-editing procedure in particular sentences. Finally, a typology of post-editing changes is presented where typical errors or changes made by the post-editor or errors adopted from the MT are classified.

References used

https://aclanthology.org/

rate research

A Review of Human Evaluation for Style Transfer

221 - Association for Computation Linguistics 2021 مقالة

This paper reviews and summarizes human evaluation practices described in 97 style transfer papers with respect to three main evaluation aspects: style transfer, meaning preservation, and fluency. In principle, evaluations by human raters should be t he most reliable. However, in style transfer papers, we find that protocols for human evaluations are often underspecified and not standardized, which hampers the reproducibility of research in this field and progress toward better human and automatic evaluation methods.

استجابة شخصية style transfer papers أوراق نقل النمط صناعة حمض الفوسفور

Human Evaluation of Creative NLG Systems: An Interdisciplinary Survey on Recent Papers

170 - Association for Computation Linguistics 2021 مقالة

We survey human evaluation in papers presenting work on creative natural language generation that have been published in INLG 2020 and ICCC 2020. The most typical human evaluation method is a scaled survey, typically on a 5 point scale, while many ot her less common methods exist. The most commonly evaluated parameters are meaning, syntactic correctness, novelty, relevance and emotional value, among many others. Our guidelines for future evaluation include clearly defining the goal of the generative system, asking questions as concrete as possible, testing the evaluation setup, using multiple different evaluation setups, reporting the entire evaluation process and potential biases clearly, and finally analyzing the evaluation results in a more profound way than merely reporting the most typical statistics.

creative nlg systems creative nlg recent papers أنظمة NLG الإبداعية الأوراق الأخيرة صناعة حمض الفوسفور

A Preliminary Study on Evaluating Consultation Notes With Post-Editing

212 - Association for Computation Linguistics 2021 مقالة

Automatic summarisation has the potential to aid physicians in streamlining clerical tasks such as note taking. But it is notoriously difficult to evaluate these systems and demonstrate that they are safe to be used in a clinical setting. To circumve nt this issue, we propose a semi-automatic approach whereby physicians post-edit generated notes before submitting them. We conduct a preliminary study on the time saving of automatically generated consultation notes with post-editing. Our evaluators are asked to listen to mock consultations and to post-edit three generated notes. We time this and find that it is faster than writing the note from scratch. We present insights and lessons learnt from this experiment.

evaluating consultation notes study on evaluating evaluating consultation تقييم ملاحظات الاستشارة دراسة على تقييم تقييم الاستشارة صناعة حمض الفوسفور المزيد..

Benchmarking ASR Systems Based on Post-Editing Effort and Error Analysis

409 - Association for Computation Linguistics 2021 مقالة

This paper offers a comparative evaluation of four commercial ASR systems which are evaluated according to the post-editing effort required to reach publishable'' quality and according to the number of errors they produce. For the error annotation ta sk, an original error typology for transcription errors is proposed. This study also seeks to examine whether there is a difference in the performance of these systems between native and non-native English speakers. The experimental results suggest that among the four systems, Trint obtains the best scores. It is also observed that most systems perform noticeably better with native speakers and that all systems are most prone to fluency errors.

asr systems based benchmarking asr systems benchmarking asr أنظمة العصر مقرها معيار أنظمة ASR. معيار العسر صناعة حمض الفوسفور المزيد..

Reliability of Human Evaluation for Text Summarization: Lessons Learned and Challenges Ahead

176 - Association for Computation Linguistics 2021 مقالة

Only a small portion of research papers with human evaluation for text summarization provide information about the participant demographics, task design, and experiment protocol. Additionally, many researchers use human evaluation as gold standard wi thout questioning the reliability or investigating the factors that might affect the reliability of the human evaluation. As a result, there is a lack of best practices for reliable human summarization evaluation grounded by empirical evidence. To investigate human evaluation reliability, we conduct a series of human evaluation experiments, provide an overview of participant demographics, task design, experimental set-up and compare the results from different experiments. Based on our empirical analysis, we provide guidelines to ensure the reliability of expert and non-expert evaluations, and we determine the factors that might affect the reliability of the human evaluation.

lessons learned challenges ahead learned and challenges الدروس المستفادة التحديات في المستقبل المستفادة والتحديات صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Detecting Post-Edited References and Their Effect on Human Evaluation

اكتشاف المراجع بعد التحرير وتأثيرها على التقييم البشري

Ask ChatGPT about the research

Read More

suggested questions