New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Human Evaluation of Creative NLG Systems: An Interdisciplinary Survey on Recent Papers

التقييم البشري لأنظمة NLG الإبداعية: مسح متعدد التخصصات على الأوراق الأخيرة

252 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

creative nlg systems creative nlg recent papers أنظمة NLG الإبداعية الأوراق الأخيرة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We survey human evaluation in papers presenting work on creative natural language generation that have been published in INLG 2020 and ICCC 2020. The most typical human evaluation method is a scaled survey, typically on a 5 point scale, while many other less common methods exist. The most commonly evaluated parameters are meaning, syntactic correctness, novelty, relevance and emotional value, among many others. Our guidelines for future evaluation include clearly defining the goal of the generative system, asking questions as concrete as possible, testing the evaluation setup, using multiple different evaluation setups, reporting the entire evaluation process and potential biases clearly, and finally analyzing the evaluation results in a more profound way than merely reporting the most typical statistics.

References used

https://aclanthology.org/

rate research

Recent Neural Methods on Dialogue State Tracking for Task-Oriented Dialogue Systems: A Survey

385 - Association for Computation Linguistics 2021 مقالة

This paper aims at providing a comprehensive overview of recent developments in dialogue state tracking (DST) for task-oriented conversational systems. We introduce the task, the main datasets that have been exploited as well as their evaluation metr ics, and we analyze several proposed approaches. We distinguish between static ontology DST models, which predict a fixed set of dialogue states, and dynamic ontology models, which can predict dialogue states even when the ontology changes. We also discuss the model's ability to track either single or multiple domains and to scale to new domains, both in terms of knowledge transfer and zero-shot learning. We cover a period from 2013 to 2020, showing a significant increase of multiple domain methods, most of them utilizing pre-trained language models.

تتبع الدولة recent neural methods task-oriented dialogue systems الأساليب العصبية الأخيرة نظم الحوار الموجهة نحو المهام صناعة حمض الفوسفور

The Use of Corpora in an Interdisciplinary Approach to Localization

344 - Association for Computation Linguistics 2021 مقالة

Translation Studies and more specifically, its subfield Descriptive Translation Studies [Holmes 1988/2000] is, according to many scholars [Gambier, 2009; Nenopoulou, 2007; Munday, 2001/2008; Hermans, 1999; Snell-Hornby et al., 1994 e.t.c], a highly i nterdisciplinary field of study. The aim of the present paper is to describe the role of polysemiotic corpora in the study of university website localization from a multidisciplinary perspective. More specifically, the paper gives an overview of an on-going postdoctoral research on the identity formation of Greek university websites on the web, focusing on the methodology adopted with reference to corpora compilation based on methodological tools and concepts from various fields such as Translation Studies, social semiotics, cultural studies, critical discourse analysis and marketing. The objects of comparative analysis are Greek and French original and translated (into English) university websites as well as original British and American university website versions. Up to now research findings have shown that polysemiotic corpora can be a valuable tool not only of quantitative but also of qualitative analysis of website localization both for scholars and translation professionals working with multimodal genres.

interdisciplinary approach descriptive translation studies translation studies نهج متعدد التخصصات دراسات الترجمة الوصفية دراسات الترجمة صناعة حمض الفوسفور المزيد..

Detecting Post-Edited References and Their Effect on Human Evaluation

257 - Association for Computation Linguistics 2021 مقالة

This paper provides a quick overview of possible methods how to detect that reference translations were actually created by post-editing an MT system. Two methods based on automatic metrics are presented: BLEU difference between the suspected MT and some other good MT and BLEU difference using additional references. These two methods revealed a suspicion that the WMT 2020 Czech reference is based on MT. The suspicion was confirmed in a manual analysis by finding concrete proofs of the post-editing procedure in particular sentences. Finally, a typology of post-editing changes is presented where typical errors or changes made by the post-editor or errors adopted from the MT are classified.

كلمة أصلية تضمين effect on human detecting post-edited references تأثير على الإنسان الكشف عن المراجع بعد التحرير صناعة حمض الفوسفور

A Review of Human Evaluation for Style Transfer

321 - Association for Computation Linguistics 2021 مقالة

This paper reviews and summarizes human evaluation practices described in 97 style transfer papers with respect to three main evaluation aspects: style transfer, meaning preservation, and fluency. In principle, evaluations by human raters should be t he most reliable. However, in style transfer papers, we find that protocols for human evaluations are often underspecified and not standardized, which hampers the reproducibility of research in this field and progress toward better human and automatic evaluation methods.

استجابة شخصية style transfer papers أوراق نقل النمط صناعة حمض الفوسفور

The Great Misalignment Problem in Human Evaluation of NLP Methods

338 - Association for Computation Linguistics 2021 مقالة

We outline the Great Misalignment Problem in natural language processing research, this means simply that the problem definition is not in line with the method proposed and the human evaluation is not in line with the definition nor the method. We st udy this misalignment problem by surveying 10 randomly sampled papers published in ACL 2020 that report results with human evaluation. Our results show that only one paper was fully in line in terms of problem definition, method and evaluation. Only two papers presented a human evaluation that was in line with what was modeled in the method. These results highlight that the Great Misalignment Problem is a major one and it affects the validity and reproducibility of results obtained by a human evaluation.

great misalignment problem great misalignment misalignment problem مشكلة اختلال كبيرة اختلال كبير مشكلة اختلال صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Human Evaluation of Creative NLG Systems: An Interdisciplinary Survey on Recent Papers

التقييم البشري لأنظمة NLG الإبداعية: مسح متعدد التخصصات على الأوراق الأخيرة

Ask ChatGPT about the research

Read More

suggested questions