Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

المؤشر جوهرة: توليد اللغة الطبيعية، تقييمها ومقاييسه

363 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

الأوراق الأخيرة صناعة حمض الفوسفور

visit our facebook page

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.

References used

https://aclanthology.org/

rate research

A Tutorial on Evaluation Metrics used in Natural Language Generation

416 - Association for Computation Linguistics 2021 مقالة

The advent of Deep Learning and the availability of large scale datasets has accelerated research on Natural Language Generation with a focus on newer tasks and better models. With such rapid progress, it is vital to assess the extent of scientific p rogress made and identify the areas/components that need improvement. To accomplish this in an automatic and reliable manner, the NLP community has actively pursued the development of automatic evaluation metrics. Especially in the last few years, there has been an increasing focus on evaluation metrics, with several criticisms of existing metrics and proposals for several new metrics. This tutorial presents the evolution of automatic evaluation metrics to their current state along with the emerging trends in this field by specifically addressing the following questions: (i) What makes NLG evaluation challenging? (ii) Why do we need automatic evaluation metrics? (iii) What are the existing automatic evaluation metrics and how can they be organised in a coherent taxonomy? (iv) What are the criticisms and shortcomings of existing metrics? (v) What are the possible future directions of research?

الرسم البياني الشبكات العصبية صناعة حمض الفوسفور

Knowledge-Enriched Natural Language Generation

443 - Association for Computation Linguistics 2021 مقالة

Knowledge-enriched text generation poses unique challenges in modeling and learning, driving active research in several core directions, ranging from integrated modeling of neural representations and symbolic information in the sequential/hierarchica l/graphical structures, learning without direct supervisions due to the cost of structured annotation, efficient optimization and inference with massive and global constraints, to language grounding on multiple modalities, and generative reasoning with implicit commonsense knowledge and background knowledge. In this tutorial we will present a roadmap to line up the state-of-the-art methods to tackle these challenges on this cutting-edge problem. We will dive deep into various technical components: how to represent knowledge, how to feed knowledge into a generation model, how to evaluate generation results, and what are the remaining challenges?

knowledge-enriched natural language اللغة الطبيعية المخصبة المعرفة صناعة حمض الفوسفور

Human Perception in Natural Language Generation

380 - Association for Computation Linguistics 2021 مقالة

We ask subjects whether they perceive as human-produced a bunch of texts, some of which are actually human-written, while others are automatically generated. We use this data to fine-tune a GPT-2 model to push it to generate more human-like texts, an d observe that this fine-tuned model produces texts that are indeed perceived more human-like than the original model. Contextually, we show that our automatic evaluation strategy well correlates with human judgements. We also run a linguistic analysis to unveil the characteristics of human- vs machine-perceived language.

مستوى الصف Flesch-Kincaid perception in natural التصور في الطبيعية صناعة حمض الفوسفور

GOT: Testing for Originality in Natural Language Generation

712 - Association for Computation Linguistics 2021 مقالة

We propose an approach to automatically test for originality in generation tasks where no standard automatic measures exist. Our proposal addresses original uses of language, not necessarily original ideas. We provide an algorithm for our approach an d a run-time analysis. The algorithm, which finds all of the original fragments in a ground-truth corpus and can reveal whether a generated fragment copies an original without attribution, has a run-time complexity of theta(nlogn) where n is the number of sentences in the ground truth.

أوراق نقل النمط testing for originality originality in natural اختبار للأصالة الأصالة في الطبيعية صناعة حمض الفوسفور

Generating Realistic Natural Language Counterfactuals

362 - Association for Computation Linguistics 2021 مقالة

Counterfactuals are a valuable means for understanding decisions made by ML systems. However, the counterfactuals generated by the methods currently available for natural language text are either unrealistic or introduce imperceptible changes. We pro pose CounterfactualGAN: a method that combines a conditional GAN and the embeddings of a pretrained BERT encoder to model-agnostically generate realistic natural language text counterfactuals for explaining regression and classification tasks. Experimental results show that our method produces perceptibly distinguishable counterfactuals, while outperforming four baseline methods on fidelity and human judgments of naturalness, across multiple datasets and multiple predictive models.

realistic natural language generating realistic natural natural language text لغة طبيعية واقعية توليد طبيعي واقعي نص لغة طبيعية صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

المؤشر جوهرة: توليد اللغة الطبيعية، تقييمها ومقاييسه

Ask ChatGPT about the research

Read More

suggested questions