نحن نقدم جوهرة، معيار معيشة لتوليد اللغة الطبيعية (NLG)، تقييمه، ومقاييسه.تعتمد التقدم المحرز في NLG على نظام بيئي متطور باستمرار للمقاييس الآلية ومجموعات البيانات ومعايير التقييم البشري.نظرا لهذا الهدف المتحرك، لا تزال هناك نماذج جديدة غالبا ما لا تزال تقيمت في ولاية شركات الأنجلو المتداخلة مع مقاييس راسخة ولكنها معيبة ومقاييس.هذا الفصل يجعل من الصعب تحديد قيود النماذج والفرص الحالية للتقدم.تعالج GEM في معالجة هذه القيد هذه بيئة يمكن فيها تطبيق النماذج التي يمكن فيها تطبيقها بسهولة على مجموعة واسعة من المهام والتي يمكن اختبار استراتيجيات التقييم فيها.سيؤدي تحديثات منتظمة إلى المعيار إلى مساعدة أبحاث NLG على تصبح أكثر تعددا متعددة اللغات وتتطور التحدي إلى جانب النماذج.تعمل هذه الورقة كوصف للبيانات المهمة المشتركة 2021 في ورشة عمل GEM المرتبطة.
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.
References used
https://aclanthology.org/
The advent of Deep Learning and the availability of large scale datasets has accelerated research on Natural Language Generation with a focus on newer tasks and better models. With such rapid progress, it is vital to assess the extent of scientific p
Knowledge-enriched text generation poses unique challenges in modeling and learning, driving active research in several core directions, ranging from integrated modeling of neural representations and symbolic information in the sequential/hierarchica
We ask subjects whether they perceive as human-produced a bunch of texts, some of which are actually human-written, while others are automatically generated. We use this data to fine-tune a GPT-2 model to push it to generate more human-like texts, an
We propose an approach to automatically test for originality in generation tasks where no standard automatic measures exist. Our proposal addresses original uses of language, not necessarily original ideas. We provide an algorithm for our approach an
Counterfactuals are a valuable means for understanding decisions made by ML systems. However, the counterfactuals generated by the methods currently available for natural language text are either unrealistic or introduce imperceptible changes. We pro