Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

It's Commonsense, isn't it? Demystifying Human Evaluations in Commonsense-Enhanced NLG Systems

انها المنطقية، أليس كذلك؟إزالة الغموض التقييمات البشرية في نظم NLG المعززة في المنطقية

830 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

commonsense-enhanced nlg systems commonsense-enhanced nlg nlg systems نظم NLG المحسنة للعمليات المنطقية المحسنة NLG أنظمة NLG. صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Common sense is an integral part of human cognition which allows us to make sound decisions, communicate effectively with others and interpret situations and utterances. Endowing AI systems with commonsense knowledge capabilities will help us get closer to creating systems that exhibit human intelligence. Recent efforts in Natural Language Generation (NLG) have focused on incorporating commonsense knowledge through large-scale pre-trained language models or by incorporating external knowledge bases. Such systems exhibit reasoning capabilities without common sense being explicitly encoded in the training set. These systems require careful evaluation, as they incorporate additional resources during training which adds additional sources of errors. Additionally, human evaluation of such systems can have significant variation, making it impossible to compare different systems and define baselines. This paper aims to demystify human evaluations of commonsense-enhanced NLG systems by proposing the Commonsense Evaluation Card (CEC), a set of recommendations for evaluation reporting of commonsense-enhanced NLG systems, underpinned by an extensive analysis of human evaluations reported in the recent literature.

References used

https://aclanthology.org/

rate research

The ReproGen Shared Task on Reproducibility of Human Evaluations in NLG: Overview and Results

756 - Association for Computation Linguistics 2021 مقالة

The NLP field has recently seen a substantial increase in work related to reproducibility of results, and more generally in recognition of the importance of having shared definitions and practices relating to evaluation. Much of the work on reproduci bility has so far focused on metric scores, with reproducibility of human evaluation results receiving far less attention. As part of a research programme designed to develop theory and practice of reproducibility assessment in NLP, we organised the first shared task on reproducibility of human evaluations, ReproGen 2021. This paper describes the shared task in detail, summarises results from each of the reproduction studies submitted, and provides further comparative analysis of the results. Out of nine initial team registrations, we received submissions from four teams. Meta-analysis of the four reproduction studies revealed varying degrees of reproducibility, and allowed very tentative first conclusions about what types of evaluation tend to have better reproducibility.

human evaluation results reproducibility نتائج التقييم البشري قابلية اعادة الأنتاج صناعة حمض الفوسفور

Does Commonsense help in detecting Sarcasm?

970 - Association for Computation Linguistics 2021 مقالة

Sarcasm detection is important for several NLP tasks such as sentiment identification in product reviews, user feedback, and online forums. It is a challenging task requiring a deep understanding of language, context, and world knowledge. In this pap er, we investigate whether incorporating commonsense knowledge helps in sarcasm detection. For this, we incorporate commonsense knowledge into the prediction process using a graph convolution network with pre-trained language model embeddings as input. Our experiments with three sarcasm detection datasets indicate that the approach does not outperform the baseline model. We perform an exhaustive set of experiments to analyze where commonsense support adds value and where it hurts classification. Our implementation is publicly available at: https://github.com/brcsomnath/commonsense-sarcasm.

تصحيح التشكل detecting sarcasm صناعة حمض الفوسفور

Improving Abstractive Summarization with Commonsense Knowledge

643 - Association for Computation Linguistics 2021 مقالة

Large scale pretrained models have demonstrated strong performances on several natural language generation and understanding benchmarks. However, introducing commonsense into them to generate more realistic text remains a challenge. Inspired from pre vious work on commonsense knowledge generation and generative commonsense reasoning, we introduce two methods to add commonsense reasoning skills and knowledge into abstractive summarization models. Both methods beat the baseline on ROUGE scores, demonstrating the superiority of our models over the baseline. Human evaluation results suggest that summaries generated by our methods are more realistic and have fewer commonsensical errors.

improving abstractive summarization improving abstractive تحسين تلخيص الجماعي تحسين المبادرة صناعة حمض الفوسفور

NegatER: Unsupervised Discovery of Negatives in Commonsense Knowledge Bases

910 - Association for Computation Linguistics 2021 مقالة

Codifying commonsense knowledge in machines is a longstanding goal of artificial intelligence. Recently, much progress toward this goal has been made with automatic knowledge base (KB) construction techniques. However, such techniques focus primarily on the acquisition of positive (true) KB statements, even though negative (false) statements are often also important for discriminative reasoning over commonsense KBs. As a first step toward the latter, this paper proposes NegatER, a framework that ranks potential negatives in commonsense KBs using a contextual language model (LM). Importantly, as most KBs do not contain negatives, NegatER relies only on the positive knowledge in the LM and does not require ground-truth negative examples. Experiments demonstrate that, compared to multiple contrastive data augmentation approaches, NegatER yields negatives that are more grammatical, coherent, and informative---leading to statistically significant accuracy improvements in a challenging KB completion task and confirming that the positive knowledge in LMs can be re-purposed'' to generate negative knowledge.

unsupervised discovery commonsense knowledge bases automatic knowledge base اكتمال اكتمال قواعد المعرفة المنطقية قاعدة المعرفة التلقائية صناعة حمض الفوسفور المزيد..

Lawyers are Dishonest? Quantifying Representational Harms in Commonsense Knowledge Resources

664 - Association for Computation Linguistics 2021 مقالة

Warning: this paper contains content that may be offensive or upsetting. Commonsense knowledge bases (CSKB) are increasingly used for various natural language processing tasks. Since CSKBs are mostly human-generated and may reflect societal biases, i t is important to ensure that such biases are not conflated with the notion of commonsense. Here we focus on two widely used CSKBs, ConceptNet and GenericsKB, and establish the presence of bias in the form of two types of representational harms, overgeneralization of polarized perceptions and representation disparity across different demographic groups in both CSKBs. Next, we find similar representational harms for downstream models that use ConceptNet. Finally, we propose a filtering-based approach for mitigating such harms, and observe that our filtered-based approach can reduce the issues in both resources and models but leads to a performance drop, leaving room for future work to build fairer and stronger commonsense models.

lawyers are dishonest dishonest representational harms المحامون غير أمين غير شريفة الأضرار التمثيلية صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

It's Commonsense, isn't it? Demystifying Human Evaluations in Commonsense-Enhanced NLG Systems

انها المنطقية، أليس كذلك؟إزالة الغموض التقييمات البشرية في نظم NLG المعززة في المنطقية

Ask ChatGPT about the research

Read More

suggested questions