Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Case Study of Efficacy and Challenges in Practical Human-in-Loop Evaluation of NLP Systems Using Checklist

دراسة حالة للفعالية والتحديات في التقييم العملي للإنسان في الحلقة التابعة للنظم NLP باستخدام قائمة مرجعية

605 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

nlp systems study of efficacy نظم NLP. صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Despite state-of-the-art performance, NLP systems can be fragile in real-world situations. This is often due to insufficient understanding of the capabilities and limitations of models and the heavy reliance on standard evaluation benchmarks. Research into non-standard evaluation to mitigate this brittleness is gaining increasing attention. Notably, the behavioral testing principle Checklist', which decouples testing from implementation revealed significant failures in state-of-the-art models for multiple tasks. In this paper, we present a case study of using Checklist in a practical scenario. We conduct experiments for evaluating an offensive content detection system and use a data augmentation technique for improving the model using insights from Checklist. We lay out the challenges and open questions based on our observations of using Checklist for human-in-loop evaluation and improvement of NLP systems. Disclaimer: The paper contains examples of content with offensive language. The examples do not represent the views of the authors or their employers towards any person(s), group(s), practice(s), or entity/entities.

References used

https://aclanthology.org/

rate research

`Just What do You Think You're Doing, Dave?' A Checklist for Responsible Data Use in NLP

586 - Association for Computation Linguistics 2021 مقالة

A key part of the NLP ethics movement is responsible use of data, but exactly what that means or how it can be best achieved remain unclear. This position paper discusses the core legal and ethical principles for collection and sharing of textual dat a, and the tensions between them. We propose a potential checklist for responsible data (re-)use that could both standardise the peer review of conference submissions, as well as enable a more in-depth view of published research across the community. Our proposal aims to contribute to the development of a consistent standard for data (re-)use, embraced across NLP conferences.

dave responsible data ديف البيانات المسؤولة صناعة حمض الفوسفور

The Great Misalignment Problem in Human Evaluation of NLP Methods

580 - Association for Computation Linguistics 2021 مقالة

We outline the Great Misalignment Problem in natural language processing research, this means simply that the problem definition is not in line with the method proposed and the human evaluation is not in line with the definition nor the method. We st udy this misalignment problem by surveying 10 randomly sampled papers published in ACL 2020 that report results with human evaluation. Our results show that only one paper was fully in line in terms of problem definition, method and evaluation. Only two papers presented a human evaluation that was in line with what was modeled in the method. These results highlight that the Great Misalignment Problem is a major one and it affects the validity and reproducibility of results obtained by a human evaluation.

great misalignment problem great misalignment misalignment problem مشكلة اختلال كبيرة اختلال كبير مشكلة اختلال صناعة حمض الفوسفور المزيد..

Reliability of Human Evaluation for Text Summarization: Lessons Learned and Challenges Ahead

776 - Association for Computation Linguistics 2021 مقالة

Only a small portion of research papers with human evaluation for text summarization provide information about the participant demographics, task design, and experiment protocol. Additionally, many researchers use human evaluation as gold standard wi thout questioning the reliability or investigating the factors that might affect the reliability of the human evaluation. As a result, there is a lack of best practices for reliable human summarization evaluation grounded by empirical evidence. To investigate human evaluation reliability, we conduct a series of human evaluation experiments, provide an overview of participant demographics, task design, experimental set-up and compare the results from different experiments. Based on our empirical analysis, we provide guidelines to ensure the reliability of expert and non-expert evaluations, and we determine the factors that might affect the reliability of the human evaluation.

lessons learned challenges ahead learned and challenges الدروس المستفادة التحديات في المستقبل المستفادة والتحديات صناعة حمض الفوسفور المزيد..

Case Study: Deontological Ethics in NLP

508 - Association for Computation Linguistics 2021 مقالة

Recent work in natural language processing (NLP) has focused on ethical challenges such as understanding and mitigating bias in data and algorithms; identifying objectionable content like hate speech, stereotypes and offensive language; and building frameworks for better system design and data handling practices. However, there has been little discussion about the ethical foundations that underlie these efforts. In this work, we study one ethical theory, namely deontological ethics, from the perspective of NLP. In particular, we focus on the generalization principle and the respect for autonomy through informed consent. We provide four case studies to demonstrate how these principles can be used with NLP systems. We also recommend directions to avoid the ethical issues in these systems.

deontological ethics ethical الأخلاقية صناعة حمض الفوسفور

Flamingos and Hedgehogs in the Croquet-Ground: Teaching Evaluation of NLP Systems for Undergraduate Students

545 - Association for Computation Linguistics 2021 مقالة

This report describes the course Evaluation of NLP Systems, taught for Computational Linguistics undergraduate students during the winter semester 20/21 at the University of Potsdam, Germany. It was a discussion-based seminar that covered different a spects of evaluation in NLP, namely paradigms, common procedures, data annotation, metrics and measurements, statistical significance testing, best practices and common approaches in specific NLP tasks and applications.

linguistics undergraduate students flamingos and hedgehogs computational linguistics undergraduate اللغويات الطلاب الجامعية فلامنجوس والقنفذ اللغويات الحسابية المرحلة الجامعية صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Case Study of Efficacy and Challenges in Practical Human-in-Loop Evaluation of NLP Systems Using Checklist

دراسة حالة للفعالية والتحديات في التقييم العملي للإنسان في الحلقة التابعة للنظم NLP باستخدام قائمة مرجعية

Ask ChatGPT about the research

Read More

suggested questions