New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Tell Me What You Read: Automatic Expertise-Based Annotator Assignment for Text Annotation in Expert Domains

أخبرني ما تقرأه: التذعار التلقائية المهمة القائمة على الخبرة التلقائي لشرح النص في مجالات الخبراء

261 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

automatic annotator assignment automatic expertise-based annotator expert domains الاحصان التلقائي الخبرات التلقائية القائمة على الخبرة مجالات الخبراء صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper investigates the effectiveness of automatic annotator assignment for text annotation in expert domains. In the task of creating high-quality annotated corpora, expert domains often cover multiple sub-domains (e.g. organic and inorganic chemistry in the chemistry domain) either explicitly or implicitly. Therefore, it is crucial to assign annotators to documents relevant with their fine-grained domain expertise. However, most of existing methods for crowdsoucing estimate reliability of each annotator or annotated instance only after the annotation process. To address the issue, we propose a method to estimate the domain expertise of each annotator before the annotation process using information easily available from the annotators beforehand. We propose two measures to estimate the annotator expertise: an explicit measure using the predefined categories of sub-domains, and an implicit measure using distributed representations of the documents. The experimental results on chemical name annotation tasks show that the annotation accuracy improves when both explicit and implicit measures for annotator assignment are combined.

References used

https://aclanthology.org/

rate research

Automatic Speech-Based Checklist for Medical Simulations

414 - Association for Computation Linguistics 2021 مقالة

Medical simulators provide a controlled environment for training and assessing clinical skills. However, as an assessment platform, it requires the presence of an experienced examiner to provide performance feedback, commonly preformed using a task s pecific checklist. This makes the assessment process inefficient and expensive. Furthermore, this evaluation method does not provide medical practitioners the opportunity for independent training. Ideally, the process of filling the checklist should be done by a fully-aware objective system, capable of recognizing and monitoring the clinical performance. To this end, we have developed an autonomous and a fully automatic speech-based checklist system, capable of objectively identifying and validating anesthesia residents' actions in a simulation environment. Based on the analyzed results, our system is capable of recognizing most of the tasks in the checklist: F1 score of 0.77 for all of the tasks, and F1 score of 0.79 for the verbal tasks. Developing an audio-based system will improve the experience of a wide range of simulation platforms. Furthermore, in the future, this approach may be implemented in the operation room and emergency room. This could facilitate the development of automatic assistive technologies for these domains.

medical simulators provide automatic speech-based checklist checklist المحاكاة الطبية توفر قائمة المراجعة القائمة على الكلام قائمة تدقيق صناعة حمض الفوسفور المزيد..

Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain

349 - Association for Computation Linguistics 2021 مقالة

This paper presents the results of the WMT21 Metrics Shared Task. Participants were asked to score the outputs of the translation systems competing in the WMT21 News Translation Task with automatic metrics on two different domains: news and TED talks . All metrics were evaluated on how well they correlate at the system- and segment-level with human ratings. Contrary to previous years' editions, this year we acquired our own human ratings based on expert-based human evaluation via Multidimensional Quality Metrics (MQM). This setup had several advantages: (i) expert-based evaluation has been shown to be more reliable, (ii) we were able to evaluate all metrics on two different domains using translations of the same MT systems, (iii) we added 5 additional translations coming from the same system during system development. In addition, we designed three challenge sets that evaluate the robustness of all automatic metrics. We present an extensive analysis on how well metrics perform on three language pairs: English to German, English to Russian and Chinese to English. We further show the impact of different reference translations on reference-based metrics and compare our expert-based MQM annotation with the DA scores acquired by WMT.

metrics shared task metrics shared مقاييس مشتركة المهمة تقاسم المقاييس صناعة حمض الفوسفور

Style Pooling: Automatic Text Style Obfuscation for Improved Classification Fairness

614 - Association for Computation Linguistics 2021 مقالة

Text style can reveal sensitive attributes of the author (e.g. age and race) to the reader, which can, in turn, lead to privacy violations and bias in both human and algorithmic decisions based on text. For example, the style of writing in job applic ations might reveal protected attributes of the candidate which could lead to bias in hiring decisions, regardless of whether hiring decisions are made algorithmically or by humans. We propose a VAE-based framework that obfuscates stylistic features of human-generated text through style transfer, by automatically re-writing the text itself. Critically, our framework operationalizes the notion of obfuscated style in a flexible way that enables two distinct notions of obfuscated style: (1) a minimal notion that effectively intersects the various styles seen in training, and (2) a maximal notion that seeks to obfuscate by adding stylistic features of all sensitive attributes to text, in effect, computing a union of styles. Our style-obfuscation framework can be used for multiple purposes, however, we demonstrate its effectiveness in improving the fairness of downstream classifiers. We also conduct a comprehensive study on style-pooling's effect on fluency, semantic consistency, and attribute removal from text, in two and three domain style transfer.

improved classification fairness obfuscation for improved improved classification تحسين تصنيف التصنيف التعويض عن تحسين تحسين التصنيف صناعة حمض الفوسفور المزيد..

Leveraging Community and Author Context to Explain the Performance and Bias of Text-Based Deception Detection Models

676 - Association for Computation Linguistics 2021 مقالة

Deceptive news posts shared in online communities can be detected with NLP models, and much recent research has focused on the development of such models. In this work, we use characteristics of online communities and authors --- the context of how a nd where content is posted --- to explain the performance of a neural network deception detection model and identify sub-populations who are disproportionately affected by model accuracy or failure. We examine who is posting the content, and where the content is posted to. We find that while author characteristics are better predictors of deceptive content than community characteristics, both characteristics are strongly correlated with model performance. Traditional performance metrics such as F1 score may fail to capture poor model performance on isolated sub-populations such as specific authors, and as such, more nuanced evaluation of deception detection models is critical.

bias of text-based text-based deception detection deception detection models انحياز من النص كشف الخداع المستند إلى النص نماذج الكشف عن الخداع صناعة حمض الفوسفور المزيد..

The Biomaterials Annotator: a system for ontology-based concept annotation of biomaterials text

332 - Association for Computation Linguistics 2021 مقالة

Biomaterials are synthetic or natural materials used for constructing artificial organs, fabricating prostheses, or replacing tissues. The last century saw the development of thousands of novel biomaterials and, as a result, an exponential increase i n scientific publications in the field. Large-scale analysis of biomaterials and their performance could enable data-driven material selection and implant design. However, such analysis requires identification and organization of concepts, such as materials and structures, from published texts. To facilitate future information extraction and the application of machine-learning techniques, we developed a semantic annotator specifically tailored for the biomaterials literature. The Biomaterials Annotator has been implemented following a modular organization using software containers for the different components and orchestrated using Nextflow as workflow manager. Natural language processing (NLP) components are mainly developed in Java. This set-up has allowed named entity recognition of seventeen classes relevant to the biomaterials domain. Here we detail the development, evaluation and performance of the system, as well as the release of the first collection of annotated biomaterials abstracts. We make both the corpus and system available to the community to promote future efforts in the field and contribute towards its sustainability.

ontology-based concept annotation biomaterials biomaterials annotator التوضيحية المفهوم القائم على OnTology المواد الحيوية المواد الحيوية Annotator. صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Tell Me What You Read: Automatic Expertise-Based Annotator Assignment for Text Annotation in Expert Domains

أخبرني ما تقرأه: التذعار التلقائية المهمة القائمة على الخبرة التلقائي لشرح النص في مجالات الخبراء

Ask ChatGPT about the research

Read More

suggested questions