Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

GANDALF: a General Character Name Description Dataset for Long Fiction

Gandalf: اسم الشخصيات العامة الوصف DataSet لخيال طويل

641 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

long fiction general character multiple-choice question answering الخيال الطويل الشخصية العامة إجابة سؤال متعدد الاختيار صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper introduces a long-range multiple-choice Question Answering (QA) dataset, based on full-length fiction book texts. The questions are formulated as 10-way multiple-choice questions, where the task is to select the correct character name given a character description, or vice-versa. Each character description is formulated in natural text and often contains information from several sections throughout the book. We provide 20,000 questions created from 10,000 manually annotated descriptions of characters from 177 books containing 152,917 words on average. We address the current discourse regarding dataset bias and leakage by a simple anonymization procedure, which in turn enables interesting probing possibilities. Finally, we show that suitable baseline algorithms perform very poorly on this task, with the book size itself making it non-trivial to attempt a Transformer-based QA solution. This leaves ample room for future improvement, and hints at the need for a completely different type of solution.

References used

https://aclanthology.org/

rate research

Automatic Resolution of Domain Name Disputes

825 - Association for Computation Linguistics 2021 مقالة

We introduce the new task of domain name dispute resolution (DNDR), that predicts the outcome of a process for resolving disputes about legal entitlement to a domain name. TheICANN UDRP establishes a mandatory arbitration process for a dispute betwee n a trade-mark owner and a domain name registrant pertaining to a generic Top-Level Domain (gTLD) name (one ending in .COM, .ORG, .NET, etc). The nature of the problem leads to a very skewed data set, which stems from being able to register a domain name with extreme ease, very little expense, and no need to prove an entitlement to it. In this paper, we describe thetask and associated data set. We also present benchmarking results based on a range of mod-els, which show that simple baselines are in general difficult to beat due to the skewed data distribution, but in the specific case of the respondent having submitted a response, a fine-tuned BERT model offers considerable improvements over a majority-class model

automatic resolution domain القرار التلقائي اختصاص حل النزاع صناعة حمض الفوسفور

CodeQA: A Question Answering Dataset for Source Code Comprehension

873 - Association for Computation Linguistics 2021 مقالة

We propose CodeQA, a free-form question answering dataset for the purpose of source code comprehension: given a code snippet and a question, a textual answer is required to be generated. CodeQA contains a Java dataset with 119,778 question-answer pai rs and a Python dataset with 70,085 question-answer pairs. To obtain natural and faithful questions and answers, we implement syntactic rules and semantic analysis to transform code comments into question-answer pairs. We present the construction process and conduct systematic analysis of our dataset. Experiment results achieved by several neural baselines on our dataset are shown and discussed. While research on question-answering and machine reading comprehension develops rapidly, few prior work has drawn attention to code question answering. This new dataset can serve as a useful research benchmark for source code comprehension.

source code comprehension شفرة المصدر الفهم صناعة حمض الفوسفور

Character Analysis In A bdoul Karim Nassif's "Al Makhtoofoon"

1440 - Aِl-Baath University 2016 ورقة بحثية

This research provides a presentation of the character concept as a basic artistic element in the novel's structure, according to the structural concept, then it provides the character analysis in A bdoul Karim Nassif's "Al Makhtoofoon", presentin g the adopted methods by the writer to structure the characters, and their roles, and the significance of their name. Through analysis, we will figure out the artistic style adopted by the writer to employ the character as an artistic component wich has its prominent role in the novel's artistic structure.

تحليل الشخصيات رواية المخطوفون عبد الكريم ناصيف Character Analysis Abdoul Karim Nassif Al Makhtoofoon

Single-dataset Experts for Multi-dataset Question Answering

766 - Association for Computation Linguistics 2021 مقالة

Many datasets have been created for training reading comprehension models, and a natural question is whether we can combine them to build models that (1) perform better on all of the training datasets and (2) generalize and transfer better to new dat asets. Prior work has addressed this goal by training one network simultaneously on multiple datasets, which works well on average but is prone to over- or under-fitting different sub- distributions and might transfer worse compared to source models with more overlap with the target dataset. Our approach is to model multi-dataset question answering with an ensemble of single-dataset experts, by training a collection of lightweight, dataset-specific adapter modules (Houlsby et al., 2019) that share an underlying Transformer model. We find that these Multi-Adapter Dataset Experts (MADE) outperform all our baselines in terms of in-distribution accuracy, and simple methods based on parameter-averaging lead to better zero-shot generalization and few-shot transfer performance, offering a strong and versatile starting point for building new reading comprehension systems.

multi-dataset question answering multi-dataset question استجابة سؤال متعددة البيانات سؤال متعدد البيانات صناعة حمض الفوسفور

Integrating Higher-Level Semantics into Robust Biomedical Name Representations

728 - Association for Computation Linguistics 2021 مقالة

Neural encoders of biomedical names are typically considered robust if representations can be effectively exploited for various downstream NLP tasks. To achieve this, encoders need to model domain-specific biomedical semantics while rivaling the univ ersal applicability of pretrained self-supervised representations. Previous work on robust representations has focused on learning low-level distinctions between names of fine-grained biomedical concepts. These fine-grained concepts can also be clustered together to reflect higher-level, more general semantic distinctions, such as grouping the names nettle sting and tick-borne fever together under the description puncture wound of skin. It has not yet been empirically confirmed that training biomedical name encoders on fine-grained distinctions automatically leads to bottom-up encoding of such higher-level semantics. In this paper, we show that this bottom-up effect exists, but that it is still relatively limited. As a solution, we propose a scalable multi-task training regime for biomedical name encoders which can also learn robust representations using only higher-level semantic classes. These representations can generalise both bottom-up as well as top-down among various semantic hierarchies. Moreover, we show how they can be used out-of-the-box for improved unsupervised detection of hypernyms, while retaining robust performance on various semantic relatedness benchmarks.

الكشف عن التقارير الذاتية representations downstream nlp tasks التوكيلات مهام الدفيئة NLP صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

GANDALF: a General Character Name Description Dataset for Long Fiction

Gandalf: اسم الشخصيات العامة الوصف DataSet لخيال طويل

Ask ChatGPT about the research

Read More

suggested questions