Research papers, master and doctoral theses about multiple

Enhancing Multiple-choice Machine Reading Comprehension by Punishing Illogical Interpretations

208 - Association for Computation Linguistics 2021 مقالة

Machine Reading Comprehension (MRC), which requires a machine to answer questions given the relevant documents, is an important way to test machines' ability to understand human language. Multiple-choice MRC is one of the most studied tasks in MRC du e to the convenience of evaluation and the flexibility of answer format. Post-hoc interpretation aims to explain a trained model and reveal how the model arrives at the prediction. One of the most important interpretation forms is to attribute model decisions to input features. Based on post-hoc interpretation methods, we assess attributions of paragraphs in multiple-choice MRC and improve the model by punishing the illogical attributions. Our method can improve model performance without any external information and model structure change. Furthermore, we also analyze how and why such a self-training method works.

مشاكل الكلمات multiple-choice machine reading قراءة آلة متعددة الاختيار صناعة حمض الفوسفور

KERS: A Knowledge-Enhanced Framework for Recommendation Dialog Systems with Multiple Subgoals

147 - Association for Computation Linguistics 2021 مقالة

Recommendation dialogs require the system to build a social bond with users to gain trust and develop affinity in order to increase the chance of a successful recommendation. It is beneficial to divide up, such conversations with multiple subgoals (s uch as social chat, question answering, recommendation, etc.), so that the system can retrieve appropriate knowledge with better accuracy under different subgoals. In this paper, we propose a unified framework for common knowledge-based multi-subgoal dialog: knowledge-enhanced multi-subgoal driven recommender system (KERS). We first predict a sequence of subgoals and use them to guide the dialog model to select knowledge from a sub-set of existing knowledge graph. We then propose three new mechanisms to filter noisy knowledge and to enhance the inclusion of cleaned knowledge in the dialog response generation process. Experiments show that our method obtains state-of-the-art results on DuRecDial dataset in both automatic and human evaluation.

recommendation dialog systems multiple subgoals recommendation أنظمة حوار التوصية فرعية متعددة توصية صناعة حمض الفوسفور المزيد..

Novel Natural Language Summarization of Program Code via Leveraging Multiple Input Representations

131 - Association for Computation Linguistics 2021 مقالة

The lack of description of a given program code acts as a big hurdle to those developers new to the code base for its understanding. To tackle this problem, previous work on code summarization, the task of automatically generating code description gi ven a piece of code reported that an auxiliary learning model trained to produce API (Application Programming Interface) embeddings showed promising results when applied to a downstream, code summarization model. However, different codes having different summaries can have the same set of API sequences. If we train a model to generate summaries given an API sequence, the model will not be able to learn effectively. Nevertheless, we note that the API sequence can still be useful and has not been actively utilized. This work proposes a novel multi-task approach that simultaneously trains two similar tasks: 1) summarizing a given code (code to summary), and 2) summarizing a given API sequence (API sequence to summary). We propose a novel code-level encoder based on BERT capable of expressing the semantics of code, and obtain representations for every line of code. Our work is the first code summarization work that utilizes a natural language-based contextual pre-trained language model in its encoder. We evaluate our approach using two common datasets (Java and Python) that have been widely used in previous studies. Our experimental results show that our multi-task approach improves over the baselines and achieves the new state-of-the-art.

leveraging multiple input multiple input representations leveraging multiple الاستفادة من مدخلات متعددة تمثيلات الإدخال المتعددة الاستفادة المتعددة صناعة حمض الفوسفور المزيد..

GANDALF: a General Character Name Description Dataset for Long Fiction

157 - Association for Computation Linguistics 2021 مقالة

This paper introduces a long-range multiple-choice Question Answering (QA) dataset, based on full-length fiction book texts. The questions are formulated as 10-way multiple-choice questions, where the task is to select the correct character name give n a character description, or vice-versa. Each character description is formulated in natural text and often contains information from several sections throughout the book. We provide 20,000 questions created from 10,000 manually annotated descriptions of characters from 177 books containing 152,917 words on average. We address the current discourse regarding dataset bias and leakage by a simple anonymization procedure, which in turn enables interesting probing possibilities. Finally, we show that suitable baseline algorithms perform very poorly on this task, with the book size itself making it non-trivial to attempt a Transformer-based QA solution. This leaves ample room for future improvement, and hints at the need for a completely different type of solution.

long fiction general character multiple-choice question answering الخيال الطويل الشخصية العامة إجابة سؤال متعدد الاختيار صناعة حمض الفوسفور المزيد..

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents

168 - Association for Computation Linguistics 2021 مقالة

We propose MultiDoc2Dial, a new task and dataset on modeling goal-oriented dialogues grounded in multiple documents. Most previous works treat document-grounded dialogue modeling as machine reading comprehension task based on a single given document or passage. In this work, we aim to address more realistic scenarios where a goal-oriented information-seeking conversation involves multiple topics, and hence is grounded on different documents. To facilitate such task, we introduce a new dataset that contains dialogues grounded in multiple documents from four different domains. We also explore modeling the dialogue-based and document-based contexts in the dataset. We present strong baseline approaches and various experimental results, aiming to support further research efforts on such a task.

dialogues grounded multiple documents grounded in multiple وضع الحوارات وثائق متعددة تستند إلى عدة صناعة حمض الفوسفور المزيد..

Unsupervised Multiple Choices Question Answering: Start Learning from Basic Knowledge

61 - Association for Computation Linguistics 2021 مقالة

In this paper, we study the possibility of unsupervised Multiple Choices Question Answering (MCQA). From very basic knowledge, the MCQA model knows that some choices have higher probabilities of being correct than others. The information, though very noisy, guides the training of an MCQA model. The proposed method is shown to outperform the baseline approaches on RACE and is even comparable with some supervised learning approaches on MC500.

choices question answering multiple choices question unsupervised multiple choices الخيارات الإجابة على الإجابة خيارات متعددة سؤال الخيارات متعددة غير المدعومة صناعة حمض الفوسفور المزيد..

When Retriever-Reader Meets Scenario-Based Multiple-Choice Questions

194 - Association for Computation Linguistics 2021 مقالة

Scenario-based question answering (SQA) requires retrieving and reading paragraphs from a large corpus to answer a question which is contextualized by a long scenario description. Since a scenario contains both keyphrases for retrieval and much noise , retrieval for SQA is extremely difficult. Moreover, it can hardly be supervised due to the lack of relevance labels of paragraphs for SQA. To meet the challenge, in this paper we propose a joint retriever-reader model called JEEVES where the retriever is implicitly supervised only using QA labels via a novel word weighting mechanism. JEEVES significantly outperforms a variety of strong baselines on multiple-choice questions in three SQA datasets.

scenario-based question answering scenario-based multiple-choice questions sqa السيناريو على أساس السؤال الرد سيناريو أسئلة متعددة الخيارات القائمة على السيناريو SQA. صناعة حمض الفوسفور المزيد..

Relation Extraction Using Multiple Pre-Training Models in Biomedical Domain

242 - Association for Computation Linguistics 2021 مقالة

The number of biomedical documents is increasing rapidly. Accordingly, a demand for extracting knowledge from large-scale biomedical texts is also increasing. BERT-based models are known for their high performance in various tasks. However, it is oft en computationally expensive. A high-end GPU environment is not available in many situations. To attain both high accuracy and fast extraction speed, we propose combinations of simpler pre-trained models. Our method outperforms the latest state-of-the-art model and BERT-based models on the GAD corpus. In addition, our method shows approximately three times faster extraction speed than the BERT-based models on the ChemProt corpus and reduces the memory size to one sixth of the BERT ones.

multiple pre-training models multiple pre-training biomedical domain نماذج متعددة التدريب مسبقا متعددة ما قبل التدريب النطاق الطبي الطبيعي صناعة حمض الفوسفور المزيد..

ur-iw-hnt at GermEval 2021: An Ensembling Strategy with Multiple BERT Models

206 - Association for Computation Linguistics 2021 مقالة

This paper describes our approach (ur-iw-hnt) for the Shared Task of GermEval2021 to identify toxic, engaging, and fact-claiming comments. We submitted three runs using an ensembling strategy by majority (hard) voting with multiple different BERT mod els of three different types: German-based, Twitter-based, and multilingual models. All ensemble models outperform single models, while BERTweet is the winner of all individual models in every subtask. Twitter-based models perform better than GermanBERT models, and multilingual models perform worse but by a small margin.

multiple bert models ensembling strategy bert models نماذج بيرت متعددة استراتيجية شبه نماذج بيرت صناعة حمض الفوسفور المزيد..

Learning Entity-Likeness with Multiple Approximate Matches for Biomedical NER

197 - Association for Computation Linguistics 2021 مقالة

Biomedical Named Entities are complex, so approximate matching has been used to improve entity coverage. However, the usual approximate matching approach fetches only one matching result, which is often noisy. In this work, we propose a method for bi omedical NER that fetches multiple approximate matches for a given phrase to leverage their variations to estimate entity-likeness. The model uses pooling to discard the unnecessary information from the noisy matching results, and learn the entity-likeness of the phrase with multiple approximate matches. Experimental results on three benchmark datasets from the biomedical domain, BC2GM, NCBI-disease, and BC4CHEMD, demonstrate the effectiveness. Our model improves the average by up to +0.21 points compared to a BioBERT-based NER.

multiple approximate matches biomedical named entities approximate matches مباريات تقريبية متعددة الكيانات المسماة الطبية الحيوية المباريات التقريبية صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد