Research papers, master and doctoral theses about world

Relational World Knowledge Representation in Contextual Language Models: A Review

664 - Association for Computation Linguistics 2021 مقالة

Relational knowledge bases (KBs) are commonly used to represent world knowledge in machines. However, while advantageous for their high degree of precision and interpretability, KBs are usually organized according to manually-defined schemas, which l imit their expressiveness and require significant human efforts to engineer and maintain. In this review, we take a natural language processing perspective to these limitations, examining how they may be addressed in part by training deep contextual language models (LMs) to internalize and express relational knowledge in more flexible forms. We propose to organize knowledge representation strategies in LMs by the level of KB supervision provided, from no KB supervision at all to entity- and relation-level supervision. Our contributions are threefold: (1) We provide a high-level, extensible taxonomy for knowledge representation in LMs; (2) Within our taxonomy, we highlight notable models, evaluation tasks, and findings, in order to provide an up-to-date review of current knowledge representation capabilities in LMs; and (3) We suggest future research directions that build upon the complementary aspects of LMs and KBs as knowledge representations.

الاهتمام العصبي يدرك التسلسل الهرمي relational world knowledge world knowledge representation المعرفة العالمية العلائقية تمثيل المعرفة العالمي صناعة حمض الفوسفور

Asking Questions Like Educational Experts: Automatically Generating Question-Answer Pairs on Real-World Examination Data

814 - Association for Computation Linguistics 2021 مقالة

Generating high quality question-answer pairs is a hard but meaningful task. Although previous works have achieved great results on answer-aware question generation, it is difficult to apply them into practical application in the education field. Thi s paper for the first time addresses the question-answer pair generation task on the real-world examination data, and proposes a new unified framework on RACE. To capture the important information of the input passage we first automatically generate (rather than extracting) keyphrases, thus this task is reduced to keyphrase-question-answer triplet joint generation. Accordingly, we propose a multi-agent communication model to generate and optimize the question and keyphrases iteratively, and then apply the generated question and keyphrases to guide the generation of answers. To establish a solid benchmark, we build our model on the strong generative pre-training model. Experimental results show that our model makes great breakthroughs in the question-answer pair generation task. Moreover, we make a comprehensive analysis on our model, suggesting new directions for this challenging task.

educational experts real-world examination data generating question-answer pairs خبراء تعليمي بيانات فحص العالم الحقيقي توليد أزواج الإجابة السؤال صناعة حمض الفوسفور المزيد..

Fighting the COVID-19 Infodemic: Modeling the Perspective of Journalists, Fact-Checkers, Social Media Platforms, Policy Makers, and the Society

707 - Association for Computation Linguistics 2021 مقالة

With the emergence of the COVID-19 pandemic, the political and the medical aspects of disinformation merged as the problem got elevated to a whole new level to become the first global infodemic. Fighting this infodemic has been declared one of the mo st important focus areas of the World Health Organization, with dangers ranging from promoting fake cures, rumors, and conspiracy theories to spreading xenophobia and panic. Addressing the issue requires solving a number of challenging problems such as identifying messages containing claims, determining their check-worthiness and factuality, and their potential to do harm as well as the nature of that harm, to mention just a few. To address this gap, we release a large dataset of 16K manually annotated tweets for fine-grained disinformation analysis that (i) focuses on COVID-19, (ii) combines the perspectives and the interests of journalists, fact-checkers, social media platforms, policy makers, and society, and (iii) covers Arabic, Bulgarian, Dutch, and English. Finally, we show strong evaluation results using pretrained Transformers, thus confirming the practical utility of the dataset in monolingual vs. multilingual, and single task vs. multitask settings.

لغة مختلطة متعددة modeling the perspective world health organization نمذجة المنظور منظمة الصحة العالمية صناعة حمض الفوسفور

SD-QA: Spoken Dialectal Question Answering for the Real World

763 - Association for Computation Linguistics 2021 مقالة

Question answering (QA) systems are now available through numerous commercial applications for a wide variety of domains, serving millions of users that interact with them via speech interfaces. However, current benchmarks in QA research do not accou nt for the errors that speech recognition models might introduce, nor do they consider the language variations (dialects) of the users. To address this gap, we augment an existing QA dataset to construct a multi-dialect, spoken QA benchmark on five languages (Arabic, Bengali, English, Kiswahili, Korean) with more than 68k audio prompts in 24 dialects from 255 speakers. We provide baseline results showcasing the real-world performance of QA systems and analyze the effect of language variety and other sensitive speaker attributes on downstream performance. Last, we study the fairness of the ASR and QA models with respect to the underlying user populations.

dialectal question answering spoken dialectal question real world الرد على السؤال الجدلي التحدث اللهوج السؤال العالم الحقيقي صناعة حمض الفوسفور المزيد..

ODIST: Open World Classification via Distributionally Shifted Instances

558 - Association for Computation Linguistics 2021 مقالة

In this work, we address the open-world classification problem with a method called ODIST, open world classification via distributionally shifted instances. This novel and straightforward method can create out-of-domain instances from the in-domain t raining instances with the help of a pre-trained generative language model. Experimental results show that ODIST performs better than state-of-the-art decision boundary finding method.

open world classification distributionally shifted instances open world تصنيف العالم المفتوح حالات تحولت تدريجيا عالم مفتوح صناعة حمض الفوسفور المزيد..

Generating Justifications in a Spatial Question-Answering Dialogue System for a Blocks World

669 - Association for Computation Linguistics 2021 مقالة

As AI reaches wider adoption, designing systems that are explainable and interpretable becomes a critical necessity. In particular, when it comes to dialogue systems, their reasoning must be transparent and must comply with human intuitions in order for them to be integrated seamlessly into day-to-day collaborative human-machine activities. Here, we describe our ongoing work on a (general purpose) dialogue system equipped with a spatial specialist with explanatory capabilities. We applied this system to a particular task of characterizing spatial configurations of blocks in a simple physical Blocks World (BW) domain using natural locative expressions, as well as generating justifications for the proposed spatial descriptions by indicating the factors that the system used to arrive at a particular conclusion.

question-answering dialogue system spatial question-answering dialogue blocks world نظام الإجابة على السؤال حوار مسألة مكانية كتل العالم صناعة حمض الفوسفور المزيد..

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

672 - Association for Computation Linguistics 2021 مقالة

Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic co rrection while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homophonic errors are fairly common in Chinese ASR. In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. Our experiment results on real world speech recognition datasets showed that our proposed method has evidently lower CER than the baseline model, which utilized a pre-trained BERT MLM as the corrector.

chinese speech recognition world speech recognition اعتراف الكلام الصينية اعتراف الكلام العالمي صناعة حمض الفوسفور

Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks

738 - Association for Computation Linguistics 2021 مقالة

Large volumes of interaction logs can be collected from NLP systems that are deployed in the real world. How can this wealth of information be leveraged? Using such interaction logs in an offline reinforcement learning (RL) setting is a promising app roach. However, due to the nature of NLP tasks and the constraints of production systems, a series of challenges arise. We present a concise overview of these challenges and discuss possible solutions.

human feedback feedback in real-world offline reinforcement learning ردود الفعل الإنسانية ردود الفعل في العالم الحقيقي التعزيز التعزيز غير متصل صناعة حمض الفوسفور المزيد..

Apple Core-dination: Linguistic Feedback and Learning in a Speech-to-Action Shared World Game

662 - Association for Computation Linguistics 2021 مقالة

We investigate the question of how adaptive feedback from a virtual agent impacts the linguistic input of the user in a shared world game environment. To do so, we carry out an exploratory pilot study to observe how individualized linguistic feedback affects the user's speech input. We introduce a speech-controlled game, Apple Core-dination, in which an agent learns complex tasks using a base knowledge of simple actions. The agent is equipped with a learning mechanism for mapping new commands to sequences of simple actions, as well as the ability to incorporate user input into written responses. The agent repeatedly shares its internal knowledge state by responding to what it knows and does not know about language meaning and the shared environment. Our paper focuses on the linguistic feedback loop in order to analyze the nature of user input. Feedback from the agent is provided in the form of visual movement and written linguistic responses. Particular attention is given to incorporating user input into agent responses and updating the speech-to-action mappings based on commands provided by the user. Through our pilot study, we analyze task success and compare the lexical features of user input. Results show variation in input length and lexical variety across users, suggesting a correlation between the two that can be studied further.

shared world game shared world world game لعبة العالم المشترك العالم المشترك لعبة العالم صناعة حمض الفوسفور المزيد..

Measuring Prefixation and Suffixation in the Languages of the World

762 - Association for Computation Linguistics 2021 مقالة

It has long been recognized that suffixing is more common than prefixing in the languages of the world. More detailed statistics on this tendency are needed to sharpen proposed explanations for this tendency. The classic approach to gathering data on the prefix/suffix preference is for a human to read grammatical descriptions (948 languages), which is time-consuming and involves discretization judgments. In this paper we explore two machine-driven approaches for prefix and suffix statistics which are crude approximations, but have advantages in terms of time and replicability. The first simply searches a large collection of grammatical descriptions for occurrences of the terms prefix' and suffix' (4 287 languages). The second counts substrings from raw text data in a way indirectly reflecting prefixation and suffixation (1 030 languages, using New Testament translations). The three approaches largely agree in their measurements but there are important theoretical and practical differences. In all measurements, there is an overall preference for suffixation, albeit only slightly, at ratios ranging between 0.51 and 0.68.

languages measuring prefixation world اللغات قياس التقاسم عالم صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد