Research papers, master and doctoral theses about Reading

Self- and Pseudo-self-supervised Prediction of Speaker and Key-utterance for Multi-party Dialogue Reading Comprehension

207 - Association for Computation Linguistics 2021 مقالة

Multi-party dialogue machine reading comprehension (MRC) brings tremendous challenge since it involves multiple speakers at one dialogue, resulting in intricate speaker information flows and noisy dialogue contexts. To alleviate such difficulties, pr evious models focus on how to incorporate these information using complex graph-based modules and additional manually labeled data, which is usually rare in real scenarios. In this paper, we design two labour-free self- and pseudo-self-supervised prediction tasks on speaker and key-utterance to implicitly model the speaker information flows, and capture salient clues in a long dialogue. Experimental results on two benchmark datasets have justified the effectiveness of our method over competitive baselines and current state-of-the-art models.

dialogue reading comprehension multi-party dialogue reading حوار قراءة الفهم قراءة الحوار متعدد الأحزاب صناعة حمض الفوسفور

Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension

572 - Association for Computation Linguistics 2021 مقالة

How can we generate concise explanations for multi-hop Reading Comprehension (RC)? The current strategies of identifying supporting sentences can be seen as an extractive question-focused summarization of the input text. However, these extractive exp lanations are not necessarily concise i.e. not minimally sufficient for answering a question. Instead, we advocate for an abstractive approach, where we propose to generate a question-focused, abstractive summary of input paragraphs and then feed it to an RC system. Given a limited amount of human-annotated abstractive explanations, we train the abstractive explainer in a semi-supervised manner, where we start from the supervised model and then train it further through trial and error maximizing a conciseness-promoted reward function. Our experiments demonstrate that the proposed abstractive explainer can generate more compact explanations than an extractive explainer with limited supervision (only 2k instances) while maintaining sufficiency.

multi-hop reading comprehension multi-hop reading فهم القراءة متعددة القفز قراءة متعددة القفز صناعة حمض الفوسفور

Extract, Integrate, Compete: Towards Verification Style Reading Comprehension

155 - Association for Computation Linguistics 2021 مقالة

In this paper, we present a new verification style reading comprehension dataset named VGaokao from Chinese Language tests of Gaokao. Different from existing efforts, the new dataset is originally designed for native speakers' evaluation, thus requir ing more advanced language understanding skills. To address the challenges in VGaokao, we propose a novel Extract-Integrate-Compete approach, which iteratively selects complementary evidence with a novel query updating mechanism and adaptively distills supportive evidence, followed by a pairwise competition to push models to learn the subtle difference among similar text pieces. Experiments show that our methods outperform various baselines on VGaokao with retrieved complementary evidence, while having the merits of efficiency and explainability. Our dataset and code are released for further research.

verification style reading style reading comprehension style reading قراءة نمط التحقق أسلوب القراءة الفهم قراءة النمط صناعة حمض الفوسفور المزيد..

Cross-Lingual Leveled Reading Based on Language-Invariant Features

137 - Association for Computation Linguistics 2021 مقالة

Leveled reading (LR) aims to automatically classify texts by the cognitive levels of readers, which is fundamental in providing appropriate reading materials regarding different reading capabilities. However, most state-of-the-art LR methods rely on the availability of copious annotated resources, which prevents their adaptation to low-resource languages like Chinese. In our work, to tackle LR in Chinese, we explore how different language transfer methods perform on English-Chinese LR. Specifically, we focus on adversarial training and cross-lingual pre-training method to transfer the LR knowledge learned from annotated data in the resource-rich English language to Chinese. For evaluation, we first introduce the age-based standard to align datasets with different leveling standards. Then we conduct experiments in both zero-shot and few-shot settings. Comparing these two methods, quantitative and qualitative evaluations show that the cross-lingual pre-training method effectively captures the language-invariant features between English and Chinese. We conduct analysis to propose further improvement in cross-lingual LR.

leveled reading based reading based leveled reading قراءة مقرها تعادل القراءة صناعة حمض الفوسفور

RoR: Read-over-Read for Long Document Machine Reading Comprehension

185 - Association for Computation Linguistics 2021 مقالة

Transformer-based pre-trained models, such as BERT, have achieved remarkable results on machine reading comprehension. However, due to the constraint of encoding length (e.g., 512 WordPiece tokens), a long document is usually split into multiple chun ks that are independently read. It results in the reading field being limited to individual chunks without information collaboration for long document machine reading comprehension. To address this problem, we propose RoR, a read-over-read method, which expands the reading field from chunk to document. Specifically, RoR includes a chunk reader and a document reader. The former first predicts a set of regional answers for each chunk, which are then compacted into a highly-condensed version of the original document, guaranteeing to be encoded once. The latter further predicts the global answers from this condensed document. Eventually, a voting strategy is utilized to aggregate and rerank the regional and global answers for final prediction. Extensive experiments on two benchmarks QuAC and TriviaQA demonstrate the effectiveness of RoR for long document reading. Notably, RoR ranks 1st place on the QuAC leaderboard (https://quac.ai/) at the time of submission (May 17th, 2021).

إجابة سؤال مقيدة long document machine document machine reading آلة وثيقة طويلة آلة وثيقة القراءة صناعة حمض الفوسفور

Less Is More: Domain Adaptation with Lottery Ticket for Reading Comprehension

322 - Association for Computation Linguistics 2021 مقالة

In this paper, we propose a simple few-shot domain adaptation paradigm for reading comprehension. We first identify the lottery subnetwork structure within the Transformer-based source domain model via gradual magnitude pruning. Then, we only fine-tu ne the lottery subnetwork, a small fraction of the whole parameters, on the annotated target domain data for adaptation. To obtain more adaptable subnetworks, we introduce self-attention attribution to weigh parameters, beyond simply pruning the smallest magnitude parameters, which can be seen as combining structured pruning and unstructured magnitude pruning softly. Experimental results show that our method outperforms the full model fine-tuning adaptation on four out of five domains when only a small amount of annotated data available for adaptation. Moreover, introducing self-attention attribution reserves more parameters for important attention heads in the lottery subnetwork and improves the target domain model performance. Our further analyses reveal that, besides exploiting fewer parameters, the choice of subnetworks is critical to the effectiveness.

توصية ticket for reading lottery ticket تذكرة للقراءة بطاقة اليانصيب صناعة حمض الفوسفور

Smoothing Dialogue States for Open Conversational Machine Reading

206 - Association for Computation Linguistics 2021 مقالة

Conversational machine reading (CMR) requires machines to communicate with humans through multi-turn interactions between two salient dialogue states of decision making and question generation processes. In open CMR settings, as the more realistic sc enario, the retrieved background knowledge would be noisy, which results in severe challenges in the information transmission. Existing studies commonly train independent or pipeline systems for the two subtasks. However, those methods are trivial by using hard-label decisions to activate question generation, which eventually hinders the model performance. In this work, we propose an effective gating strategy by smoothing the two dialogue states in only one decoder and bridge decision making and question generation to provide a richer dialogue state reference. Experiments on the OR-ShARC dataset show the effectiveness of our method, which achieves new state-of-the-art results.

conversational machine reading open conversational machine القراءة آلة المحادثة فتح آلة المحادثة صناعة حمض الفوسفور

Self Question-answering: Aspect-based Sentiment Analysis by Role Flipped Machine Reading Comprehension

404 - Association for Computation Linguistics 2021 مقالة

The pivot for the unified Aspect-based Sentiment Analysis (ABSA) is to couple aspect terms with their corresponding opinion terms, which might further derive easier sentiment predictions. In this paper, we investigate the unified ABSA task from the p erspective of Machine Reading Comprehension (MRC) by observing that the aspect and the opinion terms can serve as the query and answer in MRC interchangeably. We propose a new paradigm named Role Flipped Machine Reading Comprehension (RF-MRC) to resolve. At its heart, the predicted results of either the Aspect Term Extraction (ATE) or the Opinion Terms Extraction (OTE) are regarded as the queries, respectively, and the matched opinion or aspect terms are considered as answers. The queries and answers can be flipped for multi-hop detection. Finally, every matched aspect-opinion pair is predicted by the sentiment classifier. RF-MRC can solve the ABSA task without any additional data annotation or transformation. Experiments on three widely used benchmarks and a challenging dataset demonstrate the superiority of the proposed framework.

التسلسل الهرمي اللغوي flipped machine reading انقلبت آلة القراءة صناعة حمض الفوسفور

What If Sentence-hood is Hard to Define: A Case Study in Chinese Reading Comprehension

270 - Association for Computation Linguistics 2021 مقالة

Machine reading comprehension (MRC) is a challenging NLP task for it requires to carefully deal with all linguistic granularities from word, sentence to passage. For extractive MRC, the answer span has been shown mostly determined by key evidence lin guistic units, in which it is a sentence in most cases. However, we recently discovered that sentences may not be clearly defined in many languages to different extents, so that this causes so-called location unit ambiguity problem and as a result makes it difficult for the model to determine which sentence exactly contains the answer span when sentence itself has not been clearly defined at all. Taking Chinese language as a case study, we explain and analyze such a linguistic phenomenon and correspondingly propose a reader with Explicit Span-Sentence Predication to alleviate such a problem. Our proposed reader eventually helps achieve a new state-of-the-art on Chinese MRC benchmark and shows great potential in dealing with other languages.

hard to define sentence-hood is hard chinese reading comprehension من الصعب تحديد الجملة هود صعب القراءة الصينية الفهم صناعة حمض الفوسفور المزيد..

WebSRC: A Dataset for Web-Based Structural Reading Comprehension

342 - Association for Computation Linguistics 2021 مقالة

Web search is an essential way for humans to obtain information, but it's still a great challenge for machines to understand the contents of web pages. In this paper, we introduce the task of web-based structural reading comprehension. Given a web pa ge and a question about it, the task is to find an answer from the web page. This task requires a system not only to understand the semantics of texts but also the structure of the web page. Moreover, we proposed WebSRC, a novel Web-based Structural Reading Comprehension dataset. WebSRC consists of 400K question-answer pairs, which are collected from 6.4K web pages with corresponding HTML source code, screenshots, and metadata. Each question in WebSRC requires a certain structural understanding of a web page to answer, and the answer is either a text span on the web page or yes/no. We evaluate various strong baselines on our dataset to show the difficulty of our task. We also investigate the usefulness of structural information and visual features. Our dataset and baselines have been publicly available.

structural reading comprehension web-based structural reading فهم القراءة الهيكلية القراءة الهيكلية القائمة على الويب صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد