New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Does Structure Matter? Encoding Documents for Machine Reading Comprehension

هل الهيكل يهم؟وثائق الترميز لآلة القراءة الفهم

346 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

structure matter هيكل مسألة صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Machine reading comprehension is a challenging task especially for querying documents with deep and interconnected contexts. Transformer-based methods have shown advanced performances on this task; however, most of them still treat documents as a flat sequence of tokens. This work proposes a new Transformer-based method that reads a document as tree slices. It contains two modules for identifying more relevant text passage and the best answer span respectively, which are not only jointly trained but also jointly consulted at inference time. Our evaluation results show that our proposed method outperforms several competitive baseline approaches on two datasets from varied domains.

References used

https://aclanthology.org/

rate research

Adversarial Training for Machine Reading Comprehension with Virtual Embeddings

541 - Association for Computation Linguistics 2021 مقالة

Adversarial training (AT) as a regularization method has proved its effectiveness on various tasks. Though there are successful applications of AT on some NLP tasks, the distinguishing characteristics of NLP tasks have not been exploited. In this pap er, we aim to apply AT on machine reading comprehension (MRC) tasks. Furthermore, we adapt AT for MRC tasks by proposing a novel adversarial training method called PQAT that perturbs the embedding matrix instead of word vectors. To differentiate the roles of passages and questions, PQAT uses additional virtual P/Q-embedding matrices to gather the global perturbations of words from passages and questions separately. We test the method on a wide range of MRC tasks, including span-based extractive RC and multiple-choice RC. The results show that adversarial training is effective universally, and PQAT further improves the performance.

فورانيا الموازية adversarial training machine reading التدريب الخصم آلة قراءة صناعة حمض الفوسفور

What does BERT Learn from Arabic Machine Reading Comprehension Datasets?

499 - Association for Computation Linguistics 2021 مقالة

In machine reading comprehension tasks, a model must extract an answer from the available context given a question and a passage. Recently, transformer-based pre-trained language models have achieved state-of-the-art performance in several natural la nguage processing tasks. However, it is unclear whether such performance reflects true language understanding. In this paper, we propose adversarial examples to probe an Arabic pre-trained language model (AraBERT), leading to a significant performance drop over four Arabic machine reading comprehension datasets. We present a layer-wise analysis for the transformer's hidden states to offer insights into how AraBERT reasons to derive an answer. The experiments indicate that AraBERT relies on superficial cues and keyword matching rather than text understanding. Furthermore, hidden state visualization demonstrates that prediction errors can be recognized from vector representations in earlier layers.

machine reading comprehension bert learn reading comprehension datasets آلة قراءة الآلة بيرت تعلم قراءة مجموعات البيانات الفهم صناعة حمض الفوسفور المزيد..

RoR: Read-over-Read for Long Document Machine Reading Comprehension

268 - Association for Computation Linguistics 2021 مقالة

Transformer-based pre-trained models, such as BERT, have achieved remarkable results on machine reading comprehension. However, due to the constraint of encoding length (e.g., 512 WordPiece tokens), a long document is usually split into multiple chun ks that are independently read. It results in the reading field being limited to individual chunks without information collaboration for long document machine reading comprehension. To address this problem, we propose RoR, a read-over-read method, which expands the reading field from chunk to document. Specifically, RoR includes a chunk reader and a document reader. The former first predicts a set of regional answers for each chunk, which are then compacted into a highly-condensed version of the original document, guaranteeing to be encoded once. The latter further predicts the global answers from this condensed document. Eventually, a voting strategy is utilized to aggregate and rerank the regional and global answers for final prediction. Extensive experiments on two benchmarks QuAC and TriviaQA demonstrate the effectiveness of RoR for long document reading. Notably, RoR ranks 1st place on the QuAC leaderboard (https://quac.ai/) at the time of submission (May 17th, 2021).

إجابة سؤال مقيدة long document machine document machine reading آلة وثيقة طويلة آلة وثيقة القراءة صناعة حمض الفوسفور

Relying on Discourse Analysis to Answer Complex Questions by Neural Machine Reading Comprehension

315 - Association for Computation Linguistics 2021 مقالة

Machine reading comprehension (MRC) is one of the most challenging tasks in natural language processing domain. Recent state-of-the-art results for MRC have been achieved with the pre-trained language models, such as BERT and its modifications. Despi te the high performance of these models, they still suffer from the inability to retrieve correct answers from the detailed and lengthy passages. In this work, we introduce a novel scheme for incorporating the discourse structure of the text into a self-attention network, and, thus, enrich the embedding obtained from the standard BERT encoder with the additional linguistic knowledge. We also investigate the influence of different types of linguistic information on the model's ability to answer complex questions that require deep understanding of the whole text. Experiments performed on the SQuAD benchmark and more complex question answering datasets have shown that linguistic enhancing boosts the performance of the standard BERT model significantly.

تحديد اللغة الهجومية neural machine reading آلة القراءة العصبية صناعة حمض الفوسفور

A Study on Contextualized Language Modeling for Machine Reading Comprehension

531 - Association for Computation Linguistics 2021 مقالة

With the recent breakthrough of deep learning technologies, research on machine reading comprehension (MRC) has attracted much attention and found its versatile applications in many use cases. MRC is an important natural language processing (NLP) tas k aiming to assess the ability of a machine to understand natural language expressions, which is typically operationalized by first asking questions based on a given text paragraph and then receiving machine-generated answers in accordance with the given context paragraph and questions. In this paper, we leverage two novel pretrained language models built on top of Bidirectional Encoder Representations from Transformers (BERT), namely BERT-wwm and MacBERT, to develop effective MRC methods. In addition, we also seek to investigate whether additional incorporation of the categorical information about a context paragraph can benefit MRC or not, which is achieved based on performing context paragraph clustering on the training dataset. On the other hand, an ensemble learning approach is proposed to harness the synergistic power of the aforementioned two BERT-based models so as to further promote MRC performance.

contextualized language modeling study on contextualized نمذجة اللغة السياقية دراسة على السياق صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Does Structure Matter? Encoding Documents for Machine Reading Comprehension

هل الهيكل يهم؟وثائق الترميز لآلة القراءة الفهم

Ask ChatGPT about the research

Read More

suggested questions