ترغب بنشر مسار تعليمي؟ اضغط هنا

A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension

55   0   0.0 ( 0 )
 نشر من قبل Jie Cai
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Pre-trained models have brought significant improvements to many NLP tasks and have been extensively analyzed. But little is known about the effect of fine-tuning on specific tasks. Intuitively, people may agree that a pre-trained model already learns semantic representations of words (e.g. synonyms are closer to each other) and fine-tuning further improves its capabilities which require more complicated reasoning (e.g. coreference resolution, entity boundary detection, etc). However, how to verify these arguments analytically and quantitatively is a challenging task and there are few works focus on this topic. In this paper, inspired by the observation that most probing tasks involve identifying matched pairs of phrases (e.g. coreference requires matching an entity and a pronoun), we propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC) task. Specifically, we identify five phenomena in MRC. According to pairwise probing tasks, we compare the performance of each layers hidden representation of pre-trained and fine-tuned BERT. The proposed pairwise probe alleviates the problem of distraction from inaccurate model training and makes a robust and quantitative comparison. Our experimental analysis leads to highly confident conclusions: (1) Fine-tuning has little effect on the fundamental and low-level information and general semantic tasks. (2) For specific abilities required for downstream tasks, fine-tuned BERT is better than pre-trained BERT and such gaps are obvious after the fifth layer.



قيم البحث

اقرأ أيضاً

Achieving human-level performance on some of Machine Reading Comprehension (MRC) datasets is no longer challenging with the help of powerful Pre-trained Language Models (PLMs). However, the internal mechanism of these artifacts still remains unclear, placing an obstacle for further understanding these models. This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final performance, trying to analyze the potential explainability in PLM-based MRC models. We perform quantitative analyses on SQuAD (English) and CMRC 2018 (Chinese), two span-extraction MRC datasets, on top of BERT, ALBERT, and ELECTRA in various aspects. We discover that {em passage-to-question} and {em passage understanding} attentions are the most important ones, showing strong correlations to the final performance than other parts. Through visualizations and case studies, we also observe several general findings on the attention maps, which could be helpful to understand how these models solve the questions.
Over 97 million people speak Vietnamese as their native language in the world. However, there are few research studies on machine reading comprehension (MRC) for Vietnamese, the task of understanding a text and answering questions related to it. Due to the lack of benchmark datasets for Vietnamese, we present the Vietnamese Question Answering Dataset (UIT-ViQuAD), a new dataset for the low-resource language as Vietnamese to evaluate MRC models. This dataset comprises over 23,000 human-generated question-answer pairs based on 5,109 passages of 174 Vietnamese articles from Wikipedia. In particular, we propose a new process of dataset creation for Vietnamese MRC. Our in-depth analyses illustrate that our dataset requires abilities beyond simple reasoning like word matching and demands single-sentence and multiple-sentence inferences. Besides, we conduct experiments on state-of-the-art MRC methods for English and Chinese as the first experimental models on UIT-ViQuAD. We also estimate human performance on the dataset and compare it to the experimental results of powerful machine learning models. As a result, the substantial differences between human performance and the best model performance on the dataset indicate that improvements can be made on UIT-ViQuAD in future research. Our dataset is freely available on our website to encourage the research community to overcome challenges in Vietnamese MRC.
Recent studies on machine reading comprehension have focused on text-level understanding but have not yet reached the level of human understanding of the visual layout and content of real-world documents. In this study, we introduce a new visual mach ine reading comprehension dataset, named VisualMRC, wherein given a question and a document image, a machine reads and comprehends texts in the image to answer the question in natural language. Compared with existing visual question answering (VQA) datasets that contain texts in images, VisualMRC focuses more on developing natural language understanding and generation abilities. It contains 30,000+ pairs of a question and an abstractive answer for 10,000+ document images sourced from multiple domains of webpages. We also introduce a new model that extends existing sequence-to-sequence models, pre-trained with large-scale text corpora, to take into account the visual layout and content of documents. Experiments with VisualMRC show that this model outperformed the base sequence-to-sequence models and a state-of-the-art VQA model. However, its performance is still below that of humans on most automatic evaluation metrics. The dataset will facilitate research aimed at connecting vision and language understanding.
This paper surveys 54 English Machine Reading Comprehension datasets, with a view to providing a convenient resource for other researchers interested in this problem. We categorize the datasets according to their question and answer form and compare them across various dimensions including size, vocabulary, data source, method of creation, human performance level, and first question word. Our analysis reveals that Wikipedia is by far the most common data source and that there is a relative lack of why, when, and where questions across datasets.
102 - Hai Wang , Dian Yu , Kai Sun 2019
Remarkable success has been achieved in the last few years on some limited machine reading comprehension (MRC) tasks. However, it is still difficult to interpret the predictions of existing MRC models. In this paper, we focus on extracting evidence s entences that can explain or support the answers of multiple-choice MRC tasks, where the majority of answer options cannot be directly extracted from reference documents. Due to the lack of ground truth evidence sentence labels in most cases, we apply distant supervision to generate imperfect labels and then use them to train an evidence sentence extractor. To denoise the noisy labels, we apply a recently proposed deep probabilistic logic learning framework to incorporate both sentence-level and cross-sentence linguistic indicators for indirect supervision. We feed the extracted evidence sentences into existing MRC models and evaluate the end-to-end performance on three challenging multiple-choice MRC datasets: MultiRC, RACE, and DREAM, achieving comparable or better performance than the same models that take as input the full reference document. To the best of our knowledge, this is the first work extracting evidence sentences for multiple-choice MRC.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا