ﻻ يوجد ملخص باللغة العربية
Span-extraction reading comprehension models have made tremendous advances enabled by the availability of large-scale, high-quality training datasets. Despite such rapid progress and widespread application, extractive reading comprehension datasets in languages other than English remain scarce, and creating such a sufficient amount of training data for each language is costly and even impossible. An alternative to creating large-scale high-quality monolingual span-extraction training datasets is to develop multilingual modeling approaches and systems which can transfer to the target language without requiring training data in that language. In this paper, in order to solve the scarce availability of extractive reading comprehension training data in the target language, we propose a multilingual extractive reading comprehension approach called XLRC by simultaneously modeling the existing extractive reading comprehension training data in a multilingual environment using self-adaptive attention and multilingual attention. Specifically, we firstly construct multilingual parallel corpora by translating the existing extractive reading comprehension datasets (i.e., CMRC 2018) from the target language (i.e., Chinese) into different language families (i.e., English). Secondly, to enhance the final target representation, we adopt self-adaptive attention (SAA) to combine self-attention and inter-attention to extract the semantic relations from each pair of the target and source languages. Furthermore, we propose multilingual attention (MLA) to learn the rich knowledge from various language families. Experimental results show that our model outperforms the state-of-the-art baseline (i.e., RoBERTa_Large) on the CMRC 2018 task, which demonstrate the effectiveness of our proposed multi-lingual modeling approach and show the potentials in multilingual NLP tasks.
Entity extraction is a key technology for obtaining information from massive texts in natural language processing. The further interaction between them does not meet the standards of human reading comprehension, thus limiting the understanding of the
Span extraction is an essential problem in machine reading comprehension. Most of the existing algorithms predict the start and end positions of an answer span in the given corresponding context by generating two probability vectors. In this paper, w
Remarkable success has been achieved in the last few years on some limited machine reading comprehension (MRC) tasks. However, it is still difficult to interpret the predictions of existing MRC models. In this paper, we focus on extracting evidence s
The development of natural language processing (NLP) in general and machine reading comprehension in particular has attracted the great attention of the research community. In recent years, there are a few datasets for machine reading comprehension t
We propose a simple method to generate multilingual question and answer pairs on a large scale through the use of a single generative model. These synthetic samples can be used to improve the zero-shot performance of multilingual QA models on target