تبسيط النص هو تقنية قيمة.ومع ذلك، يقتصر البحث الحالي على تبسيط الجملة.في هذه الورقة، نحدد والتحقيق في مهمة جديدة من تبسيط نص المستندات على مستوى المستند، والتي تهدف إلى تبسيط وثيقة تتكون من جمل متعددة.بناء على مقالب ويكيبيديا، نقوم أولا ببناء مجموعة بيانات واسعة النطاق تسمى D-Wikipedia وأداء التحليل والتقييم البشري عليه لإظهار أن مجموعة البيانات موثوقة.بعد ذلك، نقترح مقياس تقييم تلقائي جديد يسمى D-SARI هو أكثر ملاءمة لمهمة تبسيط مستوى المستند.أخيرا، نقوم باختيار العديد من النماذج التمثيلية كطرازات أساسية لهذه المهمة وأداء التقييم التلقائي والتقييم البشري.نحن نحلل النتائج وأشرح أوجه القصور في النماذج الأساسية.
Text simplification is a valuable technique. However, current research is limited to sentence simplification. In this paper, we define and investigate a new task of document-level text simplification, which aims to simplify a document consisting of multiple sentences. Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia and perform analysis and human evaluation on it to show that the dataset is reliable. Then, we propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task. Finally, we select several representative models as baseline models for this task and perform automatic evaluation and human evaluation. We analyze the results and point out the shortcomings of the baseline models.
References used
https://aclanthology.org/
The task of document-level text simplification is very similar to summarization with the additional difficulty of reducing complexity. We introduce a newly collected data set of German texts, collected from the Swiss news magazine 20 Minuten (20 Minu
Reviewing contracts is a time-consuming procedure that incurs large expenses to companies and social inequality to those who cannot afford it. In this work, we propose document-level natural language inference (NLI) for contracts'', a novel, real-wor
We consider the problem of learning to simplify medical texts. This is important because most reliable, up-to-date information in biomedicine is dense with jargon and thus practically inaccessible to the lay audience. Furthermore, manual simplificati
Document-level relation extraction is a challenging task, requiring reasoning over multiple sentences to predict a set of relations in a document. In this paper, we propose a novel framework E2GRE (Entity and Evidence Guided Relation Extraction) that
The quality of fully automated text simplification systems is not good enough for use in real-world settings; instead, human simplifications are used. In this paper, we examine how to improve the cost and quality of human simplifications by leveragin