Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Document-Level Text Simplification: Dataset, Criteria and Baseline

تبسيط النص على مستوى المستند: مجموعة البيانات والمعايير والخط الأساسي

1397 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Text simplification is a valuable technique. However, current research is limited to sentence simplification. In this paper, we define and investigate a new task of document-level text simplification, which aims to simplify a document consisting of multiple sentences. Based on Wikipedia dumps, we first construct a large-scale dataset named D-Wikipedia and perform analysis and human evaluation on it to show that the dataset is reliable. Then, we propose a new automatic evaluation metric called D-SARI that is more suitable for the document-level simplification task. Finally, we select several representative models as baseline models for this task and perform automatic evaluation and human evaluation. We analyze the results and point out the shortcomings of the baseline models.

References used

https://aclanthology.org/

rate research

A New Dataset and Efficient Baselines for Document-level Text Simplification in German

783 - Association for Computation Linguistics 2021 مقالة

The task of document-level text simplification is very similar to summarization with the additional difficulty of reducing complexity. We introduce a newly collected data set of German texts, collected from the Swiss news magazine 20 Minuten (20 Minu tes') that consists of full articles paired with simplified summaries. Furthermore, we present experiments on automatic text simplification with the pretrained multilingual mBART and a modified version thereof that is more memory-friendly, using both our new data set and existing simplification corpora. Our modifications of mBART let us train at a lower memory cost without much loss in performance, in fact, the smaller mBART even improves over the standard model in a setting with multiple simplification levels.

dataset and efficient efficient baselines document-level text simplification DataSet وفعال خطوط أساس فعالة تبسيط نص المستند صناعة حمض الفوسفور المزيد..

ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts

651 - Association for Computation Linguistics 2021 مقالة

Reviewing contracts is a time-consuming procedure that incurs large expenses to companies and social inequality to those who cannot afford it. In this work, we propose document-level natural language inference (NLI) for contracts'', a novel, real-wor ld application of NLI that addresses such problems. In this task, a system is given a set of hypotheses (such as Some obligations of Agreement may survive termination.'') and a contract, and it is asked to classify whether each hypothesis is entailed by'', contradicting to'' or not mentioned by'' (neutral to) the contract as well as identifying evidence'' for the decision as spans in the contract. We annotated and release the largest corpus to date consisting of 607 annotated contracts. We then show that existing models fail badly on our task and introduce a strong baseline, which (a) models evidence identification as multi-label classification over spans instead of trying to predict start and end tokens, and (b) employs more sophisticated context segmentation for dealing with long documents. We also show that linguistic characteristics of contracts, such as negations by exceptions, are contributing to the difficulty of this task and that there is much room for improvement.

document-level natural language اللغة الطبيعية على مستوى المستند صناعة حمض الفوسفور

Paragraph-level Simplification of Medical Texts

776 - Association for Computation Linguistics 2021 مقالة

We consider the problem of learning to simplify medical texts. This is important because most reliable, up-to-date information in biomedicine is dense with jargon and thus practically inaccessible to the lay audience. Furthermore, manual simplificati on does not scale to the rapidly growing body of biomedical literature, motivating the need for automated approaches. Unfortunately, there are no large-scale resources available for this task. In this work we introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics. We then propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts. We show that this automated measure better differentiates between technical and lay summaries than existing heuristics. We introduce and evaluate baseline encoder-decoder Transformer models for simplification and propose a novel augmentation to these in which we explicitly penalize the decoder for producing jargon'' terms; we find that this yields improvements over baselines in terms of readability.

medical texts simplify medical texts paragraph-level simplification النصوص الطبية تبسيط النصوص الطبية تبسيط مستوى الفقرة صناعة حمض الفوسفور المزيد..

Entity and Evidence Guided Document-Level Relation Extraction

744 - Association for Computation Linguistics 2021 مقالة

Document-level relation extraction is a challenging task, requiring reasoning over multiple sentences to predict a set of relations in a document. In this paper, we propose a novel framework E2GRE (Entity and Evidence Guided Relation Extraction) that jointly extracts relations and the underlying evidence sentences by using large pretrained language model (LM) as input encoder. First, we propose to guide the pretrained LM's attention mechanism to focus on relevant context by using attention probabilities as additional features for evidence prediction. Furthermore, instead of feeding the whole document into pretrained LMs to obtain entity representation, we concatenate document text with head entities to help LMs concentrate on parts of the document that are more related to the head entity. Our E2GRE jointly learns relation extraction and evidence prediction effectively, showing large gains on both these tasks, which we find are highly correlated.

document-level relation extraction guided relation extraction استخراج العلاقة على مستوى المستند استخراج العلاقة الموجهة استخراج العلاقة صناعة حمض الفوسفور

Improving Human Text Simplification with Sentence Fusion

729 - Association for Computation Linguistics 2021 مقالة

The quality of fully automated text simplification systems is not good enough for use in real-world settings; instead, human simplifications are used. In this paper, we examine how to improve the cost and quality of human simplifications by leveragin g crowdsourcing. We introduce a graph-based sentence fusion approach to augment human simplifications and a reranking approach to both select high quality simplifications and to allow for targeting simplifications with varying levels of simplicity. Using the Newsela dataset (Xu et al., 2015) we show consistent improvements over experts at varying simplification levels and find that the additional sentence fusion simplifications allow for simpler output than the human simplifications alone.

improving human text human text simplification human simplifications تحسين النص البشري تبسيط النص البشري التبسيط البشري صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Document-Level Text Simplification: Dataset, Criteria and Baseline

تبسيط النص على مستوى المستند: مجموعة البيانات والمعايير والخط الأساسي

Ask ChatGPT about the research

Read More

suggested questions