New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Comparative Study on Abstractive and Extractive Approaches in Summarization of European Legislation Documents

دراسة مقارنة حول النهج المبادرة والاستخراطية في تلخيص وثائق التشريعات الأوروبية

823 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

وصف تحليلي european legislation documents european legislation وثائق التشريعات الأوروبية التشريع الأوروبي صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Extracting the most important part of legislation documents has great business value because the texts are usually very long and hard to understand. The aim of this article is to evaluate different algorithms for text summarization on EU legislation documents. The content contains domain-specific words. We collected a text summarization dataset of EU legal documents consisting of 1563 documents, in which the mean length of summaries is 424 words. Experiments were conducted with different algorithms using the new dataset. A simple extractive algorithm was selected as a baseline. Advanced extractive algorithms, which use encoders show better results than baseline. The best result measured by ROUGE scores was achieved by a fine-tuned abstractive T5 model, which was adapted to work with long texts.

References used

https://aclanthology.org/

rate research

Abstractive Document Summarization with Word Embedding Reconstruction

316 - Association for Computation Linguistics 2021 مقالة

Neural sequence-to-sequence (Seq2Seq) models and BERT have achieved substantial improvements in abstractive document summarization (ADS) without and with pre-training, respectively. However, they sometimes repeatedly attend to unimportant source phra ses while mistakenly ignore important ones. We present reconstruction mechanisms on two levels to alleviate this issue. The sequence-level reconstructor reconstructs the whole document from the hidden layer of the target summary, while the word embedding-level one rebuilds the average of word embeddings of the source at the target side to guarantee that as much critical information is included in the summary as possible. Based on the assumption that inverse document frequency (IDF) measures how important a word is, we further leverage the IDF weights in our embedding-level reconstructor. The proposed frameworks lead to promising improvements for ROUGE metrics and human rating on both the CNN/Daily Mail and Newsroom summarization datasets.

abstractive document summarization document summarization word embedding reconstruction ملخص وثيقة الجماع تلخيص الوثائق كلمة تضمين إعادة الإعمار صناعة حمض الفوسفور المزيد..

The Effect of Pretraining on Extractive Summarization for Scientific Documents

547 - Association for Computation Linguistics 2021 مقالة

Large pretrained models have seen enormous success in extractive summarization tasks. In this work, we investigate the influence of pretraining on a BERT-based extractive summarization system for scientific documents. We derive significant performanc e improvements using an intermediate pretraining step that leverages existing summarization datasets and report state-of-the-art results on a recently released scientific summarization dataset, SciTLDR. We systematically analyze the intermediate pretraining step by varying the size and domain of the pretraining corpus, changing the length of the input sequence in the target task and varying target tasks. We also investigate how intermediate pretraining interacts with contextualized word embeddings trained on different domains.

bootstraping متعددة اللغات extractive summarization tasks مهام تلخيص الاستخراجية صناعة حمض الفوسفور

On Reducing Repetition in Abstractive Summarization

558 - Association for Computation Linguistics 2021 مقالة

Repetition in natural language generation reduces the informativeness of text and makes it less appealing. Various techniques have been proposed to alleviate it. In this work, we explore and propose techniques to reduce repetition in abstractive summ arization. First, we explore the application of unlikelihood training and embedding matrix regularizers from previous work on language modeling to abstractive summarization. Next, we extend the coverage and temporal attention mechanisms to the token level to reduce repetition. In our experiments on the CNN/Daily Mail dataset, we observe that these techniques reduce the amount of repetition and increase the informativeness of the summaries, which we confirm via human evaluation.

reducing repetition الحد من التكرار صناعة حمض الفوسفور

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

722 - Association for Computation Linguistics 2021 مقالة

Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settin gs. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research.

مهام اللغة المكثفة low-resource scenarios سيناريوهات الموارد المنخفضة صناعة حمض الفوسفور

Extractive Opinion Summarization in Quantized Transformer Spaces

469 - Association for Computation Linguistics 2021 مقالة

Abstract We present the Quantized Transformer (QT), an unsupervised system for extractive opinion summarization. QT is inspired by Vector- Quantized Variational Autoencoders, which we repurpose for popularity-driven summarization. It uses a clusterin g interpretation of the quantized space and a novel extraction algorithm to discover popular opinions among hundreds of reviews, a significant step towards opinion summarization of practical scope. In addition, QT enables controllable summarization without further training, by utilizing properties of the quantized space to extract aspect-specific summaries. We also make publicly available Space, a large-scale evaluation benchmark for opinion summarizers, comprising general and aspect-specific summaries for 50 hotels. Experiments demonstrate the promise of our approach, which is validated by human studies where judges showed clear preference for our method over competitive baselines.

quantized transformer quantized transformer spaces quantized variational autoencoders محول الكمي مساحات محول كمية السيارات الكمي الآليين صناعة حمض الفوسفور المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Comparative Study on Abstractive and Extractive Approaches in Summarization of European Legislation Documents

دراسة مقارنة حول النهج المبادرة والاستخراطية في تلخيص وثائق التشريعات الأوروبية

Ask ChatGPT about the research

Read More

suggested questions