Do you want to publish a course? Click here

Centrality Meets Centroid: A Graph-based Approach for Unsupervised Document Summarization

162   0   0.0 ( 0 )
 Added by Haopeng Zhang
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

Unsupervised document summarization has re-acquired lots of attention in recent years thanks to its simplicity and data independence. In this paper, we propose a graph-based unsupervised approach for extractive document summarization. Instead of ranking sentences by salience and extracting sentences one by one, our approach works at a summary-level by utilizing graph centrality and centroid. We first extract summary candidates as subgraphs based on centrality from the sentence graph and then select from the summary candidates by matching to the centroid. We perform extensive experiments on two bench-marked summarization datasets, and the results demonstrate the effectiveness of our model compared to state-of-the-art baselines.



rate research

Read More

Lay summarization aims to generate lay summaries of scientific papers automatically. It is an essential task that can increase the relevance of science for all of society. In this paper, we build a lay summary generation system based on the BART model. We leverage sentence labels as extra supervision signals to improve the performance of lay summarization. In the CL-LaySumm 2020 shared task, our model achieves 46.00% Rouge1-F1 score.
100 - Qiwei Bi , Haoyuan Li , Kun Lu 2021
Previous abstractive methods apply sequence-to-sequence structures to generate summary without a module to assist the system to detect vital mentions and relationships within a document. To address this problem, we utilize semantic graph to boost the generation performance. Firstly, we extract important entities from each document and then establish a graph inspired by the idea of distant supervision citep{mintz-etal-2009-distant}. Then, we combine a Bi-LSTM with a graph encoder to obtain the representation of each graph node. A novel neural decoder is presented to leverage the information of such entity graphs. Automatic and human evaluations show the effectiveness of our technique.
The quadratic computational and memory complexities of large Transformers have limited their scalability for long document summarization. In this paper, we propose Hepos, a novel efficient encoder-decoder attention with head-wise positional strides to effectively pinpoint salient information from the source. We further conduct a systematic study of existing efficient self-attentions. Combined with Hepos, we are able to process ten times more tokens than existing models that use full attentions. For evaluation, we present a new dataset, GovReport, with significantly longer documents and summaries. Results show that our models produce significantly higher ROUGE scores than competitive comparisons, including new state-of-the-art results on PubMed. Human evaluation also shows that our models generate more informative summaries with fewer unfaithful errors.
We suggest a new idea of Editorial Network - a mixed extractive-abstractive summarization approach, which is applied as a post-processing step over a given sequence of extracted sentences. Our network tries to imitate the decision process of a human editor during summarization. Within such a process, each extracted sentence may be either kept untouched, rephrased or completely rejected. We further suggest an effective way for training the editor based on a novel soft-labeling approach. Using the CNN/DailyMail dataset we demonstrate the effectiveness of our approach compared to state-of-the-art extractive-only or abstractive-only baseline methods.
Canonical automatic summary evaluation metrics, such as ROUGE, suffer from two drawbacks. First, semantic similarity and linguistic quality are not captured well. Second, a reference summary, which is expensive or impossible to obtain in many cases, is needed. Existing efforts to address the two drawbacks are done separately and have limitations. To holistically address them, we introduce an end-to-end approach for summary quality assessment by leveraging sentence or document embedding and introducing two negative sampling approaches to create training data for this supervised approach. The proposed approach exhibits promising results on several summarization datasets of various domains including news, legislative bills, scientific papers, and patents. When rating machine-generated summaries in TAC2010, our approach outperforms ROUGE in terms of linguistic quality, and achieves a correlation coefficient of up to 0.5702 with human evaluations in terms of modified pyramid scores. We hope our approach can facilitate summarization research or applications when reference summaries are infeasible or costly to obtain, or when linguistic quality is a focus.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا