Lay summarization aims to generate lay summaries of scientific papers automatically. It is an essential task that can increase the relevance of science for all of society. In this paper, we build a lay summary generation system based on the BART model. We leverage sentence labels as extra supervision signals to improve the performance of lay summarization. In the CL-LaySumm 2020 shared task, our model achieves 46.00% Rouge1-F1 score.
Unsupervised document summarization has re-acquired lots of attention in recent years thanks to its simplicity and data independence. In this paper, we propose a graph-based unsupervised approach for extractive document summarization. Instead of ranking sentences by salience and extracting sentences one by one, our approach works at a summary-level by utilizing graph centrality and centroid. We first extract summary candidates as subgraphs based on centrality from the sentence graph and then select from the summary candidates by matching to the centroid. We perform extensive experiments on two bench-marked summarization datasets, and the results demonstrate the effectiveness of our model compared to state-of-the-art baselines.
The quadratic computational and memory complexities of large Transformers have limited their scalability for long document summarization. In this paper, we propose Hepos, a novel efficient encoder-decoder attention with head-wise positional strides to effectively pinpoint salient information from the source. We further conduct a systematic study of existing efficient self-attentions. Combined with Hepos, we are able to process ten times more tokens than existing models that use full attentions. For evaluation, we present a new dataset, GovReport, with significantly longer documents and summaries. Results show that our models produce significantly higher ROUGE scores than competitive comparisons, including new state-of-the-art results on PubMed. Human evaluation also shows that our models generate more informative summaries with fewer unfaithful errors.
We present a novel system providing summaries for Computer Science publications. Through a qualitative user study, we identified the most valuable scenarios for discovery, exploration and understanding of scientific documents. Based on these findings, we built a system that retrieves and summarizes scientific documents for a given information need, either in form of a free-text query or by choosing categorized values such as scientific tasks, datasets and more. Our system ingested 270,000 papers, and its summarization module aims to generate concise yet detailed summaries. We validated our approach with human experts.
We suggest a new idea of Editorial Network - a mixed extractive-abstractive summarization approach, which is applied as a post-processing step over a given sequence of extracted sentences. Our network tries to imitate the decision process of a human editor during summarization. Within such a process, each extracted sentence may be either kept untouched, rephrased or completely rejected. We further suggest an effective way for training the editor based on a novel soft-labeling approach. Using the CNN/DailyMail dataset we demonstrate the effectiveness of our approach compared to state-of-the-art extractive-only or abstractive-only baseline methods.
Canonical automatic summary evaluation metrics, such as ROUGE, suffer from two drawbacks. First, semantic similarity and linguistic quality are not captured well. Second, a reference summary, which is expensive or impossible to obtain in many cases, is needed. Existing efforts to address the two drawbacks are done separately and have limitations. To holistically address them, we introduce an end-to-end approach for summary quality assessment by leveraging sentence or document embedding and introducing two negative sampling approaches to create training data for this supervised approach. The proposed approach exhibits promising results on several summarization datasets of various domains including news, legislative bills, scientific papers, and patents. When rating machine-generated summaries in TAC2010, our approach outperforms ROUGE in terms of linguistic quality, and achieves a correlation coefficient of up to 0.5702 with human evaluations in terms of modified pyramid scores. We hope our approach can facilitate summarization research or applications when reference summaries are infeasible or costly to obtain, or when linguistic quality is a focus.