Constrained Abstractive Summarization: Preserving Factual Consistency with Constrained Generation


Abstract in English

Summaries generated by abstractive summarization are supposed to only contain statements entailed by the source documents. However, state-of-the-art abstractive methods are still prone to hallucinate content inconsistent with the source documents. In this paper, we propose constrained abstractive summarization (CAS), a general setup that preserves the factual consistency of abstractive summarization by specifying tokens as constraints that must be present in the summary. We explore the feasibility of using lexically constrained decoding, a technique applicable to any abstractive method with beam search decoding, to fulfill CAS and conduct experiments in two scenarios: (1) Standard summarization without human involvement, where keyphrase extraction is used to extract constraints from source documents; (2) Interactive summarization with human feedback, which is simulated by taking missing tokens in the reference summaries as constraints. Automatic and human evaluations on two benchmark datasets demonstrate that CAS improves the quality of abstractive summaries, especially on factual consistency. In particular, we observe up to 11.2 ROUGE-2 gains when several ground-truth tokens are used as constraints in the interactive summarization scenario.

Download