Fill in the BLANC: Human-free quality estimation of document summaries

75 0 0.0 ( 0 )

Download Cite

Added by Oleg Vasilyev

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Oleg Vasilyev - Vedant Dharnidharka - John Bohannon

Computation and Language

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present BLANC, a new approach to the automatic estimation of document summary quality. Our goal is to measure the functional performance of a summary with an objective, reproducible, and fully automated method. Our approach achieves this by measuring the performance boost gained by a pre-trained language model with access to a document summary while carrying out its language understanding task on the documents text. We present evidence that BLANC scores have as good correlation with human evaluations as do the ROUGE family of summary quality measurements. And unlike ROUGE, the BLANC method does not require human-written reference summaries, allowing for fully human-free summary quality estimation.

rate research

Sensitivity of BLANC to human-scored qualities of text summaries

71 - Oleg Vasilyev , Vedant Dharnidharka , Nicholas Egan 2020

We explore the sensitivity of a document summary quality estimator, BLANC, to human assessment of qualities for the same summaries. In our human evaluations, we distinguish five summary qualities, defined by how fluent, understandable, informative, compact, and factually correct the summary is. We make the case for optimal BLANC parameters, at which the BLANC sensitivity to almost all of summary qualities is about as good as the sensitivity of a human annotator.

Computation and Language

Quality Estimation without Human-labeled Data

484 - Yi-Lin Tuan , Ahmed El-Kishky , Adithya Renduchintala 2021

Quality estimation aims to measure the quality of translated content without access to a reference translation. This is crucial for machine translation systems in real-world scenarios where high-quality translation is needed. While many approaches exist for quality estimation, they are based on supervised machine learning requiring costly human labelled data. As an alternative, we propose a technique that does not rely on examples from human-annotators and instead uses synthetic training data. We train off-the-shelf architectures for supervised quality estimation on our synthetic data and show that the resulting models achieve comparable performance to models trained on human-annotated data, both for sentence and word-level prediction.

Computation and Language

Towards Human-Free Automatic Quality Evaluation of German Summarization

84 - Neslihan Iskender , Oleg Vasilyev , Tim Polzehl 2021

Evaluating large summarization corpora using humans has proven to be expensive from both the organizational and the financial perspective. Therefore, many automatic evaluation metrics have been developed to measure the summarization quality in a fast and reproducible way. However, most of the metrics still rely on humans and need gold standard summaries generated by linguistic experts. Since BLANC does not require golden summaries and supposedly can use any underlying language model, we consider its application to the evaluation of summarization in German. This work demonstrates how to adjust the BLANC metric to a language other than English. We compare BLANC scores with the crowd and expert ratings, as well as with commonly used automatic metrics on a German summarization data set. Our results show that BLANC in German is especially good in evaluating informativeness.

Computation and Language

A Study of Human Summaries of Scientific Articles

121 - Odellia Boni , Guy Feigenblat , Doron Cohen 2020

Researchers and students face an explosion of newly published papers which may be relevant to their work. This led to a trend of sharing human summaries of scientific papers. We analyze the summaries shared in one of these platforms Shortscience.org. The goal is to characterize human summaries of scientific papers, and use some of the insights obtained to improve and adapt existing automatic summarization systems to the domain of scientific papers.

Computation and Language

Generating (Factual?) Narrative Summaries of RCTs: Experiments with Neural Multi-Document Summarization

55 - Byron C. Wallace , Sayantan Saha , Frank Soboczenski 2020

We consider the problem of automatically generating a narrative biomedical evidence summary from multiple trial reports. We evaluate modern neural models for abstractive summarization of relevant article abstracts from systematic reviews previously conducted by members of the Cochrane collaboration, using the authors conclusions section of the review abstract as our target. We enlist medical professionals to evaluate generated summaries, and we find that modern summarization systems yield consistently fluent and relevant synopses, but that they are not always factual. We propose new approaches that capitalize on domain-specific models to inform summarization, e.g., by explicitly demarcating snippets of inputs that convey key findings, and emphasizing the reports of large and high-quality trials. We find that these strategies modestly improve the factual accuracy of generated summaries. Finally, we propose a new method for automatically evaluating the factuality of generated narrative evidence syntheses using models that infer the directionality of reported findings.

Computation and Language