أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل John Bohannon

Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings

68 - Oleg Vasilyev , John Bohannon 2021

We propose a new reference-free summary quality evaluation measure, with emphasis on the faithfulness. The measure is designed to find and count all possible minute inconsistencies of the summary with respect to the source document. The proposed ESTI ME, Estimator of Summary-to-Text Inconsistency by Mismatched Embeddings, correlates with expert scores in summary-level SummEval dataset stronger than other common evaluation measures not only in Consistency but also in Fluency. We also introduce a method of generating subtle factual errors in human summaries. We show that ESTIME is more sensitive to subtle errors than other common evaluation measures.

الحساب واللغة

Play the Shannon Game With Language Models: A Human-Free Approach to Summary Evaluation

100 - Nicholas Egan , Oleg Vasilyev , John Bohannon 2021

The goal of a summary is to concisely state the most important information in a document. With this principle in mind, we introduce new reference-free summary evaluation metrics that use a pretrained language model to estimate the information shared between a document and its summary. These metrics are a modern take on the Shannon Game, a method for summary quality scoring proposed decades ago, where we replace human annotators with language models. We also view these metrics as an extension of BLANC, a recently proposed approach to summary quality measurement based on the performance of a language model with and without the help of a summary. Using GPT-2, we empirically verify that the introduced metrics correlate with human judgement based on coverage, overall quality, and five summary dimensions.

الحساب واللغة التعلم الآلي

Is human scoring the best criteria for summary evaluation?

88 - Oleg Vasilyev , John Bohannon 2020

Normally, summary quality measures are compared with quality scores produced by human annotators. A higher correlation with human scores is considered to be a fair indicator of a better measure. We discuss observations that cast doubt on this view. W e attempt to show a possibility of an alternative indicator. Given a family of measures, we explore a criterion of selecting the best measure not relying on correlations with human scores. Our observations for the BLANC family of measures suggest that the criterion is universal across very different styles of summaries.

الحساب واللغة

Primer AIs Systems for Acronym Identification and Disambiguation

57 - Nicholas Egan , John Bohannon 2020

The prevalence of ambiguous acronyms make scientific documents harder to understand for humans and machines alike, presenting a need for models that can automatically identify acronyms in text and disambiguate their meaning. We introduce new methods for acronym identification and disambiguation: our acronym identification model projects learned token embeddings onto tag predictions, and our acronym disambiguation model finds training examples with similar sentence embeddings as test examples. Both of our systems achieve significant performance gains over previously suggested methods, and perform competitively on the SDU@AAAI-21 shared task leaderboard. Our models were trained in part on new distantly-supervised datasets for these tasks which we call AuxAI and AuxAD. We also identified a duplication conflict issue in the SciAD dataset, and formed a deduplicated version of SciAD that we call SciAD-dedupe. We publicly released all three of these datasets, and hope that they help the community make further strides in scientific document understanding.

الحساب واللغة التعلم الآلي

Fill in the BLANC: Human-free quality estimation of document summaries

74 - Oleg Vasilyev , Vedant Dharnidharka , John Bohannon 2020

We present BLANC, a new approach to the automatic estimation of document summary quality. Our goal is to measure the functional performance of a summary with an objective, reproducible, and fully automated method. Our approach achieves this by measur ing the performance boost gained by a pre-trained language model with access to a document summary while carrying out its language understanding task on the documents text. We present evidence that BLANC scores have as good correlation with human evaluations as do the ROUGE family of summary quality measurements. And unlike ROUGE, the BLANC method does not require human-written reference summaries, allowing for fully human-free summary quality estimation.

الحساب واللغة

Headline Generation: Learning from Decomposable Document Titles

86 - Oleg Vasilyev , Tom Grek , John Bohannon 2019

We propose a novel method for generating titles for unstructured text documents. We reframe the problem as a sequential question-answering task. A deep neural network is trained on document-title pairs with decomposable titles, meaning that the vocab ulary of the title is a subset of the vocabulary of the document. To train the model we use a corpus of millions of publicly available document-title pairs: news articles and headlines. We present the results of a randomized double-blind trial in which subjects were unaware of which titles were human or machine-generated. When trained on approximately 1.5 million news articles, the model generates headlines that humans judge to be as good or better than the original human-written headlines in the majority of cases.

الحساب واللغة

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد