Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance

99 0 0.0 ( 0 )

Download Cite

Added by Nan Ding

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Xi Chen - Nan Ding - Tomer Levinboim

Computation and Language Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Recent advances in automatic evaluation metrics for text have shown that deep contextualized word representations, such as those generated by BERT encoders, are helpful for designing metrics that correlate well with human judgements. At the same time, it has been argued that contextualized word representations exhibit sub-optimal statistical properties for encoding the true similarity between words or sentences. In this paper, we present two techniques for improving encoding representations for similarity metrics: a batch-mean centering strategy that improves statistical properties; and a computationally efficient tempered Word Mover Distance, for better fusion of the information in the contextualized word representations. We conduct numerical experiments that demonstrate the robustness of our techniques, reporting results over various BERT-backbone learned metrics and achieving state of the art correlation with human ratings on several benchmarks.

rate research

SDA: Improving Text Generation with Self Data Augmentation

105 - Ping Yu , Ruiyi Zhang , Yang Zhao 2021

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision. However, less work has been done in the context of text, partially due to its discrete nature and the complexity of natural languages. In this paper, we propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation. Unlike most existing sentence-level augmentation strategies, which are only applied to specific models, our method is more general and could be easily adapted to any MLE-based training procedure. In addition, our framework allows task-specific evaluation metrics to be designed to flexibly control the generated sentences, for example, in terms of controlling vocabulary usage and avoiding nontrivial repetitions. Extensive experimental results demonstrate the superiority of our method on two synthetic and several standard real datasets, significantly improving related baselines.

Computation and Language Machine Learning

Evaluation of Text Generation: A Survey

144 - Asli Celikyilmaz , Elizabeth Clark , Jianfeng Gao 2020

The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years. We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic metrics that require no training, and (3) machine-learned metrics. For each category, we discuss the progress that has been made and the challenges still being faced, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models. We then present two examples for task-specific NLG evaluations for automatic text summarization and long text generation, and conclude the paper by proposing future research directions.

Computation and Language Machine Learning

Perception Score, A Learned Metric for Open-ended Text Generation Evaluation

209 - Jing Gu , Qingyang Wu , Zhou Yu 2020

Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric: Perception Score. The method measures the overall quality of the generation and scores holistically instead of only focusing on one evaluation criteria, such as word overlapping. Moreover, it also shows the amount of uncertainty about its evaluation result. By connecting the uncertainty, Perception Score gives a more accurate evaluation for the generation system. Perception Score provides state-of-the-art results on two conditional generation tasks and two unconditional generation tasks.

Computation and Language Machine Learning

CoTK: An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation

183 - Fei Huang , Dazhen Wan , Zhihong Shao 2020

In text generation evaluation, many practical issues, such as inconsistent experimental settings and metric implementations, are often ignored but lead to unfair evaluation and untenable conclusions. We present CoTK, an open-source toolkit aiming to support fast development and fair evaluation of text generation. In model development, CoTK helps handle the cumbersome issues, such as data processing, metric implementation, and reproduction. It standardizes the development steps and reduces human errors which may lead to inconsistent experimental settings. In model evaluation, CoTK provides implementation for many commonly used metrics and benchmark models across different experimental settings. As a unique feature, CoTK can signify when and which metric cannot be fairly compared. We demonstrate that it is convenient to use CoTK for model development and evaluation, particularly across different experimental settings.

Computation and Language Machine Learning

MOVER: Mask, Over-generate and Rank for Hyperbole Generation

77 - Yunxiang Zhang , Xiaojun Wan 2021

Despite being a common figure of speech, hyperbole is under-researched with only a few studies addressing its identification task. In this paper, we introduce a new task of hyperbole generation to transfer a literal sentence into its hyperbolic paraphrase. To tackle the lack of available hyperbolic sentences, we construct HYPO-XL, the first large-scale hyperbole corpus containing 17,862 hyperbolic sentences in a non-trivial way. Based on our corpus, we propose an unsupervised method for hyperbole generation with no need for parallel literal-hyperbole pairs. During training, we fine-tune BART to infill masked hyperbolic spans of sentences from HYPO-XL. During inference, we mask part of an input literal sentence and over-generate multiple possible hyperbol

Computation and Language

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions