ﻻ يوجد ملخص باللغة العربية
The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years. We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic metrics that require no training, and (3) machine-learned metrics. For each category, we discuss the progress that has been made and the challenges still being faced, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models. We then present two examples for task-specific NLG evaluations for automatic text summarization and long text generation, and conclude the paper by proposing future research directions.
Automatic evaluation for open-ended natural language generation tasks remains a challenge. Existing metrics such as BLEU show a low correlation with human judgment. We propose a novel and powerful learning-based evaluation metric: Perception Score. T
The goal of text generation is to make machines express in human language. It is one of the most important yet challenging tasks in natural language processing (NLP). Since 2014, various neural encoder-decoder models pioneered by Seq2Seq have been pr
Recent advances in automatic evaluation metrics for text have shown that deep contextualized word representations, such as those generated by BERT encoders, are helpful for designing metrics that correlate well with human judgements. At the same time
In text generation evaluation, many practical issues, such as inconsistent experimental settings and metric implementations, are often ignored but lead to unfair evaluation and untenable conclusions. We present CoTK, an open-source toolkit aiming to
Text generation has become one of the most important yet challenging tasks in natural language processing (NLP). The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained langua