ﻻ يوجد ملخص باللغة العربية
In open-domain dialogues, predictive uncertainties are mainly evaluated in a domain shift setting to cope with out-of-distribution inputs. However, in real-world conversations, there could be more extensive distributional shifted inputs than the out-of-distribution. To evaluate this, we first propose two methods, Unknown Word (UW) and Insufficient Context (IC), enabling gradual distributional shifts by corruption on the dialogue dataset. We then investigate the effect of distributional shifts on accuracy and calibration. Our experiments show that the performance of existing uncertainty estimation methods consistently degrades with intensifying the shift. The results suggest that the proposed methods could be useful for evaluating the calibration of dialogue systems under distributional shifts.
Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive {em uncertainty}. Quantifying uncertaint
Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers. Automatic metrics such as BLEU correlate weakly with human annotations, resulting in a significant bias across different models and datasets. Some
Despite improvements in performances on different natural language generation tasks, deep neural models are prone to hallucinating facts that are incorrect or nonexistent. Different hypotheses are proposed and examined separately for different tasks,
Dialogue state trackers have made significant progress on benchmark datasets, but their generalization capability to novel and realistic scenarios beyond the held-out conversations is less understood. We propose controllable counterfactuals (CoCo) to
Knowledge-grounded dialogue agents are systems designed to conduct a conversation based on externally provided background information, such as a Wikipedia page. Such dialogue agents, especially those based on neural network language models, often pro