ﻻ يوجد ملخص باللغة العربية
We study estimating inherent human disagreement (annotation label distribution) in natural language inference task. Post-hoc smoothing of the predicted label distribution to match the expected label entropy is very effective. Such simple manipulation can reduce KL divergence by almost half, yet will not improve majority label prediction accuracy or learn label distributions. To this end, we introduce a small amount of examples with multiple references into training. We depart from the standard practice of collecting a single reference per each training example, and find that collecting multiple references can achieve better accuracy under the fixed annotation budget. Lastly, we provide rich analyses comparing these two methods for improving label distribution estimation.
We present a principled approach to incorporating labels in VAEs that captures the rich characteristic information associated with those labels. While prior work has typically conflated these by learning latent variables that directly correspond to l
Despite the recent success of deep neural networks in natural language processing, the extent to which they can demonstrate human-like generalization capacities for natural language understanding remains unclear. We explore this issue in the domain o
To build robust question answering systems, we need the ability to verify whether answers to questions are truly correct, not just good enough in the context of imperfect QA datasets. We explore the use of natural language inference (NLI) as a way to
To ensure readability, text is often written and presented with due formatting. These text formatting devices help the writer to effectively convey the narrative. At the same time, these help the readers pick up the structure of the discourse and com
A semantic equivalence assessment is defined as a task that assesses semantic equivalence in a sentence pair by binary judgment (i.e., paraphrase identification) or grading (i.e., semantic textual similarity measurement). It constitutes a set of task