Do We Need Neural Models to Explain Human Judgments of Acceptability?

113 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Matthew A. Kelly

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Wang Jing

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Native speakers can judge whether a sentence is an acceptable instance of their language. Acceptability provides a means of evaluating whether computational language models are processing language in a human-like manner. We test the ability of computational language models, simple language features, and word embeddings to predict native English speakers judgments of acceptability on English-language essays written by non-native speakers. We find that much of the sentence acceptability variance can be captured by a combination of features including misspellings, word order, and word similarity (Pearsons r = 0.494). While predictive neural models fit acceptability judgments well (r = 0.527), we find that a 4-gram model with statistical smoothing is just as good (r = 0.528). Thanks to incorporating a count of misspellings, our 4-gram model surpasses both the previous unsupervised state-of-the art (Lau et al., 2015; r = 0.472), and the average non-expert native speaker (r = 0.46). Our results demonstrate that acceptability is well captured by n-gram statistics and simple language features.

قيم البحث

اقرأ أيضاً

Do We Need Online NLU Tools?

79 - Petr Lorenc , Petr Marek , Jan Pichl 2020

The intent recognition is an essential algorithm of any conversational AI application. It is responsible for the classification of an input message into meaningful classes. In many bot development platforms, we can configure the NLU pipeline. Several intent recognition services are currently available as an API, or we choose from many open-source alternatives. However, there is no comparison of intent recognition services and open-source algorithms. Many factors make the selection of the right approach to the intent recognition challenging in practice. In this paper, we suggest criteria to choose the best intent recognition algorithm for an application. We present a dataset for evaluation. Finally, we compare selected public NLU services with selected open-source algorithms for intent recognition.

الحساب واللغة

Do we need soft cosmology?

221 - Emmanuel N. Saridakis 2021

We examine the possibility of soft cosmology, namely small deviations from the usual framework due to the effective appearance of soft-matter properties in the Universe sectors. One effect of such a case would be the dark energy to exhibit a differen t equation-of-state parameter at large scales (which determine the universe expansion) and at intermediate scales (which determine the sub-horizon clustering and the large scale structure formation). Concerning soft dark matter, we show that it can effectively arise due to the dark-energy clustering, even if dark energy is not soft. We propose a novel parametrization introducing the softness parameters of the dark sectors. As we see, although the background evolution remains unaffected, due to the extreme sensitivity and significant effects on the global properties even a slightly non-trivial softness parameter can improve the clustering behavior and alleviate e.g. the $fsigma_8$ tension. Lastly, an extension of the cosmological perturbation theory and a detailed statistical mechanical analysis, in order to incorporate complexity and estimate the scale-dependent behavior from first principles, is necessary and would provide a robust argumentation in favour of soft cosmology.

علم الكونيات والفيزياء الفلكية Nongalactic النسبية العامة وهدية الكونيات الكم الفيزياء عالية الطاقة - النظرية

History for Visual Dialog: Do we really need it?

232 - Shubham Agarwal , Trung Bui , Joon-Young Lee 2020

Visual Dialog involves understanding the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response. In this paper, we show that co -attention models which explicitly encode dialog history outperform models that dont, achieving state-of-the-art performance (72 % NDCG on val set). However, we also expose shortcomings of the crowd-sourcing dataset collection procedure by showing that history is indeed only required for a small amount of the data and that the current evaluation metric encourages generic replies. To that end, we propose a challenging subset (VisDialConv) of the VisDial val set and provide a benchmark of 63% NDCG.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي الحساب واللغة

Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?

96 - Zae Myung Kim , Laurent Besacier , Vassilina Nikoulina 2021

Recent studies on the analysis of the multilingual representations focus on identifying whether there is an emergence of language-independent representations, or whether a multilingual model partitions its weights among different languages. While mos t of such work has been conducted in a black-box manner, this paper aims to analyze individual components of a multilingual neural translation (NMT) model. In particular, we look at the encoder self-attention and encoder-decoder attention heads (in a many-to-one NMT model) that are more specific to the translation of a certain language pair than others by (1) employing metrics that quantify some aspects of the attention weights such as variance or confidence, and (2) systematically ranking the importance of attention heads with respect to translation quality. Experimental results show that surprisingly, the set of most important attention heads are very similar across the language pairs and that it is possible to remove nearly one-third of the less important heads without hurting the translation quality greatly.

الحساب واللغة الذكاء الاصطناعي التعلم الآلي

Deep Learning Based Cardiac MRI Segmentation: Do We Need Experts?

96 - Youssef Skandarani , Pierre-Marc Jodoin , Alain Lalande 2021

Deep learning methods are the de-facto solutions to a multitude of medical image analysis tasks. Cardiac MRI segmentation is one such application which, like many others, requires a large number of annotated data so a trained network can generalize w ell. Unfortunately, the process of having a large number of manually curated images by medical experts is both slow and utterly expensive. In this paper, we set out to explore whether expert knowledge is a strict requirement for the creation of annotated datasets that machine learning can successfully train on. To do so, we gauged the performance of three segmentation models, namely U-Net, Attention U-Net, and ENet, trained with different loss functions on expert and non-expert groundtruth for cardiac cine-MRI segmentation. Evaluation was done with classic segmentation metrics (Dice index and Hausdorff distance) as well as clinical measurements, such as the ventricular ejection fractions and the myocardial mass. Results reveal that generalization performances of a segmentation neural network trained on non-expert groundtruth data is, to all practical purposes, as good as on expert groundtruth data, in particular when the non-expert gets a decent level of training, highlighting an opportunity for the efficient and cheap creation of annotations for cardiac datasets.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي