What is on your mind? Automated Scoring of Mindreading in Childhood and Early Adolescence

412 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Venelin Kovatchev

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Venelin Kovatchev - Phillip Smith - Mark Lee

الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper we present the first work on the automated scoring of mindreading ability in middle childhood and early adolescence. We create MIND-CA, a new corpus of 11,311 question-answer pairs in English from 1,066 children aged 7 to 14. We perform machine learning experiments and carry out extensive quantitative and qualitative evaluation. We obtain promising results, demonstrating the applicability of state-of-the-art NLP solutions to a new domain and task.

قيم البحث

124 - Venelin Kovatchev , Phillip Smith , Mark Lee 2021

In this paper we implement and compare 7 different data augmentation strategies for the task of automatic scoring of childrens ability to understand others thoughts, feelings, and desires (or mindreading). We recruit in-domain experts to re-annotat e augmented samples and determine to what extent each strategy preserves the original rating. We also carry out multiple experiments to measure how much each augmentation strategy improves the performance of automatic scoring systems. To determine the capabilities of automatic systems to generalize to unseen data, we create UK-MIND-20 - a new corpus of childrens performance on tests of mindreading, consisting of 10,320 question-answer pairs. We obtain a new state-of-the-art performance on the MIND-CA corpus, improving macro-F1-score by 6 points. Results indicate that both the number of training examples and the quality of the augmentation strategies affect the performance of the systems. The task-specific augmentations generally outperform task-agnostic augmentations. Automatic augmentations based on vectors (GloVe, FastText) perform the worst. We find that systems trained on MIND-CA generalize well to UK-MIND-20. We demonstrate that data augmentation strategies also improve the performance on unseen data.

الحساب واللغة التعلم الآلي

Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

120 - Siddharth Karamcheti , Ranjay Krishna , Li Fei-Fei 2021

Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition. However, we un cover a striking contrast to this promise: across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning approaches fail to outperform random selection. To understand this discrepancy, we profile 8 active learning methods on a per-example basis, and identify the problem as collective outliers -- groups of examples that active learning methods prefer to acquire but models fail to learn (e.g., questions that ask about text in images or require external knowledge). Through systematic ablation experiments and qualitative visualizations, we verify that collective outliers are a general phenomenon responsible for degrading pool-based active learning. Notably, we show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases. We conclude with a discussion and prescriptive recommendations for mitigating the effects of these outliers in future work.

الحساب واللغة الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

86 - Xuanli He , Lingjuan Lyu , Qiongkai Xu 2021

Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by the pre-trained language models, such as BERT. This allows corporations to easily build powerful APIs by encapsulating fine-tune d BERT models for downstream tasks. However, when a fine-tuned BERT model is deployed as a service, it may suffer from different attacks launched by malicious users. In this work, we first present how an adversary can steal a BERT-based API service (the victim/target model) on multiple benchmark datasets with limited prior knowledge and queries. We further show that the extracted model can lead to highly transferable adversarial attacks against the victim model. Our studies indicate that the potential vulnerabilities of BERT-based API services still hold, even when there is an architectural mismatch between the victim model and the attack model. Finally, we investigate two defence strategies to protect the victim model and find that unless the performance of the victim model is sacrificed, both model ex-traction and adversarial transferability can effectively compromise the target models

الحساب واللغة

Introducing Graph Cumulants: What is the Variance of Your Social Network?

58 - Lee M. Gunderson , Gecia Bravo-Hermsdorff 2020

In an increasingly interconnected world, understanding and summarizing the structure of these networks becomes increasingly relevant. However, this task is nontrivial; proposed summary statistics are as diverse as the networks they describe, and a st andardized hierarchy has not yet been established. In contrast, vector-valued random variables admit such a description in terms of their cumulants (e.g., mean, (co)variance, skew, kurtosis). Here, we introduce the natural analogue of cumulants for networks, building a hierarchical description based on correlations between an increasing number of connections, seamlessly incorporating additional information, such as directed edges, node attributes, and edge weights. These graph cumulants provide a principled and unifying framework for quantifying the propensity of a network to display any substructure of interest (such as cliques to measure clustering). Moreover, they give rise to a natural hierarchical family of maximum entropy models for networks (i.e., ERGMs) that do not suffer from the degeneracy problem, a common practical pitfall of other ERGMs.

نظرية الإحصاء الرياضيات المتقطعة الشبكات الاجتماعية والمعلومات

Is human scoring the best criteria for summary evaluation?

88 - Oleg Vasilyev , John Bohannon 2020

Normally, summary quality measures are compared with quality scores produced by human annotators. A higher correlation with human scores is considered to be a fair indicator of a better measure. We discuss observations that cast doubt on this view. W e attempt to show a possibility of an alternative indicator. Given a family of measures, we explore a criterion of selecting the best measure not relying on correlations with human scores. Our observations for the BLANC family of measures suggest that the criterion is universal across very different styles of summaries.

الحساب واللغة