New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Understanding Model Robustness to User-generated Noisy Texts

فهم نموذج متواضع للنصوص الناتجة عن المستخدم

505 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

user-generated noisy texts noisy texts user-generated noisy النصوص الناتجة عن المستخدم نصوص صاخبة صاخبة التي تم إنشاؤها صناعة حمض الفوسفور

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Sensitivity of deep-neural models to input noise is known to be a challenging problem. In NLP, model performance often deteriorates with naturally occurring noise, such as spelling errors. To mitigate this issue, models may leverage artificially noised data. However, the amount and type of generated noise has so far been determined arbitrarily. We therefore propose to model the errors statistically from grammatical-error-correction corpora. We present a thorough evaluation of several state-of-the-art NLP systems' robustness in multiple languages, with tasks including morpho-syntactic analysis, named entity recognition, neural machine translation, a subset of the GLUE benchmark and reading comprehension. We also compare two approaches to address the performance drop: a) training the NLP models with noised data generated by our framework; and b) reducing the input noise with external system for natural language correction. The code is released at https://github.com/ufal/kazitext.

References used

https://aclanthology.org/

rate research

The Korean Morphologically Tight-Fitting Tokenizer for Noisy User-Generated Texts

420 - Association for Computation Linguistics 2021 مقالة

User-generated texts include various types of stylistic properties, or noises. Such texts are not properly processed by existing morpheme analyzers or language models based on formal texts such as encyclopedias or news articles. In this paper, we pro pose a simple morphologically tight-fitting tokenizer (K-MT) that can better process proper nouns, coinages, and internet slang among other types of noise in Korean user-generated texts. We tested our tokenizer by performing classification tasks on Korean user-generated movie reviews and hate speech datasets, and the Korean Named Entity Recognition dataset. Through our tests, we found that K-MT is better fit to process internet slangs, proper nouns, and coinages, compared to a morpheme analyzer and a character-level WordPiece tokenizer.

noisy user-generated texts noisy user-generated morphologically tight-fitting tokenizer النصوص التي أنشأها المستخدم صاخبة صاخبة المستخدم مظلمة ضيقة مورفولوجية صناعة حمض الفوسفور المزيد..

Evaluating Deception Detection Model Robustness To Linguistic Variation

262 - Association for Computation Linguistics 2021 مقالة

With the increasing use of machine-learning driven algorithmic judgements, it is critical to develop models that are robust to evolving or manipulated inputs. We propose an extensive analysis of model robustness against linguistic variation in the se tting of deceptive news detection, an important task in the context of misinformation spread online. We consider two prediction tasks and compare three state-of-the-art embeddings to highlight consistent trends in model performance, high confidence misclassifications, and high impact failures. By measuring the effectiveness of adversarial defense strategies and evaluating model susceptibility to adversarial attacks using character- and word-perturbed text, we find that character or mixed ensemble models are the most effective defenses and that character perturbation-based attack tactics are more successful.

deception detection model evaluating deception detection deception detection نموذج الكشف عن الخداع تقييم كشف الخداع كشف الخداع صناعة حمض الفوسفور المزيد..

Understanding and Interpreting the Impact of User Context in Hate Speech Detection

435 - Association for Computation Linguistics 2021 مقالة

As hate speech spreads on social media and online communities, research continues to work on its automatic detection. Recently, recognition performance has been increasing thanks to advances in deep learning and the integration of user features. This work investigates the effects that such features can have on a detection model. Unlike previous research, we show that simple performance comparison does not expose the full impact of including contextual- and user information. By leveraging explainability techniques, we show (1) that user features play a role in the model's decision and (2) how they affect the feature space learned by the model. Besides revealing that---and also illustrating why---user features are the reason for performance gains, we show how such techniques can be combined to better understand the model and to detect unintended bias.

لغة عنصرية understanding and interpreting فهم وتفسير صناعة حمض الفوسفور

Understanding and predicting user dissatisfaction in a neural generative chatbot

295 - Association for Computation Linguistics 2021 مقالة

Neural generative dialogue agents have shown an increasing ability to hold short chitchat conversations, when evaluated by crowdworkers in controlled settings. However, their performance in real-life deployment -- talking to intrinsically-motivated u sers in noisy environments -- is less well-explored. In this paper, we perform a detailed case study of a neural generative model deployed as part of Chirpy Cardinal, an Alexa Prize socialbot. We find that unclear user utterances are a major source of generative errors such as ignoring, hallucination, unclearness and repetition. However, even in unambiguous contexts the model frequently makes reasoning errors. Though users express dissatisfaction in correlation with these errors, certain dissatisfaction types (such as offensiveness and privacy objections) depend on additional factors -- such as the user's personal attitudes, and prior unaddressed dissatisfaction in the conversation. Finally, we show that dissatisfied user utterances can be used as a semi-supervised learning signal to improve the dialogue system. We train a model to predict next-turn dissatisfaction, and show through human evaluation that as a ranking function, it selects higher-quality neural-generated utterances.

neural generative chatbot neural generative understanding and predicting Chatbot الولادة العصبي الولادة العصبية فهم والتنبؤ صناعة حمض الفوسفور المزيد..

Google-trickers, Yaminjeongeum, and Leetspeak: An Empirical Taxonomy for Intentionally Noisy User-Generated Text

504 - Association for Computation Linguistics 2021 مقالة

WARNING: This article contains contents that may offend the readers. Strategies that insert intentional noise into text when posting it are commonly observed in the online space, and sometimes they aim to let only certain community users understand t he genuine semantics. In this paper, we explore the purpose of such actions by categorizing them into tricks, memes, fillers, and codes, and organize the linguistic strategies that are used for each purpose. Through this, we identify that such strategies can be conducted by authors for multiple purposes, regarding the presence of stakeholders such as Peers' and Others'. We finally analyze how these strategies appear differently in each circumstance, along with the unified taxonomy accompanying examples.

معالجة اللغات الطبيعية تصنيف النصوص intentionally noisy user-generated noisy user-generated text intentionally noisy النص الفوضوي النص الضّاج المزيد..

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Understanding Model Robustness to User-generated Noisy Texts

فهم نموذج متواضع للنصوص الناتجة عن المستخدم

Ask ChatGPT about the research

Read More

suggested questions