Subscribe to the gold package and get unlimited access to Shamra Academy

tWT--WT: A Dataset to Assert the Role of Target Entities for Detecting Stance of Tweets

TWT - WT: مجموعة بيانات لتأكيد دور الكيانات المستهدفة للكشف عن موقف التغريدات

713 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The stance detection task aims at detecting the stance of a tweet or a text for a target. These targets can be named entities or free-form sentences (claims). Though the task involves reasoning of the tweet with respect to a target, we find that it is possible to achieve high accuracy on several publicly available Twitter stance detection datasets without looking at the target sentence. Specifically, a simple tweet classification model achieved human-level performance on the WT--WT dataset and more than two-third accuracy on various other datasets. We investigate the existence of biases in such datasets to find the potential spurious correlations of sentiment-stance relations and lexical choice associated with the stance category. Furthermore, we propose a new large dataset free of such biases and demonstrate its aptness on the existing stance detection systems. Our empirical findings show much scope for research on the stance detection task and proposes several considerations for creating future stance detection datasets.

References used

https://aclanthology.org/

rate research

RED: A Novel Dataset for Romanian Emotion Detection from Tweets

1296 - Association for Computation Linguistics 2021 مقالة

In Romanian language there are some resources for automatic text comprehension, but for Emotion Detection, not lexicon-based, there are none. To cover this gap, we extracted data from Twitter and created the first dataset containing tweets annotated with five types of emotions: joy, fear, sadness, anger and neutral, with the intent of being used for opinion mining and analysis tasks. In this article we present some features of our novel dataset, and create a benchmark to achieve the first supervised machine learning model for automatic Emotion Detection in Romanian short texts. We investigate the performance of four classical machine learning models: Multinomial Naive Bayes, Logistic Regression, Support Vector Classification and Linear Support Vector Classification. We also investigate more modern approaches like fastText, which makes use of subword information. Lastly, we fine-tune the Romanian BERT for text classification and our experiments show that the BERT-based model has the best performance for the task of Emotion Detection from Romanian tweets. Keywords: Emotion Detection, Twitter, Romanian, Supervised Machine Learning

التبعيات العالمية romanian emotion detection support vector classification الكشف عن العاطفة الرومانية دعم تصنيف ناقلات صناعة حمض الفوسفور

DreamDrug - A crowdsourced NER dataset for detecting drugs in darknet markets

636 - Association for Computation Linguistics 2021 مقالة

We present DreamDrug, a crowdsourced dataset for detecting mentions of drugs in noisy user-generated item listings from darknet markets. Our dataset contains nearly 15,000 manually annotated drug entities in over 3,500 item listings scraped from the darknet market platform DreamMarket'' in 2017. We also train and evaluate baseline models for detecting these entities, using contextual language models fine-tuned in a few-shot setting and on the full dataset, and examine the effect of pretraining on in-domain unannotated corpora.

crowdsourced ner dataset crowdsourced ner ner dataset مجموعة بيانات Growdsourced Ner growdsourced ner. DataSet ner. صناعة حمض الفوسفور المزيد..

NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

690 - Association for Computation Linguistics 2021 مقالة

In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its i mportant difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. NEREL can facilitate development of novel models that can extract relations between nested named entities, as well as relations on both sentence and document levels. NEREL also contains the annotation of events involving named entities and their roles in the events. The NEREL collection is available via https://github.com/nerel-ds/NEREL.

الجودة والتصنيف nested named entities russian dataset الكيانات المسماة المتداخلة DataSet الروسية صناعة حمض الفوسفور

Target-Aware Data Augmentation for Stance Detection

1071 - Association for Computation Linguistics 2021 مقالة

The goal of stance detection is to identify whether the author of a text is in favor of, neutral or against a specific target. Despite substantial progress on this task, one of the remaining challenges is the scarcity of annotations. Data augmentatio n is commonly used to address annotation scarcity by generating more training samples. However, the augmented sentences that are generated by existing methods are either less diversified or inconsistent with the given target and stance label. In this paper, we formulate the data augmentation of stance detection as a conditional masked language modeling task and augment the dataset by predicting the masked word conditioned on both its context and the auxiliary sentence that contains target and label information. Moreover, we propose another simple yet effective method that generates target-aware sentence by replacing a target mention with the other. Experimental results show that our proposed methods significantly outperforms previous augmentation methods on 11 targets.

تشعب صناعة حمض الفوسفور

Jibes \& Delights: A Dataset of Targeted Insults and Compliments to Tackle Online Abuse

784 - Association for Computation Linguistics 2021 مقالة

Online abuse and offensive language on social media have become widespread problems in today's digital age. In this paper, we contribute a Reddit-based dataset, consisting of 68,159 insults and 51,102 compliments targeted at individuals instead of ta rgeting a particular community or race. Secondly, we benchmark multiple existing state-of-the-art models for both classification and unsupervised style transfer on the dataset. Finally, we analyse the experimental results and conclude that the transfer task is challenging, requiring the models to understand the high degree of creativity exhibited in the data.

tackle online abuse tackle online online abuse معالجة سوء المعاملة عبر الإنترنت معالجة عبر الإنترنت الاعتداء عبر الإنترنت صناعة حمض الفوسفور المزيد..

tWT--WT: A Dataset to Assert the Role of Target Entities for Detecting Stance of Tweets

TWT - WT: مجموعة بيانات لتأكيد دور الكيانات المستهدفة للكشف عن موقف التغريدات

Ask ChatGPT about the research

Read More

suggested questions