New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Arabic Offensive Language on Twitter: Analysis and Experiments

اللغة الهجومية العربية على تويتر: التحليل والتجارب

317 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building a large Arabic offensive tweet dataset. We introduce a method for building a dataset that is not biased by topic, dialect, or target. We produce the largest Arabic dataset to date with special tags for vulgarity and hate speech. We thoroughly analyze the dataset to determine which topics, dialects, and gender are most associated with offensive tweets and how Arabic speakers useoffensive language. Lastly, we conduct many experiments to produce strong results (F1 =83.2) on the dataset using SOTA techniques.

References used

https://aclanthology.org/

rate research

Leveraging Offensive Language for Sarcasm and Sentiment Detection in Arabic

372 - Association for Computation Linguistics 2021 مقالة

Sarcasm detection is one of the top challenging tasks in text classification, particularly for informal Arabic with high syntactic and semantic ambiguity. We propose two systems that harness knowledge from multiple tasks to improve the performance of the classifier. This paper presents the systems used in our participation to the two sub-tasks of the Sixth Arabic Natural Language Processing Workshop (WANLP); Sarcasm Detection and Sentiment Analysis. Our methodology is driven by the hypothesis that tweets with negative sentiment and tweets with sarcasm content are more likely to have offensive content, thus, fine-tuning the classification model using large corpus of offensive language, supports the learning process of the model to effectively detect sentiment and sarcasm contents. Results demonstrate the effectiveness of our approach for sarcasm detection task over sentiment analysis task.

leveraging offensive language sixth arabic natural الاستفادة من اللغة الهجومية السادسة العربية الطبيعية صناعة حمض الفوسفور

ROFF - A Romanian Twitter Dataset for Offensive Language

272 - Association for Computation Linguistics 2021 مقالة

This paper describes the annotation process of an offensive language data set for Romanian on social media. To facilitate comparable multi-lingual research on offensive language, the annotation guidelines follow some of the recent annotation efforts for other languages. The final corpus contains 5000 micro-blogging posts annotated by a large number of volunteer annotators. The inter-annotator agreement and the initial automatic discrimination results we present are in line with earlier annotation efforts.

romanian twitter dataset romanian twitter مجموعة بيانات Twitter الرومانية الرومانية تويتر صناعة حمض الفوسفور

Hate Towards the Political Opponent: A Twitter Corpus Study of the 2020 US Elections on the Basis of Offensive Speech and Stance Detection

339 - Association for Computation Linguistics 2021 مقالة

The 2020 US Elections have been, more than ever before, characterized by social media campaigns and mutual accusations. We investigate in this paper if this manifests also in online communication of the supporters of the candidates Biden and Trump, b y uttering hateful and offensive communication. We formulate an annotation task, in which we join the tasks of hateful/offensive speech detection and stance detection, and annotate 3000 Tweets from the campaign period, if they express a particular stance towards a candidate. Next to the established classes of favorable and against, we add mixed and neutral stances and also annotate if a candidate is mentioned with- out an opinion expression. Further, we an- notate if the tweet is written in an offensive style. This enables us to analyze if supporters of Joe Biden and the Democratic Party communicate differently than supporters of Donald Trump and the Republican Party. A BERT baseline classifier shows that the detection if somebody is a supporter of a candidate can be performed with high quality (.89 F1 for Trump and .91 F1 for Biden), while the detection that somebody expresses to be against a candidate is more challenging (.79 F1 and .64 F1, respectively). The automatic detection of hate/offensive speech remains challenging (with .53 F1). Our corpus is publicly available and constitutes a novel resource for computational modelling of offensive language under consideration of stances.

twitter corpus study political opponent twitter corpus Twitter Corpus الدراسة الخصم السياسي تويتر كوربوس صناعة حمض الفوسفور المزيد..

DamascusTeam at NLP4IF2021: Fighting the Arabic COVID-19 Infodemic on Twitter Using AraBERT

420 - Association for Computation Linguistics 2021 مقالة

The objective of this work was the introduction of an effective approach based on the AraBERT language model for fighting Tweets COVID-19 Infodemic. It was arranged in the form of a two-step pipeline, where the first step involved a series of pre-pro cessing procedures to transform Twitter jargon, including emojis and emoticons, into plain text, and the second step exploited a version of AraBERT, which was pre-trained on plain text, to fine-tune and classify the tweets with respect to their Label. The use of language models pre-trained on plain texts rather than on tweets was motivated by the necessity to address two critical issues shown by the scientific literature, namely (1) pre-trained language models are widely available in many languages, avoiding the time-consuming and resource-intensive model training directly on tweets from scratch, allowing to focus only on their fine-tuning; (2) available plain text corpora are larger than tweet-only ones, allowing for better performance.

fighting the arabic arabert language model plain text قتال العربي نموذج لغة أرابيرت نص عادي صناعة حمض الفوسفور المزيد..

OffendES: A New Corpus in Spanish for Offensive Language Research

363 - Association for Computation Linguistics 2021 مقالة

Offensive language detection and analysis has become a major area of research in Natural Language Processing. The freedom of participation in social media has exposed online users to posts designed to denigrate, insult or hurt them according to gende r, race, religion, ideology, or other personal characteristics. Focusing on young influencers from the well-known social platforms of Twitter, Instagram, and YouTube, we have collected a corpus composed of 47,128 Spanish comments manually labeled on offensive pre-defined categories. A subset of the corpus attaches a degree of confidence to each label, so both multi-class classification and multi-output regression studies are possible. In this paper, we introduce the corpus, discuss its building process, novelties, and some preliminary experiments with it to serve as a baseline for the research community.

تكامل المعجم offensive language research أبحاث اللغة الهجومية صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Arabic Offensive Language on Twitter: Analysis and Experiments

اللغة الهجومية العربية على تويتر: التحليل والتجارب

Ask ChatGPT about the research

Read More

suggested questions