Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Kawarith: an Arabic Twitter Corpus for Crisis Events

Kawarith: كوربوس تويتر العربي لأحداث الأزمات

876 0 0 0.0 ( 0 )

Download Cite

Added by Association for Computation Linguistics مقالة

Publication date 2021

fields Artificial Intelligence

and research's language is English

Created by Shamra Editor

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Social media (SM) platforms such as Twitter provide large quantities of real-time data that can be leveraged during mass emergencies. Developing tools to support crisis-affected communities requires available datasets, which often do not exist for low resource languages. This paper introduces Kawarith a multi-dialect Arabic Twitter corpus for crisis events, comprising more than a million Arabic tweets collected during 22 crises that occurred between 2018 and 2020 and involved several types of hazard. Exploration of this content revealed the most discussed topics and information types, and the paper presents a labelled dataset from seven emergency events that serves as a gold standard for several tasks in crisis informatics research. Using annotated data from the same event, a BERT model is fine-tuned to classify tweets into different categories in the multi- label setting. Results show that BERT-based models yield good performance on this task even with small amounts of task-specific training data.

References used

https://aclanthology.org/

rate research

Arabic Offensive Language on Twitter: Analysis and Experiments

789 - Association for Computation Linguistics 2021 مقالة

Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building a large Arabic offensive tweet dataset. We introduce a method for building a datas et that is not biased by topic, dialect, or target. We produce the largest Arabic dataset to date with special tags for vulgarity and hate speech. We thoroughly analyze the dataset to determine which topics, dialects, and gender are most associated with offensive tweets and how Arabic speakers useoffensive language. Lastly, we conduct many experiments to produce strong results (F1 =83.2) on the dataset using SOTA techniques.

language on twitter arabic offensive language اللغة على Twitter اللغة الهجومية العربية صناعة حمض الفوسفور

Hate Towards the Political Opponent: A Twitter Corpus Study of the 2020 US Elections on the Basis of Offensive Speech and Stance Detection

731 - Association for Computation Linguistics 2021 مقالة

The 2020 US Elections have been, more than ever before, characterized by social media campaigns and mutual accusations. We investigate in this paper if this manifests also in online communication of the supporters of the candidates Biden and Trump, b y uttering hateful and offensive communication. We formulate an annotation task, in which we join the tasks of hateful/offensive speech detection and stance detection, and annotate 3000 Tweets from the campaign period, if they express a particular stance towards a candidate. Next to the established classes of favorable and against, we add mixed and neutral stances and also annotate if a candidate is mentioned with- out an opinion expression. Further, we an- notate if the tweet is written in an offensive style. This enables us to analyze if supporters of Joe Biden and the Democratic Party communicate differently than supporters of Donald Trump and the Republican Party. A BERT baseline classifier shows that the detection if somebody is a supporter of a candidate can be performed with high quality (.89 F1 for Trump and .91 F1 for Biden), while the detection that somebody expresses to be against a candidate is more challenging (.79 F1 and .64 F1, respectively). The automatic detection of hate/offensive speech remains challenging (with .53 F1). Our corpus is publicly available and constitutes a novel resource for computational modelling of offensive language under consideration of stances.

twitter corpus study political opponent twitter corpus Twitter Corpus الدراسة الخصم السياسي تويتر كوربوس صناعة حمض الفوسفور المزيد..

EnKhCorp1.0: An English--Khasi Corpus

573 - Association for Computation Linguistics 2021 مقالة

In machine translation, corpus preparation is one of the crucial tasks, particularly for lowresource pairs. In multilingual countries like India, machine translation plays a vital role in communication among people with various linguistic backgrounds . There are available online automatic translation systems by Google and Microsoft which include various languages which lack support for the Khasi language, which can hence be considered lowresource. This paper overviews the development of EnKhCorp1.0, a corpus for English--Khasi pair, and implemented baseline systems for EnglishtoKhasi and KhasitoEnglish translation based on the neural machine translation approach.

ترجمة آلة Manipuri-English صناعة حمض الفوسفور

A Novel Framework for Detecting Important Subevents from Crisis Events via Dynamic Semantic Graphs

642 - Association for Computation Linguistics 2021 مقالة

Social media is an essential tool to share information about crisis events, such as natural disasters. Event Detection aims at extracting information in the form of an event, but considers each event in isolation, without combining information across sentences or events. Many posts in Crisis NLP contain repetitive or complementary information which needs to be aggregated (e.g., the number of trapped people and their location) for disaster response. Although previous approaches in Crisis NLP aggregate information across posts, they only use shallow representations of the content (e.g., keywords), which cannot adequately represent the semantics of a crisis event and its sub-events. In this work, we propose a novel framework to extract critical sub-events from a large-scale crisis event by combining important information across relevant tweets. Our framework first converts all the tweets from a crisis event into a temporally-ordered set of graphs. Then it extracts sub-graphs that represent semantic relationships connecting verbs and nouns in 3 to 6 node sub-graphs. It does this by learning edge weights via Dynamic Graph Convolutional Networks (DGCNs) and extracting smaller, relevant sub-graphs. Our experiments show that our extracted structures (1) are semantically meaningful sub-events and (2) contain information important for the large crisis-event. Furthermore, we show that our approach significantly outperforms event detection baselines, highlighting the importance of aggregating information across tweets for our task.

detecting important subevents detecting important important subevents اكتشاف شبه مهم الكشف عن مهم دعوى مهمة صناعة حمض الفوسفور المزيد..

IceSum: An Icelandic Text Summarization Corpus

539 - Association for Computation Linguistics 2021 مقالة

Automatic Text Summarization (ATS) is the task of generating concise and fluent summaries from one or more documents. In this paper, we present IceSum, the first Icelandic corpus annotated with human-generated summaries. IceSum consists of 1,000 onli ne news articles and their extractive summaries. We train and evaluate several neural network-based models on this dataset, comparing them against a selection of baseline methods. We find that an encoder-decoder model with a sequence-to-sequence based extractor obtains the best results, outperforming all baseline methods. Furthermore, we evaluate how the size of the training corpus affects the quality of the generated summaries. We release the corpus and the models with an open license.

icelandic text summarization text summarization corpus تلخيص النص الأيسلندي تلخيص النص التلقائي تلخيص النص كوربوس صناعة حمض الفوسفور

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Kawarith: an Arabic Twitter Corpus for Crisis Events

Kawarith: كوربوس تويتر العربي لأحداث الأزمات

Ask ChatGPT about the research

Read More

suggested questions