ﻻ يوجد ملخص باللغة العربية
The ability to accurately detect and filter offensive content automatically is important to ensure a rich and diverse digital discourse. Trolling is a type of hurtful or offensive content that is prevalent in social media, but is underrepresented in datasets for offensive content detection. In this work, we present a dataset that models trolling as a subcategory of offensive content. The dataset was created by collecting samples from well-known datasets and reannotating them along precise definitions of different categories of offensive content. The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech. It encompasses content from Twitter, Reddit and Wikipedia Talk Pages. Models trained on our dataset show appreciable performance without any significant hyperparameter tuning and can potentially learn meaningful linguistic information effectively. We find that these models are sensitive to data ablation which suggests that the dataset is largely devoid of spurious statistical artefacts that could otherwise distract and confuse classification models.
Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fa
The impact of online social media on societal events and institutions is profound; and with the rapid increases in user uptake, we are just starting to understand its ramifications. Social scientists and practitioners who model online discourse as a
Automatic analysis of teacher and student interactions could be very important to improve the quality of teaching and student engagement. However, despite some recent progress in utilizing multimodal data for teaching and learning analytics, a thorou
Increasingly available high-frequency location datasets derived from smartphones provide unprecedented insight into trajectories of human mobility. These datasets can play a significant and growing role in informing preparedness and response to natur
Important ethical concerns arising from computer vision datasets of people have been receiving significant attention, and a number of datasets have been withdrawn as a result. To meet the academic need for people-centric datasets, we propose an analy