مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Trawling for Trolling: A Dataset

116 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Hitkul Jangid

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Hitkul - Karmanya Aggarwal - Pakhi Bamdev

أجهزة الكمبيوتر والمجتمع الشبكات الاجتماعية والمعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The ability to accurately detect and filter offensive content automatically is important to ensure a rich and diverse digital discourse. Trolling is a type of hurtful or offensive content that is prevalent in social media, but is underrepresented in datasets for offensive content detection. In this work, we present a dataset that models trolling as a subcategory of offensive content. The dataset was created by collecting samples from well-known datasets and reannotating them along precise definitions of different categories of offensive content. The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech. It encompasses content from Twitter, Reddit and Wikipedia Talk Pages. Models trained on our dataset show appreciable performance without any significant hyperparameter tuning and can potentially learn meaningful linguistic information effectively. We find that these models are sensitive to data ablation which suggests that the dataset is largely devoid of spurious statistical artefacts that could otherwise distract and confuse classification models.

قيم البحث

85 - Julio C. S. Reis , Philipe de Freitas Melo , Kiran Garimella 2020

Recently, messaging applications, such as WhatsApp, have been reportedly abused by misinformation campaigns, especially in Brazil and India. A notable form of abuse in WhatsApp relies on several manipulated images and memes containing all kinds of fa ke stories. In this work, we performed an extensive data collection from a large set of WhatsApp publicly accessible groups and fact-checking agency websites. This paper opens a novel dataset to the research community containing fact-checked fake images shared through WhatsApp for two distinct scenarios known for the spread of fake news on the platform: the 2018 Brazilian elections and the 2019 Indian elections.

أجهزة الكمبيوتر والمجتمع الشبكات الاجتماعية والمعلومات

Birdspotter: A Tool for Analyzing and Labeling Twitter Users

82 - Rohit Ram , Quyu Kong , Marian-Andrei Rizoiu 2020

The impact of online social media on societal events and institutions is profound; and with the rapid increases in user uptake, we are just starting to understand its ramifications. Social scientists and practitioners who model online discourse as a proxy for real-world behavior, often curate large social media datasets. A lack of available tooling aimed at non-data science experts frequently leaves this data (and the insights it holds) underutilized. Here, we propose birdspotter -- a tool to analyze and label Twitter users --, and birdspotter.ml -- an exploratory visualizer for the computed metrics. birdspotter provides an end-to-end analysis pipeline, from the processing of pre-collected Twitter data, to general-purpose labeling of users, and estimating their social influence, within a few lines of code. The package features tutorials and detailed documentation. We also illustrate how to train birdspotter into a fully-fledged bot detector that achieves better than state-of-the-art performances without making any Twitter API online calls, and we showcase its usage in an exploratory analysis of a topical COVID-19 dataset.

أجهزة الكمبيوتر والمجتمع الشبكات الاجتماعية والمعلومات

MUTLA: A Large-Scale Dataset for Multimodal Teaching and Learning Analytics

135 - Fangli Xu , Lingfei Wu , KP Thai 2019

Automatic analysis of teacher and student interactions could be very important to improve the quality of teaching and student engagement. However, despite some recent progress in utilizing multimodal data for teaching and learning analytics, a thorou gh analysis of a rich multimodal dataset coming for a complex real learning environment has yet to be done. To bridge this gap, we present a large-scale MUlti-modal Teaching and Learning Analytics (MUTLA) dataset. This dataset includes time-synchronized multimodal data records of students (learning logs, videos, EEG brainwaves) as they work in various subjects from Squirrel AI Learning System (SAIL) to solve problems of varying difficulty levels. The dataset resources include user records from the learner records store of SAIL, brainwave data collected by EEG headset devices, and video data captured by web cameras while students worked in the SAIL products. Our hope is that by analyzing real-world student learning activities, facial expressions, and brainwave patterns, researchers can better predict engagement, which can then be used to improve adaptive learning selection and student learning outcomes. An additional goal is to provide a dataset gathered from real-world educational activities versus those from controlled lab environments to benefit the educational learning community.

أجهزة الكمبيوتر والمجتمع التعلم الالي

Mobilkit: A Python Toolkit for Urban Resilience and Disaster Risk Management Analytics using High Frequency Human Mobility Data

121 - Enrico Ubaldi , Takahiro Yabe , Nicholas K. W. Jones 2021

Increasingly available high-frequency location datasets derived from smartphones provide unprecedented insight into trajectories of human mobility. These datasets can play a significant and growing role in informing preparedness and response to natur al disasters. However, limited tools exist to enable rapid analytics using mobility data, and tend not to be tailored specifically for disaster risk management. We present an open-source, Python-based toolkit designed to conduct replicable and scalable post-disaster analytics using GPS location data. Privacy, system capabilities, and potential expansions of textit{Mobilkit} are discussed.

أجهزة الكمبيوتر والمجتمع الشبكات الاجتماعية والمعلومات الفيزياء والمجتمع

An Ethical Highlighter for People-Centric Dataset Creation

200 - Margot Hanley , Apoorv Khandelwal , Hadar Averbuch-Elor 2020

Important ethical concerns arising from computer vision datasets of people have been receiving significant attention, and a number of datasets have been withdrawn as a result. To meet the academic need for people-centric datasets, we propose an analy tical framework to guide ethical evaluation of existing datasets and to serve future dataset creators in avoiding missteps. Our work is informed by a review and analysis of prior works and highlights where such ethical challenges arise.

أجهزة الكمبيوتر والمجتمع الرؤية الحاسوبية وتمييز الأنماط قواعد البيانات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الأميركية في بيروت

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Trawling for Trolling: A Dataset

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً