أصبحت الإيذاء عبر الإنترنت واللغة المسيئة في وسائل التواصل الاجتماعي مشاكل واسعة النطاق في العصر الرقمي اليوم.في هذه الورقة، نحن نساهم في مجموعة بيانات قائمة على إعادة استخدامها، تتكون من 68،159 إهانات و 51،102 تحية مستهدفة للأفراد بدلا من استهداف مجتمع أو سباق معين.ثانيا، نقوم بتقييم العديد من النماذج الحديثة الموجودة في كل من التصنيف ونقل النمط غير المقترح على DataSet.أخيرا، نقوم بتحليل النتائج التجريبية واستنتج أن مهمة النقل صعبة، تتطلب النماذج لفهم درجة عالية من الإبداع المعروضة في البيانات.
Online abuse and offensive language on social media have become widespread problems in today's digital age. In this paper, we contribute a Reddit-based dataset, consisting of 68,159 insults and 51,102 compliments targeted at individuals instead of targeting a particular community or race. Secondly, we benchmark multiple existing state-of-the-art models for both classification and unsupervised style transfer on the dataset. Finally, we analyse the experimental results and conclude that the transfer task is challenging, requiring the models to understand the high degree of creativity exhibited in the data.
References used
https://aclanthology.org/
In this paper, we introduce a new English Twitter-based dataset for cyberbullying detection and online abuse. Comprising 62,587 tweets, this dataset was sourced from Twitter using specific query terms designed to retrieve tweets with high probabiliti
Abuse on the Internet is an important societal problem of our time. Millions of Internet users face harassment, racism, personal attacks, and other types of abuse across various platforms. The psychological effects of abuse on individuals can be prof
People utilize online forums to either look for information or to contribute it. Because of their growing popularity, certain online forums have been created specifically to provide support, assistance, and opinions for people suffering from mental i
The stance detection task aims at detecting the stance of a tweet or a text for a target. These targets can be named entities or free-form sentences (claims). Though the task involves reasoning of the tweet with respect to a target, we find that it i
Cross-document event coreference resolution (CDCR) is the task of identifying which event mentions refer to the same events throughout a collection of documents. Annotating CDCR data is an arduous and expensive process, explaining why existing corpor