تصف هذه الورقة عملية التوضيحية لبيانات لغة مسيئة محددة لرومانية على وسائل التواصل الاجتماعي.لتسهيل البحوث القابلة للمقارنة متعددة اللغات حول اللغة الهجومية، تتبع المبادئ التوجيهية التوضيحي بعض جهود التوضيح الحديثة لغات أخرى.يحتوي Corpus النهائي على 5000 وظيفة مدونات دقيقة مشروح من عدد كبير من المحن المعلقين المتطوعين.إن اتفاقية المعلن والتمييز التلقائي الأولي الناتج نواجهها تتماشى مع جهود التوضيحية السابقة.
This paper describes the annotation process of an offensive language data set for Romanian on social media. To facilitate comparable multi-lingual research on offensive language, the annotation guidelines follow some of the recent annotation efforts for other languages. The final corpus contains 5000 micro-blogging posts annotated by a large number of volunteer annotators. The inter-annotator agreement and the initial automatic discrimination results we present are in line with earlier annotation efforts.
References used
https://aclanthology.org/
Online misogyny has become an increasing worry for Arab women who experience gender-based online abuse on a daily basis. Misogyny automatic detection systems can assist in the prohibition of anti-women Arabic toxic content. Developing such systems is
Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building a large Arabic offensive tweet dataset. We introduce a method for building a datas
In Romanian language there are some resources for automatic text comprehension, but for Emotion Detection, not lexicon-based, there are none. To cover this gap, we extracted data from Twitter and created the first dataset containing tweets annotated
Reviewing contracts is a time-consuming procedure that incurs large expenses to companies and social inequality to those who cannot afford it. In this work, we propose document-level natural language inference (NLI) for contracts'', a novel, real-wor
In this paper, we introduce a new English Twitter-based dataset for cyberbullying detection and online abuse. Comprising 62,587 tweets, this dataset was sourced from Twitter using specific query terms designed to retrieve tweets with high probabiliti