في هذه الورقة ندرس لغة Pejorative، موضوعا غير متوقع في اللغويات الحسابية.على عكس النماذج الحالية من اللغة الهجومية وكلاب الكراهية، تظهر لغة Pejorative نفسها في المقام الأول على المستوى المعجمي، وتوضح كلمة تستخدم مع دلالة سلبية، مما يجعلها مختلفة عن اللغة المسيئة أو الفئات الأخرى التي تمت دراستها.يعتمد Pejorativity أيضا على السياق: يمكن استخدام نفس الكلمة مع أو بدون دلالات Pejorative، وبالتالي فإن الكشف عن Pejorativity هو أساسا مشكلة مماثلة ل Disambiguation Sense Word.نستفيد بين القواميس عبر الإنترنت لبناء معجم متعدد اللغات من شروط Pejorative للغة الإنجليزية والإسبانية والإيطالية والرومانية.كلفنا تحرير مجموعة بيانات من تغريدات المشروح لاستخدام Pejorative.بناء على هذه الموارد، نقدم تحليلا لاستخدام وحدوث كلمات Pejorative في وسائل التواصل الاجتماعي، وتقديم محاولة لإفساد استخدام Pejorative تلقائيا في مجموعة بياناتنا.
In this paper we study pejorative language, an under-explored topic in computational linguistics. Unlike existing models of offensive language and hate speech, pejorative language manifests itself primarily at the lexical level, and describes a word that is used with a negative connotation, making it different from offensive language or other more studied categories. Pejorativity is also context-dependent: the same word can be used with or without pejorative connotations, thus pejorativity detection is essentially a problem similar to word sense disambiguation. We leverage online dictionaries to build a multilingual lexicon of pejorative terms for English, Spanish, Italian, and Romanian. We additionally release a dataset of tweets annotated for pejorative use. Based on these resources, we present an analysis of the usage and occurrence of pejorative words in social media, and present an attempt to automatically disambiguate pejorative usage in our dataset.
References used
Social media texts such as blog posts, comments, and tweets often contain offensive languages including racial hate speech comments, personal attacks, and sexual harassment. Detecting inappropriate use of language is, therefore, of utmost importance
The speech act of complaining is used by humans to communicate a negative mismatch between reality and expectations as a reaction to an unfavorable situation. Linguistic theory of pragmatics categorizes complaints into various severity levels based o
Sarcasm is a linguistic expression often used to communicate the opposite of what is said, usually something that is very unpleasant with an intention to insult or ridicule. Inherent ambiguity in sarcastic expressions makes sarcasm detection very dif
This paper describes the Helsinki--Ljubljana contribution to the VarDial 2021 shared task on social media variety geolocation. Following our successful participation at VarDial 2020, we again propose constrained and unconstrained systems based on the
Mainstream research on hate speech focused so far predominantly on the task of classifying mainly social media posts with respect to predefined typologies of rather coarse-grained hate speech categories. This may be sufficient if the goal is to detec