تصف هذه الورقة مساهمة Helsinki - Ljubljana المهمة المشتركة في عام 2021 في مجال تحديد الموقع الجغرافي للوسائط الجغرافية الاجتماعية.بعد مشاركتنا الناجحة في 32020، اقترحنا مرة أخرى أنظمة مقيدة وغير مقيدة بناء على بنية بيرت.في هذه الورقة، نقوم بالإبلاغ عن تجارب مع إعدادات التكوين المختلفة ونماذج مختلفة تم تدريبها مسبقا، وننظر إلى نهج الانحدار الخالي من المعلمة مع مخططات التصنيف المختلفة التي اقترحها المشاركين الآخرون في كل من التعليمات الفاردة 2020. كل من التعليمات البرمجية وأفضل أداء مسبقا مسبقايتم تقديم النماذج بحرية المتاحة.
This paper describes the Helsinki--Ljubljana contribution to the VarDial 2021 shared task on social media variety geolocation. Following our successful participation at VarDial 2020, we again propose constrained and unconstrained systems based on the BERT architecture. In this paper, we report experiments with different tokenization settings and different pre-trained models, and we contrast our parameter-free regression approach with various classification schemes proposed by other participants at VarDial 2020. Both the code and the best-performing pre-trained models are made freely available.
References used
https://aclanthology.org/
The speech act of complaining is used by humans to communicate a negative mismatch between reality and expectations as a reaction to an unfavorable situation. Linguistic theory of pragmatics categorizes complaints into various severity levels based o
Abstract Much previous work characterizing language variation across Internet social groups has focused on the types of words used by these groups. We extend this type of study by employing BERT to characterize variation in the senses of words as wel
Social media texts such as blog posts, comments, and tweets often contain offensive languages including racial hate speech comments, personal attacks, and sexual harassment. Detecting inappropriate use of language is, therefore, of utmost importance
Sarcasm is a linguistic expression often used to communicate the opposite of what is said, usually something that is very unpleasant with an intention to insult or ridicule. Inherent ambiguity in sarcastic expressions makes sarcasm detection very dif
Mainstream research on hate speech focused so far predominantly on the task of classifying mainly social media posts with respect to predefined typologies of rather coarse-grained hate speech categories. This may be sufficient if the goal is to detec