يعد الكشف عن الموقف على Twitter تحديا بشكل خاص بسبب الطول القصير لكل سقسقة، والتعايش المستمر لمصطلحات جديدة وعلاج التصنيف، وانحراف هيكل الجملة من النثر القياسي.تم عرض نماذج لغة ذات ضبطها باستخدام بيانات داخل المجال على نطاق واسع لتكون الحالة الجديدة للعديد من مهام NLP، بما في ذلك اكتشاف الموقف.في هذه الورقة، نقترح طريقة رواية متناصة قائمة بذاتها تعزز نموذج اللغة الملثم للكشف عن الموقف.بدلا من إخفاء الرمز المميز العشوائي، نقترح استخدام نسبة مرجحة للأحكام المرجحة لتحديد الكلمات ذات الموقف العالي ومن ثم نموذج آلية الاهتمام التي تركز على هذه الكلمات.نظهر أن نهجنا المقترح يتفوق على حالة الفنية من أجل الكشف عن البيانات حول بيانات تويتر حول الانتخابات الرئاسية الأمريكية 2020.
Detecting stance on Twitter is especially challenging because of the short length of each tweet, the continuous coinage of new terminology and hashtags, and the deviation of sentence structure from standard prose. Fine-tuned language models using large-scale in-domain data have been shown to be the new state-of-the-art for many NLP tasks, including stance detection. In this paper, we propose a novel BERT-based fine-tuning method that enhances the masked language model for stance detection. Instead of random token masking, we propose using a weighted log-odds-ratio to identify words with high stance distinguishability and then model an attention mechanism that focuses on these words. We show that our proposed approach outperforms the state of the art for stance detection on Twitter data about the 2020 US Presidential election.
References used
https://aclanthology.org/
The goal of stance detection is to identify whether the author of a text is in favor of, neutral or against a specific target. Despite substantial progress on this task, one of the remaining challenges is the scarcity of annotations. Data augmentatio
As NLP systems become better at detecting opinions and beliefs from text, it is important to ensure not only that models are accurate but also that they arrive at their predictions in ways that align with human reasoning. In this work, we present a m
Masked language modeling (MLM) is one of the key sub-tasks in vision-language pretraining. In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text. In this paper,
Empathy is the link between self and others. Detecting and understanding empathy is a key element for improving human-machine interaction. However, annotating data for detecting empathy at a large scale is a challenging task. This paper employs multi
Stance detection (SD) entails classifying the sentiment of a text towards a given target, and is a relevant sub-task for opinion mining and social media analysis. Recent works have explored knowledge infusion supplementing the linguistic competence a