Stance Detection in Web and Social Media: A Comparative Study

126 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Saptarshi Ghosh Dr.

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shalmoli Ghosh - Prajwal Singhania - Siddharth Singh

الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Online forums and social media platforms are increasingly being used to discuss topics of varying polarities where different people take different stances. Several methodologies for automatic stance detection from text have been proposed in literature. To our knowledge, there has not been any systematic investigation towards their reproducibility, and their comparative performances. In this work, we explore the reproducibility of several existing stance detection models, including both neural models and classical classifier-based models. Through experiments on two datasets -- (i)~the popular SemEval microblog dataset, and (ii)~a set of health-related online news articles -- we also perform a detailed comparative analysis of various methods and explore their shortcomings. Implementations of all algorithms discussed in this paper are available at https://github.com/prajwal1210/Stance-Detection-in-Web-and-Social-Media.

قيم البحث

396 - Chunyuan Yuan , Wanhui Qian , Qianwen Ma 2021

The rapid development of social media changes the lifestyle of people and simultaneously provides an ideal place for publishing and disseminating rumors, which severely exacerbates social panic and triggers a crisis of social trust. Early content-bas ed methods focused on finding clues from the text and user profiles for rumor detection. Recent studies combine the stances of users comments with news content to capture the difference between true and false rumors. Although the users stance is effective for rumor detection, the manual labeling process is time-consuming and labor-intensive, which limits the application of utilizing it to facilitate rumor detection. In this paper, we first finetune a pre-trained BERT model on a small labeled dataset and leverage this model to annotate weak stance labels for users comment data to overcome the problem mentioned above. Then, we propose a novel Stance-aware Reinforcement Learning Framework (SRLF) to select high-quality labeled stance data for model training and rumor detection. Both the stance selection and rumor detection tasks are optimized simultaneously to promote both tasks mutually. We conduct experiments on two commonly used real-world datasets. The experimental results demonstrate that our framework outperforms the state-of-the-art models significantly, which confirms the effectiveness of the proposed framework.

الحساب واللغة الذكاء الاصطناعي

A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality

101 - Shubhanshu Mishra , Daniel Collier 2020

In this paper we introduce a framework for annotating a social media text corpora for various categories. Since, social media data is generated via individuals, it is important to annotate the text for the individuals demographic attributes to enable a socio-technical analysis of the corpora. Furthermore, when analyzing a large data-set we can often annotate a small sample of data and then train a prediction model using this sample to annotate the full data for the relevant categories. We use a case study of a Facebook comment corpora on student loan discussion which was annotated for gender, military affiliation, age-group, political leaning, race, stance, topicalilty, neoliberlistic views and civility of the comment. We release three datasets of Facebook comments for further research at: https://github.com/socialmediaie/StudentDebtFbComments

الحساب واللغة أجهزة الكمبيوتر والمجتمع

Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis

110 - Jo~ao A. Leite , Diego F. Silva , Kalina Bontcheva 2020

Hate speech and toxic comments are a common concern of social media platform users. Although these comments are, fortunately, the minority in these platforms, they are still capable of causing harm. Therefore, identifying these comments is an importa nt task for studying and preventing the proliferation of toxicity in social media. Previous work in automatically detecting toxic comments focus mainly in English, with very few work in languages like Brazilian Portuguese. In this paper, we propose a new large-scale dataset for Brazilian Portuguese with tweets annotated as either toxic or non-toxic or in different types of toxicity. We present our dataset collection and annotation process, where we aimed to select candidates covering multiple demographic groups. State-of-the-art BERT models were able to achieve 76% macro-F1 score using monolingual data in the binary case. We also show that large-scale monolingual data is still needed to create more accurate models, despite recent advances in multilingual approaches. An error analysis and experiments with multi-label classification show the difficulty of classifying certain types of toxic comments that appear less frequently in our data and highlights the need to develop models that are aware of different categories of toxicity.

الحساب واللغة التعلم الآلي الشبكات الاجتماعية والمعلومات

360{deg} Stance Detection

77 - Sebastian Ruder , John Glover , Afshin Mehrabani 2018

The proliferation of fake news and filter bubbles makes it increasingly difficult to form an unbiased, balanced opinion towards a topic. To ameliorate this, we propose 360{deg} Stance Detection, a tool that aggregates news with multiple perspectives on a topic. It presents them on a spectrum ranging from support to opposition, enabling the user to base their opinion on multiple pieces of diverse evidence.

الحساب واللغة استرجاع المعلومات الشبكات الاجتماعية والمعلومات

Echo Chambers on Social Media: A comparative analysis

103 - Matteo Cinelli , Gianmarco De Francisci Morales , Alessandro Galeazzi 2020

Recent studies have shown that online users tend to select information adhering to their system of beliefs, ignore information that does not, and join groups - i.e., echo chambers - around a shared narrative. Although a quantitative methodology for t heir identification is still missing, the phenomenon of echo chambers is widely debated both at scientific and political level. To shed light on this issue, we introduce an operational definition of echo chambers and perform a massive comparative analysis on more than 1B pieces of contents produced by 1M users on four social media platforms: Facebook, Twitter, Reddit, and Gab. We infer the leaning of users about controversial topics - ranging from vaccines to abortion - and reconstruct their interaction networks by analyzing different features, such as shared links domain, followed pages, follower relationship and commented posts. Our method quantifies the existence of echo-chambers along two main dimensions: homophily in the interaction networks and bias in the information diffusion toward likely-minded peers. We find peculiar differences across social media. Indeed, while Facebook and Twitter present clear-cut echo chambers in all the observed dataset, Reddit and Gab do not. Finally, we test the role of the social media platform on news consumption by comparing Reddit and Facebook. Again, we find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.

الفيزياء والمجتمع أجهزة الكمبيوتر والمجتمع الشبكات الاجتماعية والمعلومات