ترغب بنشر مسار تعليمي؟ اضغط هنا

A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality

102   0   0.0 ( 0 )
 نشر من قبل Shubhanshu Mishra
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper we introduce a framework for annotating a social media text corpora for various categories. Since, social media data is generated via individuals, it is important to annotate the text for the individuals demographic attributes to enable a socio-technical analysis of the corpora. Furthermore, when analyzing a large data-set we can often annotate a small sample of data and then train a prediction model using this sample to annotate the full data for the relevant categories. We use a case study of a Facebook comment corpora on student loan discussion which was annotated for gender, military affiliation, age-group, political leaning, race, stance, topicalilty, neoliberlistic views and civility of the comment. We release three datasets of Facebook comments for further research at: https://github.com/socialmediaie/StudentDebtFbComments



قيم البحث

اقرأ أيضاً

The rapid development of social media changes the lifestyle of people and simultaneously provides an ideal place for publishing and disseminating rumors, which severely exacerbates social panic and triggers a crisis of social trust. Early content-bas ed methods focused on finding clues from the text and user profiles for rumor detection. Recent studies combine the stances of users comments with news content to capture the difference between true and false rumors. Although the users stance is effective for rumor detection, the manual labeling process is time-consuming and labor-intensive, which limits the application of utilizing it to facilitate rumor detection. In this paper, we first finetune a pre-trained BERT model on a small labeled dataset and leverage this model to annotate weak stance labels for users comment data to overcome the problem mentioned above. Then, we propose a novel Stance-aware Reinforcement Learning Framework (SRLF) to select high-quality labeled stance data for model training and rumor detection. Both the stance selection and rumor detection tasks are optimized simultaneously to promote both tasks mutually. We conduct experiments on two commonly used real-world datasets. The experimental results demonstrate that our framework outperforms the state-of-the-art models significantly, which confirms the effectiveness of the proposed framework.
Online forums and social media platforms are increasingly being used to discuss topics of varying polarities where different people take different stances. Several methodologies for automatic stance detection from text have been proposed in literatur e. To our knowledge, there has not been any systematic investigation towards their reproducibility, and their comparative performances. In this work, we explore the reproducibility of several existing stance detection models, including both neural models and classical classifier-based models. Through experiments on two datasets -- (i)~the popular SemEval microblog dataset, and (ii)~a set of health-related online news articles -- we also perform a detailed comparative analysis of various methods and explore their shortcomings. Implementations of all algorithms discussed in this paper are available at https://github.com/prajwal1210/Stance-Detection-in-Web-and-Social-Media.
Bill writing is a critical element of representative democracy. However, it is often overlooked that most legislative bills are derived, or even directly copied, from other bills. Despite the significance of bill-to-bill linkages for understanding th e legislative process, existing approaches fail to address semantic similarities across bills, let alone reordering or paraphrasing which are prevalent in legal document writing. In this paper, we overcome these limitations by proposing a 5-class classification task that closely reflects the nature of the bill generation process. In doing so, we construct a human-labeled dataset of 4,721 bill-to-bill relationships at the subsection-level and release this annotated dataset to the research community. To augment the dataset, we generate synthetic data with varying degrees of similarity, mimicking the complex bill writing process. We use BERT variants and apply multi-stage training, sequentially fine-tuning our models with synthetic and human-labeled datasets. We find that the predictive performance significantly improves when training with both human-labeled and synthetic data. Finally, we apply our trained model to infer section- and bill-level similarities. Our analysis shows that the proposed methodology successfully captures the similarities across legal documents at various levels of aggregation.
The outbreak of COVID-19 has transformed societies across the world as governments tackle the health, economic and social costs of the pandemic. It has also raised concerns about the spread of hateful language and prejudice online, especially hostili ty directed against East Asia. In this paper we report on the creation of a classifier that detects and categorizes social media posts from Twitter into four classes: Hostility against East Asia, Criticism of East Asia, Meta-discussions of East Asian prejudice and a neutral class. The classifier achieves an F1 score of 0.83 across all four classes. We provide our final model (coded in Python), as well as a new 20,000 tweet training dataset used to make the classifier, two analyses of hashtags associated with East Asian prejudice and the annotation codebook. The classifier can be implemented by other researchers, assisting with both online content moderation processes and further research into the dynamics, prevalence and impact of East Asian prejudice online during this global pandemic.
The Ubiquitous nature of smartphones has significantly increased the use of social media platforms, such as Facebook, Twitter, TikTok, and LinkedIn, etc., among the public, government, and businesses. Facebook generated ~70 billion USD in 2019 in adv ertisement revenues alone, a ~27% increase from the previous year. Social media has also played a strong role in outbreaks of social protests responsible for political changes in different countries. As we can see from the above examples, social media plays a big role in business intelligence and international politics. In this paper, we present and discuss a high-level functional intelligence model (recipe) of Social Media Analysis (SMA). This model synthesizes the input data and uses operational intelligence to provide actionable recommendations. In addition, it also matches the synthesized function of the experiences and learning gained from the environment. The SMA model presented is independent of the application domain, and can be applied to different domains, such as Education, Healthcare and Government, etc. Finally, we also present some of the challenges faced by SMA and how the SMA model presented in this paper solves them.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا