بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

Efficient Twitter Sentiment Classification using Subjective Distant Supervision

389 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Venkata Naveen Reddy Chedeti

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Tapan Sahni - Chinmay Chandak - Naveen Reddy Chedeti

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

As microblogging services like Twitter are becoming more and more influential in todays globalised world, its facets like sentiment analysis are being extensively studied. We are no longer constrained by our own opinion. Others opinions and sentiments play a huge role in shaping our perspective. In this paper, we build on previous works on Twitter sentiment analysis using Distant Supervision. The existing approach requires huge computation resource for analysing large number of tweets. In this paper, we propose techniques to speed up the computation process for sentiment analysis. We use tweet subjectivity to select the right training samples. We also introduce the concept of EFWS (Effective Word Score) of a tweet that is derived from polarity scores of frequently used words, which is an additional heuristic that can be used to speed up the sentiment classification with standard machine learning algorithms. We performed our experiments using 1.6 million tweets. Experimental evaluations show that our proposed technique is more efficient and has higher accuracy compared to previously proposed methods. We achieve overall accuracies of around 80% (EFWS heuristic gives an accuracy around 85%) on a training dataset of 100K tweets, which is half the size of the dataset used for the baseline model. The accuracy of our proposed model is 2-3% higher than the baseline model, and the model effectively trains at twice the speed of the baseline model.

قيم البحث

129 - Shihan Wang , Marijn Schraagen , Erik Tjong Kim Sang 2020

Public sentiment (the opinions, attitudes or feelings expressed by the public) is a factor of interest for government, as it directly influences the implementation of policies. Given the unprecedented nature of the COVID-19 crisis, having an up-to-da te representation of public sentiment on governmental measures and announcements is crucial. While the staying-at-home policy makes face-to-face interactions and interviews challenging, analysing real-time Twitter data that reflects public opinion toward policy measures is a cost-effective way to access public sentiment. In this context, we collect streaming data using the Twitter API starting from the COVID-19 outbreak in the Netherlands in February 2020, and track Dutch general public reactions on governmental measures and announcements. We provide temporal analysis of tweet frequency and public sentiment over the past seven months. We also identify public attitudes towards two Dutch policies in case studies: one regarding social distancing and one regarding wearing face masks. By presenting those preliminary results, we aim to provide visibility into the social media discussions around COVID-19 to the general public, scientists and policy makers. The data collection and analysis will be updated and expanded over time.

الشبكات الاجتماعية والمعلومات الحساب واللغة استرجاع المعلومات

Multitask Learning for Fine-Grained Twitter Sentiment Analysis

105 - Georgios Balikas , Simon Moura , Massih-Reza Amini 2017

Traditional sentiment analysis approaches tackle problems like ternary (3-category) and fine-grained (5-category) classification by learning the tasks separately. We argue that such classification tasks are correlated and we propose a multitask appro ach based on a recurrent neural network that benefits by jointly learning them. Our study demonstrates the potential of multitask models on this type of problems and improves the state-of-the-art results in the fine-grained sentiment classification problem.

استرجاع المعلومات الحساب واللغة التعلم الآلي

Storywrangler: A massive exploratorium for sociolinguistic, cultural, socioeconomic, and political timelines using Twitter

73 - Thayer Alshaabi , Jane L. Adams , Michael V. Arnold 2020

In real-time, social media data strongly imprints world events, popular culture, and day-to-day conversations by millions of ordinary people at a scale that is scarcely conventionalized and recorded. Vitally, and absent from many standard corpora suc h as books and news archives, sharing and commenting mechanisms are native to social media platforms, enabling us to quantify social amplification (i.e., popularity) of trending storylines and contemporary cultural phenomena. Here, we describe Storywrangler, a natural language processing instrument designed to carry out an ongoing, day-scale curation of over 100 billion tweets containing roughly 1 trillion 1-grams from 2008 to 2021. For each day, we break tweets into unigrams, bigrams, and trigrams spanning over 100 languages. We track n-gram usage frequencies, and generate Zipf distributions, for words, hashtags, handles, numerals, symbols, and emojis. We make the data set available through an interactive time series viewer, and as downloadable time series and daily distributions. Although Storywrangler leverages Twitter data, our method of extracting and tracking dynamic changes of n-grams can be extended to any similar social media platform. We showcase a few examples of the many possible avenues of study we aim to enable including how social amplification can be visualized through contagiograms. We also present some example case studies that bridge n-gram time series with disparate data sources to explore sociotechnical dynamics of famous individuals, box office success, and social unrest.

الشبكات الاجتماعية والمعلومات الحساب واللغة الفيزياء والمجتمع

TwiSE at SemEval-2016 Task 4: Twitter Sentiment Classification

218 - Georgios Balikas , Massih-Reza Amini 2016

This paper describes the participation of the team TwiSE in the SemEval 2016 challenge. Specifically, we participated in Task 4, namely Sentiment Analysis in Twitter for which we implemented sentiment classification systems for subtasks A, B, C and D . Our approach consists of two steps. In the first step, we generate and validate diverse feature sets for twitter sentiment evaluation, inspired by the work of participants of previous editions of such challenges. In the second step, we focus on the optimization of the evaluation measures of the different subtasks. To this end, we examine different learning strategies by validating them on the data provided by the task organisers. For our final submissions we used an ensemble learning approach (stacked generalization) for Subtask A and single linear models for the rest of the subtasks. In the official leaderboard we were ranked 9/35, 8/19, 1/11 and 2/14 for subtasks A, B, C and D respectively.footnote{We make the code available for research purposes at url{https://github.com/balikasg/SemEval2016-Twitter_Sentiment_Evaluation}.}

الحساب واللغة استرجاع المعلومات التعلم الآلي

Discourse Analysis of Covid-19 in Persian Twitter Social Networks Using Graph Mining and Natural Language Processing

173 - Omid Shokrollahi , Niloofar Hashemi , Mohammad Dehghani 2021

One of the new scientific ways of understanding discourse dynamics is analyzing the public data of social networks. This researchs aim is Post-structuralist Discourse Analysis (PDA) of Covid-19 phenomenon (inspired by Laclau and Mouffes Discourse The ory) by using Intelligent Data Mining for Persian Society. The examined big data is five million tweets from 160,000 users of the Persian Twitter network to compare two discourses. Besides analyzing the tweet texts individually, a social network graph database has been created based on retweets relationships. We use the VoteRank algorithm to introduce and rank people whose posts become word of mouth, provided that the total information spreading scope is maximized over the network. These users are also clustered according to their word usage pattern (the Gaussian Mixture Model is used). The constructed discourse of influential spreaders is compared to the most active users. This analysis is done based on Covid-related posts over eight episodes. Also, by relying on the statistical content analysis and polarity of tweet words, discourse analysis is done for the whole mentioned subpopulations, especially for the top individuals. The most important result of this research is that the Twitter subjects discourse construction is government-based rather than community-based. The analyzed Iranian society does not consider itself responsible for the Covid-19 wicked problem, does not believe in participation, and expects the government to solve all problems. The most active and most influential users similarity is that political, national, and critical discourse construction is the predominant one. In addition to the advantages of its research methodology, it is necessary to pay attention to the studys limitations. Suggestion for future encounters of Iranian society with similar crises is given.

الشبكات الاجتماعية والمعلومات الحساب واللغة

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

جامعة حماه

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Efficient Twitter Sentiment Classification using Subjective Distant Supervision

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً