No Arabic abstract
We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force. We quantify these aspects with existing lexical methods, and propose clustering of tweet embeddings as a means to identify salient topics for analysis across events; human evaluations show that our approach generates more cohesive topics than traditional LDA-based models. We apply our methods to study 4.4M tweets on 21 mass shootings. We provide evidence that the discussion of these events is highly polarized politically and that this polarization is primarily driven by partisan differences in framing rather than topic choice. We identify framing devices, such as grounding and the contrasting use of the terms terrorist and crazy, that contribute to polarization. Results pertaining to topic choice, affect and illocutionary force suggest that Republicans focus more on the shooter and event-specific facts (news) while Democrats focus more on the victims and call for policy changes. Our work contributes to a deeper understanding of the way group divisions manifest in language and to computational methods for studying them.
Parody is a figurative device used to imitate an entity for comedic or critical purposes and represents a widespread phenomenon in social media through many popular parody accounts. In this paper, we present the first computational study of parody. We introduce a new publicly available data set of tweets from real politicians and their corresponding parody accounts. We run a battery of supervised machine learning models for automatically detecting parody tweets with an emphasis on robustness by testing on tweets from accounts unseen in training, across different genders and across countries. Our results show that political parody tweets can be predicted with an accuracy up to 90%. Finally, we identify the markers of parody through a linguistic analysis. Beyond research in linguistics and political communication, accurately and automatically detecting parody is important to improving fact checking for journalists and analytics such as sentiment analysis through filtering out parodical utterances.
Over the past two decades, school shootings within the United States have repeatedly devastated communities and shaken public opinion. Many of these attacks appear to be `lone wolf ones driven by specific individual motivations, and the identification of precursor signals and hence actionable policy measures would thus seem highly unlikely. Here, we take a system-wide view and investigate the timing of school attacks and the dynamical feedback with social media. We identify a trend divergence in which college attacks have continued to accelerate over the last 25 years while those carried out on K-12 schools have slowed down. We establish the copycat effect in school shootings and uncover a statistical association between social media chatter and the probability of an attack in the following days. While hinting at causality, this relationship may also help mitigate the frequency and intensity of future attacks.
Interest surrounding cryptocurrencies, digital or virtual currencies that are used as a medium for financial transactions, has grown tremendously in recent years. The anonymity surrounding these currencies makes investors particularly susceptible to fraud---such as pump and dump scams---where the goal is to artificially inflate the perceived worth of a currency, luring victims into investing before the fraudsters can sell their holdings. Because of the speed and relative anonymity offered by social platforms such as Twitter and Telegram, social media has become a preferred platform for scammers who wish to spread false hype about the cryptocurrency they are trying to pump. In this work we propose and evaluate a computational approach that can automatically identify pump and dump scams as they unfold by combining information across social media platforms. We also develop a multi-modal approach for predicting whether a particular pump attempt will succeed or not. Finally, we analyze the prevalence of bots in cryptocurrency related tweets, and observe a significant increase in bot activity during the pump attempts.
The ongoing Coronavirus (COVID-19) pandemic highlights the inter-connectedness of our present-day globalized world. With social distancing policies in place, virtual communication has become an important source of (mis)information. As increasing number of people rely on social media platforms for news, identifying misinformation and uncovering the nature of online discourse around COVID-19 has emerged as a critical task. To this end, we collected streaming data related to COVID-19 using the Twitter API, starting March 1, 2020. We identified unreliable and misleading contents based on fact-checking sources, and examined the narratives promoted in misinformation tweets, along with the distribution of engagements with these tweets. In addition, we provide examples of the spreading patterns of prominent misinformation tweets. The analysis is presented and updated on a publically accessible dashboard (https://usc-melady.github.io/COVID-19-Tweet-Analysis) to track the nature of online discourse and misinformation about COVID-19 on Twitter from March 1 - June 5, 2020. The dashboard provides a daily list of identified misinformation tweets, along with topics, sentiments, and emerging trends in the COVID-19 Twitter discourse. The dashboard is provided to improve visibility into the nature and quality of information shared online, and provide real-time access to insights and information extracted from the dataset.
Despite the high consumption of dietary supplements (DS), there are not many reliable, relevant, and comprehensive online resources that could satisfy information seekers. The purpose of this research study is to understand consumers information needs on DS using topic modeling and to evaluate its accuracy in correctly identifying topics from social media. We retrieved 16,095 unique questions posted on Yahoo! Answers relating to 438 unique DS ingredients mentioned in sub-section, Alternative medicine under the section, Health. We implemented an unsupervised topic modeling method, Correlation Explanation (CorEx) to unveil the various topics consumers are most interested in. We manually reviewed the keywords of all the 200 topics generated by CorEx and assigned them to 38 health-related categories, corresponding to 12 higher-level groups. We found high accuracy (90-100%) in identifying questions that correctly align with the selected topics. The results could be used to guide us to generate a more comprehensive and structured DS resource based on consumers information needs.