No Arabic abstract
The pervasive use of social media has grown to over two billion users to date, and is commonly utilized as a means to share information and shape world events. Evidence suggests that passive social media usage (i.e., viewing without taking action) has an impact on the users perspective. This empirical influence over perspective could have significant impact on social events. Therefore, it is important to understand how social media contributes to the formation of an individuals perspective. A set of experimental tasks were designed to investigate empirically derived thresholds for opinion formation as a result of passive interactions with different social media data types (i.e., videos, images, and messages). With a better understanding of how humans passively interact with social media information, a paradigm can be developed that allows the exploitation of this interaction and plays a significant role in future military plans and operations.
Stance detection, which aims to determine whether an individual is for or against a target concept, promises to uncover public opinion from large streams of social media data. Yet even human annotation of social media content does not always capture stance as measured by public opinion polls. We demonstrate this by directly comparing an individuals self-reported stance to the stance inferred from their social media data. Leveraging a longitudinal public opinion survey with respondent Twitter handles, we conducted this comparison for 1,129 individuals across four salient targets. We find that recall is high for both Pro and Anti stance classifications but precision is variable in a number of cases. We identify three factors leading to the disconnect between text and author stance: temporal inconsistencies, differences in constructs, and measurement errors from both survey respondents and annotators. By presenting a framework for assessing the limitations of stance detection models, this work provides important insight into what stance detection truly measures.
The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data.
The COVID-19 pandemic has affected peoples lives around the world on an unprecedented scale. We intend to investigate hoarding behaviors in response to the pandemic using large-scale social media data. First, we collect hoarding-related tweets shortly after the outbreak of the coronavirus. Next, we analyze the hoarding and anti-hoarding patterns of over 42,000 unique Twitter users in the United States from March 1 to April 30, 2020, and dissect the hoarding-related tweets by age, gender, and geographic location. We find the percentage of females in both hoarding and anti-hoarding groups is higher than that of the general Twitter user population. Furthermore, using topic modeling, we investigate the opinions expressed towards the hoarding behavior by categorizing these topics according to demographic and geographic groups. We also calculate the anxiety scores for the hoarding and anti-hoarding related tweets using a lexical approach. By comparing their anxiety scores with the baseline Twitter anxiety score, we reveal further insights. The LIWC anxiety mean for the hoarding-related tweets is significantly higher than the baseline Twitter anxiety mean. Interestingly, beer has the highest calculated anxiety score compared to other hoarded items mentioned in the tweets.
Social media sites are information marketplaces, where users produce and consume a wide variety of information and ideas. In these sites, users typically choose their information sources, which in turn determine what specific information they receive, how much information they receive and how quickly this information is shown to them. In this context, a natural question that arises is how efficient are social media users at selecting their information sources. In this work, we propose a computational framework to quantify users efficiency at selecting information sources. Our framework is based on the assumption that the goal of users is to acquire a set of unique pieces of information. To quantify users efficiency, we ask if the user could have acquired the same pieces of information from another set of sources more efficiently. We define three different notions of efficiency -- link, in-flow, and delay -- corresponding to the number of sources the user follows, the amount of (redundant) information she acquires and the delay with which she receives the information. Our definitions of efficiency are general and applicable to any social media system with an underlying information network, in which every user follows others to receive the information they produce. In our experiments, we measure the efficiency of Twitter users at acquiring different types of information. We find that Twitter users exhibit sub-optimal efficiency across the three notions of efficiency, although they tend to be more efficient at acquiring non-popular than popular pieces of information. We then show that this lack of efficiency is a consequence of the triadic closure mechanism by which users typically discover and follow other users in social media. Finally, we develop a heuristic algorithm that enables users to be significantly more efficient at acquiring the same unique pieces of information.
Users online tend to consume information adhering to their system of beliefs and to ignore dissenting information. During the COVID-19 pandemic, users get exposed to a massive amount of information about a new topic having a high level of uncertainty. In this paper, we analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation concerning COVID-19. We compare the two platforms on about three million pieces of content analyzing user interaction with respect to news articles. We first describe users consumption patterns on the two platforms focusing on the political leaning of news outlets. Finally, we characterize the echo chamber effect by modeling the dynamics of users interaction networks. Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content, with a consequent affiliation towards reliable sources in terms of engagement and comments. Conversely, the lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior. Twitter users show segregation towards reliable content with a uniform narrative. Gab, instead, offers a more heterogeneous structure where users, independently of their leaning, follow people who are slightly polarized towards questionable news.