No Arabic abstract
We present the first comprehensive characterization of the diffusion of ideas on Twitter, studying more than 4000 topics that include both popular and less popular topics. On a data set containing approximately 10 million users and a comprehensive scraping of all the tweets posted by these users between June 2009 and August 2009 (approximately 200 million tweets), we perform a rigorous temporal and spatial analysis, investigating the time-evolving properties of the subgraphs formed by the users discussing each topic. We focus on two different notions of the spatial: the network topology formed by follower-following links on Twitter, and the geospatial location of the users. We investigate the effect of initiators on the popularity of topics and find that users with a high number of followers have a strong impact on popularity. We deduce that topics become popular when disjoint clusters of users discussing them begin to merge and form one giant component that grows to cover a significant fraction of the network. Our geospatial analysis shows that highly popular topics are those that cross regional boundaries aggressively.
On social media platforms, like Twitter, users are often interested in gaining more influence and popularity by growing their set of followers, aka their audience. Several studies have described the properties of users on Twitter based on static snapshots of their follower network. Other studies have analyzed the general process of link formation. Here, rather than investigating the dynamics of this process itself, we study how the characteristics of the audience and follower links change as the audience of a user grows in size on the road to users popularity. To begin with, we find that the early followers tend to be more elite users than the late followers, i.e., they are more likely to have verified and expert accounts. Moreover, the early followers are significantly more similar to the person that they follow than the late followers. Namely, they are more likely to share time zone, language, and topics of interests with the followed user. To some extent, these phenomena are related with the growth of Twitter itself, wherein the early followers tend to be the early adopters of Twitter, while the late followers are late adopters. We isolate, however, the effect of the growth of audiences consisting of followers from the growth of Twitters user base itself. Finally, we measure the engagement of such audiences with the content of the followed user, by measuring the probability that an early or late follower becomes a retweeter.
A large amount of content is generated everyday in social media. One of the main goals of content creators is to spread their information to a large audience. There are many factors that affect information spread, such as posting time, location, type of information, number of social connections, etc. In this paper, we look at the problem of finding the best posting time(s) to get high content visibility. The posting time is derived taking other factors into account, such as location, type of information, etc. In this paper, we do our analysis over Facebook pages. We propose six posting schedules that can be used for individual pages or group of pages with similar audience reaction profile. We perform our experiment on a Facebook pages dataset containing 0.3 million posts, 10 million audience reactions. Our best posting schedule can lead to seven times more number of audience reactions compared to the average number of audience reactions that users would get without following any optimized posting schedule. We also present some interesting audience reaction patterns that we obtained through daily, weekly and monthly audience reaction analysis.
The outbreak of COVID-19 highlights the need for a more harmonized, less privacy-concerning, easily accessible approach to monitoring the human mobility that has been proved to be associated with the viral transmission. In this study, we analyzed 587 million tweets worldwide to see how global collaborative efforts in reducing human mobility are reflected from the user-generated information at the global, country, and the U.S. state scale. Considering the multifaceted nature of mobility, we propose two types of distance: the single-day distance and the cross-day distance. To quantify the responsiveness in certain geographical regions, we further propose a mobility-based responsive index (MRI) that captures the overall degree of mobility changes within a time window. The results suggest that mobility patterns obtained from Twitter data are amendable to quantitatively reflect the mobility dynamics. Globally, the proposed two distances had greatly deviated from their baselines after March 11, 2020, when WHO declared COVID-19 as a pandemic. The considerably less periodicity after the declaration suggests that the protection measures have obviously affected peoples travel routines. The country scale comparisons reveal the discrepancies in responsiveness, evidenced by the contrasting mobility patterns in different epidemic phases. We find that the triggers of mobility changes correspond well with the national announcements of mitigation measures. In the U.S., the influence of the COVID-19 pandemic on mobility is distinct. However, the impacts varied substantially among states. The strong mobility recovering momentum is further fueled by the Black Lives Matter protests, potentially fostering the second wave of infections in the U.S.
Predicting the popularity of online content is a fundamental problem in various application areas. One practical challenge for popularity prediction takes roots in the different settings of popularity prediction tasks in different situations, e.g., the varying lengths of the observation time window or prediction horizon. In other words, a good model for popularity prediction is desired to handle various tasks with different settings. However, the conventional paradigm for popularity prediction is training a separate prediction model for each prediction task, and thus the obtained model for one task is difficult to be generalized to other tasks, causing a great waste of training time and computational resources. To solve this issue, in this paper, we propose a novel pre-training framework for popularity prediction, aiming to pre-train a general deep representation model by learning intrinsic knowledge about popularity dynamics from the readily available diffusion cascades. We design a novel pretext task for pre-training, i.e., temporal context prediction for two randomly sampled time slices of popularity dynamics, impelling the deep prediction model to effectively capture the characteristics of popularity dynamics. Taking the state-of-the-art deep model, i.e., temporal convolutional neural network, as an instantiation of our proposed framework, experimental results conducted on both Sina Weibo and Twitter datasets demonstrate both the effectiveness and efficiency of the proposed pre-training framework for multiple popularity prediction tasks.
The dynamics and influence of fake news on Twitter during the 2016 US presidential election remains to be clarified. Here, we use a dataset of 171 million tweets in the five months preceding the election day to identify 30 million tweets, from 2.2 million users, which contain a link to news outlets. Based on a classification of news outlets curated by www.opensources.co, we find that 25% of these tweets spread either fake or extremely biased news. We characterize the networks of information flow to find the most influential spreaders of fake and traditional news and use causal modeling to uncover how fake news influenced the presidential election. We find that, while top influencers spreading traditional center and left leaning news largely influence the activity of Clinton supporters, this causality is reversed for the fake news: the activity of Trump supporters influences the dynamics of the top fake news spreaders.