No Arabic abstract
Predicting the popularity of online content is a fundamental problem in various application areas. One practical challenge for popularity prediction takes roots in the different settings of popularity prediction tasks in different situations, e.g., the varying lengths of the observation time window or prediction horizon. In other words, a good model for popularity prediction is desired to handle various tasks with different settings. However, the conventional paradigm for popularity prediction is training a separate prediction model for each prediction task, and thus the obtained model for one task is difficult to be generalized to other tasks, causing a great waste of training time and computational resources. To solve this issue, in this paper, we propose a novel pre-training framework for popularity prediction, aiming to pre-train a general deep representation model by learning intrinsic knowledge about popularity dynamics from the readily available diffusion cascades. We design a novel pretext task for pre-training, i.e., temporal context prediction for two randomly sampled time slices of popularity dynamics, impelling the deep prediction model to effectively capture the characteristics of popularity dynamics. Taking the state-of-the-art deep model, i.e., temporal convolutional neural network, as an instantiation of our proposed framework, experimental results conducted on both Sina Weibo and Twitter datasets demonstrate both the effectiveness and efficiency of the proposed pre-training framework for multiple popularity prediction tasks.
People differ in how they attend to, interpret, and respond to their surroundings. Convergent processing of the world may be one factor that contributes to social connections between individuals. We used neuroimaging and network analysis to investigate whether the most central individuals in their communities (as measured by in-degree centrality, a notion of popularity) process the world in a particularly normative way. More central individuals had exceptionally similar neural responses to their peers and especially to each other in brain regions associated with high-level interpretations and social cognition (e.g., in the default-mode network), whereas less-central individuals exhibited more idiosyncratic responses. Self-reported enjoyment of and interest in stimuli followed a similar pattern, but accounting for these data did not change our main results. These findings suggest an Anna Karenina principle in social networks: Highly-central individuals process the world in exceptionally similar ways, whereas less-central individuals process the world in idiosyncratic ways.
We present the first comprehensive characterization of the diffusion of ideas on Twitter, studying more than 4000 topics that include both popular and less popular topics. On a data set containing approximately 10 million users and a comprehensive scraping of all the tweets posted by these users between June 2009 and August 2009 (approximately 200 million tweets), we perform a rigorous temporal and spatial analysis, investigating the time-evolving properties of the subgraphs formed by the users discussing each topic. We focus on two different notions of the spatial: the network topology formed by follower-following links on Twitter, and the geospatial location of the users. We investigate the effect of initiators on the popularity of topics and find that users with a high number of followers have a strong impact on popularity. We deduce that topics become popular when disjoint clusters of users discussing them begin to merge and form one giant component that grows to cover a significant fraction of the network. Our geospatial analysis shows that highly popular topics are those that cross regional boundaries aggressively.
Predicting popularity, or the total volume of information outbreaks, is an important subproblem for understanding collective behavior in networks. Each of the two main types of recent approaches to the problem, feature-driven and generative models, have desired qualities and clear limitations. This paper bridges the gap between these solutions with a new hybrid approach and a new performance benchmark. We model each social cascade with a marked Hawkes self-exciting point process, and estimate the content virality, memory decay, and user influence. We then learn a predictive layer for popularity prediction using a collection of cascade history. To our surprise, Hawkes process with a predictive overlay outperform recent feature-driven and generative approaches on existing tweet data [43] and a new public benchmark on news tweets. We also found that a basic set of user features and event time summary statistics performs competitively in both classification and regression tasks, and that adding point process information to the feature set further improves predictions. From these observations, we argue that future work on popularity prediction should compare across feature-driven and generative modeling approaches in both classification and regression tasks.
Understanding and predicting the popularity of online items is an important open problem in social media analysis. Considerable progress has been made recently in data-driven predictions, and in linking popularity to external promotions. However, the existing methods typically focus on a single source of external influence, whereas for many types of online content such as YouTube videos or news articles, attention is driven by multiple heterogeneous sources simultaneously - e.g. microblogs or traditional media coverage. Here, we propose RNN-MAS, a recurrent neural network for modeling asynchronous streams. It is a sequence generator that connects multiple streams of different granularity via joint inference. We show RNN-MAS not only to outperform the current state-of-the-art Youtube popularity prediction system by 17%, but also to capture complex dynamics, such as seasonal trends of unseen influence. We define two new metrics: promotion score quantifies the gain in popularity from one unit of promotion for a Youtube video; the loudness level captures the effects of a particular user tweeting about the video. We use the loudness level to compare the effects of a video being promoted by a single highly-followed user (in the top 1% most followed users) against being promoted by a group of mid-followed users. We find that results depend on the type of content being promoted: superusers are more successful in promoting Howto and Gaming videos, whereas the cohort of regular users are more influential for Activism videos. This work provides more accurate and explainable popularity predictions, as well as computational tools for content producers and marketers to allocate resources for promotion campaigns.
Many real-world systems can be expressed in temporal networks with nodes playing far different roles in structure and function and edges representing the relationships between nodes. Identifying critical nodes can help us control the spread of public opinions or epidemics, predict leading figures in academia, conduct advertisements for various commodities, and so on. However, it is rather difficult to identify critical nodes because the network structure changes over time in temporal networks. In this paper, considering the sequence topological information of temporal networks, a novel and effective learning framework based on the combination of special GCNs and RNNs is proposed to identify nodes with the best spreading ability. The effectiveness of the approach is evaluated by weighted Susceptible-Infected-Recovered model. Experimental results on four real-world temporal networks demonstrate that the proposed method outperforms both traditional and deep learning benchmark methods in terms of the Kendall $tau$ coefficient and top $k$ hit rate.