No Arabic abstract
Streams of user-generated content in social media exhibit patterns of collective attention across diverse topics, with temporal structures determined both by exogenous factors and endogenous factors. Teasing apart different topics and resolving their individual, concurrent, activity timelines is a key challenge in extracting knowledge from microblog streams. Facing this challenge requires the use of methods that expose latent signals by using term correlations across posts and over time. Here we focus on content posted to Twitter during the London 2012 Olympics, for which a detailed schedule of events is independently available and can be used for reference. We mine the temporal structure of topical activity by using two methods based on non-negative matrix factorization. We show that for events in the Olympics schedule that can be semantically matched to Twitter topics, the extracted Twitter activity timeline closely matches the known timeline from the schedule. Our results show that, given appropriate techniques to detect latent signals, Twitter can be used as a social sensor to extract topical-temporal information on real-world events at high temporal resolution.
The occurrence of new events in a system is typically driven by external causes and by previous events taking place inside the system. This is a general statement, applying to a range of situations including, more recently, to the activity of users in Online social networks (OSNs). Here we develop a method for extracting from a series of posting times the relative contributions of exogenous, e.g. news media, and endogenous, e.g. information cascade. The method is based on the fitting of a generalized linear model (GLM) equipped with a self-excitation mechanism. We test the method with synthetic data generated by a nonlinear Hawkes process, and apply it to a real time series of tweets with a given hashtag. In the empirical dataset, the estimated contributions of exogenous and endogenous volumes are close to the amounts of original tweets and retweets respectively. We conclude by discussing the possible applications of the method, for instance in online marketing.
User activity fluctuations reflect the performance of online society. We investigate the statistical properties of 1-min user activity time series of simultaneously online users inhabited in 95 independent virtual worlds. The number of online users exhibits clear intraday and weekly patterns due to humans circadian rhythms and week cycles. Statistical analysis shows that the distribution of absolute activity fluctuations has a power-law tail for 44 virtual worlds with an average tail exponent close to 2.15. The partition function approach unveils that the absolute activity fluctuations possess multifractal features for all the 95 virtual worlds. For the sample of 44 virtual worlds with power-law tailed distributions of the absolute activity fluctuations, the width of singularity $Deltaalpha$ is negatively correlated with the maximum activity ($p$-value=0.070) and the time to the maximum activity ($p$-value=0.010). The negative correlations are not observed for neither the other 51 virtual worlds nor the whole sample of the 95 virtual worlds. In addition, numerical experiments indicate that both temporal structure and large fluctuations have influence on the multifractal spectrum. We also find that the temporal structure has stronger impact on the singularity width than large fluctuations.
Studying human behaviour in virtual environments provides extraordinary opportunities for a quantitative analysis of social phenomena with levels of accuracy that approach those of the natural sciences. In this paper we use records of player activities in the massive multiplayer online game Pardus over 1,238 consecutive days, and analyze dynamical features of sequences of actions of players. We build on previous work were temporal structures of human actions of the same type were quantified, and extend provide an empirical understanding of human actions of different types. This study of multi-level human activity can be seen as a dynamic counterpart of static multiplex network analysis. We show that the interevent time distributions of actions in the Pardus universe follow highly non-trivial distribution functions, from which we extract action-type specific characteristic decay constants. We discuss characteristic features of interevent time distributions, including periodic patterns on different time scales, bursty dynamics, and various functional forms on different time scales. We comment on gender differences of players in emotional actions, and find that while male and female act similarly when performing some positive actions, females are slightly faster for negative actions. We also observe effects on the age of players: more experienced players are generally faster in making decisions about engaging and terminating in enmity and friendship, respectively.
While most topic modeling algorithms model text corpora with unigrams, human interpretation often relies on inherent grouping of terms into phrases. As such, we consider the problem of discovering topical phrases of mixed lengths. Existing work either performs post processing to the inference results of unigram-based topic models, or utilizes complex n-gram-discovery topic models. These methods generally produce low-quality topical phrases or suffer from poor scalability on even moderately-sized datasets. We propose a different approach that is both computationally efficient and effective. Our solution combines a novel phrase mining framework to segment a document into single and multi-word phrases, and a new topic model that operates on the induced document partition. Our approach discovers high quality topical phrases with negligible extra cost to the bag-of-words topic model in a variety of datasets including research publication titles, abstracts, reviews, and news articles.
We study SIS epidemic spreading processes unfolding on a recent generalisation of the activity-driven modelling framework. In this model of time-varying networks each node is described by two variables: activity and attractiveness. The first, describes the propensity to form connections. The second, defines the propensity to attract them. We derive analytically the epidemic threshold considering the timescale driving the evolution of contacts and the contagion as comparable. The solutions are general and hold for any joint distribution of activity and attractiveness. The theoretical picture is confirmed via large-scale numerical simulations performed considering heterogeneous distributions and different correlations between the two variables. We find that heterogeneous distributions of attractiveness alter the contagion process. In particular, in case of uncorrelated and positive correlations between the two variables, heterogeneous attractiveness facilitates the spreading. On the contrary, negative correlations between activity and attractiveness hamper the spreading. The results presented contribute to the understanding of the dynamical properties of time-varying networks and their effects on contagion phenomena unfolding on their fabric.