No Arabic abstract
The occurrence of new events in a system is typically driven by external causes and by previous events taking place inside the system. This is a general statement, applying to a range of situations including, more recently, to the activity of users in Online social networks (OSNs). Here we develop a method for extracting from a series of posting times the relative contributions of exogenous, e.g. news media, and endogenous, e.g. information cascade. The method is based on the fitting of a generalized linear model (GLM) equipped with a self-excitation mechanism. We test the method with synthetic data generated by a nonlinear Hawkes process, and apply it to a real time series of tweets with a given hashtag. In the empirical dataset, the estimated contributions of exogenous and endogenous volumes are close to the amounts of original tweets and retweets respectively. We conclude by discussing the possible applications of the method, for instance in online marketing.
Are large biological extinctions such as the Cretaceous/Tertiary KT boundary due to a meteorite, extreme volcanic activity or self-organized critical extinction cascades? Are commercial successes due to a progressive reputation cascade or the result of a well orchestrated advertisement? Determining the chain of causality for extreme events in complex systems requires disentangling interwoven exogenous and endogenous contributions with either no clear or too many signatures. Here, I review several efforts carried out with collaborators, which suggest a general strategy for understanding the organization of several complex systems under the dual effect of endogenous and exogenous fluctuations. The studied examples are: Internet download shocks, book sale shocks, social shocks, financial volatility shocks, and financial crashes. Simple models are offered to quantitatively relate the endogenous organization to the exogenous response of the system. Suggestions for applications of these ideas to many other systems are offered.
Daily interactions naturally define social circles. Individuals tend to be friends with the people they spend time with and they choose to spend time with their friends, inextricably entangling physical location and social relationships. As a result, it is possible to predict not only someones location from their friends locations but also friendship from spatial and temporal co-occurrence. While several models have been developed to separately describe mobility and the evolution of social networks, there is a lack of studies coupling social interactions and mobility. In this work, we introduce a new model that bridges this gap by explicitly considering the feedback of mobility on the formation of social ties. Data coming from three online social networks (Twitter, Gowalla and Brightkite) is used for validation. Our model reproduces various topological and physical properties of these networks such as: i) the size of the connected components, ii) the distance distribution between connected users, iii) the dependence of the reciprocity on the distance, iv) the variation of the social overlap and the clustering with the distance. Besides numerical simulations, a mean-field approach is also used to study analytically the main statistical features of the networks generated by the model. The robustness of the results to changes in the model parameters is explored, finding that a balance between friend visits and long-range random connections is essential to reproduce the geographical features of the empirical networks.
Recent wide-spread adoption of electronic and pervasive technologies has enabled the study of human behavior at an unprecedented level, uncovering universal patterns underlying human activity, mobility, and inter-personal communication. In the present work, we investigate whether deviations from these universal patterns may reveal information about the socio-economical status of geographical regions. We quantify the extent to which deviations in diurnal rhythm, mobility patterns, and communication styles across regions relate to their unemployment incidence. For this we examine a country-scale publicly articulated social media dataset, where we quantify individual behavioral features from over 145 million geo-located messages distributed among more than 340 different Spanish economic regions, inferred by computing communities of cohesive mobility fluxes. We find that regions exhibiting more diverse mobility fluxes, earlier diurnal rhythms, and more correct grammatical styles display lower unemployment rates. As a result, we provide a simple model able to produce accurate, easily interpretable reconstruction of regional unemployment incidence from their social-media digital fingerprints alone. Our results show that cost-effective economical indicators can be built based on publicly-available social media datasets.
A number of predictors have been suggested to detect the most influential spreaders of information in online social media across various domains such as Twitter or Facebook. In particular, degree, PageRank, k-core and other centralities have been adopted to rank the spreading capability of users in information dissemination media. So far, validation of the proposed predictors has been done by simulating the spreading dynamics rather than following real information flow in social networks. Consequently, only model-dependent contradictory results have been achieved so far for the best predictor. Here, we address this issue directly. We search for influential spreaders by following the real spreading dynamics in a wide range of networks. We find that the widely-used degree and PageRank fail in ranking users influence. We find that the best spreaders are consistently located in the k-core across dissimilar social platforms such as Twitter, Facebook, Livejournal and scientific publishing in the American Physical Society. Furthermore, when the complete global network structure is unavailable, we find that the sum of the nearest neighbors degree is a reliable local proxy for users influence. Our analysis provides practical instructions for optimal design of strategies for viral information dissemination in relevant applications.
Although the many forms of modern social media have become major channels for the dissemination of information, they are becoming overloaded because of the rapidly-expanding number of information feeds. We analyze the expanding user-generated content in Sina Weibo, the largest micro-blog site in China, and find evidence that popular messages often follow a mechanism that differs from that found in the spread of disease, in contrast to common believe. In this mechanism, an individual with more friends needs more repeated exposures to spread further the information. Moreover, our data suggest that in contrast to epidemics, for certain messages the chance of an individual to share the message is proportional to the fraction of its neighbours who shared it with him/her. Thus the greater the number of friends an individual has the greater the number of repeated contacts needed to spread the message, which is a result of competition for attention. We model this process using a fractional susceptible infected recovered (FSIR) model, where the infection probability of a node is proportional to its fraction of infected neighbors. Our findings have dramatic implications for information contagion. For example, using the FSIR model we find that real-world social networks have a finite epidemic threshold. This is in contrast to the zero threshold that conventional wisdom derives from disease epidemic models. This means that when individuals are overloaded with excess information feeds, the information either reaches out the population if it is above the critical epidemic threshold, or it would never be well received, leading to only a handful of information contents that can be widely spread throughout the population.