No Arabic abstract
Modeling online discourse dynamics is a core activity in understanding the spread of information, both offline and online, and emergent online behavior. There is currently a disconnect between the practitioners of online social media analysis -- usually social, political and communication scientists -- and the accessibility to tools capable of examining online discussions of users. Here we present evently, a tool for modeling online reshare cascades, and particularly retweet cascades, using self-exciting processes. It provides a comprehensive set of functionalities for processing raw data from Twitter public APIs, modeling the temporal dynamics of processed retweet cascades and characterizing online users with a wide range of diffusion measures. This tool is designed for researchers with a wide range of computer expertise, and it includes tutorials and detailed documentation. We illustrate the usage of evently with an end-to-end analysis of online user behavior on a topical dataset relating to COVID-19. We show that, by characterizing users solely based on how their content spreads online, we can disentangle influential users and online bots.
It is well-known that online behavior is long-tailed, with most cascaded actions being short and a few being very long. A prominent drawback in generative models for online events is the inability to describe unpopular items well. This work addresses these shortcomings by proposing dual mixture self-exciting processes to jointly learn from groups of cascades. We first start from the observation that maximum likelihood estimates for content virality and influence decay are separable in a Hawkes process. Next, our proposed model, which leverages a Borel mixture model and a kernel mixture model, jointly models the unfolding of a heterogeneous set of cascades. When applied to cascades of the same online items, the model directly characterizes their spread dynamics and supplies interpretable quantities, such as content virality and content influence decay, as well as methods for predicting the final content popularities. On two retweet cascade datasets -- one relating to YouTube videos and the second relating to controversial news articles -- we show that our models capture the differences between online items at the granularity of items, publishers and categories. In particular, we are able to distinguish between far-right, conspiracy, controversial and reputable online news articles based on how they diffuse through social media, achieving an F1 score of 0.945. On holdout datasets, we show that the dual mixture model provides, for reshare diffusion cascades especially unpopular ones, better generalization performance and, for online items, accurate item popularity predictions.
The contagion dynamics can emerge in social networks when repeated activation is allowed. An interesting example of this phenomenon is retweet cascades where users allow to re-share content posted by other people with public accounts. To model this type of behaviour we use a Hawkes self-exciting process. To do it properly though one needs to calibrate model under consideration. The main goal of this paper is to construct moments method of estimation of this model. The key step is based on identifying of a generator of a Hawkes process. We perform numerical analysis on real data as well.
Epidemic models and self-exciting processes are two types of models used to describe diffusion phenomena online and offline. These models were originally developed in different scientific communities, and their commonalities are under-explored. This work establishes, for the first time, a general connection between the two model classes via three new mathematical components. The first is a generalized version of stochastic Susceptible-Infected-Recovered (SIR) model with arbitrary recovery time distributions; the second is the relationship between the (latent and arbitrary) recovery time distribution, recovery hazard function, and the infection kernel of self-exciting processes; the third includes methods for simulating, fitting, evaluating and predicting the generalized process. On three large Twitter diffusion datasets, we conduct goodness-of-fit tests and holdout log-likelihood evaluation of self-exciting processes with three infection kernels --- exponential, power-law and Tsallis Q-exponential. We show that the modeling performance of the infection kernels varies with respect to the temporal structures of diffusions, and also with respect to user behavior, such as the likelihood of being bots. We further improve the prediction of popularity by combining two models that are identified as complementary by the goodness-of-fit tests.
We consider a 2-dimensional marked Hawkes process with increasing baseline intensity in order to model prices on electricity intraday markets. This model allows to represent different empirical facts such as increasing market activity, random jump sizes but above all microstructure noise through the signature plot. This last feature is of particular importance for practitioners and has not yet been modeled on those particular markets. We provide analytic formulas for first and second moments and for the signature plot, extending the classic results of Bacry et al. (2013) in the context of Hawkes processes with random jump sizes and time dependent baseline intensity. The tractable model we propose is estimated on German data and seems to fit the data well. We also provide a result about the convergence of the price process to a Brownian motion with increasing volatility at macroscopic scales, highlighting the Samuelson effect.
Sequences of events including infectious disease outbreaks, social network activities, and crimes are ubiquitous and the data on such events carry essential information about the underlying diffusion processes between communities (e.g., regions, online user groups). Modeling diffusion processes and predicting future events are crucial in many applications including epidemic control, viral marketing, and predictive policing. Hawkes processes offer a central tool for modeling the diffusion processes, in which the influence from the past events is described by the triggering kernel. However, the triggering kernel parameters, which govern how each community is influenced by the past events, are assumed to be static over time. In the real world, the diffusion processes depend not only on the influences from the past, but also the current (time-evolving) states of the communities, e.g., peoples awareness of the disease and peoples current interests. In this paper, we propose a novel Hawkes process model that is able to capture the underlying dynamics of community states behind the diffusion processes and predict the occurrences of events based on the dynamics. Specifically, we model the latent dynamic function that encodes these hidden dynamics by a mixture of neural networks. Then we design the triggering kernel using the latent dynamic function and its integral. The proposed method, termed DHP (Dynamic Hawkes Processes), offers a flexible way to learn complex representations of the time-evolving communities states, while at the same time it allows to computing the exact likelihood, which makes parameter learning tractable. Extensive experiments on four real-world event datasets show that DHP outperforms five widely adopted methods for event prediction.