No Arabic abstract
Hard-to-predict bursts of COVID-19 pandemic revealed significance of statistical modeling which would resolve spatio-temporal correlations over geographical areas, for example spread of the infection over a city with census tract granularity. In this manuscript, we provide algorithmic answers to the following two inter-related public health challenges. (1) Inference Challenge: assuming that there are $N$ census blocks (nodes) in the city, and given an initial infection at any set of nodes, what is the probability for a subset of census blocks to become infected by the time the spread of the infection burst is stabilized? (2) Prevention Challenge: What is the minimal control action one can take to minimize the infected part of the stabilized state footprint? To answer the challenges, we build a Graphical Model of pandemic of the attractive Ising (pair-wise, binary) type, where each node represents a census track and each edge factor represents the strength of the pairwise interaction between a pair of nodes. We show that almost all attractive Ising Models on dense graphs result in either of the two modes for the most probable state: either all nodes which were not infected initially became infected, or all the initially uninfected nodes remain uninfected. This bi-modal solution of the Inference Challenge allows us to re-state the Prevention Challenge as the following tractable convex programming: for the bare Ising Model with pair-wise and bias factors representing the system without prevention measures, such that the MAP state is fully infected for at least one of the initial infection patterns, find the closest, in $l_1$ norm, set of factors resulting in all the MAP states of the Ising model, with the optimal prevention measures applied, to become safe.
Maximum A posteriori Probability (MAP) inference in graphical models amounts to solving a graph-structured combinatorial optimization problem. Popular inference algorithms such as belief propagation (BP) and generalized belief propagation (GBP) are intimately related to linear programming (LP) relaxation within the Sherali-Adams hierarchy. Despite the popularity of these algorithms, it is well understood that the Sum-of-Squares (SOS) hierarchy based on semidefinite programming (SDP) can provide superior guarantees. Unfortunately, SOS relaxations for a graph with $n$ vertices require solving an SDP with $n^{Theta(d)}$ variables where $d$ is the degree in the hierarchy. In practice, for $dge 4$, this approach does not scale beyond a few tens of variables. In this paper, we propose binary SDP relaxations for MAP inference using the SOS hierarchy with two innovations focused on computational efficiency. Firstly, in analogy to BP and its variants, we only introduce decision variables corresponding to contiguous regions in the graphical model. Secondly, we solve the resulting SDP using a non-convex Burer-Monteiro style method, and develop a sequential rounding procedure. We demonstrate that the resulting algorithm can solve problems with tens of thousands of variables within minutes, and outperforms BP and GBP on practical problems such as image denoising and Ising spin glasses. Finally, for specific graph types, we establish a sufficient condition for the tightness of the proposed partial SOS relaxation.
In social networks, by removing some target-sensitive links, privacy protection might be achieved. However, some hidden links can still be re-observed by link prediction methods on observable networks. In this paper, the conventional link prediction method named Resource Allocation Index (RA) is adopted for privacy attacks. Several defense methods are proposed, including heuristic and evolutionary approaches, to protect targeted links from RA attacks via evolutionary perturbations. This is the first time to study privacy protection on targeted links against link-prediction-based attacks. Some links are randomly selected from the network as targeted links for experimentation. The simulation results on six real-world networks demonstrate the superiority of the evolutionary perturbation approach for target defense against RA attacks. Moreover, transferring experiments show that, although the evolutionary perturbation approach is designed to against RA attacks, it is also effective against other link-prediction-based attacks.
Discussion of Latent variable graphical model selection via convex optimization by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky [arXiv:1008.1290].
Probabilistic graphical models, such as Markov random fields (MRF), exploit dependencies among random variables to model a rich family of joint probability distributions. Sophisticated inference algorithms, such as belief propagation (BP), can effectively compute the marginal posteriors. Nonetheless, it is still difficult to interpret the inference outcomes for important human decision making. There is no existing method to rigorously attribute the inference outcomes to the contributing factors of the graphical models. Shapley values provide an axiomatic framework, but naively computing or even approximating the values on general graphical models is challenging and less studied. We propose GraphShapley to integrate the decomposability of Shapley values, the structure of MRFs, and the iterative nature of BP inference in a principled way for fast Shapley value computation, that 1) systematically enumerates the important contributions to the Shapley values of the explaining variables without duplicate; 2) incrementally compute the contributions without starting from scratches. We theoretically characterize GraphShapley regarding independence, equal contribution, and additivity. On nine graphs, we demonstrate that GraphShapley provides sensible and practical explanations.
Predicting popularity, or the total volume of information outbreaks, is an important subproblem for understanding collective behavior in networks. Each of the two main types of recent approaches to the problem, feature-driven and generative models, have desired qualities and clear limitations. This paper bridges the gap between these solutions with a new hybrid approach and a new performance benchmark. We model each social cascade with a marked Hawkes self-exciting point process, and estimate the content virality, memory decay, and user influence. We then learn a predictive layer for popularity prediction using a collection of cascade history. To our surprise, Hawkes process with a predictive overlay outperform recent feature-driven and generative approaches on existing tweet data [43] and a new public benchmark on news tweets. We also found that a basic set of user features and event time summary statistics performs competitively in both classification and regression tasks, and that adding point process information to the feature set further improves predictions. From these observations, we argue that future work on popularity prediction should compare across feature-driven and generative modeling approaches in both classification and regression tasks.