We study the problem of learning influence functions under incomplete observations of node activations. Incomplete observations are a major concern as most (online and real-world) social networks are not fully observable. We establish both proper and improper PAC learnability of influence functions under randomly missing observations. Proper PAC learnability under the Discrete-Time Linear Threshold (DLT) and Discrete-Time Independent Cascade (DIC) models is established by reducing incomplete observations to complete observations in a modified graph. Our improper PAC learnability result applies for the DLT and DIC models as well as the Continuous-Time Independent Cascade (CIC) model. It is based on a parametrization in terms of reachability features, and also gives rise to an efficient and practical heuristic. Experiments on synthetic and real-world datasets demonstrate the ability of our method to compensate even for a fairly large fraction of missing observations.
Although recent works have developed methods that can generate estimations (or imputations) of the missing entries in a dataset to facilitate downstream analysis, most depend on assumptions that may not align with real-world applications and could suffer from poor performance in subsequent tasks. This is particularly true if the data have large missingness rates or a small population. More importantly, the imputation error could be propagated into the prediction step that follows, causing the gradients used to train the prediction models to be biased. Consequently, in this work, we introduce the importance guided stochastic gradient descent (IGSGD) method to train multilayer perceptrons (MLPs) and long short-term memories (LSTMs) to directly perform inference from inputs containing missing values without imputation. Specifically, we employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. This not only reduces bias but allows the model to exploit the underlying information behind missingness patterns. We test the proposed approach on real-world time-series (i.e., MIMIC-III), tabular data obtained from an eye clinic, and a standard dataset (i.e., MNIST), where our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
Spreading processes play an increasingly important role in modeling for diffusion networks, information propagation, marketing and opinion setting. We address the problem of learning of a spreading model such that the predictions generated from this model are accurate and could be subsequently used for the optimization, and control of diffusion dynamics. We focus on a challenging setting where full observations of the dynamics are not available, and standard approaches such as maximum likelihood quickly become intractable for large network instances. We introduce a computationally efficient algorithm, based on a scalable dynamic message-passing approach, which is able to learn parameters of the effective spreading model given only limited information on the activation times of nodes in the network. The popular Independent Cascade model is used to illustrate our approach. We show that tractable inference from the learned model generates a better prediction of marginal probabilities compared to the original model. We develop a systematic procedure for learning a mixture of models which further improves the prediction quality.
The problem of predicting peoples participation in real-world events has received considerable attention as it offers valuable insights for human behavior analysis and event-related advertisement. Today social networks (e.g. Twitter) widely reflect large popular events where people discuss their interest with friends. Event participants usually stimulate friends to join the event which propagates a social influence in the network. In this paper, we propose to model the social influence of friends on event attendance. We consider non-geotagged posts besides structures of social groups to infer users attendance. To leverage the information on network topology we apply some of recent graph embedding techniques such as node2vec, HARP and Poincar`e. We describe the approach followed to design the feature space and feed it to a neural network. The performance evaluation is conducted using two large music festivals datasets, namely the VFestival and Creamfields. The experimental results show that our classifier outperforms the state-of-the-art baseline with 89% accuracy observed for the VFestival dataset.
We study nonlinear dynamics on complex networks. Each vertex $i$ has a state $x_i$ which evolves according to a networked dynamics to a steady-state $x_i^*$. We develop fundamental tools to learn the true steady-state of a small part of the network, without knowing the full network. A naive approach and the current state-of-the-art is to follow the dynamics of the observed partial network to local equilibrium. This dramatically fails to extract the true steady state. We use a mean-field approach to map the dynamics of the unseen part of the network to a single node, which allows us to recover accurate estimates of steady-state on as few as 5 observed vertices in domains ranging from ecology to social networks to gene regulation. Incomplete networks are the norm in practice, and we offer new ways to think about nonlinear dynamics when only sparse information is available.
Attributed networks are ubiquitous since a network often comes with auxiliary attribute information e.g. a social network with user profiles. Attributed Network Embedding (ANE) has recently attracted considerable attention, which aims to learn unified low dimensional node embeddings while preserving both structural and attribute information. The resulting node embeddings can then facilitate various network downstream tasks e.g. link prediction. Although there are several ANE methods, most of them cannot deal with incomplete attributed networks with missing links and/or missing node attributes, which often occur in real-world scenarios. To address this issue, we propose a robust ANE method, the general idea of which is to reconstruct a unified denser network by fusing two sources of information for information enhancement, and then employ a random walks based network embedding method for learning node embeddings. The experiments of link prediction, node classification, visualization, and parameter sensitivity analysis on six real-world datasets validate the effectiveness of our method to incomplete attributed networks.