No Arabic abstract
We propose a detailed analysis of the online-learning problem for Independent Cascade (IC) models under node-level feedback. These models have widespread applications in modern social networks. Existing works for IC models have only shed light on edge-level feedback models, where the agent knows the explicit outcome of every observed edge. Little is known about node-level feedback models, where only combined outcomes for sets of edges are observed; in other words, the realization of each edge is censored. This censored information, together with the nonlinear form of the aggregated influence probability, make both parameter estimation and algorithm design challenging. We establish the first confidence-region result under this setting. We also develop an online algorithm achieving a cumulative regret of $mathcal{O}( sqrt{T})$, matching the theoretical regret bound for IC models with edge-level feedback.
We study the online influence maximization (OIM) problem in social networks, where in multiple rounds the learner repeatedly chooses seed nodes to generate cascades, observes the cascade feedback, and gradually learns the best seeds that generate the largest cascade. We focus on two major challenges in this paper. First, we work with node-level feedback instead of edge-level feedback. The edge-level feedback reveals all edges that pass through information in a cascade, where the node-level feedback only reveals the activated nodes with timestamps. The node-level feedback is arguably more realistic since in practice it is relatively easy to observe who is influenced but very difficult to observe from which relationship (edge) the influence comes from. Second, we use standard offline oracle instead of offline pair-oracle. To compute a good seed set for the next round, an offline pair-oracle finds the best seed set and the best parameters within the confidence region simultaneously, and such an oracle is difficult to compute due to the combinatorial core of OIM problem. So we focus on how to use the standard offline influence maximization oracle which finds the best seed set given the edge parameters as input. In this paper, we resolve these challenges for the two most popular diffusion models, the independent cascade (IC) and the linear threshold (LT) model. For the IC model, the past research only achieves edge-level feedback, while we present the first $widetilde{O}(sqrt{T})$-regret algorithm for the node-level feedback. Besides, the algorithm only invokes standard offline oracles. For the LT model, a recent study only provides an OIM solution that meets the first challenge but still requires a pair-oracle. In this paper, we apply a similar technique as in the IC model to replace the pair-oracle with a standard oracle while maintaining $widetilde{O}(sqrt{T})$-regret.
We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of best influencers in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, and (ii) limited feedback, since only the influenced portion of the network is observed. Under a stochastic semi-bandit feedback, we propose and analyze IMLinUCB, a computationally efficient UCB-based algorithm. Our bounds on the cumulative regret are polynomial in all quantities of interest, achieve near-optimal dependence on the number of interactions and reflect the topology of the network and the activation probabilities of its edges, thereby giving insights on the problem complexity. To the best of our knowledge, these are the first such results. Our experiments show that in several representative graph topologies, the regret of IMLinUCB scales as suggested by our upper bounds. IMLinUCB permits linear generalization and thus is both statistically and computationally suitable for large-scale problems. Our experiments also show that IMLinUCB with linear generalization can lead to low regret in real-world online influence maximization.
Spreading processes play an increasingly important role in modeling for diffusion networks, information propagation, marketing and opinion setting. We address the problem of learning of a spreading model such that the predictions generated from this model are accurate and could be subsequently used for the optimization, and control of diffusion dynamics. We focus on a challenging setting where full observations of the dynamics are not available, and standard approaches such as maximum likelihood quickly become intractable for large network instances. We introduce a computationally efficient algorithm, based on a scalable dynamic message-passing approach, which is able to learn parameters of the effective spreading model given only limited information on the activation times of nodes in the network. The popular Independent Cascade model is used to illustrate our approach. We show that tractable inference from the learned model generates a better prediction of marginal probabilities compared to the original model. We develop a systematic procedure for learning a mixture of models which further improves the prediction quality.
We formulate a new problem at the intersectionof semi-supervised learning and contextual bandits,motivated by several applications including clini-cal trials and ad recommendations. We demonstratehow Graph Convolutional Network (GCN), a semi-supervised learning approach, can be adjusted tothe new problem formulation. We also propose avariant of the linear contextual bandit with semi-supervised missing rewards imputation. We thentake the best of both approaches to develop multi-GCN embedded contextual bandit. Our algorithmsare verified on several real world datasets.
We study online learning when partial feedback information is provided following every action of the learning process, and the learner incurs switching costs for changing his actions. In this setting, the feedback information system can be represented by a graph, and previous works studied the expected regret of the learner in the case of a clique (Expert setup), or disconnected single loops (Multi-Armed Bandits (MAB)). This work provides a lower bound on the expected regret in the Partial Information (PI) setting, namely for general feedback graphs --excluding the clique. Additionally, it shows that all algorithms that are optimal without switching costs are necessarily sub-optimal in the presence of switching costs, which motivates the need to design new algorithms. We propose two new algorithms: Threshold Based EXP3 and EXP3. SC. For the two special cases of symmetric PI setting and MAB, the expected regret of both of these algorithms is order optimal in the duration of the learning process. Additionally, Threshold Based EXP3 is order optimal in the switching cost, whereas EXP3. SC is not. Finally, empirical evaluations show that Threshold Based EXP3 outperforms the previously proposed order-optimal algorithms EXP3 SET in the presence of switching costs, and Batch EXP3 in the MAB setting with switching costs.