No Arabic abstract
We study the problem of online influence maximization in social networks. In this problem, a learner aims to identify the set of best influencers in a network by interacting with it, i.e., repeatedly selecting seed nodes and observing activation feedback in the network. We capitalize on an important property of the influence maximization problem named network assortativity, which is ignored by most existing works in online influence maximization. To realize network assortativity, we factorize the activation probability on the edges into latent factors on the corresponding nodes, including influence factor on the giving nodes and susceptibility factor on the receiving nodes. We propose an upper confidence bound based online learning solution to estimate the latent factors, and therefore the activation probabilities. Considerable regret reduction is achieved by our factorization based online influence maximization algorithm. And extensive empirical evaluations on two real-world networks showed the effectiveness of our proposed solution.
We propose a cumulative oversampling (CO) method for online learning. Our key idea is to sample parameter estimations from the updated belief space once in each round (similar to Thompson Sampling), and utilize the cumulative samples up to the current round to construct optimistic parameter estimations that asymptotically concentrate around the true parameters as tighter upper confidence bounds compared to the ones constructed with standard UCB methods. We apply CO to a novel budgeted variant of the Influence Maximization (IM) semi-bandits with linear generalization of edge weights, whose offline problem is NP-hard. Combining CO with the oracle we design for the offline problem, our online learning algorithm simultaneously tackles budget allocation, parameter learning, and reward maximization. We show that for IM semi-bandits, our CO-based algorithm achieves a scaled regret comparable to that of the UCB-based algorithms in theory, and performs on par with Thompson Sampling in numerical experiments.
Recommender systems should adapt to user interests as the latter evolve. A prevalent cause for the evolution of user interests is the influence of their social circle. In general, when the interests are not known, online algorithms that explore the recommendation space while also exploiting observed preferences are preferable. We present online recommendation algorithms rooted in the linear multi-armed bandit literature. Our bandit algorithms are tailored precisely to recommendation scenarios where user interests evolve under social influence. In particular, we show that our adaptations of the classic LinREL and Thompson Sampling algorithms maintain the same asymptotic regret bounds as in the non-social case. We validate our approach experimentally using both synthetic and real datasets.
Influence maximization is the task of finding the smallest set of nodes whose activation in a social network can trigger an activation cascade that reaches the targeted network coverage, where threshold rules determine the outcome of influence. This problem is NP-hard and it has generated a significant amount of recent research on finding efficient heuristics. We focus on a {it Balance Index} algorithm that relies on three parameters to tune its performance to the given network structure. We propose using a supervised machine-learning approach for such tuning. We select the most influential graph features for the parameter tuning. Then, using random-walk-based graph-sampling, we create small snapshots from the given synthetic and large-scale real-world networks. Using exhaustive search, we find for these snapshots the high accuracy values of BI parameters to use as a ground truth. Then, we train our machine-learning model on the snapshots and apply this model to the real-word network to find the best BI parameters. We apply these parameters to the sampled real-world network to measure the quality of the sets of initiators found this way. We use various real-world networks to validate our approach against other heuristic.
We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of best influencers in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, and (ii) limited feedback, since only the influenced portion of the network is observed. Under a stochastic semi-bandit feedback, we propose and analyze IMLinUCB, a computationally efficient UCB-based algorithm. Our bounds on the cumulative regret are polynomial in all quantities of interest, achieve near-optimal dependence on the number of interactions and reflect the topology of the network and the activation probabilities of its edges, thereby giving insights on the problem complexity. To the best of our knowledge, these are the first such results. Our experiments show that in several representative graph topologies, the regret of IMLinUCB scales as suggested by our upper bounds. IMLinUCB permits linear generalization and thus is both statistically and computationally suitable for large-scale problems. Our experiments also show that IMLinUCB with linear generalization can lead to low regret in real-world online influence maximization.
During online decision making in Multi-Armed Bandits (MAB), one needs to conduct inference on the true mean reward of each arm based on data collected so far at each step. However, since the arms are adaptively selected--thereby yielding non-iid data--conducting inference accurately is not straightforward. In particular, sample averaging, which is used in the family of UCB and Thompson sampling (TS) algorithms, does not provide a good choice as it suffers from bias and a lack of good statistical properties (e.g. asymptotic normality). Our thesis in this paper is that more sophisticated inference schemes that take into account the adaptive nature of the sequentially collected data can unlock further performance gains, even though both UCB and TS type algorithms are optimal in the worst case. In particular, we propose a variant of TS-style algorithms--which we call doubly adaptive TS--that leverages recent advances in causal inference and adaptively reweights the terms of a doubly robust estimator on the true mean reward of each arm. Through 20 synthetic domain experiments and a semi-synthetic experiment based on data from an A/B test of a web service, we demonstrate that using an adaptive inferential scheme (while still retaining the exploration efficacy of TS) provides clear benefits in online decision making: the proposed DATS algorithm has superior empirical performance to existing baselines (UCB and TS) in terms of regret and sample complexity in identifying the best arm. In addition, we also provide a finite-time regret bound of doubly adaptive TS that matches (up to log factors) those of UCB and TS algorithms, thereby establishing that its improved practical benefits do not come at the expense of worst-case suboptimality.