No Arabic abstract
We propose a cumulative oversampling (CO) method for online learning. Our key idea is to sample parameter estimations from the updated belief space once in each round (similar to Thompson Sampling), and utilize the cumulative samples up to the current round to construct optimistic parameter estimations that asymptotically concentrate around the true parameters as tighter upper confidence bounds compared to the ones constructed with standard UCB methods. We apply CO to a novel budgeted variant of the Influence Maximization (IM) semi-bandits with linear generalization of edge weights, whose offline problem is NP-hard. Combining CO with the oracle we design for the offline problem, our online learning algorithm simultaneously tackles budget allocation, parameter learning, and reward maximization. We show that for IM semi-bandits, our CO-based algorithm achieves a scaled regret comparable to that of the UCB-based algorithms in theory, and performs on par with Thompson Sampling in numerical experiments.
We study the problem of online influence maximization in social networks. In this problem, a learner aims to identify the set of best influencers in a network by interacting with it, i.e., repeatedly selecting seed nodes and observing activation feedback in the network. We capitalize on an important property of the influence maximization problem named network assortativity, which is ignored by most existing works in online influence maximization. To realize network assortativity, we factorize the activation probability on the edges into latent factors on the corresponding nodes, including influence factor on the giving nodes and susceptibility factor on the receiving nodes. We propose an upper confidence bound based online learning solution to estimate the latent factors, and therefore the activation probabilities. Considerable regret reduction is achieved by our factorization based online influence maximization algorithm. And extensive empirical evaluations on two real-world networks showed the effectiveness of our proposed solution.
In real-world machine learning applications, there is a cost associated with sampling of different features. Budgeted learning can be used to select which feature-values to acquire from each instance in a dataset, such that the best model is induced under a given constraint. However, this approach is not possible in the domain of online learning since one may not retroactively acquire feature-values from past instances. In online learning, the challenge is to find the optimum set of features to be acquired from each instance upon arrival from a data stream. In this paper we introduce the issue of online budgeted learning and describe a general framework for addressing this challenge. We propose two types of feature value acquisition policies based on the multi-armed bandit problem: random and adaptive. Adaptive policies perform online adjustments according to new information coming from a data stream, while random policies are not sensitive to the information that arrives from the data stream. Our comparative study on five real-world datasets indicates that adaptive policies outperform random policies for most budget limitations and datasets. Furthermore, we found that in some cases adaptive policies achieve near-optimal results.
Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize the value of the ranking? These applications exhibit strong diminishing returns: Redundancy decreases the marginal utility of each ad or information source. We show that these and other problems can be formalized as repeatedly selecting an assignment of items to positions to maximize a sequence of monotone submodular functions that arrive one by one. We present an efficient algorithm for this general problem and analyze it in the no-regret model. Our algorithm possesses strong theoretical guarantees, such as a performance ratio that converges to the optimal constant of 1 - 1/e. We empirically evaluate our algorithm on two real-world online optimization problems on the web: ad allocation with submodular utilities, and dynamically ranking blogs to detect information cascades. Finally, we present a second algorithm that handles the more general case in which the feasible sets are given by a matroid constraint, while still maintaining a 1 - 1/e asymptotic performance ratio.
Influence maximization is the task of finding the smallest set of nodes whose activation in a social network can trigger an activation cascade that reaches the targeted network coverage, where threshold rules determine the outcome of influence. This problem is NP-hard and it has generated a significant amount of recent research on finding efficient heuristics. We focus on a {it Balance Index} algorithm that relies on three parameters to tune its performance to the given network structure. We propose using a supervised machine-learning approach for such tuning. We select the most influential graph features for the parameter tuning. Then, using random-walk-based graph-sampling, we create small snapshots from the given synthetic and large-scale real-world networks. Using exhaustive search, we find for these snapshots the high accuracy values of BI parameters to use as a ground truth. Then, we train our machine-learning model on the snapshots and apply this model to the real-word network to find the best BI parameters. We apply these parameters to the sampled real-world network to measure the quality of the sets of initiators found this way. We use various real-world networks to validate our approach against other heuristic.
We study the online influence maximization problem in social networks under the independent cascade model. Specifically, we aim to learn the set of best influencers in a social network online while repeatedly interacting with it. We address the challenges of (i) combinatorial action space, since the number of feasible influencer sets grows exponentially with the maximum number of influencers, and (ii) limited feedback, since only the influenced portion of the network is observed. Under a stochastic semi-bandit feedback, we propose and analyze IMLinUCB, a computationally efficient UCB-based algorithm. Our bounds on the cumulative regret are polynomial in all quantities of interest, achieve near-optimal dependence on the number of interactions and reflect the topology of the network and the activation probabilities of its edges, thereby giving insights on the problem complexity. To the best of our knowledge, these are the first such results. Our experiments show that in several representative graph topologies, the regret of IMLinUCB scales as suggested by our upper bounds. IMLinUCB permits linear generalization and thus is both statistically and computationally suitable for large-scale problems. Our experiments also show that IMLinUCB with linear generalization can lead to low regret in real-world online influence maximization.