Do you want to publish a course? Click here

Towards Efficient Discrete Integration via Adaptive Quantile Queries

82   0   0.0 ( 0 )
 Added by Fan Ding
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

Discrete integration in a high dimensional space of n variables poses fundamental challenges. The WISH algorithm reduces the intractable discrete integration problem into n optimization queries subject to randomized constraints, obtaining a constant approximation guarantee. The optimization queries are expensive, which limits the applicability of WISH. We propose AdaWISH, which is able to obtain the same guarantee but accesses only a small subset of queries of WISH. For example, when the number of function values is bounded by a constant, AdaWISH issues only O(log n) queries. The key idea is to query adaptively, taking advantage of the shape of the weight function being integrated. In general, we prove that AdaWISH has a regret of only O(log n) relative to an idealistic oracle that issues queries at data-dependent optimal points. Experimentally, AdaWISH gives precise estimates for discrete integration problems, of the same quality as that of WISH and better than several competing approaches, on a variety of probabilistic inference benchmarks. At the same time, it saves substantially on the number of optimization queries compared to WISH. On a suite of UAI inference challenge benchmarks, it saves 81.5% of WISH queries while retaining the quality of results.

rate research

Read More

131 - Seth Neel , Aaron Roth 2018
Data that is gathered adaptively --- via bandit algorithms, for example --- exhibits bias. This is true both when gathering simple numeric valued data --- the empirical means kept track of by stochastic bandit algorithms are biased downwards --- and when gathering more complicated data --- running hypothesis tests on complex data gathered via contextual bandit algorithms leads to false discovery. In this paper, we show that this problem is mitigated if the data collection procedure is differentially private. This lets us both bound the bias of simple numeric valued quantities (like the empirical means of stochastic bandit algorithms), and correct the p-values of hypothesis tests run on the adaptively gathered data. Moreover, there exist differentially private bandit algorithms with near optimal regret bounds: we apply existing theorems in the simple stochastic case, and give a new analysis for linear contextual bandits. We complement our theoretical results with experiments validating our theory.
Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks. Existing transfer approaches either explicitly computes the similarity between tasks or select appropriate source policies to provide guided explorations for the target task. However, how to directly optimize the target policy by alternatively utilizing knowledge from appropriate source policies without explicitly measuring the similarity is currently missing. In this paper, we propose a novel Policy Transfer Framework (PTF) to accelerate RL by taking advantage of this idea. Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it by modeling multi-policy transfer as the option learning problem. PTF can be easily combined with existing deep RL approaches. Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods in terms of learning efficiency and final performance in both discrete and continuous action spaces.
In E-commerce, advertising is essential for merchants to reach their target users. The typical objective is to maximize the advertisers cumulative revenue over a period of time under a budget constraint. In real applications, an advertisement (ad) usually needs to be exposed to the same user multiple times until the user finally contributes revenue (e.g., places an order). However, existing advertising systems mainly focus on the immediate revenue with single ad exposures, ignoring the contribution of each exposure to the final conversion, thus usually falls into suboptimal solutions. In this paper, we formulate the sequential advertising strategy optimization as a dynamic knapsack problem. We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space while ensuring the solution quality. To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach. Extensive offline and online experiments show the superior performance of our approaches over state-of-the-art baselines in terms of cumulative revenue.
Adaptive sequential decision making is one of the central challenges in machine learning and artificial intelligence. In such problems, the goal is to design an interactive policy that plans for an action to take, from a finite set of $n$ actions, given some partial observations. It has been shown that in many applications such as active learning, robotics, sequential experimental design, and active detection, the utility function satisfies adaptive submodularity, a notion that generalizes the notion of diminishing returns to policies. In this paper, we revisit the power of adaptivity in maximizing an adaptive monotone submodular function. We propose an efficient semi adaptive policy that with $O(log n timeslog k)$ adaptive rounds of observations can achieve an almost tight $1-1/e-epsilon$ approximation guarantee with respect to an optimal policy that carries out $k$ actions in a fully sequential manner. To complement our results, we also show that it is impossible to achieve a constant factor approximation with $o(log n)$ adaptive rounds. We also extend our result to the case of adaptive stochastic minimum cost coverage where the goal is to reach a desired utility $Q$ with the cheapest policy. We first prove the conjecture of the celebrated work of Golovin and Krause by showing that the greedy policy achieves the asymptotically tight logarithmic approximation guarantee without resorting to stronger notions of adaptivity. We then propose a semi adaptive policy that provides the same guarantee in polylogarithmic adaptive rounds through a similar information-parallelism scheme. Our results shrink the adaptivity gap in adaptive submodular maximization by an exponential factor.
We prove novel algorithmic guarantees for several online problems in the smoothed analysis model. In this model, at each time an adversary chooses an input distribution with density function bounded above by $tfrac{1}{sigma}$ times that of the uniform distribution; nature then samples an input from this distribution. Crucially, our results hold for {em adaptive} adversaries that can choose an input distribution based on the decisions of the algorithm and the realizations of the inputs in the previous time steps. This paper presents a general technique for proving smoothed algorithmic guarantees against adaptive adversaries, in effect reducing the setting of adaptive adversaries to the simpler case of oblivious adversaries. We apply this technique to prove strong smoothed guarantees for three problems: -Online learning: We consider the online prediction problem, where instances are generated from an adaptive sequence of $sigma$-smooth distributions and the hypothesis class has VC dimension $d$. We bound the regret by $tilde{O}big(sqrt{T dln(1/sigma)} + dsqrt{ln(T/sigma)}big)$. This answers open questions of [RST11,Hag18]. -Online discrepancy minimization: We consider the online Komlos problem, where the input is generated from an adaptive sequence of $sigma$-smooth and isotropic distributions on the $ell_2$ unit ball. We bound the $ell_infty$ norm of the discrepancy vector by $tilde{O}big(ln^2!big( frac{nT}{sigma}big) big)$. -Dispersion in online optimization: We consider online optimization of piecewise Lipschitz functions where functions with $ell$ discontinuities are chosen by a smoothed adaptive adversary and show that the resulting sequence is $big( {sigma}/{sqrt{Tell}}, tilde Obig(sqrt{Tell} big)big)$-dispersed. This matches the parameters of [BDV18] for oblivious adversaries, up to log factors.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا