No Arabic abstract
This paper analyzes the problem of Gaussian process (GP) bandits with deterministic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al, 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, Srinivas et al proved that the regret vanishes at the approximate rate of $O(1/sqrt{t})$, where t is the number of observations. To complement their result, we attack the deterministic case and attain a much faster exponential convergence rate. Under some regularity assumptions, we show that the regret decreases asymptotically according to $O(e^{-frac{tau t}{(ln t)^{d/4}}})$ with high probability. Here, d is the dimension of the search space and tau is a constant that depends on the behaviour of the objective function near its global maximum.
This paper analyses the problem of Gaussian process (GP) bandits with deterministic observations. The analysis uses a branch and bound algorithm that is related to the UCB algorithm of (Srinivas et al., 2010). For GPs with Gaussian observation noise, with variance strictly greater than zero, (Srinivas et al., 2010) proved that the regret vanishes at the approximate rate of $O(frac{1}{sqrt{t}})$, where t is the number of observations. To complement their result, we attack the deterministic case and attain a much faster exponential convergence rate. Under some regularity assumptions, we show that the regret decreases asymptotically according to $O(e^{-frac{tau t}{(ln t)^{d/4}}})$ with high probability. Here, d is the dimension of the search space and $tau$ is a constant that depends on the behaviour of the objective function near its global maximum.
Many applications require a learner to make sequential decisions given uncertainty regarding both the systems payoff function and safety constraints. In safety-critical systems, it is paramount that the learners actions do not violate the safety constraints at any stage of the learning process. In this paper, we study a stochastic bandit optimization problem where the unknown payoff and constraint functions are sampled from Gaussian Processes (GPs) first considered in [Srinivas et al., 2010]. We develop a safe variant of GP-UCB called SGP-UCB, with necessary modifications to respect safety constraints at every round. The algorithm has two distinct phases. The first phase seeks to estimate the set of safe actions in the decision set, while the second phase follows the GP-UCB decision rule. Our main contribution is to derive the first sub-linear regret bounds for this problem. We numerically compare SGP-UCB against existing safe Bayesian GP optimization algorithms.
In this work, we investigate black-box optimization from the perspective of frequentist kernel methods. We propose a novel batch optimization algorithm, which jointly maximizes the acquisition function and select points from a whole batch in a holistic way. Theoretically, we derive regret bounds for both the noise-free and perturbation settings irrespective of the choice of kernel. Moreover, we analyze the property of the adversarial regret that is required by a robust initialization for Bayesian Optimization (BO). We prove that the adversarial regret bounds decrease with the decrease of covering radius, which provides a criterion for generating a point set to minimize the bound. We then propose fast searching algorithms to generate a point set with a small covering radius for the robust initialization. Experimental results on both synthetic benchmark problems and real-world problems show the effectiveness of the proposed algorithms.
We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and find the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.
We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a single parameter exponential family. We work under the much weaker assumption that the moments of order $(1+epsilon)$ are uniformly bounded by a known constant B, for some given $epsilon > 0$. We propose an optimal algorithm that matches the lower bound exactly in the first-order term. We also give a finite-time bound on its regret. We show that our index concentrates faster than the well known truncated or trimmed empirical mean estimators for the mean of heavy-tailed distributions. Computing our index can be computationally demanding. To address this, we develop a batch-based algorithm that is optimal up to a multiplicative constant depending on the batch size. We hence provide a controlled trade-off between statistical optimality and computational cost.