No Arabic abstract
In the Best-$k$-Arm problem, we are given $n$ stochastic bandit arms, each associated with an unknown reward distribution. We are required to identify the $k$ arms with the largest means by taking as few samples as possible. In this paper, we make progress towards a complete characterization of the instance-wise sample complexity bounds for the Best-$k$-Arm problem. On the lower bound side, we obtain a novel complexity term to measure the sample complexity that every Best-$k$-Arm instance requires. This is derived by an interesting and nontrivial reduction from the Best-$1$-Arm problem. We also provide an elimination-based algorithm that matches the instance-wise lower bound within doubly-logarithmic factors. The sample complexity of our algorithm strictly dominates the state-of-the-art for Best-$k$-Arm (module constant factors).
In the classical best arm identification (Best-$1$-Arm) problem, we are given $n$ stochastic bandit arms, each associated with a reward distribution with an unknown mean. We would like to identify the arm with the largest mean with probability at least $1-delta$, using as few samples as possible. Understanding the sample complexity of Best-$1$-Arm has attracted significant attention since the last decade. However, the exact sample complexity of the problem is still unknown. Recently, Chen and Li made the gap-entropy conjecture concerning the instance sample complexity of Best-$1$-Arm. Given an instance $I$, let $mu_{[i]}$ be the $i$th largest mean and $Delta_{[i]}=mu_{[1]}-mu_{[i]}$ be the corresponding gap. $H(I)=sum_{i=2}^nDelta_{[i]}^{-2}$ is the complexity of the instance. The gap-entropy conjecture states that $Omegaleft(H(I)cdotleft(lndelta^{-1}+mathsf{Ent}(I)right)right)$ is an instance lower bound, where $mathsf{Ent}(I)$ is an entropy-like term determined by the gaps, and there is a $delta$-correct algorithm for Best-$1$-Arm with sample complexity $Oleft(H(I)cdotleft(lndelta^{-1}+mathsf{Ent}(I)right)+Delta_{[2]}^{-2}lnlnDelta_{[2]}^{-1}right)$. If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best-$1$-Arm. We make significant progress towards the resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires [Oleft(H(I)cdotleft(lndelta^{-1} +mathsf{Ent}(I)right)+Delta_{[2]}^{-2}lnlnDelta_{[2]}^{-1}mathrm{polylog}(n,delta^{-1})right)] samples in expectation. For the lower bound, we show that for any Gaussian Best-$1$-Arm instance with gaps of the form $2^{-k}$, any $delta$-correct monotone algorithm requires $Omegaleft(H(I)cdotleft(lndelta^{-1} + mathsf{Ent}(I)right)right)$ samples in expectation.
We study the combinatorial pure exploration problem Best-Set in stochastic multi-armed bandits. In a Best-Set instance, we are given $n$ arms with unknown reward distributions, as well as a family $mathcal{F}$ of feasible subsets over the arms. Our goal is to identify the feasible subset in $mathcal{F}$ with the maximum total mean using as few samples as possible. The problem generalizes the classical best arm identification problem and the top-$k$ arm identification problem, both of which have attracted significant attention in recent years. We provide a novel instance-wise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of $ln|mathcal{F}|$. For an important class of combinatorial families, we also provide polynomial time implementation of the sampling algorithm, using the equivalence of separation and optimization for convex program, and approximate Pareto curves in multi-objective optimization. We also show that the $ln|mathcal{F}|$ factor is inevitable in general through a nontrivial lower bound construction. Our results significantly improve several previous results for several important combinatorial constraints, and provide a tighter understanding of the general Best-Set problem. We further introduce an even more general problem, formulated in geometric terms. We are given $n$ Gaussian arms with unknown means and unit variance. Consider the $n$-dimensional Euclidean space $mathbb{R}^n$, and a collection $mathcal{O}$ of disjoint subsets. Our goal is to determine the subset in $mathcal{O}$ that contains the $n$-dimensional vector of the means. The problem generalizes most pure exploration bandit problems studied in the literature. We provide the first nearly optimal sample complexity upper and lower bounds for the problem.
Given a finite set of unknown distributions, or arms, that can be sampled, we consider the problem of identifying the one with the largest mean using a delta-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified delta) that has minimum sample complexity. Lower bounds for delta-correct algorithms are well known. Delta-correct algorithms that match the lower bound asymptotically as delta reduces to zero have been previously developed when arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential, as otherwise under a delta-correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a delta-correct algorithm that matches the lower bound as delta reduces to zero under the mild restriction that a known bound on the expectation of a non-negative, continuous, increasing convex function (for example, the squared moment) of the underlying random variables, exists. We also propose batch processing and identify near-optimal batch sizes to substantially speed up the proposed algorithm. The best-arm problem has many learning applications, including recommendation systems and product selection. It is also a well studied classic problem in the simulation community.
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that simultaneously proves both the optimal problem-dependent bound of $(1+epsilon)sum_i frac{ln T}{Delta_i}+O(frac{N}{epsilon^2})$ and the first near-optimal problem-independent bound of $O(sqrt{NTln T})$ on the expected regret of this algorithm. Our near-optimal problem-independent bound solves a COLT 2012 open problem of Chapelle and Li. The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [ALT 2012]. Our novel martingale-based analysis techniques are conceptually simple, easily extend to distributions other than the Beta distribution, and also extend to the more general contextual bandits setting [Manuscript, Agrawal and Goyal, 2012].
We prove a emph{query complexity} lower bound for approximating the top $r$ dimensional eigenspace of a matrix. We consider an oracle model where, given a symmetric matrix $mathbf{M} in mathbb{R}^{d times d}$, an algorithm $mathsf{Alg}$ is allowed to make $mathsf{T}$ exact queries of the form $mathsf{w}^{(i)} = mathbf{M} mathsf{v}^{(i)}$ for $i$ in ${1,...,mathsf{T}}$, where $mathsf{v}^{(i)}$ is drawn from a distribution which depends arbitrarily on the past queries and measurements ${mathsf{v}^{(j)},mathsf{w}^{(i)}}_{1 le j le i-1}$. We show that for every $mathtt{gap} in (0,1/2]$, there exists a distribution over matrices $mathbf{M}$ for which 1) $mathrm{gap}_r(mathbf{M}) = Omega(mathtt{gap})$ (where $mathrm{gap}_r(mathbf{M})$ is the normalized gap between the $r$ and $r+1$-st largest-magnitude eigenvector of $mathbf{M}$), and 2) any algorithm $mathsf{Alg}$ which takes fewer than $mathrm{const} times frac{r log d}{sqrt{mathtt{gap}}}$ queries fails (with overwhelming probability) to identity a matrix $widehat{mathsf{V}} in mathbb{R}^{d times r}$ with orthonormal columns for which $langle widehat{mathsf{V}}, mathbf{M} widehat{mathsf{V}}rangle ge (1 - mathrm{const} times mathtt{gap})sum_{i=1}^r lambda_i(mathbf{M})$. Our bound requires only that $d$ is a small polynomial in $1/mathtt{gap}$ and $r$, and matches the upper bounds of Musco and Musco 15. Moreover, it establishes a strict separation between convex optimization and emph{randomized}, strict-saddle non-convex optimization of which PCA is a canonical example: in the former, first-order methods can have dimension-free iteration complexity, whereas in PCA, the iteration complexity of gradient-based methods must necessarily grow with the dimension.