No Arabic abstract
Conditional value-at-risk (CVaR) and value-at-risk (VaR) are popular tail-risk measures in finance and insurance industries as well as in highly reliable, safety-critical uncertain environments where often the underlying probability distributions are heavy-tailed. We use the multi-armed bandit best-arm identification framework and consider the problem of identifying the arm from amongst finitely many that has the smallest CVaR, VaR, or weighted sum of CVaR and mean. The latter captures the risk-return trade-off common in finance. Our main contribution is an optimal $delta$-correct algorithm that acts on general arms, including heavy-tailed distributions, and matches the lower bound on the expected number of samples needed, asymptotically (as $delta$ approaches $0$). The algorithm requires solving a non-convex optimization problem in the space of probability measures, that requires delicate analysis. En-route, we develop new non-asymptotic empirical likelihood-based concentration inequalities for tail-risk measures which are tighter than those for popular truncation-based empirical estimators.
In the classical best arm identification (Best-$1$-Arm) problem, we are given $n$ stochastic bandit arms, each associated with a reward distribution with an unknown mean. We would like to identify the arm with the largest mean with probability at least $1-delta$, using as few samples as possible. Understanding the sample complexity of Best-$1$-Arm has attracted significant attention since the last decade. However, the exact sample complexity of the problem is still unknown. Recently, Chen and Li made the gap-entropy conjecture concerning the instance sample complexity of Best-$1$-Arm. Given an instance $I$, let $mu_{[i]}$ be the $i$th largest mean and $Delta_{[i]}=mu_{[1]}-mu_{[i]}$ be the corresponding gap. $H(I)=sum_{i=2}^nDelta_{[i]}^{-2}$ is the complexity of the instance. The gap-entropy conjecture states that $Omegaleft(H(I)cdotleft(lndelta^{-1}+mathsf{Ent}(I)right)right)$ is an instance lower bound, where $mathsf{Ent}(I)$ is an entropy-like term determined by the gaps, and there is a $delta$-correct algorithm for Best-$1$-Arm with sample complexity $Oleft(H(I)cdotleft(lndelta^{-1}+mathsf{Ent}(I)right)+Delta_{[2]}^{-2}lnlnDelta_{[2]}^{-1}right)$. If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best-$1$-Arm. We make significant progress towards the resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires [Oleft(H(I)cdotleft(lndelta^{-1} +mathsf{Ent}(I)right)+Delta_{[2]}^{-2}lnlnDelta_{[2]}^{-1}mathrm{polylog}(n,delta^{-1})right)] samples in expectation. For the lower bound, we show that for any Gaussian Best-$1$-Arm instance with gaps of the form $2^{-k}$, any $delta$-correct monotone algorithm requires $Omegaleft(H(I)cdotleft(lndelta^{-1} + mathsf{Ent}(I)right)right)$ samples in expectation.
We study the problem of best arm identification in linear bandits in the fixed-budget setting. By leveraging properties of the G-optimal design and incorporating it into the arm allocation rule, we design a parameter-free algorithm, Optimal Design-based Linear Best Arm Identification (OD-LinBAI). We provide a theoretical analysis of the failure probability of OD-LinBAI. While the performances of existing methods (e.g., BayesGap) depend on all the optimality gaps, OD-LinBAI depends on the gaps of the top $d$ arms, where $d$ is the effective dimension of the linear bandit instance. Furthermore, we present a minimax lower bound for this problem. The upper and lower bounds show that OD-LinBAI is minimax optimal up to multiplicative factors in the exponent. Finally, numerical experiments corroborate our theoretical findings.
We give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis.
We study $(epsilon, delta)$-PAC best arm identification, where a decision-maker must identify an $epsilon$-optimal arm with probability at least $1 - delta$, while minimizing the number of arm pulls (samples). Most of the work on this topic is in the sequential setting, where there is no constraint on the time taken to identify such an arm; this allows the decision-maker to pull one arm at a time. In this work, the decision-maker is given a deadline of $T$ rounds, where, on each round, it can adaptively choose which arms to pull and how many times to pull them; this distinguishes the number of decisions made (i.e., time or number of rounds) from the number of samples acquired (cost). Such situations occur in clinical trials, where one may need to identify a promising treatment under a deadline while minimizing the number of test subjects, or in simulation-based studies run on the cloud, where we can elastically scale up or down the number of virtual machines to conduct as many experiments as we wish, but need to pay for the resource-time used. As the decision-maker can only make $T$ decisions, she may need to pull some arms excessively relative to a sequential algorithm in order to perform well on all possible problems. We formalize this added difficulty with two hardness results that indicate that unlike sequential settings, the ability to adapt to the problem difficulty is constrained by the finite deadline. We propose Elastic Batch Racing (EBR), a novel algorithm for this setting and bound its sample complexity, showing that EBR is optimal with respect to both hardness results. We present simulations evaluating EBR in this setting, where it outperforms baselines by several orders of magnitude.
Given a finite set of unknown distributions, or arms, that can be sampled, we consider the problem of identifying the one with the largest mean using a delta-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified delta) that has minimum sample complexity. Lower bounds for delta-correct algorithms are well known. Delta-correct algorithms that match the lower bound asymptotically as delta reduces to zero have been previously developed when arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential, as otherwise under a delta-correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a delta-correct algorithm that matches the lower bound as delta reduces to zero under the mild restriction that a known bound on the expectation of a non-negative, continuous, increasing convex function (for example, the squared moment) of the underlying random variables, exists. We also propose batch processing and identify near-optimal batch sizes to substantially speed up the proposed algorithm. The best-arm problem has many learning applications, including recommendation systems and product selection. It is also a well studied classic problem in the simulation community.