Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models


الملخص بالإنكليزية

In this paper, we study sequential testing problems with emph{overlapping} hypotheses. We first focus on the simple problem of assessing if the mean $mu$ of a Gaussian distribution is $geq -epsilon$ or $leq epsilon$; if $muin(-epsilon,epsilon)$, both answers are considered to be correct. Then, we consider PAC-best arm identification in a bandit model: given $K$ probability distributions on $mathbb{R}$ with means $mu_1,dots,mu_K$, we derive the asymptotic complexity of identifying, with risk at most $delta$, an index $Iin{1,dots,K}$ such that $mu_Igeq max_imu_i -epsilon$. We provide non asymptotic bounds on the error of a parallel General Likelihood Ratio Test, which can also be used for more general testing problems. We further propose lower bound on the number of observation needed to identify a correct hypothesis. Those lower bounds rely on information-theoretic arguments, and specifically on t

تحميل البحث