Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit models

نشر في Aurelien Garivier بتاريخ 2019 والبحث باللغة English تحميل البحث

الملخص بالإنكليزية

In this paper, we study sequential testing problems with emph{overlapping} hypotheses. We first focus on the simple problem of assessing if the mean $mu$ of a Gaussian distribution is $geq -epsilon$ or $leq epsilon$; if $muin(-epsilon,epsilon)$, both answers are considered to be correct. Then, we consider PAC-best arm identification in a bandit model: given $K$ probability distributions on $mathbb{R}$ with means $mu_1,dots,mu_K$, we derive the asymptotic complexity of identifying, with risk at most $delta$, an index $Iin{1,dots,K}$ such that $mu_Igeq max_imu_i -epsilon$. We provide non asymptotic bounds on the error of a parallel General Likelihood Ratio Test, which can also be used for more general testing problems. We further propose lower bound on the number of observation needed to identify a correct hypothesis. Those lower bounds rely on information-theoretic arguments, and specifically on t

تحميل البحث