Optimal $delta$-Correct Best-Arm Selection for General Distributions


الملخص بالإنكليزية

Given a finite set of unknown distributions, or arms, that can be sampled, we consider the problem of identifying the one with the largest mean using a delta-correct algorithm (an adaptive, sequential algorithm that restricts the probability of error to a specified delta) that has minimum sample complexity. Lower bounds for delta-correct algorithms are well known. Delta-correct algorithms that match the lower bound asymptotically as delta reduces to zero have been previously developed when arm distributions are restricted to a single parameter exponential family. In this paper, we first observe a negative result that some restrictions are essential, as otherwise under a delta-correct algorithm, distributions with unbounded support would require an infinite number of samples in expectation. We then propose a delta-correct algorithm that matches the lower bound as delta reduces to zero under the mild restriction that a known bound on the expectation of a non-negative, continuous, increasing convex function (for example, the squared moment) of the underlying random variables, exists. We also propose batch processing and identify near-optimal batch sizes to substantially speed up the proposed algorithm. The best-arm problem has many learning applications, including recommendation systems and product selection. It is also a well studied classic problem in the simulation community.

تحميل البحث