Combinatorial Pure Exploration with Bottleneck Reward Function and its Extension to General Reward Functions


Abstract in English

In this paper, we study the Combinatorial Pure Exploration problem with the bottleneck reward function (CPE-B) under the fixed-confidence and fixed-budget settings. In CPE-B, given a set of base arms and a collection of subsets of base arms (super arms) following certain combinatorial constraint, a learner sequentially plays (samples) a base arm and observes its random outcome, with the objective of finding the optimal super arm that maximizes its bottleneck value, defined as the minimum expected value among the base arms contained in the super arm. CPE-B captures a variety of practical scenarios such as network routing in communication networks, but it cannot be solved by the existing CPE algorithms since most of them assumed linear reward functions. For CPE-B, we present both fixed-confidence and fixed-budget algorithms, and provide the sample complexity lower bound for the fixed-confidence setting, which implies that our algorithms match the lower bound (within a logarithmic factor) for a broad family of instances. In addition, we extend CPE-B to general reward functions (CPE-G) and propose the first fixed-confidence algorithm for general non-linear reward functions with non-trivial sample complexity. Our experimental results on the top-$k$, path and matching instances demonstrate the empirical superiority of our proposed algorithms over the baselines.

Download