Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits


الملخص بالإنكليزية

We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits, where a decision maker can only adapt decisions at a batch level. In particular, the decision maker, only observing rewards at the end of each batch, dynamically decides how many individuals to include in the next batch (at the current batchs end) and what personalized action-selection scheme to adopt within the batch. Such batch constraints are ubiquitous in a variety of practical contexts, including personalized product offerings in marketing and medical treatment selection in clinical trials. We characterize the fundamental learning limit in this problem via a novel lower bound analysis and provide a simple, exploration-free algorithm that uses the LASSO estimator, which achieves the minimax optimal performance characterized by the lower bound (up to log factors). To our best knowledge, our work provides the first inroad into a rigorous understanding of dynamic batch learning with high-dimensional covariates. We also demonstrate the efficacy of our algorithm on both synthetic data and the Warfarin medical dosing data. The empirical results show that with three batches (hence only two opportunities to adapt), our algorithm already performs comparably (in terms of statistical performance) to the state-of-the-art fully online high-dimensional linear contextual bandits algorithm. As an added bonus, since our algorithm operates in batches, it is orders of magnitudes faster than fully online learning algorithms. As such, our algorithm provides a desirable candidate for practical data-driven personalized decision making problems, where limited adaptivity is often a hard constraint.

تحميل البحث