ﻻ يوجد ملخص باللغة العربية
We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits, where a decision maker can only adapt decisions at a batch level. In particular, the decision maker, only observing rewards at the end of each batch, dynamically decides how many individuals to include in the next batch (at the current batchs end) and what personalized action-selection scheme to adopt within the batch. Such batch constraints are ubiquitous in a variety of practical contexts, including personalized product offerings in marketing and medical treatment selection in clinical trials. We characterize the fundamental learning limit in this problem via a novel lower bound analysis and provide a simple, exploration-free algorithm that uses the LASSO estimator, which achieves the minimax optimal performance characterized by the lower bound (up to log factors). To our best knowledge, our work provides the first inroad into a rigorous understanding of dynamic batch learning with high-dimensional covariates. We also demonstrate the efficacy of our algorithm on both synthetic data and the Warfarin medical dosing data. The empirical results show that with three batches (hence only two opportunities to adapt), our algorithm already performs comparably (in terms of statistical performance) to the state-of-the-art fully online high-dimensional linear contextual bandits algorithm. As an added bonus, since our algorithm operates in batches, it is orders of magnitudes faster than fully online learning algorithms. As such, our algorithm provides a desirable candidate for practical data-driven personalized decision making problems, where limited adaptivity is often a hard constraint.
Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising. We derive a novel $Omega(n^{2/3})$ dimension-free minimax regret lower bound for s
We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption. We propose a successive refinement based algorithm called Adaptive Contextual Bandit ({ttfamily ACB}), that works in phases a
Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions. However, textit{simultaneously} proposing a batch of decisions, which leverages available resources for parallel experimentation, has
Stochastic linear contextual bandit algorithms have substantial applications in practice, such as recommender systems, online advertising, clinical trials, etc. Recent works show that optimal bandit algorithms are vulnerable to adversarial attacks an
Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which naturally balanc