ﻻ يوجد ملخص باللغة العربية
In this paper we introduce the transductive linear bandit problem: given a set of measurement vectors $mathcal{X}subset mathbb{R}^d$, a set of items $mathcal{Z}subset mathbb{R}^d$, a fixed confidence $delta$, and an unknown vector $theta^{ast}in mathbb{R}^d$, the goal is to infer $text{argmax}_{zin mathcal{Z}} z^toptheta^ast$ with probability $1-delta$ by making as few sequentially chosen noisy measurements of the form $x^toptheta^{ast}$ as possible. When $mathcal{X}=mathcal{Z}$, this setting generalizes linear bandits, and when $mathcal{X}$ is the standard basis vectors and $mathcal{Z}subset {0,1}^d$, combinatorial bandits. Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $mathcal{X}$ a user is queried about may be restricted to well known best-sellers even though the goal might be to recommend more esoteric titles $mathcal{Z}$. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we provide the first non-asymptotic algorithm for linear bandits that nearly achieves the information theoretic lower bound.
We introduce Deep Adaptive Design (DAD), a method for amortizing the cost of adaptive Bayesian experimental design that allows experiments to be run in real-time. Traditional sequential Bayesian optimal experimental design approaches require substant
In this work, we study sequential choice bandits with feedback. We propose bandit algorithms for a platform that personalizes users experience to maximize its rewards. For each action directed to a given user, the platform is given a positive reward,
Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions. However, textit{simultaneously} proposing a batch of decisions, which leverages available resources for parallel experimentation, has
Stochastic sparse linear bandits offer a practical model for high-dimensional online decision-making problems and have a rich information-regret structure. In this work we explore the use of information-directed sampling (IDS), which naturally balanc
Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising. We derive a novel $Omega(n^{2/3})$ dimension-free minimax regret lower bound for s