ﻻ يوجد ملخص باللغة العربية
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We wish to learn a hidden $d$-dimensional value $w^*$. Every round, we are presented with a subset $mathcal{X}_t subseteq mathbb{R}^d$ of possible actions. If we choose (i.e. recommend to the user) action $x_t$, we obtain utility $langle x_t, w^* rangle$ but only learn the identity of the best action $argmax_{x in mathcal{X}_t} langle x, w^* rangle$. We design algorithms for this problem which achieve regret $O(dlog T)$ and $exp(O(d log d))$. To accomplish this, we design novel cutting-plane algorithms with low regret -- the total distance between the true point $w^*$ and the hyperplanes the separation oracle returns. We also consider the variant where we are allowed to provide a list of several recommendations. In this variant, we give an algorithm with $O(d^2 log d)$ regret and list size $mathrm{poly}(d)$. Finally, we construct nearly tight algorithms for a weaker variant of this problem where the learner only learns the identity of an action that is better than the recommendation. Our results rely on new algorithmic techniques in convex geometry (including a variant of Steiners formula for the centroid of a convex set) which may be of independent interest.
Decision making is a challenging task in online recommender systems. The decision maker often needs to choose a contextual item at each step from a set of candidates. Contextual bandit algorithms have been successfully deployed to such applications,
We introduce the online stochastic Convex Programming (CP) problem, a very general version of stochastic online problems which allows arbitrary concave objectives and convex feasibility constraints. Many well-studied problems like online stochastic p
We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that handles constrained resou
A dominant approach to solving large imperfect-information games is Counterfactural Regret Minimization (CFR). In CFR, many regret minimization problems are combined to solve the game. For very large games, abstraction is typically needed to render C