ﻻ يوجد ملخص باللغة العربية
We consider the stochastic combinatorial semi-bandit problem with adversarial corruptions. We provide a simple combinatorial algorithm that can achieve a regret of $tilde{O}left(C+d^2K/Delta_{min}right)$ where $C$ is the total amount of corruptions, $d$ is the maximal number of arms one can play in each round, $K$ is the number of arms. If one selects only one arm in each round, we achieves a regret of $tilde{O}left(C+sum_{Delta_i>0}(1/Delta_i)right)$. Our algorithm is combinatorial and improves on the previous combinatorial algorithm by [Gupta et al., COLT2019] (their bound is $tilde{O}left(KC+sum_{Delta_i>0}(1/Delta_i)right)$), and almost matches the best known bounds obtained by [Zimmert et al., ICML2019] and [Zimmert and Seldin, AISTATS2019] (up to logarithmic factor). Note that the algorithms in [Zimmert et al., ICML2019] and [Zimmert and Seldin, AISTATS2019] require one to solve complex convex programs while our algorithm is combinatorial, very easy to implement, requires weaker assumptions and has very low oracle complexity and running time. We also study the setting where we only get access to an approximation oracle for the stochastic combinatorial semi-bandit problem. Our algorithm achieves an (approximation) regret bound of $tilde{O}left(dsqrt{KT}right)$. Our algorithm is very simple, only worse than the best known regret bound by $sqrt{d}$, and has much lower oracle complexity than previous work.
We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks (BwK) and combinatorial semi-bandits. The former concerns limited resources consumed by the algorithm, e.g., limited supply in dynamic pricing. The latter allows a hu
We consider the combinatorial bandits problem, where at each time step, the online learner selects a size-$k$ subset $s$ from the arms set $mathcal{A}$, where $left|mathcal{A}right| = n$, and observes a stochastic reward of each arm in the selected s
In this paper, we study Combinatorial Semi-Bandits (CSB) that is an extension of classic Multi-Armed Bandits (MAB) under Differential Privacy (DP) and stronger Local Differential Privacy (LDP) setting. Since the server receives more information from
The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge. We propose a multi-armed bandit framework for matching incentives to users, whose preferences
A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum