ﻻ يوجد ملخص باللغة العربية
In this paper, we study Combinatorial Semi-Bandits (CSB) that is an extension of classic Multi-Armed Bandits (MAB) under Differential Privacy (DP) and stronger Local Differential Privacy (LDP) setting. Since the server receives more information from users in CSB, it usually causes additional dependence on the dimension of data, which is a notorious side-effect for privacy preserving learning. However for CSB under two common smoothness assumptions cite{kveton2015tight,chen2016combinatorial}, we show it is possible to remove this side-effect. In detail, for $B_{infty}$-bounded smooth CSB under either $varepsilon$-LDP or $varepsilon$-DP, we prove the optimal regret bound is $Theta(frac{mB^2_{infty}ln T } {Deltaepsilon^2})$ or $tilde{Theta}(frac{mB^2_{infty}ln T} { Deltaepsilon})$ respectively, where $T$ is time period, $Delta$ is the gap of rewards and $m$ is the number of base arms, by proposing novel algorithms and matching lower bounds. For $B_1$-bounded smooth CSB under $varepsilon$-DP, we also prove the optimal regret bound is $tilde{Theta}(frac{mKB^2_1ln T} {Deltaepsilon})$ with both upper bound and lower bound, where $K$ is the maximum number of feedback in each round. All above results nearly match corresponding non-private optimal rates, which imply there is no additional price for (locally) differentially private CSB in above common settings.
We study locally differentially private (LDP) bandits learning in this paper. First, we propose simple black-box reduction frameworks that can solve a large family of context-free bandits learning problems with LDP guarantee. Based on our frameworks,
In this paper we study the problem of stochastic multi-armed bandits (MAB) in the (local) differential privacy (DP/LDP) model. Unlike the previous results which need to assume bounded reward distributions, here we mainly focus on the case the reward
We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks (BwK) and combinatorial semi-bandits. The former concerns limited resources consumed by the algorithm, e.g., limited supply in dynamic pricing. The latter allows a hu
We give an $(varepsilon,delta)$-differentially private algorithm for the multi-armed bandit (MAB) problem in the shuffle model with a distribution-dependent regret of $Oleft(left(sum_{ain [k]:Delta_a>0}frac{log T}{Delta_a}right)+frac{ksqrt{logfrac{1}
Interpretable predictions, where it is clear why a machine learning model has made a particular decision, can compromise privacy by revealing the characteristics of individual data points. This raises the central question addressed in this paper: Can