ﻻ يوجد ملخص باللغة العربية
We consider a multi-armed bandit problem motivated by situations where only the extreme values, as opposed to expected values in the classical bandit setting, are of interest. We propose distribution free algorithms using robust statistics and characterize the statistical properties. We show that the provided algorithms achieve vanishing extremal regret under weaker conditions than existing algorithms. Performance of the algorithms is demonstrated for the finite-sample setting using numerical experiments. The results show superior performance of the proposed algorithms compared to the well known algorithms.
Stochastic linear contextual bandit algorithms have substantial applications in practice, such as recommender systems, online advertising, clinical trials, etc. Recent works show that optimal bandit algorithms are vulnerable to adversarial attacks an
We propose a generalized formulation of the Huber loss. We show that with a suitable function of choice, specifically the log-exp transform; we can achieve a loss function which combines the desirable properties of both the absolute and the quadratic
The Multi-Armed Bandits (MAB) framework highlights the tension between acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation). In the classical MAB problem, a decision maker must choose an arm at each time step, upon
Stochastic Lipschitz bandit algorithms balance exploration and exploitation, and have been used for a variety of important task domains. In this paper, we present a framework for Lipschitz bandit methods that adaptively learns partitions of context-
Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-pl