ﻻ يوجد ملخص باللغة العربية
This paper introduces a new online learning framework for multiclass classification called learning with diluted bandit feedback. At every time step, the algorithm predicts a candidate label set instead of a single label for the observed example. It then receives feedback from the environment whether the actual label lies in this candidate label set or not. This feedback is called diluted bandit feedback. Learning in this setting is even more challenging than the bandit feedback setting, as there is more uncertainty in the supervision. We propose an algorithm for multiclass classification using dilute bandit feedback (MC-DBF), which uses the exploration-exploitation strategy to predict the candidate set in each trial. We show that the proposed algorithm achieves O(T^{1-frac{1}{m+2}}) mistake bound if candidate label set size (in each step) is m. We demonstrate the effectiveness of the proposed approach with extensive simulations.
This paper addresses the problem of multiclass classification with corrupted or noisy bandit feedback. In this setting, the learner may not receive true feedback. Instead, it receives feedback that has been flipped with some non-zero probability. We
We consider the online multiclass linear classification under the bandit feedback setting. Beygelzimer, P{a}l, Sz{o}r{e}nyi, Thiruvenkatachari, Wei, and Zhang [ICML19] considered two notions of linear separability, weak and strong linear separability
In this paper, we propose online algorithms for multiclass classification using partial labels. We propose two variants of Perceptron called Avg Perceptron and Max Perceptron to deal with the partial labeled data. We also propose Avg Pegasos and Max
We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a known or unk
In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the $epsilon$-multi-player multi-armed bandit problem, in wh