Multiclass Classification using dilute bandit feedback

80 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Gaurav Batra

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Gaurav Batra - Naresh Manwani

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper introduces a new online learning framework for multiclass classification called learning with diluted bandit feedback. At every time step, the algorithm predicts a candidate label set instead of a single label for the observed example. It then receives feedback from the environment whether the actual label lies in this candidate label set or not. This feedback is called diluted bandit feedback. Learning in this setting is even more challenging than the bandit feedback setting, as there is more uncertainty in the supervision. We propose an algorithm for multiclass classification using dilute bandit feedback (MC-DBF), which uses the exploration-exploitation strategy to predict the candidate set in each trial. We show that the proposed algorithm achieves O(T^{1-frac{1}{m+2}}) mistake bound if candidate label set size (in each step) is m. We demonstrate the effectiveness of the proposed approach with extensive simulations.

قيم البحث

164 - Mudit Agarwal , Naresh Manwani 2020

This paper addresses the problem of multiclass classification with corrupted or noisy bandit feedback. In this setting, the learner may not receive true feedback. Instead, it receives feedback that has been flipped with some non-zero probability. We propose a novel approach to deal with noisy bandit feedback based on the unbiased estimator technique. We further offer a method that can efficiently estimate the noise rates, thus providing an end-to-end framework. The proposed algorithm enjoys a mistake bound of the order of $O(sqrt{T})$ in the high noise case and of the order of $O(T^{ icefrac{2}{3}})$ in the worst case. We show our approachs effectiveness using extensive experiments on several benchmark datasets.

التعلم الآلي التعلم الالي

Bandit Multiclass Linear Classification for the Group Linear Separable Case

112 - Jittat Fakcharoenphol , Chayutpong Prompak 2019

We consider the online multiclass linear classification under the bandit feedback setting. Beygelzimer, P{a}l, Sz{o}r{e}nyi, Thiruvenkatachari, Wei, and Zhang [ICML19] considered two notions of linear separability, weak and strong linear separability . When examples are strongly linearly separable with margin $gamma$, they presented an algorithm based on Multiclass Perceptron with mistake bound $O(K/gamma^2)$, where $K$ is the number of classes. They employed rational kernel to deal with examples under the weakly linearly separable condition, and obtained the mistake bound of $min(Kcdot 2^{tilde{O}(Klog^2(1/gamma))},Kcdot 2^{tilde{O}(sqrt{1/gamma}log K)})$. In this paper, we refine the notion of weak linear separability to support the notion of class grouping, called group weak linear separable condition. This situation may arise from the fact that class structures contain inherent grouping. We show that under this condition, we can also use the rational kernel and obtain the mistake bound of $Kcdot 2^{tilde{O}(sqrt{1/gamma}log L)})$, where $Lleq K$ represents the number of groups.

التعلم الآلي التعلم الالي

Online Algorithms for Multiclass Classification using Partial Labels

150 - Rajarshi Bhattacharjee , Naresh Manwani 2019

In this paper, we propose online algorithms for multiclass classification using partial labels. We propose two variants of Perceptron called Avg Perceptron and Max Perceptron to deal with the partial labeled data. We also propose Avg Pegasos and Max Pegasos, which are extensions of Pegasos algorithm. We also provide mistake bounds for Avg Perceptron and regret bound for Avg Pegasos. We show the effectiveness of the proposed approaches by experimenting on various datasets and comparing them with the standard Perceptron and Pegasos.

التعلم الآلي التعلم الالي

Non-Stochastic Control with Bandit Feedback

95 - Paula Gradu , John Hallman , Elad Hazan 2020

We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a known or unk nown system, we give an efficient sublinear regret algorithm. The main algorithmic difficulty is the dependence of the loss on past controls. To overcome this issue, we propose an efficient algorithm for the general setting of bandit convex optimization for loss functions with memory, which may be of independent interest.

التعلم الآلي التحسين والتحكم التعلم الالي

Multitask Bandit Learning Through Heterogeneous Feedback Aggregation

92 - Zhi Wang , Chicheng Zhang , Manish Kumar Singh 2020

In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the $epsilon$-multi-player multi-armed bandit problem, in wh ich a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily identical. We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players. In the setting where an upper bound on the pairwise similarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information sharing across players. We complement these upper bounds with nearly matching lower bounds. In the setting where pairwise similarities are unknown, we provide a lower bound, as well as an algorithm that trades off minimax regret guarantees for adaptivity to unknown similarity structure.

التعلم الآلي التعلم الالي