ترغب بنشر مسار تعليمي؟ اضغط هنا

Multiclass Classification using dilute bandit feedback

80   0   0.0 ( 0 )
 نشر من قبل Gaurav Batra
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper introduces a new online learning framework for multiclass classification called learning with diluted bandit feedback. At every time step, the algorithm predicts a candidate label set instead of a single label for the observed example. It then receives feedback from the environment whether the actual label lies in this candidate label set or not. This feedback is called diluted bandit feedback. Learning in this setting is even more challenging than the bandit feedback setting, as there is more uncertainty in the supervision. We propose an algorithm for multiclass classification using dilute bandit feedback (MC-DBF), which uses the exploration-exploitation strategy to predict the candidate set in each trial. We show that the proposed algorithm achieves O(T^{1-frac{1}{m+2}}) mistake bound if candidate label set size (in each step) is m. We demonstrate the effectiveness of the proposed approach with extensive simulations.

قيم البحث

اقرأ أيضاً

This paper addresses the problem of multiclass classification with corrupted or noisy bandit feedback. In this setting, the learner may not receive true feedback. Instead, it receives feedback that has been flipped with some non-zero probability. We propose a novel approach to deal with noisy bandit feedback based on the unbiased estimator technique. We further offer a method that can efficiently estimate the noise rates, thus providing an end-to-end framework. The proposed algorithm enjoys a mistake bound of the order of $O(sqrt{T})$ in the high noise case and of the order of $O(T^{ icefrac{2}{3}})$ in the worst case. We show our approachs effectiveness using extensive experiments on several benchmark datasets.
We consider the online multiclass linear classification under the bandit feedback setting. Beygelzimer, P{a}l, Sz{o}r{e}nyi, Thiruvenkatachari, Wei, and Zhang [ICML19] considered two notions of linear separability, weak and strong linear separability . When examples are strongly linearly separable with margin $gamma$, they presented an algorithm based on Multiclass Perceptron with mistake bound $O(K/gamma^2)$, where $K$ is the number of classes. They employed rational kernel to deal with examples under the weakly linearly separable condition, and obtained the mistake bound of $min(Kcdot 2^{tilde{O}(Klog^2(1/gamma))},Kcdot 2^{tilde{O}(sqrt{1/gamma}log K)})$. In this paper, we refine the notion of weak linear separability to support the notion of class grouping, called group weak linear separable condition. This situation may arise from the fact that class structures contain inherent grouping. We show that under this condition, we can also use the rational kernel and obtain the mistake bound of $Kcdot 2^{tilde{O}(sqrt{1/gamma}log L)})$, where $Lleq K$ represents the number of groups.
In this paper, we propose online algorithms for multiclass classification using partial labels. We propose two variants of Perceptron called Avg Perceptron and Max Perceptron to deal with the partial labeled data. We also propose Avg Pegasos and Max Pegasos, which are extensions of Pegasos algorithm. We also provide mistake bounds for Avg Perceptron and regret bound for Avg Pegasos. We show the effectiveness of the proposed approaches by experimenting on various datasets and comparing them with the standard Perceptron and Pegasos.
We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a known or unk nown system, we give an efficient sublinear regret algorithm. The main algorithmic difficulty is the dependence of the loss on past controls. To overcome this issue, we propose an efficient algorithm for the general setting of bandit convex optimization for loss functions with memory, which may be of independent interest.
In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol. We formulate this problem as the $epsilon$-multi-player multi-armed bandit problem, in wh ich a set of players concurrently interact with a set of arms, and for each arm, the reward distributions for all players are similar but not necessarily identical. We develop an upper confidence bound-based algorithm, RobustAgg$(epsilon)$, that adaptively aggregates rewards collected by different players. In the setting where an upper bound on the pairwise similarities of reward distributions between players is known, we achieve instance-dependent regret guarantees that depend on the amenability of information sharing across players. We complement these upper bounds with nearly matching lower bounds. In the setting where pairwise similarities are unknown, we provide a lower bound, as well as an algorithm that trades off minimax regret guarantees for adaptivity to unknown similarity structure.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا