On Generalizing the C-Bound to the Multiclass and Multi-label Settings

642 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Emilie Morvant

تاريخ النشر 2015

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Francois Laviolette - Emilie Morvant - Liva Ralaivola

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The C-bound, introduced in Lacasse et al., gives a tight upper bound on the risk of a binary majority vote classifier. In this work, we present a first step towards extending this work to more complex outputs, by providing generalizations of the C-bound to the multiclass and multi-label settings.

قيم البحث

66 - Shuang Qiu , Tingjin Luo , Jieping Ye 2017

We study an extreme scenario in multi-label learning where each training instance is endowed with a single one-bit label out of multiple labels. We formulate this problem as a non-trivial special case of one-bit rank-one matrix sensing and develop an efficient non-convex algorithm based on alternating power iteration. The proposed algorithm is able to recover the underlying low-rank matrix model with linear convergence. For a rank-$k$ model with $d_1$ features and $d_2$ classes, the proposed algorithm achieves $O(epsilon)$ recovery error after retrieving $O(k^{1.5}d_1 d_2/epsilon)$ one-bit labels within $O(kd)$ memory. Our bound is nearly optimal in the order of $O(1/epsilon)$. This significantly improves the state-of-the-art sampling complexity of one-bit multi-label learning. We perform experiments to verify our theory and evaluate the performance of the proposed algorithm.

التعلم الالي التعلم الآلي

Regularizing Model Complexity and Label Structure for Multi-Label Text Classification

72 - Bingyu Wang , Cheng Li , Virgil Pavlu 2017

Multi-label text classification is a popular machine learning task where each document is assigned with multiple relevant labels. This task is challenging due to high dimensional features and correlated labels. Multi-label text classifiers need to be carefully regularized to prevent the severe over-fitting in the high dimensional space, and also need to take into account label dependencies in order to make accurate predictions under uncertainty. We demonstrate significant and practical improvement by carefully regularizing the model complexity during training phase, and also regularizing the label search space during prediction phase. Specifically, we regularize the classifier training using Elastic-net (L1+L2) penalty for reducing model complexity/size, and employ early stopping to prevent overfitting. At prediction time, we apply support inference to restrict the search space to label sets encountered in the training set, and F-optimizer GFM to make optimal predictions for the F1 metric. We show that although support inference only provides density estimations on existing label combinations, when combined with GFM predictor, the algorithm can output unseen label combinations. Taken collectively, our experiments show state of the art results on many benchmark datasets. Beyond performance and practical contributions, we make some interesting observations. Contrary to the prior belief, which deems support inference as purely an approximate inference procedure, we show that support inference acts as a strong regularizer on the label prediction structure. It allows the classifier to take into account label dependencies during prediction even if the classifiers had not modeled any label dependencies during training.

التعلم الالي التعلم الآلي

On the Generalization of the C-Bound to Structured Output Ensemble Methods

447 - Franc{c}ois Laviolette , Emilie Morvant , Liva Ralaivola 2014

This paper generalizes an important result from the PAC-Bayesian literature for binary classification to the case of ensemble methods for structured outputs. We prove a generic version of the Cbound, an upper bound over the risk of models expressed a s a weighted majority vote that is based on the first and second statistical moments of the votes margin. This bound may advantageously $(i)$ be applied on more complex outputs such as multiclass labels and multilabel, and $(ii)$ allow to consider margin relaxations. These results open the way to develop new ensemble methods for structured output prediction with PAC-Bayesian guarantees.

التعلم الالي

Towards Interpretable Deep Extreme Multi-label Learning

74 - Yihuang Kang , I-Ling Cheng , Wenjui Mao 2019

Many Machine Learning algorithms, such as deep neural networks, have long been criticized for being black-boxes-a kind of models unable to provide how it arrive at a decision without further efforts to interpret. This problem has raised concerns on m odel applications trust, safety, nondiscrimination, and other ethical issues. In this paper, we discuss the machine learning interpretability of a real-world application, eXtreme Multi-label Learning (XML), which involves learning models from annotated data with many pre-defined labels. We propose a two-step XML approach that combines deep non-negative autoencoder with other multi-label classifiers to tackle different data applications with a large number of labels. Our experimental result shows that the proposed approach is able to cope with many-label problems as well as to provide interpretable label hierarchies and dependencies that helps us understand how the model recognizes the existences of objects in an image.

التعلم الالي التعلم الآلي تطبيقات الإحصاء

An adaptive multiclass nearest neighbor classifier

136 - Nikita Puchkin , Vladimir Spokoiny 2018

We consider a problem of multiclass classification, where the training sample $S_n = {(X_i, Y_i)}_{i=1}^n$ is generated from the model $mathbb P(Y = m | X = x) = eta_m(x)$, $1 leq m leq M$, and $eta_1(x), dots, eta_M(x)$ are unknown $alpha$-Holder co ntinuous functions.Given a test point $X$, our goal is to predict its label. A widely used $mathsf k$-nearest-neighbors classifier constructs estimates of $eta_1(X), dots, eta_M(X)$ and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter $mathsf k$, which may become tricky in some situations. In our solution, we fix several integers $n_1, dots, n_K$, compute corresponding $n_k$-nearest-neighbor estimates for each $m$ and each $n_k$ and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter $alpha$ and to the margin and establish rates of convergence under mild assumptions.

التعلم الالي التعلم الآلي