Data-dependent Generalization Bounds for Multi-class Classification

215 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yunwen Lei

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Yunwen Lei - Urun Dogan - Ding-Xuan Zhou

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we study data-dependent generalization error bounds exhibiting a mild dependency on the number of classes, making them suitable for multi-class learning with a large number of label classes. The bounds generally hold for empirical multi-class risk minimization algorithms using an arbitrary norm as regularizer. Key to our analysis are new structural results for multi-class Gaussian complexities and empirical $ell_infty$-norm covering numbers, which exploit the Lipschitz continuity of the loss function with respect to the $ell_2$- and $ell_infty$-norm, respectively. We establish data-dependent error bounds in terms of complexities of a linear function class defined on a finite set induced by training examples, for which we show tight lower and upper bounds. We apply the results to several prominent multi-class learning machines, exhibiting a tighter dependency on the number of classes than the state of the art. For instance, for the multi-class SVM by Crammer and Singer (2002), we obtain a data-dependent bound with a logarithmic dependency which significantly improves the previous square-root dependency. Experimental results are reported to verify the effectiveness of our theoretical findings.

قيم البحث

444 - Cenk Baykal , Lucas Liebenwein , Igor Gilitschenski 2018

We present an efficient coresets-based neural network compression algorithm that sparsifies the parameters of a trained fully-connected neural network in a manner that provably approximates the networks output. Our approach is based on an importance sampling scheme that judiciously defines a sampling distribution over the neural network parameters, and as a result, retains parameters of high importance while discarding redundant ones. We leverage a novel, empirical notion of sensitivity and extend traditional coreset constructions to the application of compressing parameters. Our theoretical analysis establishes guarantees on the size and accuracy of the resulting compressed network and gives rise to generalization bounds that may provide new insights into the generalization properties of neural networks. We demonstrate the practical effectiveness of our algorithm on a variety of neural network configurations and real-world data sets.

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Multi-Class Classification from Single-Class Data with Confidences

170 - Yuzhou Cao , Lei Feng , Senlin Shu 2021

Can we learn a multi-class classifier from only data of a single class? We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rig orous consistency guarantee when confidences (i.e., the class-posterior probabilities for all the classes) are available. Specifically, we propose an empirical risk minimization framework that is loss-/model-/optimizer-independent. Instead of constructing a boundary between the given class and other classes, our method can conduct discriminative classification between all the classes even if no data from the other classes are provided. We further theoretically and experimentally show that our method can be Bayes-consistent with a simple modification even if the provided confidences are highly noisy. Then, we provide an extension of our method for the case where data from a subset of all the classes are available. Experimental results demonstrate the effectiveness of our methods.

التعلم الآلي التعلم الالي

Multi-Class Classification from Noisy-Similarity-Labeled Data

100 - Songhua Wu , Xiaobo Xia , Tongliang Liu 2020

A similarity label indicates whether two instances belong to the same class while a class label shows the class of the instance. Without class labels, a multi-class classifier could be learned from similarity-labeled pairwise data by meta classificat ion learning. However, since the similarity label is less informative than the class label, it is more likely to be noisy. Deep neural networks can easily remember noisy data, leading to overfitting in classification. In this paper, we propose a method for learning from only noisy-similarity-labeled data. Specifically, to model the noise, we employ a noise transition matrix to bridge the class-posterior probability between clean and noisy data. We further estimate the transition matrix from only noisy data and build a novel learning system to learn a classifier which can assign noise-free class labels for instances. Moreover, we theoretically justify how our proposed method generalizes for learning classifiers. Experimental results demonstrate the superiority of the proposed method over the state-of-the-art method on benchmark-simulated and real-world noisy-label datasets.

التعلم الآلي التعلم الالي

Adaptive Base Class Boost for Multi-class Classification

267 - Ping Li 2008

We develop the concept of ABC-Boost (Adaptive Base Class Boost) for multi-class classification and present ABC-MART, a concrete implementation of ABC-Boost. The original MART (Multiple Additive Regression Trees) algorithm has been very successful in large-scale applications. For binary classification, ABC-MART recovers MART. For multi-class classification, ABC-MART considerably improves MART, as evaluated on several public data sets.

التعلم الآلي استرجاع المعلومات

Robust Classification under Class-Dependent Domain Shift

93 - Tigran Galstyan , Hrant Khachatrian , Greg Ver Steeg 2020

Investigation of machine learning algorithms robust to changes between the training and test distributions is an active area of research. In this paper we explore a special type of dataset shift which we call class-dependent domain shift. It is chara cterized by the following features: the input data causally depends on the label, the shift in the data is fully explained by a known variable, the variable which controls the shift can depend on the label, there is no shift in the label distribution. We define a simple optimization problem with an information theoretic constraint and attempt to solve it with neural networks. Experiments on a toy dataset demonstrate the proposed method is able to learn robust classifiers which generalize well to unseen domains.

التعلم الآلي التعلم الالي