Supervised classification via minimax probabilistic transformations

400 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Santiago Mazuelas

تاريخ النشر 2019

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Santiago Mazuelas - Andrea Zanoni - Aritz Perez

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Conventional techniques for supervised classification constrain the classification rules considered and use surrogate losses for classification 0-1 loss. Favored families of classification rules are those that enjoy parametric representations suitable for surrogate loss minimization, and low complexity properties suitable for overfitting control. This paper presents classification techniques based on robust risk minimization (RRM) that we call linear probabilistic classifiers (LPCs). The proposed techniques consider unconstrained classification rules, optimize the classification 0-1 loss, and provide performance bounds during learning. LPCs enable efficient learning by using linear optimization, and avoid overffiting by using RRM over polyhedral uncertainty sets of distributions. We also provide finite-sample generalization bounds for LPCs and show their competitive performance with state-of-the-art techniques using benchmark datasets.

قيم البحث

484 - Santiago Mazuelas , Aritz Perez 2019

Different types of training data have led to numerous schemes for supervised classification. Current learning techniques are tailored to one specific scheme and cannot handle general ensembles of training data. This paper presents a unifying framewor k for supervised classification with general ensembles of training data, and proposes the learning methodology of generalized robust risk minimization (GRRM). The paper shows how current and novel supervision schemes can be addressed under the proposed framework by representing the relationship between examples at test and training via probabilistic transformations. The results show that GRRM can handle different types of training data in a unified manner, and enable new supervision schemes that aggregate general ensembles of training data.

التعلم الالي التعلم الآلي

Minimax Classification with 0-1 Loss and Performance Guarantees

172 - Santiago Mazuelas , Andrea Zanoni , Aritz Perez 2020

Supervised classification techniques use training samples to find classification rules with small expected 0-1 loss. Conventional methods achieve efficient learning and out-of-sample generalization by minimizing surrogate losses over specific familie s of rules. This paper presents minimax risk classifiers (MRCs) that do not rely on a choice of surrogate loss and family of rules. MRCs achieve efficient learning and out-of-sample generalization by minimizing worst-case expected 0-1 loss w.r.t. uncertainty sets that are defined by linear constraints and include the true underlying distribution. In addition, MRCs learning stage provides performance guarantees as lower and upper tight bounds for expected 0-1 loss. We also present MRCs finite-sample generalization bounds in terms of training size and smallest minimax risk, and show their competitive classification performance w.r.t. state-of-the-art techniques using benchmark datasets.

التعلم الالي التعلم الآلي

A Minimax Approach to Supervised Learning

107 - Farzan Farnia , David Tse 2016

Given a task of predicting $Y$ from $X$, a loss function $L$, and a set of probability distributions $Gamma$ on $(X,Y)$, what is the optimal decision rule minimizing the worst-case expected loss over $Gamma$? In this paper, we address this question b y introducing a generalization of the principle of maximum entropy. Applying this principle to sets of distributions with marginal on $X$ constrained to be the empirical marginal from the data, we develop a general minimax approach for supervised learning problems. While for some loss functions such as squared-error and log loss, the minimax approach rederives well-knwon regression models, for the 0-1 loss it results in a new linear classifier which we call the maximum entropy machine. The maximum entropy machine minimizes the worst-case 0-1 loss over the structured set of distribution, and by our numerical experiments can outperform other well-known linear classifiers such as SVM. We also prove a bound on the generalization worst-case error in the minimax approach.

التعلم الالي نظرية المعلومات التعلم الآلي

Extreme Classification via Adversarial Softmax Approximation

127 - Robert Bamler , Stephan Mandt 2020

Training a classifier over a large number of classes, known as extreme classification, has become a topic of major interest with applications in technology, science, and e-commerce. Traditional softmax regression induces a gradient cost proportional to the number of classes $C$, which often is prohibitively expensive. A popular scalable softmax approximation relies on uniform negative sampling, which suffers from slow convergence due a poor signal-to-noise ratio. In this paper, we propose a simple training method for drastically enhancing the gradient signal by drawing negative samples from an adversarial model that mimics the data distribution. Our contributions are three-fold: (i) an adversarial sampling mechanism that produces negative samples at a cost only logarithmic in $C$, thus still resulting in cheap gradient updates; (ii) a mathematical proof that this adversarial sampling minimizes the gradient variance while any bias due to non-uniform sampling can be removed; (iii) experimental results on large scale data sets that show a reduction of the training time by an order of magnitude relative to several competitive baselines.

التعلم الالي التعلم الآلي

Semi-Supervised Learning via New Deep Network Inversion

73 - Randall Balestriero , Vincent Roger , Herve G. Glotin 2017

We exploit a recently derived inversion scheme for arbitrary deep neural networks to develop a new semi-supervised learning framework that applies to a wide range of systems and problems. The approach outperforms current state-of-the-art methods on M NIST reaching $99.14%$ of test set accuracy while using $5$ labeled examples per class. Experiments with one-dimensional signals highlight the generality of the method. Importantly, our approach is simple, efficient, and requires no change in the deep network architecture.

التعلم الالي التعلم الآلي