Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Max-Margin is Dead, Long Live Max-Margin!

173 0 0.0 ( 0 )

Download Cite

Added by Alex Nowak-Vila

Publication date 2021

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Alex Nowak-Vila - Alessandro Rudi - Francis Bach

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The foundational concept of Max-Margin in machine learning is ill-posed for output spaces with more than two labels such as in structured prediction. In this paper, we show that the Max-Margin loss can only be consistent to the classification task under highly restrictive assumptions on the discrete loss measuring the error between outputs. These conditions are satisfied by distances defined in tree graphs, for which we prove consistency, thus being the first losses shown to be consistent for Max-Margin beyond the binary setting. We finally address these limitations by correcting the concept of Max-Margin and introducing the Restricted-Max-Margin, where the maximization of the loss-augmented scores is maintained, but performed over a subset of the original domain. The resulting loss is also a generalization of the binary support vector machine and it is consistent under milder conditions on the discrete loss.

rate research

Consistent Structured Prediction with Max-Min Margin Markov Networks

61 - Alex Nowak-Vila , Francis Bach , Alessandro Rudi 2020

Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs. Unfortunately, these methods are statistically inconsistent when the relationship between inputs and labels is far from deterministic. We overcome such limitations by defining the learning problem in terms of a max-min margin formulation, naming the resulting method max-min margin Markov networks ($M^4N$). We prove consistency and finite sample generalization bounds for $M^4N$ and provide an explicit algorithm to compute the estimator. The algorithm achieves a generalization error of $O(1/sqrt{n})$ for a total cost of $O(n)$ projection-oracle calls (which have at most the same cost as the max-oracle from $M^3N$). Experiments on multi-class classification, ordinal regression, sequence prediction and ranking demonstrate the effectiveness of the proposed method.

Machine Learning Machine Learning

Faster Rates for training Max-Margin Markov Networks

326 - Xinhua Zhang Purduen University 2010

Structured output prediction is an important machine learning problem both in theory and practice, and the max-margin Markov network (mcn) is an effective approach. All state-of-the-art algorithms for optimizing mcn objectives take at least $O(1/epsilon)$ number of iterations to find an $epsilon$ accurate solution. Recent results in structured optimization suggest that faster rates are possible by exploiting the structure of the objective function. Towards this end citet{Nesterov05} proposed an excessive gap reduction technique based on Euclidean projections which converges in $O(1/sqrt{epsilon})$ iterations on strongly convex functions. Unfortunately when applied to mcn s, this approach does not admit graphical model factorization which, as in many existing algorithms, is crucial for keeping the cost per iteration tractable. In this paper, we present a new excessive gap reduction technique based on Bregman projections which admits graphical model factorization naturally, and converges in $O(1/sqrt{epsilon})$ iterations. Compared with existing algorithms, the convergence rate of our method has better dependence on $epsilon$ and other parameters of the problem, and can be easily kernelized.

Machine Learning Computational Complexity

Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection

323 - Bohao Li , Boyu Yang , Chang Liu 2021

Few-shot object detection has made substantial progressby representing novel class objects using the feature representation learned upon a set of base class objects. However,an implicit contradiction between novel class classification and representation is unfortunately ignored. On the one hand, to achieve accurate novel class classification, the distributions of either two base classes must be far away fromeach other (max-margin). On the other hand, to precisely represent novel classes, the distributions of base classes should be close to each other to reduce the intra-class distance of novel classes (min-margin). In this paper, we propose a class margin equilibrium (CME) approach, with the aim to optimize both feature space partition and novel class reconstruction in a systematic way. CME first converts the few-shot detection problem to the few-shot classification problem by using a fully connected layer to decouple localization features. CME then reserves adequate margin space for novel classes by introducing simple-yet-effective class margin loss during feature learning. Finally, CME pursues margin equilibrium by disturbing the features of novel class instances in an adversarial min-max fashion. Experiments on Pascal VOC and MS-COCO datasets show that CME significantly improves upon two baseline detectors (up to $3sim 5%$ in average), achieving state-of-the-art performance. Code is available at https://github.com/Bohao-Lee/CME .

Computer Vision and Pattern Recognition

Apportioned Margin Approach for Cost Sensitive Large Margin Classifiers

143 - Lee-Ad Gottlieb , Eran Kaufman , Aryeh Kontorovich 2020

We consider the problem of cost sensitive multiclass classification, where we would like to increase the sensitivity of an important class at the expense of a less important one. We adopt an {em apportioned margin} framework to address this problem, which enables an efficient margin shift between classes that share the same boundary. The decision boundary between all pairs of classes divides the margin between them in accordance to a given prioritization vector, which yields a tighter error bound for the important classes while also reducing the overall out-of-sample error. In addition to demonstrating an efficient implementation of our framework, we derive generalization bounds, demonstrate Fisher consistency, adapt the framework to Mercers kernel and to neural networks, and report promising empirical results on all accounts.

Machine Learning Machine Learning

The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

73 - Andrea Montanari , Feng Ruan , Youngtak Sohn 2019

Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite their simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data $(y_i,{boldsymbol x}_i)$, $ile n$ with ${boldsymbol x}_isim {sf N}(0,{boldsymbol Sigma})$ a $p$-dimensional feature vector, and $y_i in{+1,-1}$ a label whose distribution depends on a linear combination of the covariates $langle{boldsymboltheta}_*,{boldsymbol x}_irangle$. We consider the proportional asymptotics $n,ptoinfty$ with $p/nto psi$, and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when $n,p$ are of the order of a few hundreds. We explore several choices for $({boldsymbol theta}_*,{boldsymbol Sigma})$, and show that the resulting generalization curve (test error error as a function of the overparametrization $psi=p/n$) is qualitatively different, depending on this choice. In particular we consider a specific structure of $({boldsymbol theta}_*,{boldsymbolSigma})$ that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we aim at classifying data $(y_i,{boldsymbol x}_i)$ with ${boldsymbol x}_iin{mathbb R}^d$ but we do so by first embedding them a $p$ dimensional feature space via ${boldsymbol x}_imapstosigma({boldsymbol W}{boldsymbol x}_i)$ and then finding a max-margin classifier in this space. We derive exact formulas in the proportional asymptotics $p,n,dtoinfty$ with $p/dtopsi_1$, $n/dtopsi_2$ and observe that the test error is minimized in the highly overparametrized regime $psi_1gg 0$.

Statistics Theory Machine Learning Statistics Theory

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Max-Margin is Dead, Long Live Max-Margin!

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions