ﻻ يوجد ملخص باللغة العربية
Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite their simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data $(y_i,{boldsymbol x}_i)$, $ile n$ with ${boldsymbol x}_isim {sf N}(0,{boldsymbol Sigma})$ a $p$-dimensional feature vector, and $y_i in{+1,-1}$ a label whose distribution depends on a linear combination of the covariates $langle{boldsymboltheta}_*,{boldsymbol x}_irangle$. We consider the proportional asymptotics $n,ptoinfty$ with $p/nto psi$, and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when $n,p$ are of the order of a few hundreds. We explore several choices for $({boldsymbol theta}_*,{boldsymbol Sigma})$, and show that the resulting generalization curve (test error error as a function of the overparametrization $psi=p/n$) is qualitatively different, depending on this choice. In particular we consider a specific structure of $({boldsymbol theta}_*,{boldsymbolSigma})$ that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we aim at classifying data $(y_i,{boldsymbol x}_i)$ with ${boldsymbol x}_iin{mathbb R}^d$ but we do so by first embedding them a $p$ dimensional feature space via ${boldsymbol x}_imapstosigma({boldsymbol W}{boldsymbol x}_i)$ and then finding a max-margin classifier in this space. We derive exact formulas in the proportional asymptotics $p,n,dtoinfty$ with $p/dtopsi_1$, $n/dtopsi_2$ and observe that the test error is minimized in the highly overparametrized regime $psi_1gg 0$.
Consider the classical supervised learning problem: we are given data $(y_i,{boldsymbol x}_i)$, $ile n$, with $y_i$ a response and ${boldsymbol x}_iin {mathcal X}$ a covariates vector, and try to learn a model $f:{mathcal X}to{mathbb R}$ to predict f
The foundational concept of Max-Margin in machine learning is ill-posed for output spaces with more than two labels such as in structured prediction. In this paper, we show that the Max-Margin loss can only be consistent to the classification task un
This paper explores the information-theoretic limitations of graph property testing in zero-field Ising models. Instead of learning the entire graph structure, sometimes testing a basic graph property such as connectivity, cycle presence or maximum c
We consider the problem of cost sensitive multiclass classification, where we would like to increase the sensitivity of an important class at the expense of a less important one. We adopt an {em apportioned margin} framework to address this problem,
We study high-dimensional linear models with error-in-variables. Such models are motivated by various applications in econometrics, finance and genetics. These models are challenging because of the need to account for measurement errors to avoid non-