The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime


Abstract in English

Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite their simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data $(y_i,{boldsymbol x}_i)$, $ile n$ with ${boldsymbol x}_isim {sf N}(0,{boldsymbol Sigma})$ a $p$-dimensional feature vector, and $y_i in{+1,-1}$ a label whose distribution depends on a linear combination of the covariates $langle{boldsymboltheta}_*,{boldsymbol x}_irangle$. We consider the proportional asymptotics $n,ptoinfty$ with $p/nto psi$, and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when $n,p$ are of the order of a few hundreds. We explore several choices for $({boldsymbol theta}_*,{boldsymbol Sigma})$, and show that the resulting generalization curve (test error error as a function of the overparametrization $psi=p/n$) is qualitatively different, depending on this choice. In particular we consider a specific structure of $({boldsymbol theta}_*,{boldsymbolSigma})$ that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we aim at classifying data $(y_i,{boldsymbol x}_i)$ with ${boldsymbol x}_iin{mathbb R}^d$ but we do so by first embedding them a $p$ dimensional feature space via ${boldsymbol x}_imapstosigma({boldsymbol W}{boldsymbol x}_i)$ and then finding a max-margin classifier in this space. We derive exact formulas in the proportional asymptotics $p,n,dtoinfty$ with $p/dtopsi_1$, $n/dtopsi_2$ and observe that the test error is minimized in the highly overparametrized regime $psi_1gg 0$.

Download