The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime

نشر في Andrea Montanari بتاريخ 2019 والبحث باللغة English تحميل البحث

الملخص بالإنكليزية

Modern machine learning models are often so complex that they achieve vanishing classification error on the training set. Max-margin linear classifiers are among the simplest classification methods that have zero training error (with linearly separable data). Despite their simplicity, their high-dimensional behavior is not yet completely understood. We assume to be given i.i.d. data $(y_i,{boldsymbol x}_i)$, $ile n$ with ${boldsymbol x}_isim {sf N}(0,{boldsymbol Sigma})$ a $p$-dimensional feature vector, and $y_i in{+1,-1}$ a label whose distribution depends on a linear combination of the covariates $langle{boldsymboltheta}_*,{boldsymbol x}_irangle$. We consider the proportional asymptotics $n,ptoinfty$ with $p/nto psi$, and derive exact expressions for the limiting prediction error. Our asymptotic results match simulations already when $n,p$ are of the order of a few hundreds. We explore several choices for $({boldsymbol theta}_*,{boldsymbol Sigma})$, and show that the resulting generalization curve (test error error as a function of the overparametrization $psi=p/n$) is qualitatively different, depending on this choice. In particular we consider a specific structure of $({boldsymbol theta}_*,{boldsymbolSigma})$ that captures the behavior of nonlinear random feature models or, equivalently, two-layers neural networks with random first layer weights. In this case, we aim at classifying data $(y_i,{boldsymbol x}_i)$ with ${boldsymbol x}_iin{mathbb R}^d$ but we do so by first embedding them a $p$ dimensional feature space via ${boldsymbol x}_imapstosigma({boldsymbol W}{boldsymbol x}_i)$ and then finding a max-margin classifier in this space. We derive exact formulas in the proportional asymptotics $p,n,dtoinfty$ with $p/dtopsi_1$, $n/dtopsi_2$ and observe that the test error is minimized in the highly overparametrized regime $psi_1gg 0$.

تحميل البحث