For interpolating kernel machines, minimizing the norm of the ERM solution minimizes stability

78 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Akshay Rangamani

تاريخ النشر 2020

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Akshay Rangamani - Lorenzo Rosasco - Tomaso Poggio

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the average $mbox{CV}_{loo}$ stability of kernel ridge-less regression and derive corresponding risk bounds. We show that the interpolating solution with minimum norm minimizes a bound on $mbox{CV}_{loo}$ stability, which in turn is controlled by the condition number of the empirical kernel matrix. The latter can be characterized in the asymptotic regime where both the dimension and cardinality of the data go to infinity. Under the assumption of random kernel matrices, the corresponding test error should be expected to follow a double descent curve.

قيم البحث

70 - Tengyuan Liang , Benjamin Recht 2021

This paper provides elementary analyses of the regret and generalization of minimum-norm interpolating classifiers (MNIC). The MNIC is the function of smallest Reproducing Kernel Hilbert Space norm that perfectly interpolates a label pattern on a fin ite data set. We derive a mistake bound for MNIC and a regularized variant that holds for all data sets. This bound follows from elementary properties of matrix inverses. Under the assumption that the data is independently and identically distributed, the mistake bound implies that MNIC generalizes at a rate proportional to the norm of the interpolating solution and inversely proportional to the number of data points. This rate matches similar rates derived for margin classifiers and perceptrons. We derive several plausible generative models where the norm of the interpolating classifier is bounded or grows at a rate sublinear in $n$. We also show that as long as the population class conditional distributions are sufficiently separable in total variation, then MNIC generalizes with a fast rate.

التعلم الالي التعلم الآلي التحليل العددي

Spectral Norm of Random Kernel Matrices with Applications to Privacy

415 - Shiva Prasad Kasiviswanathan , Mark Rudelson 2015

Kernel methods are an extremely popular set of techniques used for many important machine learning and data analysis applications. In addition to having good practical performances, these methods are supported by a well-developed theory. Kernel metho ds use an implicit mapping of the input data into a high dimensional feature space defined by a kernel function, i.e., a function returning the inner product between the images of two data points in the feature space. Central to any kernel method is the kernel matrix, which is built by evaluating the kernel function on a given sample dataset. In this paper, we initiate the study of non-asymptotic spectral theory of random kernel matrices. These are n x n random matrices whose (i,j)th entry is obtained by evaluating the kernel function on $x_i$ and $x_j$, where $x_1,...,x_n$ are a set of n independent random high-dimensional vectors. Our main contribution is to obtain tight upper bounds on the spectral norm (largest eigenvalue) of random kernel matrices constructed by commonly used kernel functions based on polynomials and Gaussian radial basis. As an application of these results, we provide lower bounds on the distortion needed for releasing the coefficients of kernel ridge regression under attribute privacy, a general privacy notion which captures a large class of privacy definitions. Kernel ridge regression is standard method for performing non-parametric regression that regularly outperforms traditional regression approaches in various domains. Our privacy distortion lower bounds are the first for any kernel technique, and our analysis assumes realistic scenarios for the input, unlike all previous lower bounds for other release problems which only hold under very restrictive input settings.

التعلم الالي التشفير والأمن التعلم الآلي

Feature Elimination in Kernel Machines in moderately high dimensions

477 - Sayan Dasgupta , Yair Goldberg , Michael Kosorok 2013

We develop an approach for feature elimination in statistical learning with kernel machines, based on recursive elimination of features.We present theoretical properties of this method and show that it is uniformly consistent in finding the correct f eature space under certain generalized assumptions.We present four case studies to show that the assumptions are met in most practical situations and present simulation results to demonstrate performance of the proposed approach.

التعلم الالي

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

229 - Cencheng Shen , Joshua T. Vogelstein 2018

Distance-based tests, also called energy statistics, are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from kernel mean embeddings, are leading methods for two-sample and independen ce tests from the machine learning community. A fixed-point transformation was previously proposed to connect the distance methods and kernel methods for the population statistics. In this paper, we propose a new bijective transformation between metrics and kernels. It simplifies the fixed-point transformation, inherits similar theoretical properties, allows distance methods to be exactly the same as kernel methods for sample statistics and p-value, and better preserves the data structure upon transformation. Our results further advance the understanding in distance and kernel-based tests, streamline the code base for implementing these tests, and enable a rich literature of distance-based and kernel-based methodologies to directly communicate with each other.

التعلم الالي التعلم الآلي

Exponential Machines

99 - Alexander Novikov , Mikhail Trofimov , Ivan Oseledets 2016

Modeling interactions between features improves the performance of machine learning solutions in many domains (e.g. recommender systems or sentiment analysis). In this paper, we introduce Exponential Machines (ExM), a predictor that models all intera ctions of every order. The key idea is to represent an exponentially large tensor of parameters in a factorized format called Tensor Train (TT). The Tensor Train format regularizes the model and lets you control the number of underlying parameters. To train the model, we develop a stochastic Riemannian optimization procedure, which allows us to fit tensors with 2^160 entries. We show that the model achieves state-of-the-art performance on synthetic data with high-order interactions and that it works on par with high-order factorization machines on a recommender system dataset MovieLens 100K.

التعلم الالي التعلم الآلي