Reproducing Kernel Hilbert Space, Mercers Theorem, Eigenfunctions, Nystrom Method, and Use of Kernels in Machine Learning: Tutorial and Survey

325 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Benyamin Ghojogh

تاريخ النشر 2021

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Benyamin Ghojogh - Ali Ghodsi - Fakhri Karray

التعلم الالي التعلم الآلي تحليل وظيفي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This is a tutorial and survey paper on kernels, kernel methods, and related fields. We start with reviewing the history of kernels in functional analysis and machine learning. Then, Mercer kernel, Hilbert and Banach spaces, Reproducing Kernel Hilbert Space (RKHS), Mercers theorem and its proof, frequently used kernels, kernel construction from distance metric, important classes of kernels (including bounded, integrally positive definite, universal, stationary, and characteristic kernels), kernel centering and normalization, and eigenfunctions are explained in detail. Then, we introduce types of use of kernels in machine learning including kernel methods (such as kernel support vector machines), kernel learning by semi-definite programming, Hilbert-Schmidt independence criterion, maximum mean discrepancy, kernel mean embedding, and kernel dimensionality reduction. We also cover rank and factorization of kernel matrix as well as the approximation of eigenfunctions and kernels using the Nystr{o}m method. This paper can be useful for various fields of science including machine learning, dimensionality reduction, functional analysis in mathematics, and mathematical physics in quantum mechanics.

قيم البحث

84 - Toni Karvonen , Chris J. Oates , Mark Girolami 2020

The Gaussian kernel plays a central role in machine learning, uncertainty quantification and scattered data approximation, but has received relatively little attention from a numerical analysis standpoint. The basic problem of finding an algorithm fo r efficient numerical integration of functions reproduced by Gaussian kernels has not been fully solved. In this article we construct two classes of algorithms that use $N$ evaluations to integrate $d$-variate functions reproduced by Gaussian kernels and prove the exponential or super-algebraic decay of their worst-case errors. In contrast to earlier work, no constraints are placed on the length-scale parameter of the Gaussian kernel. The first class of algorithms is obtained via an appropriate scaling of the classical Gauss-Hermite rules. For these algorithms we derive lower and upper bounds on the worst-case error of the forms $exp(-c_1 N^{1/d}) N^{1/(4d)}$ and $exp(-c_2 N^{1/d}) N^{-1/(4d)}$, respectively, for positive constants $c_1 > c_2$. The second class of algorithms we construct is more flexible and uses worst-case optimal weights for points that may be taken as a nested sequence. For these algorithms we derive upper bounds of the form $exp(-c_3 N^{1/(2d)})$ for a positive constant $c_3$.

التحليل العددي التحليل العددي تحليل وظيفي

Scalable Learning in Reproducing Kernel Krein Spaces

84 - Dino Oglic , Thomas Gartner 2018

We provide the first mathematically complete derivation of the Nystrom method for low-rank approximation of indefinite kernels and propose an efficient method for finding an approximate eigendecomposition of such kernel matrices. Building on this res ult, we devise highly scalable methods for learning in reproducing kernel Kreu{i}n spaces. The devised approaches provide a principled and theoretically well-founded means to tackle large scale learning problems with indefinite kernels. The main motivation for our work comes from problems with structured representations (e.g., graphs, strings, time-series), where it is relatively easy to devise a pairwise (dis)similarity function based on intuition and/or knowledge of domain experts. Such functions are typically not positive definite and it is often well beyond the expertise of practitioners to verify this condition. The effectiveness of the devised approaches is evaluated empirically using indefinite kernels defined on structured and vectorial data representations.

التعلم الالي التعلم الآلي

Early stopping and polynomial smoothing in regression with reproducing kernels

405 - Yaroslav Averyanov , Alain Celisse 2020

In this paper, we study the problem of early stopping for iterative learning algorithms in a reproducing kernel Hilbert space (RKHS) in the nonparametric regression framework. In particular, we work with the gradient descent and (iterative) kernel ri dge regression algorithms. We present a data-driven rule to perform early stopping without a validation set that is based on the so-called minimum discrepancy principle. This method enjoys only one assumption on the regression function: it belongs to a reproducing kernel Hilbert space (RKHS). The proposed rule is proved to be minimax-optimal over different types of kernel spaces, including finite-rank and Sobolev smoothness classes. The proof is derived from the fixed-point analysis of the localized Rademacher complexities, which is a standard technique for obtaining optimal rates in the nonparametric regression literature. In addition to that, we present simulation results on artificial datasets that show the comparable performance of the designed rule with respect to other stopping rules such as the one determined by V-fold cross-validation.

التعلم الالي التعلم الآلي نظرية الإحصاء

Pontryagin Space Structure in Reproducing Kernel Hilbert Spaces over *-semigroups

130 - Franciszek Hugon Szafraniec , Michal Wojtylak 2009

The geometry of spaces with indefinite inner product, known also as Krein spaces, is a basic tool for developing Operator Theory therein. In the present paper we establish a link between this geometry and the algebraic theory of *-semigroups. It goes via the positive definite functions and related to them reproducing kernel Hilbert spaces. Our concern is in describing properties of elements of the semigroup which determine shift operators which serve as Pontryagin fundamental symmetries

تحليل وظيفي

Connections and Equivalences between the Nystrom Method and Sparse Variational Gaussian Processes

92 - Veit Wild , Motonobu Kanagawa , Dino Sejdinovic 2021

We investigate the connections between sparse approximation methods for making kernel methods and Gaussian processes (GPs) scalable to massive data, focusing on the Nystrom method and the Sparse Variational Gaussian Processes (SVGP). While sparse app roximation methods for GPs and kernel methods share some algebraic similarities, the literature lacks a deep understanding of how and why they are related. This is a possible obstacle for the communications between the GP and kernel communities, making it difficult to transfer results from one side to the other. Our motivation is to remove this possible obstacle, by clarifying the connections between the sparse approximations for GPs and kernel methods. In this work, we study the two popular approaches, the Nystrom and SVGP approximations, in the context of a regression problem, and establish various connections and equivalences between them. In particular, we provide an RKHS interpretation of the SVGP approximation, and show that the Evidence Lower Bound of the SVGP contains the objective function of the Nystrom approximation, revealing the origin of the algebraic equivalence between the two approaches. We also study recently established convergence results for the SVGP and how they are related to the approximation quality of the Nystrom method.

التعلم الالي التعلم الآلي نظرية الإحصاء