ترغب بنشر مسار تعليمي؟ اضغط هنا

Understanding Kernel Ridge Regression: Common behaviors from simple functions to density functionals

381   0   0.0 ( 0 )
 نشر من قبل Kevin Vu
 تاريخ النشر 2015
والبحث باللغة English




اسأل ChatGPT حول البحث

Accurate approximations to density functionals have recently been obtained via machine learning (ML). By applying ML to a simple function of one variable without any random sampling, we extract the qualitative dependence of errors on hyperparameters. We find universal features of the behavior in extreme limits, including both very small and very large length scales, and the noise-free limit. We show how such features arise in ML models of density functionals.



قيم البحث

اقرأ أيضاً

Kernel ridge regression is used to approximate the kinetic energy of non-interacting fermions in a one-dimensional box as a functional of their density. The properties of different kernels and methods of cross-validation are explored, and highly accu rate energies are achieved. Accurate {em constrained optimal densities} are found via a modified Euler-Lagrange constrained minimization of the total energy. A projected gradient descent algorithm is derived using local principal component analysis. Additionally, a sparse grid representation of the density can be used without degrading the performance of the methods. The implications for machine-learned density functional approximations are discussed.
89 - Yifan Chen , Yun Yang 2021
Nystrom approximation is a fast randomized method that rapidly solves kernel ridge regression (KRR) problems through sub-sampling the n-by-n empirical kernel matrix appearing in the objective function. However, the performance of such a sub-sampling method heavily relies on correctly estimating the statistical leverage scores for forming the sampling distribution, which can be as costly as solving the original KRR. In this work, we propose a linear time (modulo poly-log terms) algorithm to accurately approximate the statistical leverage scores in the stationary-kernel-based KRR with theoretical guarantees. Particularly, by analyzing the first-order condition of the KRR objective, we derive an analytic formula, which depends on both the input distribution and the spectral density of stationary kernels, for capturing the non-uniformity of the statistical leverage scores. Numerical experiments demonstrate that with the same prediction accuracy our method is orders of magnitude more efficient than existing methods in selecting the representative sub-samples in the Nystrom approximation.
89 - Wenjia Wang , Yi-Hui Zhou 2020
In the multivariate regression, also referred to as multi-task learning in machine learning, the goal is to recover a vector-valued function based on noisy observations. The vector-valued function is often assumed to be of low rank. Although the mult ivariate linear regression is extensively studied in the literature, a theoretical study on the multivariate nonlinear regression is lacking. In this paper, we study reduced rank multivariate kernel ridge regression, proposed by cite{mukherjee2011reduced}. We prove the consistency of the function predictor and provide the convergence rate. An algorithm based on nuclear norm relaxation is proposed. A few numerical examples are presented to show the smaller mean squared prediction error comparing with the elementwise univariate kernel ridge regression.
Machine learning is used to approximate density functionals. For the model problem of the kinetic energy of non-interacting fermions in 1d, mean absolute errors below 1 kcal/mol on test densities similar to the training set are reached with fewer tha n 100 training densities. A predictor identifies if a test density is within the interpolation region. Via principal component analysis, a projected functional derivative finds highly accurate self-consistent densities. Challenges for application of our method to real electronic structure problems are discussed.
The divide-and-conquer method has been widely used for estimating large-scale kernel ridge regression estimates. Unfortunately, when the response variable is highly skewed, the divide-and-conquer kernel ridge regression (dacKRR) may overlook the unde rrepresented region and result in unacceptable results. We develop a novel response-adaptive partition strategy to overcome the limitation. In particular, we propose to allocate the replicates of some carefully identified informative observations to multiple nodes (local processors). The idea is analogous to the popular oversampling technique. Although such a technique has been widely used for addressing discrete label skewness, extending it to the dacKRR setting is nontrivial. We provide both theoretical and practical guidance on how to effectively over-sample the observations under the dacKRR setting. Furthermore, we show the proposed estimate has a smaller asymptotic mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions. Our theoretical findings are supported by both simulated and real-data analyses.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا