Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon

125 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Xiyu Zhai

تاريخ النشر 2018

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Alexander Rakhlin - Xiyu Zhai

التعلم الالي التعلم الآلي نظرية الإحصاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We show that minimum-norm interpolation in the Reproducing Kernel Hilbert Space corresponding to the Laplace kernel is not consistent if input dimension is constant. The lower bound holds for any choice of kernel bandwidth, even if selected based on data. The result supports the empirical observation that minimum-norm interpolation (that is, exact fit to training data) in RKHS generalizes well for some high-dimensional datasets, but not for low-dimensional ones.

قيم البحث

129 - Julie Josse , Nicolas Prost (CMAP 2019

In many application settings, the data have missing entries which make analysis challenging. An abundant literature addresses missing values in an inferential framework: estimating parameters and their variance from incomplete tables. Here, we consid er supervised-learning settings: predicting a target when missing values appear in both training and testing data. We show the consistency of two approaches in prediction. A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative. This contrasts with inferential settings where mean imputation is pointed at for distorting the distribution of the data. That such a simple approach can be consistent is important in practice. We also show that a predictor suited for complete observations can predict optimally on incomplete data,through multiple imputation.Finally, to compare imputation with learning directly with a model that accounts for missing values, we analyze further decision trees. These can naturally tackle empirical risk minimization with missing values, due to their ability to handle the half-discrete nature of incomplete variables. After comparing theoretically and empirically different missing values strategies in trees, we recommend using the missing incorporated in attribute method as it can handle both non-informative and informative missing values.

التعلم الالي التعلم الآلي نظرية الإحصاء

Early stopping and polynomial smoothing in regression with reproducing kernels

405 - Yaroslav Averyanov , Alain Celisse 2020

In this paper, we study the problem of early stopping for iterative learning algorithms in a reproducing kernel Hilbert space (RKHS) in the nonparametric regression framework. In particular, we work with the gradient descent and (iterative) kernel ri dge regression algorithms. We present a data-driven rule to perform early stopping without a validation set that is based on the so-called minimum discrepancy principle. This method enjoys only one assumption on the regression function: it belongs to a reproducing kernel Hilbert space (RKHS). The proposed rule is proved to be minimax-optimal over different types of kernel spaces, including finite-rank and Sobolev smoothness classes. The proof is derived from the fixed-point analysis of the localized Rademacher complexities, which is a standard technique for obtaining optimal rates in the nonparametric regression literature. In addition to that, we present simulation results on artificial datasets that show the comparable performance of the designed rule with respect to other stopping rules such as the one determined by V-fold cross-validation.

التعلم الالي التعلم الآلي نظرية الإحصاء

Epsilon Consistent Mixup: An Adaptive Consistency-Interpolation Tradeoff

68 - Vincent Pisztora , Yanglan Ou , Xiaolei Huang 2021

In this paper we propose $epsilon$-Consistent Mixup ($epsilon$mu). $epsilon$mu is a data-based structural regularization technique that combines Mixups linear interpolation with consistency regularization in the Mixup direction, by compelling a simpl e adaptive tradeoff between the two. This learnable combination of consistency and interpolation induces a more flexible structure on the evolution of the response across the feature space and is shown to improve semi-supervised classification accuracy on the SVHN and CIFAR10 benchmark datasets, yielding the largest gains in the most challenging low label-availability scenarios. Empirical studies comparing $epsilon$mu and Mixup are presented and provide insight into the mechanisms behind $epsilon$mus effectiveness. In particular, $epsilon$mu is found to produce more accurate synthetic labels and more confident predictions than Mixup.

التعلم الالي التعلم الآلي تطبيقات الإحصاء

Nearly Optimal Variational Inference for High Dimensional Regression with Shrinkage Priors

118 - Jincheng Bai , Qifan Song , Guang Cheng 2020

We propose a variational Bayesian (VB) procedure for high-dimensional linear model inferences with heavy tail shrinkage priors, such as student-t prior. Theoretically, we establish the consistency of the proposed VB method and prove that under the pr oper choice of prior specifications, the contraction rate of the VB posterior is nearly optimal. It justifies the validity of VB inference as an alternative of Markov Chain Monte Carlo (MCMC) sampling. Meanwhile, comparing to conventional MCMC methods, the VB procedure achieves much higher computational efficiency, which greatly alleviates the computing burden for modern machine learning applications such as massive data analysis. Through numerical studies, we demonstrate that the proposed VB method leads to shorter computing time, higher estimation accuracy, and lower variable selection error than competitive sparse Bayesian methods.

التعلم الالي التعلم الآلي نظرية الإحصاء

High-Dimensional Sparse Linear Bandits

150 - Botao Hao , Tor Lattimore , Mengdi Wang 2020

Stochastic linear bandits with high-dimensional sparse features are a practical model for a variety of domains, including personalized medicine and online advertising. We derive a novel $Omega(n^{2/3})$ dimension-free minimax regret lower bound for s parse linear bandits in the data-poor regime where the horizon is smaller than the ambient dimension and where the feature vectors admit a well-conditioned exploration distribution. This is complemented by a nearly matching upper bound for an explore-then-commit algorithm showing that that $Theta(n^{2/3})$ is the optimal rate in the data-poor regime. The results complement existing bounds for the data-rich regime and provide another example where carefully balancing the trade-off between information and regret is necessary. Finally, we prove a dimension-free $O(sqrt{n})$ regret upper bound under an additional assumption on the magnitude of the signal for relevant features.

التعلم الالي التعلم الآلي نظرية الإحصاء