On the number of variables to use in principal component regression

58 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Daniel Hsu

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ji Xu - Daniel Hsu

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study least squares linear regression over $N$ uncorrelated Gaussian features that are selected in order of decreasing variance. When the number of selected features $p$ is at most the sample size $n$, the estimator under consideration coincides with the principal component regression estimator; when $p>n$, the estimator is the least $ell_2$ norm solution over the selected features. We give an average-case analysis of the out-of-sample prediction error as $p,n,N to infty$ with $p/N to alpha$ and $n/N to beta$, for some constants $alpha in [0,1]$ and $beta in (0,1)$. In this average-case setting, the prediction error exhibits a double descent shape as a function of $p$. We also establish conditions under which the minimum risk is achieved in the interpolating ($p>n$) regime.

قيم البحث

132 - Martin Wahl 2018

We analyse the prediction error of principal component regression (PCR) and prove non-asymptotic upper bounds for the corresponding squared risk. Under mild assumptions, we show that PCR performs as well as the oracle method obtained by replacing emp irical principal components by their population counterparts. Our approach relies on upper bounds for the excess risk of principal component analysis.

نظرية الإحصاء نظرية الإحصاء

Principal Component Projection Without Principal Component Analysis

177 - Roy Frostig , Cameron Musco , Christopher Musco 2016

We show how to efficiently project a vector onto the top principal components of a matrix, without explicitly computing these components. Specifically, we introduce an iterative algorithm that provably computes the projection using few calls to any b lack-box routine for ridge regression. By avoiding explicit principal component analysis (PCA), our algorithm is the first with no runtime dependence on the number of top principal components. We show that it can be used to give a fast iterative method for the popular principal component regression problem, giving the first major runtime improvement over the naive method of combining PCA with regression. To achieve our results, we first observe that ridge regression can be used to obtain a smooth projection onto the top principal components. We then sharpen this approximation to true projection using a low-degree polynomial approximation to the matrix step function. Step function approximation is a topic of long-term interest in scientific computing. We extend prior theory by constructing polynomials with simple iterative structure and rigorously analyzing their behavior under limited precision.

بنى وهياكل البيانات والخوارزميات التعلم الآلي التعلم الالي

Kernel ridge vs. principal component regression: minimax bounds and adaptability of regularization operators

80 - Lee H. Dicker , Dean P. Foster , Daniel Hsu 2016

Regularization is an essential element of virtually all kernel methods for nonparametric regression problems. A critical factor in the effectiveness of a given kernel method is the type of regularization that is employed. This article compares and co ntrasts members from a general class of regularization techniques, which notably includes ridge regression and principal component regression. We derive an explicit finite-sample risk bound for regularization-based estimators that simultaneously accounts for (i) the structure of the ambient function space, (ii) the regularity of the true regression function, and (iii) the adaptability (or qualification) of the regularization. A simple consequence of this upper bound is that the risk of the regularization-based estimators matches the minimax rate in a variety of settings. The general bound also illustrates how some regularization techniques are more adaptable than others to favorable regularity properties that the true regression function may possess. This, in particular, demonstrates a striking difference between kernel ridge regression and kernel principal component regression. Our theoretical results are supported by numerical experiments.

نظرية الإحصاء نظرية الإحصاء

Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data

451 - Yehua Li , Tailen Hsing 2012

We consider nonparametric estimation of the mean and covariance functions for functional/longitudinal data. Strong uniform convergence rates are developed for estimators that are local-linear smoothers. Our results are obtained in a unified framework in which the number of observations within each curve/cluster can be of any rate relative to the sample size. We show that the convergence rates for the procedures depend on both the number of sample curves and the number of observations on each curve. For sparse functional data, these rates are equivalent to the optimal rates in nonparametric regression. For dense functional data, root-n rates of convergence can be achieved with proper choices of bandwidths. We further derive almost sure rates of convergence for principal component analysis using the estimated covariance function. The results are illustrated with simulation studies.

نظرية الإحصاء نظرية الإحصاء

New asymptotic results in principal component analysis

143 - Vladimir Koltchinskii , Karim Lounici 2016

Let $X$ be a mean zero Gaussian random vector in a separable Hilbert space ${mathbb H}$ with covariance operator $Sigma:={mathbb E}(Xotimes X).$ Let $Sigma=sum_{rgeq 1}mu_r P_r$ be the spectral decomposition of $Sigma$ with distinct eigenvalues $mu_1 >mu_2> dots$ and the corresponding spectral projectors $P_1, P_2, dots.$ Given a sample $X_1,dots, X_n$ of size $n$ of i.i.d. copies of $X,$ the sample covariance operator is defined as $hat Sigma_n := n^{-1}sum_{j=1}^n X_jotimes X_j.$ The main goal of principal component analysis is to estimate spectral projectors $P_1, P_2, dots$ by their empirical counterparts $hat P_1, hat P_2, dots$ properly defined in terms of spectral decomposition of the sample covariance operator $hat Sigma_n.$ The aim of this paper is to study asymptotic distributions of important statistics related to this problem, in particular, of statistic $|hat P_r-P_r|_2^2,$ where $|cdot|_2^2$ is the squared Hilbert--Schmidt norm. This is done in a high-complexity asymptotic framework in which the so called effective rank ${bf r}(Sigma):=frac{{rm tr}(Sigma)}{|Sigma|_{infty}}$ (${rm tr}(cdot)$ being the trace and $|cdot|_{infty}$ being the operator norm) of the true covariance $Sigma$ is becoming large simultaneously with the sample size $n,$ but ${bf r}(Sigma)=o(n)$ as $ntoinfty.$ In this setting, we prove that, in the case of one-dimensional spectral projector $P_r,$ the properly centered and normalized statistic $|hat P_r-P_r|_2^2$ with {it data-dependent} centering and normalization converges in distribution to a Cauchy type limit. The proofs of this and other related results rely on perturbation analysis and Gaussian concentration.

نظرية الإحصاء نظرية الإحصاء