New asymptotic results in principal component analysis


الملخص بالإنكليزية

Let $X$ be a mean zero Gaussian random vector in a separable Hilbert space ${mathbb H}$ with covariance operator $Sigma:={mathbb E}(Xotimes X).$ Let $Sigma=sum_{rgeq 1}mu_r P_r$ be the spectral decomposition of $Sigma$ with distinct eigenvalues $mu_1>mu_2> dots$ and the corresponding spectral projectors $P_1, P_2, dots.$ Given a sample $X_1,dots, X_n$ of size $n$ of i.i.d. copies of $X,$ the sample covariance operator is defined as $hat Sigma_n := n^{-1}sum_{j=1}^n X_jotimes X_j.$ The main goal of principal component analysis is to estimate spectral projectors $P_1, P_2, dots$ by their empirical counterparts $hat P_1, hat P_2, dots$ properly defined in terms of spectral decomposition of the sample covariance operator $hat Sigma_n.$ The aim of this paper is to study asymptotic distributions of important statistics related to this problem, in particular, of statistic $|hat P_r-P_r|_2^2,$ where $|cdot|_2^2$ is the squared Hilbert--Schmidt norm. This is done in a high-complexity asymptotic framework in which the so called effective rank ${bf r}(Sigma):=frac{{rm tr}(Sigma)}{|Sigma|_{infty}}$ (${rm tr}(cdot)$ being the trace and $|cdot|_{infty}$ being the operator norm) of the true covariance $Sigma$ is becoming large simultaneously with the sample size $n,$ but ${bf r}(Sigma)=o(n)$ as $ntoinfty.$ In this setting, we prove that, in the case of one-dimensional spectral projector $P_r,$ the properly centered and normalized statistic $|hat P_r-P_r|_2^2$ with {it data-dependent} centering and normalization converges in distribution to a Cauchy type limit. The proofs of this and other related results rely on perturbation analysis and Gaussian concentration.

تحميل البحث