Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A robust approach for principal component analyisis

305 0 0.0 ( 0 )

Download Cite

Added by Mar\\'ia Camila V\\'asquez Correa

Publication date 2019

fields Mathematical Statistics

and research's language is English

Authors Maria Camila Vasquez-Correa - Henry Laniado Rodas

Statistics Theory Statistics Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper we analyze different ways of performing principal component analysis throughout three different approaches: robust covariance and correlation matrix estimation, projection pursuit approach and non-parametric maximum entropy algorithm. The objective of these approaches is the correction of the well known sensitivity to outliers of the classical method for principal component analysis. Due to their robustness, they perform very well in contaminated data, while the classical approach fails to preserve the characteristics of the core information.

rate research

Robust covariance estimation for distributed principal component analysis

106 - Kangqiang Li , Han Bao , Songqiao Tang 2020

Fan et al. [$mathit{Annals}$ $mathit{of}$ $mathit{Statistics}$ $textbf{47}$(6) (2019) 3009-3031] proposed a distributed principal component analysis (PCA) algorithm to significantly reduce the communication cost between multiple servers. In this paper, we robustify their distributed algorithm by using robust covariance matrix estimators respectively proposed by Minsker [$mathit{Annals}$ $mathit{of}$ $mathit{Statistics}$ $textbf{46}$(6A) (2018) 2871-2903] and Ke et al. [$mathit{Statistical}$ $mathit{Science}$ $textbf{34}$(3) (2019) 454-471] instead of the sample covariance matrix. We extend the deviation bound of robust covariance estimators with bounded fourth moments to the case of the heavy-tailed distribution under only bounded $2+epsilon$ moments assumption. The theoretical results show that after the shrinkage or truncation treatment for the sample covariance matrix, the statistical error rate of the final estimator produced by the robust algorithm is the same as that of sub-Gaussian tails, when $epsilon geq 2$ and the sampling distribution is symmetric innovation. While $2 > epsilon >0$, the rate with respect to the sample size of each server is slower than that of the bounded fourth moment assumption. Extensive numerical results support the theoretical analysis, and indicate that the algorithm performs better than the original distributed algorithm and is robust to heavy-tailed data and outliers.

Statistics Theory Statistics Theory

New asymptotic results in principal component analysis

143 - Vladimir Koltchinskii , Karim Lounici 2016

Let $X$ be a mean zero Gaussian random vector in a separable Hilbert space ${mathbb H}$ with covariance operator $Sigma:={mathbb E}(Xotimes X).$ Let $Sigma=sum_{rgeq 1}mu_r P_r$ be the spectral decomposition of $Sigma$ with distinct eigenvalues $mu_1>mu_2> dots$ and the corresponding spectral projectors $P_1, P_2, dots.$ Given a sample $X_1,dots, X_n$ of size $n$ of i.i.d. copies of $X,$ the sample covariance operator is defined as $hat Sigma_n := n^{-1}sum_{j=1}^n X_jotimes X_j.$ The main goal of principal component analysis is to estimate spectral projectors $P_1, P_2, dots$ by their empirical counterparts $hat P_1, hat P_2, dots$ properly defined in terms of spectral decomposition of the sample covariance operator $hat Sigma_n.$ The aim of this paper is to study asymptotic distributions of important statistics related to this problem, in particular, of statistic $|hat P_r-P_r|_2^2,$ where $|cdot|_2^2$ is the squared Hilbert--Schmidt norm. This is done in a high-complexity asymptotic framework in which the so called effective rank ${bf r}(Sigma):=frac{{rm tr}(Sigma)}{|Sigma|_{infty}}$ (${rm tr}(cdot)$ being the trace and $|cdot|_{infty}$ being the operator norm) of the true covariance $Sigma$ is becoming large simultaneously with the sample size $n,$ but ${bf r}(Sigma)=o(n)$ as $ntoinfty.$ In this setting, we prove that, in the case of one-dimensional spectral projector $P_r,$ the properly centered and normalized statistic $|hat P_r-P_r|_2^2$ with {it data-dependent} centering and normalization converges in distribution to a Cauchy type limit. The proofs of this and other related results rely on perturbation analysis and Gaussian concentration.

Statistics Theory Statistics Theory

A note on the prediction error of principal component regression

132 - Martin Wahl 2018

We analyse the prediction error of principal component regression (PCR) and prove non-asymptotic upper bounds for the corresponding squared risk. Under mild assumptions, we show that PCR performs as well as the oracle method obtained by replacing empirical principal components by their population counterparts. Our approach relies on upper bounds for the excess risk of principal component analysis.

Statistics Theory Statistics Theory

Sparse principal component analysis for high-dimensional stationary time series

127 - Kou Fujimori , Yuichi Goto , Yan Liu 2021

We consider the sparse principal component analysis for high-dimensional stationary processes. The standard principal component analysis performs poorly when the dimension of the process is large. We establish the oracle inequalities for penalized principal component estimators for the processes including heavy-tailed time series. The rate of convergence of the estimators is established. We also elucidate the theoretical rate for choosing the tuning parameter in penalized estimators. The performance of the sparse principal component analysis is demonstrated by numerical simulations. The utility of the sparse principal component analysis for time series data is exemplified by the application to average temperature data.

Statistics Theory Statistics Theory

Principal Component Analysis for Functional Data on Riemannian Manifolds and Spheres

354 - Xiongtao Dai , Hans-Georg Muller 2017

Functional data analysis on nonlinear manifolds has drawn recent interest. Sphere-valued functional data, which are encountered for example as movement trajectories on the surface of the earth, are an important special case. We consider an intrinsic principal component analysis for smooth Riemannian manifold-valued functional data and study its asymptotic properties. Riemannian functional principal component analysis (RFPCA) is carried out by first mapping the manifold-valued data through Riemannian logarithm maps to tangent spaces around the time-varying Frechet mean function, and then performing a classical multivariate functional principal component analysis on the linear tangent spaces. Representations of the Riemannian manifold-valued functions and the eigenfunctions on the original manifold are then obtained with exponential maps. The tangent-space approximation through functional principal component analysis is shown to be well-behaved in terms of controlling the residual variation if the Riemannian manifold has nonnegative curvature. Specifically, we derive a central limit theorem for the mean function, as well as root-$n$ uniform convergence rates for other model components, including the covariance function, eigenfunctions, and functional principal component scores. Our applications include a novel framework for the analysis of longitudinal compositional data, achieved by mapping longitudinal compositional data to trajectories on the sphere, illustrated with longitudinal fruit fly behavior patterns. RFPCA is shown to be superior in terms of trajectory recovery in comparison to an unrestricted functional principal component analysis in applications and simulations and is also found to produce principal component scores that are better predictors for classification compared to traditional functional functional principal component scores.

Statistics Theory Statistics Theory

comments

Fetching comments

National Institute of Business Administration

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A robust approach for principal component analyisis

Ask ChatGPT about the research

No Arabic abstract

Read More