ﻻ يوجد ملخص باللغة العربية
Principal Component Analysis (PCA) is a powerful tool in statistics and machine learning. While existing study of PCA focuses on the recovery of principal components and their associated eigenvalues, there are few precise characterizations of individual principal component scores that yield low-dimensional embedding of samples. That hinders the analysis of various spectral methods. In this paper, we first develop an $ell_p$ perturbation theory for a hollowed version of PCA in Hilbert spaces which provably improves upon the vanilla PCA in the presence of heteroscedastic noises. Through a novel $ell_p$ analysis of eigenvectors, we investigate entrywise behaviors of principal component score vectors and show that they can be approximated by linear functionals of the Gram matrix in $ell_p$ norm, which includes $ell_2$ and $ell_infty$ as special examples. For sub-Gaussian mixture models, the choice of $p$ giving optimal bounds depends on the signal-to-noise ratio, which further yields optimality guarantees for spectral clustering. For contextual community detection, the $ell_p$ theory leads to a simple spectral algorithm that achieves the information threshold for exact recovery. These also provide optimal recovery results for Gaussian mixture and stochastic block models as special cases.
We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gaussian noise. This includes clustering in a Gaussian mixture model, sparse PCA, and submatrix localization. Each of these problems is conjectured to exhi
Diffusion maps is a manifold learning algorithm widely used for dimensionality reduction. Using a sample from a distribution, it approximates the eigenvalues and eigenfunctions of associated Laplace-Beltrami operators. Theoretical bounds on the appro
Singular value decomposition (SVD) based principal component analysis (PCA) breaks down in the high-dimensional and limited sample size regime below a certain critical eigen-SNR that depends on the dimensionality of the system and the number of sampl
Random subspaces $X$ of $mathbb{R}^n$ of dimension proportional to $n$ are, with high probability, well-spread with respect to the $ell_p$-norm (for $p in [1,2]$). Namely, every nonzero $x in X$ is robustly non-sparse in the following sense: $x$ is $
Recent years have seen the rise of convolutional neural network techniques in exemplar-based image synthesis. These methods often rely on the minimization of some variational formulation on the image space for which the minimizers are assumed to be t