Approximation of projections of random vectors


Abstract in English

Let $X$ be a $d$-dimensional random vector and $X_theta$ its projection onto the span of a set of orthonormal vectors ${theta_1,...,theta_k}$. Conditions on the distribution of $X$ are given such that if $theta$ is chosen according to Haar measure on the Stiefel manifold, the bounded-Lipschitz distance from $X_theta$ to a Gaussian distribution is concentrated at its expectation; furthermore, an explicit bound is given for the expected distance, in terms of $d$, $k$, and the distribution of $X$, allowing consideration not just of fixed $k$ but of $k$ growing with $d$. The results are applied in the setting of projection pursuit, showing that most $k$-dimensional projections of $n$ data points in $R^d$ are close to Gaussian, when $n$ and $d$ are large and $k=csqrt{log(d)}$ for a small constant $c$.

Download