Limiting distribution of the sample canonical correlation coefficients of high-dimensional random vectors


Abstract in English

Consider two high-dimensional random vectors $widetilde{mathbf x}inmathbb R^p$ and $widetilde{mathbf y}inmathbb R^q$ with finite rank correlations. More precisely, suppose that $widetilde{mathbf x}=mathbf x+Amathbf z$ and $widetilde{mathbf y}=mathbf y+Bmathbf z$, for independent random vectors $mathbf xinmathbb R^p$, $mathbf yinmathbb R^q$ and $mathbf zinmathbb R^r$ with iid entries of mean 0 and variance 1, and two deterministic matrices $Ainmathbb R^{ptimes r}$ and $Binmathbb R^{qtimes r}$ . With $n$ iid observations of $(widetilde{mathbf x},widetilde{mathbf y})$, we study the sample canonical correlations between them. In this paper, we focus on the high-dimensional setting with a rank-$r$ correlation. Let $t_1gecdotsge t_r$ be the squares of the population canonical correlation coefficients (CCC) between $widetilde{mathbf x}$ and $widetilde{mathbf y}$, and $widetildelambda_1gecdotsgewidetildelambda_r$ be the squares of the largest $r$ sample CCC. Under certain moment assumptions on the entries of $mathbf x$, $mathbf y$ and $mathbf z$, we show that there exists a threshold $t_cin(0, 1)$ such that if $t_i>t_c$, then $sqrt{n}(widetildelambda_i-theta_i)$ converges in law to a centered normal distribution, where $theta_i>lambda_+$ is a fixed outlier location determined by $t_i$. Our results extend the ones in [4] for Gaussian vectors. Moreover, we find that the variance of the limiting distribution of $sqrt{n}(widetildelambda_i-theta_i)$ also depends on the fourth cumulants of the entries of $mathbf x$, $mathbf y$ and $mathbf z$, a phenomenon that cannot be observed in the Gaussian case.

Download