Canonical correlation coefficients of high-dimensional Gaussian vectors: finite rank case


Abstract in English

Consider a Gaussian vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively, where both $p$ and $q$ are proportional to the sample size $n$. Denote by $Sigma_{mathbf{u}mathbf{v}}$ the population cross-covariance matrix of random vectors $mathbf{u}$ and $mathbf{v}$, and denote by $S_{mathbf{u}mathbf{v}}$ the sample counterpart. The canonical correlation coefficients between $mathbf{x}$ and $mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $Sigma_{mathbf{x}mathbf{x}}^{-1}Sigma_{mathbf{x}mathbf{y}}Sigma_{mathbf{y}mathbf{y}}^{-1}Sigma_{mathbf{y}mathbf{x}}$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. We study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest $k$ eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, denoted by $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_{+},1]$. We also obtain the limiting distribution of $lambda_i$s under appropriate normalization. Specifically, $lambda_i$ possesses Gaussian type fluctuation if $r_i>r_c$, and follows Tracy-Widom distribution if $r_i<r_c$. Some applications of our results are also discussed.

Download