Canonical correlation coefficients of high-dimensional normal vectors: finite rank case


Abstract in English

Consider a normal vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively. With $n$ independent observations of $mathbf{z}$ at hand, we study the correlation between $mathbf{x}$ and $mathbf{y}$, from the perspective of the Canonical Correlation Analysis, under the high-dimensional setting: both $p$ and $q$ are proportional to the sample size $n$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. Under the additional assumptions $(p+q)/nto yin (0,1)$ and $p/q otto 1$, we study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest k eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, namely $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_r$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_r,1]$, from which we can recover $r_i$ in turn, thus provide an estimate of the latter in the high-dimensional scenario.

Download