ترغب بنشر مسار تعليمي؟ اضغط هنا

Limiting distribution of the sample canonical correlation coefficients of high-dimensional random vectors

78   0   0.0 ( 0 )
 نشر من قبل Fan Yang
 تاريخ النشر 2021
  مجال البحث
والبحث باللغة English
 تأليف Fan Yang




اسأل ChatGPT حول البحث

Consider two high-dimensional random vectors $widetilde{mathbf x}inmathbb R^p$ and $widetilde{mathbf y}inmathbb R^q$ with finite rank correlations. More precisely, suppose that $widetilde{mathbf x}=mathbf x+Amathbf z$ and $widetilde{mathbf y}=mathbf y+Bmathbf z$, for independent random vectors $mathbf xinmathbb R^p$, $mathbf yinmathbb R^q$ and $mathbf zinmathbb R^r$ with iid entries of mean 0 and variance 1, and two deterministic matrices $Ainmathbb R^{ptimes r}$ and $Binmathbb R^{qtimes r}$ . With $n$ iid observations of $(widetilde{mathbf x},widetilde{mathbf y})$, we study the sample canonical correlations between them. In this paper, we focus on the high-dimensional setting with a rank-$r$ correlation. Let $t_1gecdotsge t_r$ be the squares of the population canonical correlation coefficients (CCC) between $widetilde{mathbf x}$ and $widetilde{mathbf y}$, and $widetildelambda_1gecdotsgewidetildelambda_r$ be the squares of the largest $r$ sample CCC. Under certain moment assumptions on the entries of $mathbf x$, $mathbf y$ and $mathbf z$, we show that there exists a threshold $t_cin(0, 1)$ such that if $t_i>t_c$, then $sqrt{n}(widetildelambda_i-theta_i)$ converges in law to a centered normal distribution, where $theta_i>lambda_+$ is a fixed outlier location determined by $t_i$. Our results extend the ones in [4] for Gaussian vectors. Moreover, we find that the variance of the limiting distribution of $sqrt{n}(widetildelambda_i-theta_i)$ also depends on the fourth cumulants of the entries of $mathbf x$, $mathbf y$ and $mathbf z$, a phenomenon that cannot be observed in the Gaussian case.

قيم البحث

اقرأ أيضاً

Consider a normal vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively. With $n$ independent observations of $mathbf{z}$ at hand, we study the correlation between $mathbf{x}$ and $mathbf{y}$, from the perspective of the Canonical Correlation Analysis, under the high-dimensional setting: both $p$ and $q$ are proportional to the sample size $n$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. Under the additional assumptions $(p+q)/nto yin (0,1)$ and $p/q otto 1$, we study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest k eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, namely $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_r$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_r,1]$, from which we can recover $r_i$ in turn, thus provide an estimate of the latter in the high-dimensional scenario.
Consider a Gaussian vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively, where both $p$ and $q$ are proportional to the sample size $n$. Denote by $Sigma_{mathbf {u}mathbf{v}}$ the population cross-covariance matrix of random vectors $mathbf{u}$ and $mathbf{v}$, and denote by $S_{mathbf{u}mathbf{v}}$ the sample counterpart. The canonical correlation coefficients between $mathbf{x}$ and $mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $Sigma_{mathbf{x}mathbf{x}}^{-1}Sigma_{mathbf{x}mathbf{y}}Sigma_{mathbf{y}mathbf{y}}^{-1}Sigma_{mathbf{y}mathbf{x}}$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. We study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest $k$ eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, denoted by $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_{+},1]$. We also obtain the limiting distribution of $lambda_i$s under appropriate normalization. Specifically, $lambda_i$ possesses Gaussian type fluctuation if $r_i>r_c$, and follows Tracy-Widom distribution if $r_i<r_c$. Some applications of our results are also discussed.
Let ${X}_{k}=(x_{k1}, cdots, x_{kp}), k=1,cdots,n$, be a random sample of size $n$ coming from a $p$-dimensional population. For a fixed integer $mgeq 2$, consider a hypercubic random tensor $mathbf{{T}}$ of $m$-th order and rank $n$ with begin{eqnar ray*} mathbf{{T}}= sum_{k=1}^{n}underbrace{{X}_{k}otimescdotsotimes {X}_{k}}_{m~multiple}=Big(sum_{k=1}^{n} x_{ki_{1}}x_{ki_{2}}cdots x_{ki_{m}}Big)_{1leq i_{1},cdots, i_{m}leq p}. end{eqnarray*} Let $W_n$ be the largest off-diagonal entry of $mathbf{{T}}$. We derive the asymptotic distribution of $W_n$ under a suitable normalization for two cases. They are the ultra-high dimension case with $ptoinfty$ and $log p=o(n^{beta})$ and the high-dimension case with $pto infty$ and $p=O(n^{alpha})$ where $alpha,beta>0$. The normalizing constant of $W_n$ depends on $m$ and the limiting distribution of $W_n$ is a Gumbel-type distribution involved with parameter $m$.
Consider a $p$-dimensional population ${mathbf x} inmathbb{R}^p$ with iid coordinates in the domain of attraction of a stable distribution with index $alphain (0,2)$. Since the variance of ${mathbf x}$ is infinite, the sample covariance matrix ${math bf S}_n=n^{-1}sum_{i=1}^n {{mathbf x}_i}{mathbf x}_i$ based on a sample ${mathbf x}_1,ldots,{mathbf x}_n$ from the population is not well behaved and it is of interest to use instead the sample correlation matrix ${mathbf R}_n= {operatorname{diag}({mathbf S}_n)}^{-1/2}, {mathbf S}_n {operatorname{diag}({mathbf S}_n)}^{-1/2}$. This paper finds the limiting distributions of the eigenvalues of ${mathbf R}_n$ when both the dimension $p$ and the sample size $n$ grow to infinity such that $p/nto gamma in (0,infty)$. The family of limiting distributions ${H_{alpha,gamma}}$ is new and depends on the two parameters $alpha$ and $gamma$. The moments of $H_{alpha,gamma}$ are fully identified as sum of two contributions: the first from the classical Marv{c}enko-Pastur law and a second due to heavy tails. Moreover, the family ${H_{alpha,gamma}}$ has continuous extensions at the boundaries $alpha=2$ and $alpha=0$ leading to the Marv{c}enko-Pastur law and a modified Poisson distribution, respectively. Our proofs use the method of moments, the path-shortening algorithm developed in [18] and some novel graph counting combinatorics. As a consequence, the moments of $H_{alpha,gamma}$ are expressed in terms of combinatorial objects such as Stirling numbers of the second kind. A simulation study on these limiting distributions $H_{alpha,gamma}$ is also provided for comparison with the Marv{c}enko-Pastur law.
In this article we prove three fundamental types of limit theorems for the $q$-norm of random vectors chosen at random in an $ell_p^n$-ball in high dimensions. We obtain a central limit theorem, a moderate deviations as well as a large deviations pri nciple when the underlying distribution of the random vectors belongs to a general class introduced by Barthe, Guedon, Mendelson, and Naor. It includes the normalized volume and the cone probability measure as well as projections of these measures as special cases. Two new applications to random and non-random projections of $ell_p^n$-balls to lower-dimensional subspaces are discussed as well. The text is a continuation of [Kabluchko, Prochno, Thale: High-dimensional limit theorems for random vectors in $ell_p^n$-balls, Commun. Contemp. Math. (2019)].
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا