ترغب بنشر مسار تعليمي؟ اضغط هنا

Canonical correlation coefficients of high-dimensional Gaussian vectors: finite rank case

71   0   0.0 ( 0 )
 نشر من قبل Zhigang Bao
 تاريخ النشر 2017
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Consider a Gaussian vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively, where both $p$ and $q$ are proportional to the sample size $n$. Denote by $Sigma_{mathbf{u}mathbf{v}}$ the population cross-covariance matrix of random vectors $mathbf{u}$ and $mathbf{v}$, and denote by $S_{mathbf{u}mathbf{v}}$ the sample counterpart. The canonical correlation coefficients between $mathbf{x}$ and $mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $Sigma_{mathbf{x}mathbf{x}}^{-1}Sigma_{mathbf{x}mathbf{y}}Sigma_{mathbf{y}mathbf{y}}^{-1}Sigma_{mathbf{y}mathbf{x}}$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. We study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest $k$ eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, denoted by $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_{+},1]$. We also obtain the limiting distribution of $lambda_i$s under appropriate normalization. Specifically, $lambda_i$ possesses Gaussian type fluctuation if $r_i>r_c$, and follows Tracy-Widom distribution if $r_i<r_c$. Some applications of our results are also discussed.



قيم البحث

اقرأ أيضاً

Consider a normal vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively. With $n$ independent observations of $mathbf{z}$ at hand, we study the correlation between $mathbf{x}$ and $mathbf{y}$, from the perspective of the Canonical Correlation Analysis, under the high-dimensional setting: both $p$ and $q$ are proportional to the sample size $n$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. Under the additional assumptions $(p+q)/nto yin (0,1)$ and $p/q otto 1$, we study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest k eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, namely $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_r$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_r,1]$, from which we can recover $r_i$ in turn, thus provide an estimate of the latter in the high-dimensional scenario.
This paper proposes a new statistic to test independence between two high dimensional random vectors ${mathbf{X}}:p_1times1$ and ${mathbf{Y}}:p_2times1$. The proposed statistic is based on the sum of regularized sample canonical correlation coefficie nts of ${mathbf{X}}$ and ${mathbf{Y}}$. The asymptotic distribution of the statistic under the null hypothesis is established as a corollary of general central limit theorems (CLT) for the linear statistics of classical and regularized sample canonical correlation coefficients when $p_1$ and $p_2$ are both comparable to the sample size $n$. As applications of the developed independence test, various types of dependent structures, such as factor models, ARCH models and a general uncorrelated but dependent case, etc., are investigated by simulations. As an empirical application, cross-sectional dependence of daily stock returns of companies between different sections in the New York Stock Exchange (NYSE) is detected by the proposed test.
77 - Fan Yang 2021
Consider two high-dimensional random vectors $widetilde{mathbf x}inmathbb R^p$ and $widetilde{mathbf y}inmathbb R^q$ with finite rank correlations. More precisely, suppose that $widetilde{mathbf x}=mathbf x+Amathbf z$ and $widetilde{mathbf y}=mathbf y+Bmathbf z$, for independent random vectors $mathbf xinmathbb R^p$, $mathbf yinmathbb R^q$ and $mathbf zinmathbb R^r$ with iid entries of mean 0 and variance 1, and two deterministic matrices $Ainmathbb R^{ptimes r}$ and $Binmathbb R^{qtimes r}$ . With $n$ iid observations of $(widetilde{mathbf x},widetilde{mathbf y})$, we study the sample canonical correlations between them. In this paper, we focus on the high-dimensional setting with a rank-$r$ correlation. Let $t_1gecdotsge t_r$ be the squares of the population canonical correlation coefficients (CCC) between $widetilde{mathbf x}$ and $widetilde{mathbf y}$, and $widetildelambda_1gecdotsgewidetildelambda_r$ be the squares of the largest $r$ sample CCC. Under certain moment assumptions on the entries of $mathbf x$, $mathbf y$ and $mathbf z$, we show that there exists a threshold $t_cin(0, 1)$ such that if $t_i>t_c$, then $sqrt{n}(widetildelambda_i-theta_i)$ converges in law to a centered normal distribution, where $theta_i>lambda_+$ is a fixed outlier location determined by $t_i$. Our results extend the ones in [4] for Gaussian vectors. Moreover, we find that the variance of the limiting distribution of $sqrt{n}(widetildelambda_i-theta_i)$ also depends on the fourth cumulants of the entries of $mathbf x$, $mathbf y$ and $mathbf z$, a phenomenon that cannot be observed in the Gaussian case.
135 - Emilie Devijver 2015
We study a dimensionality reduction technique for finite mixtures of high-dimensional multivariate response regression models. Both the dimension of the response and the number of predictors are allowed to exceed the sample size. We consider predicto r selection and rank reduction to obtain lower-dimensional approximations. A class of estimators with a fast rate of convergence is introduced. We apply this result to a specific procedure, introduced in [11], where the relevant predictors are selected by the Group-Lasso.
139 - Nicolas Verzelen 2008
Let $(Y,(X_i)_{iinmathcal{I}})$ be a zero mean Gaussian vector and $V$ be a subset of $mathcal{I}$. Suppose we are given $n$ i.i.d. replications of the vector $(Y,X)$. We propose a new test for testing that $Y$ is independent of $(X_i)_{iin mathcal{I }backslash V}$ conditionally to $(X_i)_{iin V}$ against the general alternative that it is not. This procedure does not depend on any prior information on the covariance of $X$ or the variance of $Y$ and applies in a high-dimensional setting. It straightforwardly extends to test the neighbourhood of a Gaussian graphical model. The procedure is based on a model of Gaussian regression with random Gaussian covariates. We give non asymptotic properties of the test and we prove that it is rate optimal (up to a possible $log(n)$ factor) over various classes of alternatives under some additional assumptions. Besides, it allows us to derive non asymptotic minimax rates of testing in this setting. Finally, we carry out a simulation study in order to evaluate the performance of our procedure.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا