High Dimensional Correlation Matrices: CLT and Its Applications

145 0 0.0 ( 0 )

Download Cite

Added by Guangming Pan

Publication date 2014

fields Mathematical Statistics

and research's language is English

Authors Jiti Gao - Xiao Han - Guangming Pan

Statistics Theory Statistics Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Statistical inferences for sample correlation matrices are important in high dimensional data analysis. Motivated by this, this paper establishes a new central limit theorem (CLT) for a linear spectral statistic (LSS) of high dimensional sample correlation matrices for the case where the dimension p and the sample size $n$ are comparable. This result is of independent interest in large dimensional random matrix theory. Meanwhile, we apply the linear spectral statistic to an independence test for $p$ random variables, and then an equivalence test for p factor loadings and $n$ factors in a factor model. The finite sample performance of the proposed test shows its applicability and effectiveness in practice. An empirical application to test the independence of household incomes from different cities in China is also conducted.

rate research

Joint CLT for eigenvalue statistics from several dependent large dimensional sample covariance matrices with application

86 - Weiming Li , Zeng Li , Jianfeng Yao 2018

Let $mathbf{X}_n=(x_{ij})$ be a $k times n$ data matrix with complex-valued, independent and standardized entries satisfying a Lindeberg-type moment condition. We consider simultaneously $R$ sample covariance matrices $mathbf{B}_{nr}=frac1n mathbf{Q}_r mathbf{X}_n mathbf{X}_n^*mathbf{Q}_r^top,~1le rle R$, where the $mathbf{Q}_{r}$s are nonrandom real matrices with common dimensions $ptimes k~(kgeq p)$. Assuming that both the dimension $p$ and the sample size $n$ grow to infinity, the limiting distributions of the eigenvalues of the matrices ${mathbf{B}_{nr}}$ are identified, and as the main result of the paper, we establish a joint central limit theorem for linear spectral statistics of the $R$ matrices ${mathbf{B}_{nr}}$. Next, this new CLT is applied to the problem of testing a high dimensional white noise in time series modelling. In experiments the derived test has a controlled size and is significantly faster than the classical permutation test, though it does have lower power. This application highlights the necessity of such joint CLT in the presence of several dependent sample covariance matrices. In contrast, all the existing works on CLT for linear spectral statistics of large sample covariance matrices deal with a single sample covariance matrix ($R=1$).

Statistics Theory Statistics Theory

Asymptotics of eigenstructure of sample correlation matrices for high-dimensional spiked models

84 - David Morales-Jimenez , Iain M. Johnstone , Matthew R. McKay 2018

Sample correlation matrices are employed ubiquitously in statistics. However, quite surprisingly, little is known about their asymptotic spectral properties for high-dimensional data, particularly beyond the case of null models for which the data is assumed independent. Here, considering the popular class of spiked models, we apply random matrix theory to derive asymptotic first-order and distributional results for both the leading eigenvalues and eigenvectors of sample correlation matrices. These results are obtained under high-dimensional settings for which the number of samples n and variables p approach infinity, with p/n tending to a constant. To first order, the spectral properties of sample correlation matrices are seen to coincide with those of sample covariance matrices; however their asymptotic distributions can differ significantly, with fluctuations of both the sample eigenvalues and eigenvectors often being remarkably smaller than those of their sample covariance counterparts.

Statistics Theory Statistics Theory

A unified matrix model including both CCA and F matrices in multivariate analysis: the largest eigenvalue and its applications

133 - Xiao Han , Guangming Pan , Qing Yang 2016

Let $bbZ_{M_1times N}=bbT^{frac{1}{2}}bbX$ where $(bbT^{frac{1}{2}})^2=bbT$ is a positive definite matrix and $bbX$ consists of independent random variables with mean zero and variance one. This paper proposes a unified matrix model $$bold{bbom}=(bbZbbU_2bbU_2^TbbZ^T)^{-1}bbZbbU_1bbU_1^TbbZ^T,$$ where $bbU_1$ and $bbU_2$ are isometric with dimensions $Ntimes N_1$ and $Ntimes (N-N_2)$ respectively such that $bbU_1^TbbU_1=bbI_{N_1}$, $bbU_2^TbbU_2=bbI_{N-N_2}$ and $bbU_1^TbbU_2=0$. Moreover, $bbU_1$ and $bbU_2$ (random or non-random) are independent of $bbZ_{M_1times N}$ and with probability tending to one, $rank(bbU_1)=N_1$ and $rank(bbU_2)=N-N_2$. We establish the asymptotic Tracy-Widom distribution for its largest eigenvalue under moment assumptions on $bbX$ when $N_1,N_2$ and $M_1$ are comparable. By selecting appropriate matrices $bbU_1$ and $bbU_2$, the asymptotic distributions of the maximum eigenvalues of the matrices used in Canonical Correlation Analysis (CCA) and of F matrices (including centered and non-center

Statistics Theory Statistics Theory

A general method for power analysis in testing high dimensional covariance matrices

106 - Qiyang Han , Tiefeng Jiang , Yandi Shen 2021

Covariance matrix testing for high dimensional data is a fundamental problem. A large class of covariance test statistics based on certain averaged spectral statistics of the sample covariance matrix are known to obey central limit theorems under the null. However, precise understanding for the power behavior of the corresponding tests under general alternatives remains largely unknown. This paper develops a general method for analyzing the power behavior of covariance test statistics via accurate non-asymptotic power expansions. We specialize our general method to two prototypical settings of testing identity and sphericity, and derive sharp power expansion for a number of widely used tests, including the likelihood ratio tests, Ledoit-Nagao-Wolfs test, Cai-Mas test and Johns test. The power expansion for each of those tests holds uniformly over all possible alternatives under mild growth conditions on the dimension-to-sample ratio. Interestingly, although some of those tests are previously known to share the same limiting power behavior under spiked covariance alternatives with a fixed number of spikes, our new power characterizations indicate that such equivalence fails when many spikes exist. The proofs of our results combine techniques from Poincare-type inequalities, random matrices and zonal polynomials.

Statistics Theory Statistics Theory

Canonical correlation coefficients of high-dimensional Gaussian vectors: finite rank case

70 - Zhigang Bao , Jiang Hu , Guangming Pan 2017

Consider a Gaussian vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively, where both $p$ and $q$ are proportional to the sample size $n$. Denote by $Sigma_{mathbf{u}mathbf{v}}$ the population cross-covariance matrix of random vectors $mathbf{u}$ and $mathbf{v}$, and denote by $S_{mathbf{u}mathbf{v}}$ the sample counterpart. The canonical correlation coefficients between $mathbf{x}$ and $mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $Sigma_{mathbf{x}mathbf{x}}^{-1}Sigma_{mathbf{x}mathbf{y}}Sigma_{mathbf{y}mathbf{y}}^{-1}Sigma_{mathbf{y}mathbf{x}}$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. We study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest $k$ eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, denoted by $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_{+},1]$. We also obtain the limiting distribution of $lambda_i$s under appropriate normalization. Specifically, $lambda_i$ possesses Gaussian type fluctuation if $r_i>r_c$, and follows Tracy-Widom distribution if $r_i<r_c$. Some applications of our results are also discussed.

Statistics Theory Statistics Theory

comments

Fetching comments

National Institute of Agronomic Research of Algeria

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

High Dimensional Correlation Matrices: CLT and Its Applications

Ask ChatGPT about the research

No Arabic abstract

Read More