No Arabic abstract
We study the asymptotic distributions of the spiked eigenvalues and the largest nonspiked eigenvalue of the sample covariance matrix under a general covariance matrix model with divergent spiked eigenvalues, while the other eigenvalues are bounded but otherwise arbitrary. The limiting normal distribution for the spiked sample eigenvalues is established. It has distinct features that the asymptotic mean relies on not only the population spikes but also the nonspikes and that the asymptotic variance in general depends on the population eigenvectors. In addition, the limiting Tracy-Widom law for the largest nonspiked sample eigenvalue is obtained. Estimation of the number of spikes and the convergence of the leading eigenvectors are also considered. The results hold even when the number of the spikes diverges. As a key technical tool, we develop a Central Limit Theorem for a type of random quadratic forms where the random vectors and random matrices involved are dependent. This result can be of independent interest.
We consider general high-dimensional spiked sample covariance models and show that their leading sample spiked eigenvalues and their linear spectral statistics are asymptotically independent when the sample size and dimension are proportional to each other. As a byproduct, we also establish the central limit theorem of the leading sample spiked eigenvalues by removing the block diagonal assumption on the population covariance matrix, which is commonly needed in the literature. Moreover, we propose consistent estimators of the $L_4$ norm of the spiked population eigenvectors. Based on these results, we develop a new statistic to test the equality of two spiked population covariance matrices. Numerical studies show that the new test procedure is more powerful than some existing methods.
We consider large complex random sample covariance matrices obtained from spiked populations, that is when the true covariance matrix is diagonal with all but finitely many eigenvalues equal to one. We investigate the limiting behavior of the largest eigenvalues when the population and the sample sizes both become large. Under some conditions on moments of the sample distribution, we prove that the asymptotic fluctuations of the largest eigenvalues are the same as for a complex Gaussian sample with the same true covariance. The real setting is also considered.
In this paper, we study the asymptotic behavior of the extreme eigenvalues and eigenvectors of the high dimensional spiked sample covariance matrices, in the supercritical case when a reliable detection of spikes is possible. Especially, we derive the joint distribution of the extreme eigenvalues and the generalized components of the associated eigenvectors, i.e., the projections of the eigenvectors onto arbitrary given direction, assuming that the dimension and sample size are comparably large. In general, the joint distribution is given in terms of linear combinations of finitely many Gaussian and Chi-square variables, with parameters depending on the projection direction and the spikes. Our assumption on the spikes is fully general. First, the strengths of spikes are only required to be slightly above the critical threshold and no upper bound on the strengths is needed. Second, multiple spikes, i.e., spikes with the same strength, are allowed. Third, no structural assumption is imposed on the spikes. Thanks to the general setting, we can then apply the results to various high dimensional statistical hypothesis testing problems involving both the eigenvalues and eigenvectors. Specifically, we propose accurate and powerful statistics to conduct hypothesis testing on the principal components. These statistics are data-dependent and adaptive to the underlying true spikes. Numerical simulations also confirm the accuracy and powerfulness of our proposed statistics and illustrate significantly better performance compared to the existing methods in the literature. Especially, our methods are accurate and powerful even when either the spikes are small or the dimension is large.
Consider two $p$-variate populations, not necessarily Gaussian, with covariance matrices $Sigma_1$ and $Sigma_2$, respectively, and let $S_1$ and $S_2$ be the sample covariances matrices from samples of the populations with degrees of freedom $T$ and $n$, respectively. When the difference $Delta$ between $Sigma_1$ and $Sigma_2$ is of small rank compared to $p,T$ and $n$, the Fisher matrix $F=S_2^{-1}S_1$ is called a {em spiked Fisher matrix}. When $p,T$ and $n$ grow to infinity proportionally, we establish a phase transition for the extreme eigenvalues of $F$: when the eigenvalues of $Delta$ ({em spikes}) are above (or under) a critical value, the associated extreme eigenvalues of the Fisher matrix will converge to some point outside the support of the global limit (LSD) of other eigenvalues; otherwise, they will converge to the edge points of the LSD. Furthermore, we derive central limit theorems for these extreme eigenvalues of the spiked Fisher matrix. The limiting distributions are found to be Gaussian if and only if the corresponding population spike eigenvalues in $Delta$ are {em simple}. Numerical examples are provided to demonstrate the finite sample performance of the results. In addition to classical applications of a Fisher matrix in high-dimensional data analysis, we propose a new method for the detection of signals allowing an arbitrary covariance structure of the noise. Simulation experiments are conducted to illustrate the performance of this detector.
In this paper, we study limiting laws and consistent estimation criteria for the extreme eigenvalues in a spiked covariance model of dimension $p$. Firstly, for fixed $p$, we propose a generalized estimation criterion that can consistently estimate, $k$, the number of spiked eigenvalues. Compared with the existing literature, we show that consistency can be achieved under weaker conditions on the penalty term. Next, allowing both $p$ and $k$ to diverge, we derive limiting distributions of the spiked sample eigenvalues using random matrix theory techniques. Notably, our results do not require the spiked eigenvalues to be uniformly bounded from above or tending to infinity, as have been assumed in the existing literature. Based on the above derived results, we formulate a generalized estimation criterion and show that it can consistently estimate $k$, while $k$ can be fixed or grow at an order of $k=o(n^{1/3})$. We further show that the results in our work continue to hold under a general population distribution without assuming normality. The efficacy of the proposed estimation criteria is illustrated through comparative simulation studies.