No Arabic abstract
In this paper, we analyse singular values of a large $ptimes n$ data matrix $mathbf{X}_n= (mathbf{x}_{n1},ldots,mathbf{x}_{nn})$ where the column $mathbf{x}_{nj}$s are independent $p$-dimensional vectors, possibly with different distributions. Such data matrices are common in high-dimensional statistics. Under a key assumption that the covariance matrices $mathbf{Sigma}_{nj}=text{Cov}(mathbf{x}_{nj})$ can be asymptotically simultaneously diagonalizable, and appropriate convergence of their spectra, we establish a limiting distribution for the singular values of $mathbf{X}_n$ when both dimension $p$ and $n$ grow to infinity in a comparable magnitude. The matrix model goes beyond and includes many existing works on different types of sample covariance matrices, including the weighted sample covariance matrix, the Gram matrix model and the sample covariance matrix of linear times series models. Furthermore, we develop two applications of our general approach. First, we obtain the existence and uniqueness of a new limiting spectral distribution of realized covariance matrices for a multi-dimensional diffusion process with anisotropic time-varying co-volatility processes. Secondly, we derive the limiting spectral distribution for singular values of the data matrix for a recent matrix-valued auto-regressive model. Finally, for a generalized finite mixture model, the limiting spectral distribution for singular values of the data matrix is obtained.
Covariance matrix testing for high dimensional data is a fundamental problem. A large class of covariance test statistics based on certain averaged spectral statistics of the sample covariance matrix are known to obey central limit theorems under the null. However, precise understanding for the power behavior of the corresponding tests under general alternatives remains largely unknown. This paper develops a general method for analyzing the power behavior of covariance test statistics via accurate non-asymptotic power expansions. We specialize our general method to two prototypical settings of testing identity and sphericity, and derive sharp power expansion for a number of widely used tests, including the likelihood ratio tests, Ledoit-Nagao-Wolfs test, Cai-Mas test and Johns test. The power expansion for each of those tests holds uniformly over all possible alternatives under mild growth conditions on the dimension-to-sample ratio. Interestingly, although some of those tests are previously known to share the same limiting power behavior under spiked covariance alternatives with a fixed number of spikes, our new power characterizations indicate that such equivalence fails when many spikes exist. The proofs of our results combine techniques from Poincare-type inequalities, random matrices and zonal polynomials.
The Riemannian geometry of covariance matrices has been essential to several successful applications, in computer vision, biomedical signal and image processing, and radar data processing. For these applications, an important ongoing challenge is to develop Riemannian-geometric tools which are adapted to structured covariance matrices. The present paper proposes to meet this challenge by introducing a new class of probability distributions, Gaussian distributions of structured covariance matrices. These are Riemannian analogs of Gaussian distributions, which only sample from covariance matrices having a preassigned structure, such as complex, Toeplitz, or block-Toeplitz. The usefulness of these distributions stems from three features: (1) they are completely tractable, analytically or numerically, when dealing with large covariance matrices, (2) they provide a statistical foundation to the concept of structured Riemannian barycentre (i.e. Frechet or geometric mean), (3) they lead to efficient statistical learning algorithms, which realise, among others, density estimation and classification of structured covariance matrices. The paper starts from the observation that several spaces of structured covariance matrices, considered from a geometric point of view, are Riemannian symmetric spaces. Accordingly, it develops an original theory of Gaussian distributions on Riemannian symmetric spaces, of their statistical inference, and of their relationship to the concept of Riemannian barycentre. Then, it uses this original theory to give a detailed description of Gaussian distributions of three kinds of structured covariance matrices, complex, Toeplitz, and block-Toeplitz. Finally, it describes algorithms for density estimation and classification of structured covariance matrices, based on Gaussian distribution mixture models.
The concordance signature of a multivariate continuous distribution is the vector of concordance probabilities for margins of all orders; it underlies the bivariate and multivariate Kendalls tau measure of concordance. It is shown that every attainable concordance signature is equal to the concordance signature of a unique mixture of the extremal copulas, that is the copulas with extremal correlation matrices consisting exclusively of 1s and -1s. This result establishes that the set of attainable Kendall rank correlation matrices of multivariate continuous distributions in arbitrary dimension is the set of convex combinations of extremal correlation matrices, a set known as the cut polytope. A methodology for testing the attainability of concordance signatures using linear optimization and convex analysis is provided. The elliptical copulas are shown to yield a strict subset of the attainable concordance signatures as well as a strict subset of the attainable Kendall rank correlation matrices; the Student t copula is seen to converge to a mixture of extremal copulas sharing its concordance signature with all elliptical distributions that have the same correlation matrix. A method of estimating an attainable concordance signature from data is derived and shown to correspond to using standard estimates of Kendalls tau in the absence of ties. The methodology has application to Monte Carlo simulations of dependent random variables as well as expert elicitation of consistent systems of Kendalls tau dependence measures.
We consider the set Bp of parametric block correlation matrices with p blocks of various (and possibly different) sizes, whose diagonal blocks are compound symmetry (CS) correlation matrices and off-diagonal blocks are constant matrices. Such matrices appear in probabilistic models on categorical data, when the levels are partitioned in p groups, assuming a constant correlation within a group and a constant correlation for each pair of groups. We obtain two necessary and sufficient conditions for positive definiteness of elements of Bp. Firstly we consider the block average map $phi$, consisting in replacing a block by its mean value. We prove that for any A $in$ Bp , A is positive definite if and only if $phi$(A) is positive definite. Hence it is equivalent to check the validity of the covariance matrix of group means, which only depends on the number of groups and not on their sizes. This theorem can be extended to a wider set of block matrices. Secondly, we consider the subset of Bp for which the between group correlation is the same for all pairs of groups. Positive definiteness then comes down to find the positive definite interval of a matrix pencil on Sp. We obtain a simple characterization by localizing the roots of the determinant with within group correlation values.
Consider two $p$-variate populations, not necessarily Gaussian, with covariance matrices $Sigma_1$ and $Sigma_2$, respectively, and let $S_1$ and $S_2$ be the sample covariances matrices from samples of the populations with degrees of freedom $T$ and $n$, respectively. When the difference $Delta$ between $Sigma_1$ and $Sigma_2$ is of small rank compared to $p,T$ and $n$, the Fisher matrix $F=S_2^{-1}S_1$ is called a {em spiked Fisher matrix}. When $p,T$ and $n$ grow to infinity proportionally, we establish a phase transition for the extreme eigenvalues of $F$: when the eigenvalues of $Delta$ ({em spikes}) are above (or under) a critical value, the associated extreme eigenvalues of the Fisher matrix will converge to some point outside the support of the global limit (LSD) of other eigenvalues; otherwise, they will converge to the edge points of the LSD. Furthermore, we derive central limit theorems for these extreme eigenvalues of the spiked Fisher matrix. The limiting distributions are found to be Gaussian if and only if the corresponding population spike eigenvalues in $Delta$ are {em simple}. Numerical examples are provided to demonstrate the finite sample performance of the results. In addition to classical applications of a Fisher matrix in high-dimensional data analysis, we propose a new method for the detection of signals allowing an arbitrary covariance structure of the noise. Simulation experiments are conducted to illustrate the performance of this detector.