No Arabic abstract
We consider the set Bp of parametric block correlation matrices with p blocks of various (and possibly different) sizes, whose diagonal blocks are compound symmetry (CS) correlation matrices and off-diagonal blocks are constant matrices. Such matrices appear in probabilistic models on categorical data, when the levels are partitioned in p groups, assuming a constant correlation within a group and a constant correlation for each pair of groups. We obtain two necessary and sufficient conditions for positive definiteness of elements of Bp. Firstly we consider the block average map $phi$, consisting in replacing a block by its mean value. We prove that for any A $in$ Bp , A is positive definite if and only if $phi$(A) is positive definite. Hence it is equivalent to check the validity of the covariance matrix of group means, which only depends on the number of groups and not on their sizes. This theorem can be extended to a wider set of block matrices. Secondly, we consider the subset of Bp for which the between group correlation is the same for all pairs of groups. Positive definiteness then comes down to find the positive definite interval of a matrix pencil on Sp. We obtain a simple characterization by localizing the roots of the determinant with within group correlation values.
Statistical inferences for sample correlation matrices are important in high dimensional data analysis. Motivated by this, this paper establishes a new central limit theorem (CLT) for a linear spectral statistic (LSS) of high dimensional sample correlation matrices for the case where the dimension p and the sample size $n$ are comparable. This result is of independent interest in large dimensional random matrix theory. Meanwhile, we apply the linear spectral statistic to an independence test for $p$ random variables, and then an equivalence test for p factor loadings and $n$ factors in a factor model. The finite sample performance of the proposed test shows its applicability and effectiveness in practice. An empirical application to test the independence of household incomes from different cities in China is also conducted.
Sample correlation matrices are employed ubiquitously in statistics. However, quite surprisingly, little is known about their asymptotic spectral properties for high-dimensional data, particularly beyond the case of null models for which the data is assumed independent. Here, considering the popular class of spiked models, we apply random matrix theory to derive asymptotic first-order and distributional results for both the leading eigenvalues and eigenvectors of sample correlation matrices. These results are obtained under high-dimensional settings for which the number of samples n and variables p approach infinity, with p/n tending to a constant. To first order, the spectral properties of sample correlation matrices are seen to coincide with those of sample covariance matrices; however their asymptotic distributions can differ significantly, with fluctuations of both the sample eigenvalues and eigenvectors often being remarkably smaller than those of their sample covariance counterparts.
In this paper, we analyse singular values of a large $ptimes n$ data matrix $mathbf{X}_n= (mathbf{x}_{n1},ldots,mathbf{x}_{nn})$ where the column $mathbf{x}_{nj}$s are independent $p$-dimensional vectors, possibly with different distributions. Such data matrices are common in high-dimensional statistics. Under a key assumption that the covariance matrices $mathbf{Sigma}_{nj}=text{Cov}(mathbf{x}_{nj})$ can be asymptotically simultaneously diagonalizable, and appropriate convergence of their spectra, we establish a limiting distribution for the singular values of $mathbf{X}_n$ when both dimension $p$ and $n$ grow to infinity in a comparable magnitude. The matrix model goes beyond and includes many existing works on different types of sample covariance matrices, including the weighted sample covariance matrix, the Gram matrix model and the sample covariance matrix of linear times series models. Furthermore, we develop two applications of our general approach. First, we obtain the existence and uniqueness of a new limiting spectral distribution of realized covariance matrices for a multi-dimensional diffusion process with anisotropic time-varying co-volatility processes. Secondly, we derive the limiting spectral distribution for singular values of the data matrix for a recent matrix-valued auto-regressive model. Finally, for a generalized finite mixture model, the limiting spectral distribution for singular values of the data matrix is obtained.
The concordance signature of a multivariate continuous distribution is the vector of concordance probabilities for margins of all orders; it underlies the bivariate and multivariate Kendalls tau measure of concordance. It is shown that every attainable concordance signature is equal to the concordance signature of a unique mixture of the extremal copulas, that is the copulas with extremal correlation matrices consisting exclusively of 1s and -1s. This result establishes that the set of attainable Kendall rank correlation matrices of multivariate continuous distributions in arbitrary dimension is the set of convex combinations of extremal correlation matrices, a set known as the cut polytope. A methodology for testing the attainability of concordance signatures using linear optimization and convex analysis is provided. The elliptical copulas are shown to yield a strict subset of the attainable concordance signatures as well as a strict subset of the attainable Kendall rank correlation matrices; the Student t copula is seen to converge to a mixture of extremal copulas sharing its concordance signature with all elliptical distributions that have the same correlation matrix. A method of estimating an attainable concordance signature from data is derived and shown to correspond to using standard estimates of Kendalls tau in the absence of ties. The methodology has application to Monte Carlo simulations of dependent random variables as well as expert elicitation of consistent systems of Kendalls tau dependence measures.
Chatterjee (2021) introduced a simple new rank correlation coefficient that has attracted much recent attention. The coefficient has the unusual appeal that it not only estimates a population quantity first proposed by Dette et al. (2013) that is zero if and only if the underlying pair of random variables is independent, but also is asymptotically normal under independence. This paper compares Chatterjees new correlation coefficient to three established rank correlations that also facilitate consistent tests of independence, namely, Hoeffdings $D$, Blum-Kiefer-Rosenblatts $R$, and Bergsma-Dassios-Yanagimotos $tau^*$. We contrast their computational efficiency in light of recent advances, and investigate their power against local rotation and mixture alternatives. Our main results show that Chatterjees coefficient is unfortunately rate sub-optimal compared to $D$, $R$, and $tau^*$. The situation is more subtle for a related earlier estimator of Dette et al. (2013). These results favor $D$, $R$, and $tau^*$ over Chatterjees new correlation coefficient for the purpose of testing independence.