No Arabic abstract
By studying the family of $p$-dimensional scale mixtures, this paper shows for the first time a non trivial example where the eigenvalue distribution of the corresponding sample covariance matrix {em does not converge} to the celebrated Marv{c}enko-Pastur law. A different and new limit is found and characterized. The reasons of failure of the Marv{c}enko-Pastur limit in this situation are found to be a strong dependence between the $p$-coordinates of the mixture. Next, we address the problem of testing whether the mixture has a spherical covariance matrix. To analize the traditional Johns type test we establish a novel and general CLT for linear statistics of eigenvalues of the sample covariance matrix. It is shown that the Johns test and its recent high-dimensional extensions both fail for high-dimensional mixtures, precisely due to the different spectral limit above. As a remedy, a new test procedure is constructed afterwards for the sphericity hypothesis. This test is then applied to identify the covariance structure in model-based clustering. It is shown that the test has much higher power than the widely used ICL and BIC criteria in detecting non spherical component covariance matrices of a high-dimensional mixture.
Covariance matrix testing for high dimensional data is a fundamental problem. A large class of covariance test statistics based on certain averaged spectral statistics of the sample covariance matrix are known to obey central limit theorems under the null. However, precise understanding for the power behavior of the corresponding tests under general alternatives remains largely unknown. This paper develops a general method for analyzing the power behavior of covariance test statistics via accurate non-asymptotic power expansions. We specialize our general method to two prototypical settings of testing identity and sphericity, and derive sharp power expansion for a number of widely used tests, including the likelihood ratio tests, Ledoit-Nagao-Wolfs test, Cai-Mas test and Johns test. The power expansion for each of those tests holds uniformly over all possible alternatives under mild growth conditions on the dimension-to-sample ratio. Interestingly, although some of those tests are previously known to share the same limiting power behavior under spiked covariance alternatives with a fixed number of spikes, our new power characterizations indicate that such equivalence fails when many spikes exist. The proofs of our results combine techniques from Poincare-type inequalities, random matrices and zonal polynomials.
We propose a Bayesian methodology for estimating spiked covariance matrices with jointly sparse structure in high dimensions. The spiked covariance matrix is reparametrized in terms of the latent factor model, where the loading matrix is equipped with a novel matrix spike-and-slab LASSO prior, which is a continuous shrinkage prior for modeling jointly sparse matrices. We establish the rate-optimal posterior contraction for the covariance matrix with respect to the operator norm as well as that for the principal subspace with respect to the projection operator norm loss. We also study the posterior contraction rate of the principal subspace with respect to the two-to-infinity norm loss, a novel loss function measuring the distance between subspaces that is able to capture element-wise eigenvector perturbations. We show that the posterior contraction rate with respect to the two-to-infinity norm loss is tighter than that with respect to the routinely used projection operator norm loss under certain low-rank and bounded coherence conditions. In addition, a point estimator for the principal subspace is proposed with the rate-optimal risk bound with respect to the projection operator norm loss. These results are based on a collection of concentration and large deviation inequalities for the matrix spike-and-slab LASSO prior. The numerical performance of the proposed methodology is assessed through synthetic examples and the analysis of a real-world face data example.
We consider testing the equality of two high-dimensional covariance matrices by carrying out a multi-level thresholding procedure, which is designed to detect sparse and faint differences between the covariances. A novel U-statistic composition is developed to establish the asymptotic distribution of the thresholding statistics in conjunction with the matrix blocking and the coupling techniques. We propose a multi-thresholding test that is shown to be powerful in detecting sparse and weak differences between two covariance matrices. The test is shown to have attractive detection boundary and to attain the optimal minimax rate in the signal strength under different regimes of high dimensionality and the sparsity of the signal. Simulation studies are conducted to demonstrate the utility of the proposed test.
Testing large covariance matrices is of fundamental importance in statistical analysis with high-dimensional data. In the past decade, three types of test statistics have been studied in the literature: quadratic form statistics, maximum form statistics, and their weighted combination. It is known that quadratic form statistics would suffer from low power against sparse alternatives and maximum form statistics would suffer from low power against dense alternatives. The weighted combination methods were introduced to enhance the power of quadratic form statistics or maximum form statistics when the weights are appropriately chosen. In this paper, we provide a new perspective to exploit the full potential of quadratic form statistics and maximum form statistics for testing high-dimensional covariance matrices. We propose a scale-invariant power enhancement test based on Fishers method to combine the p-values of quadratic form statistics and maximum form statistics. After carefully studying the asymptotic joint distribution of quadratic form statistics and maximum form statistics, we prove that the proposed combination method retains the correct asymptotic size and boosts the power against more general alternatives. Moreover, we demonstrate the finite-sample performance in simulation studies and a real application.
Testing heteroscedasticity of the errors is a major challenge in high-dimensional regressions where the number of covariates is large compared to the sample size. Traditional procedures such as the White and the Breusch-Pagan tests typically suffer from low sizes and powers. This paper proposes two new test procedures based on standard OLS residuals. Using the theory of random Haar orthogonal matrices, the asymptotic normality of both test statistics is obtained under the null when the degree of freedom tends to infinity. This encompasses both the classical low-dimensional setting where the number of variables is fixed while the sample size tends to infinity, and the proportional high-dimensional setting where these dimensions grow to infinity proportionally. These procedures thus offer a wide coverage of dimensions in applications. To our best knowledge, this is the first procedures in the literature for testing heteroscedasticity which are valid for medium and high-dimensional regressions. The superiority of our proposed tests over the existing methods are demonstrated by extensive simulations and by several real data analyses as well.