A Pairwise Hotelling Method for Testing High-Dimensional Mean Vectors

82 0 0.0 ( 0 )

Download Cite

Added by Tiejun Tong

Publication date 2020

fields Mathematical Statistics

and research's language is English

Authors Zongliang Hu - Tiejun Tong - Marc G. Genton

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

For high-dimensional small sample size data, Hotellings T2 test is not applicable for testing mean vectors due to the singularity problem in the sample covariance matrix. To overcome the problem, there are three main approaches in the literature. Note, however, that each of the existing approaches may have serious limitations and only works well in certain situations. Inspired by this, we propose a pairwise Hotelling method for testing high-dimensional mean vectors, which, in essence, provides a good balance between the existing approaches. To effectively utilize the correlation information, we construct the new test statistics as the summation of Hotellings test statistics for the covariate pairs with strong correlations and the squared $t$ statistics for the individual covariates that have little correlation with others. We further derive the asymptotic null distributions and power functions for the proposed Hotelling tests under some regularity conditions. Numerical results show that our new tests are able to control the type I error rates, and can achieve a higher statistical power compared to existing methods, especially when the covariates are highly correlated. Two real data examples are also analyzed and they both demonstrate the efficacy of our pairwise Hotelling tests.

rate research

Diagonal Likelihood Ratio Test for Equality of Mean Vectors in High-Dimensional Data

94 - Zongliang Hu , Tiejun Tong , Marc G. Genton 2017

We propose a likelihood ratio test framework for testing normal mean vectors in high-dimensional data under two common scenarios: the one-sample test and the two-sample test with equal covariance matrices. We derive the test statistics under the assumption that the covariance matrices follow a diagonal matrix structure. In comparison with the diagonal Hotellings tests, our proposed test statistics display some interesting characteristics. In particular, they are a summation of the log-transformed squared $t$-statistics rather than a direct summation of those components. More importantly, to derive the asymptotic normality of our test statistics under the null and local alternative hypotheses, we do not need the requirement that the covariance matrices follow a diagonal matrix structure. As a consequence, our proposed test methods are very flexible and readily applicable in practice. Simulation studies and a real data analysis are also carried out to demonstrate the advantages of our likelihood ratio test methods.

Methodology

Testing for Heteroscedasticity in High-dimensional Regressions

65 - Zhaoyuan Li , Jianfeng Yao 2015

Testing heteroscedasticity of the errors is a major challenge in high-dimensional regressions where the number of covariates is large compared to the sample size. Traditional procedures such as the White and the Breusch-Pagan tests typically suffer from low sizes and powers. This paper proposes two new test procedures based on standard OLS residuals. Using the theory of random Haar orthogonal matrices, the asymptotic normality of both test statistics is obtained under the null when the degree of freedom tends to infinity. This encompasses both the classical low-dimensional setting where the number of variables is fixed while the sample size tends to infinity, and the proportional high-dimensional setting where these dimensions grow to infinity proportionally. These procedures thus offer a wide coverage of dimensions in applications. To our best knowledge, this is the first procedures in the literature for testing heteroscedasticity which are valid for medium and high-dimensional regressions. The superiority of our proposed tests over the existing methods are demonstrated by extensive simulations and by several real data analyses as well.

Methodology

Testing for Independence of Large Dimensional Vectors

203 - Taras Bodnar , Holger Dette , Nestor Parolya 2017

In this paper new tests for the independence of two high-dimensional vectors are investigated. We consider the case where the dimension of the vectors increases with the sample size and propose multivariate analysis of variance-type statistics for the hypothesis of a block diagonal covariance matrix. The asymptotic properties of the new test statistics are investigated under the null hypothesis and the alternative hypothesis using random matrix theory. For this purpose we study the weak convergence of linear spectral statistics of central and (conditionally) non-central Fisher matrices. In particular, a central limit theorem for linear spectral statistics of large dimensional (conditionally) non-central Fisher matrices is derived which is then used to analyse the power of the tests under the alternative. The theoretical results are illustrated by means of a simulation study where we also compare the new tests with several alternative, in particular with the commonly used corrected likelihood ratio test. It is demonstrated that the latter test does not keep its nominal level, if the dimension of one sub-vector is relatively small compared to the dimension of the other sub-vector. On the other hand the tests proposed in this paper provide a reasonable approximation of the nominal level in such situations. Moreover, we observe that one of the proposed tests is most powerful under a variety of correlation scenarios.

Statistics Theory Probability Statistics Theory

On structure testing for component covariance matrices of a high-dimensional mixture

179 - Weiming Li , Jianfeng Yao 2017

By studying the family of $p$-dimensional scale mixtures, this paper shows for the first time a non trivial example where the eigenvalue distribution of the corresponding sample covariance matrix {em does not converge} to the celebrated Marv{c}enko-Pastur law. A different and new limit is found and characterized. The reasons of failure of the Marv{c}enko-Pastur limit in this situation are found to be a strong dependence between the $p$-coordinates of the mixture. Next, we address the problem of testing whether the mixture has a spherical covariance matrix. To analize the traditional Johns type test we establish a novel and general CLT for linear statistics of eigenvalues of the sample covariance matrix. It is shown that the Johns test and its recent high-dimensional extensions both fail for high-dimensional mixtures, precisely due to the different spectral limit above. As a remedy, a new test procedure is constructed afterwards for the sphericity hypothesis. This test is then applied to identify the covariance structure in model-based clustering. It is shown that the test has much higher power than the widely used ICL and BIC criteria in detecting non spherical component covariance matrices of a high-dimensional mixture.

Methodology

Testing linear hypotheses in high-dimensional regressions

150 - Z. Bai , D. Jiang , J. Yao 2012

For a multivariate linear model, Wilks likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative requires complex analytic approximations and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say $ple 20$. On the other hand, assuming that the data dimension $p$ as well as the number $q$ of regression variables are fixed while the sample size $n$ grows, several asymptotic approximations are proposed in the literature for Wilks $bLa$ including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilks test in a high-dimensional context, specifically assuming a high data dimension $p$ and a large sample size $n$. Based on recent random matrix theory, the correction we propose to Wilks test is asymptotically Gaussian under the null and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large $p$ and large $n$ context, but also for moderately large data dimensions like $p=30$ or $p=50$. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in MANOVA which is valid for high-dimensional data.

Methodology Statistics Theory Statistics Theory