ترغب بنشر مسار تعليمي؟ اضغط هنا

Testing for Independence of Large Dimensional Vectors

204   0   0.0 ( 0 )
 نشر من قبل Nestor Parolya Jun.-Prof. Dr.
 تاريخ النشر 2017
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper new tests for the independence of two high-dimensional vectors are investigated. We consider the case where the dimension of the vectors increases with the sample size and propose multivariate analysis of variance-type statistics for the hypothesis of a block diagonal covariance matrix. The asymptotic properties of the new test statistics are investigated under the null hypothesis and the alternative hypothesis using random matrix theory. For this purpose we study the weak convergence of linear spectral statistics of central and (conditionally) non-central Fisher matrices. In particular, a central limit theorem for linear spectral statistics of large dimensional (conditionally) non-central Fisher matrices is derived which is then used to analyse the power of the tests under the alternative. The theoretical results are illustrated by means of a simulation study where we also compare the new tests with several alternative, in particular with the commonly used corrected likelihood ratio test. It is demonstrated that the latter test does not keep its nominal level, if the dimension of one sub-vector is relatively small compared to the dimension of the other sub-vector. On the other hand the tests proposed in this paper provide a reasonable approximation of the nominal level in such situations. Moreover, we observe that one of the proposed tests is most powerful under a variety of correlation scenarios.



قيم البحث

اقرأ أيضاً

Consider a normal vector $mathbf{z}=(mathbf{x},mathbf{y})$, consisting of two sub-vectors $mathbf{x}$ and $mathbf{y}$ with dimensions $p$ and $q$ respectively. With $n$ independent observations of $mathbf{z}$ at hand, we study the correlation between $mathbf{x}$ and $mathbf{y}$, from the perspective of the Canonical Correlation Analysis, under the high-dimensional setting: both $p$ and $q$ are proportional to the sample size $n$. In this paper, we focus on the case that $Sigma_{mathbf{x}mathbf{y}}$ is of finite rank $k$, i.e. there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_1geqcdotsgeq r_k>0$. Under the additional assumptions $(p+q)/nto yin (0,1)$ and $p/q otto 1$, we study the sample counterparts of $r_i,i=1,ldots,k$, i.e. the largest k eigenvalues of the sample canonical correlation matrix $S_{mathbf{x}mathbf{x}}^{-1}S_{mathbf{x}mathbf{y}}S_{mathbf{y}mathbf{y}}^{-1}S_{mathbf{y}mathbf{x}}$, namely $lambda_1geqcdotsgeq lambda_k$. We show that there exists a threshold $r_cin(0,1)$, such that for each $iin{1,ldots,k}$, when $r_ileq r_c$, $lambda_i$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_r$. When $r_i>r_c$, $lambda_i$ possesses an almost sure limit in $(d_r,1]$, from which we can recover $r_i$ in turn, thus provide an estimate of the latter in the high-dimensional scenario.
Dependence measures based on reproducing kernel Hilbert spaces, also known as Hilbert-Schmidt Independence Criterion and denoted HSIC, are widely used to statistically decide whether or not two random vectors are dependent. Recently, non-parametric H SIC-based statistical tests of independence have been performed. However, these tests lead to the question of the choice of the kernels associated to the HSIC. In particular, there is as yet no method to objectively select specific kernels with theoretical guarantees in terms of first and second kind errors. One of the main contributions of this work is to develop a new HSIC-based aggregated procedure which avoids such a kernel choice, and to provide theoretical guarantees for this procedure. To achieve this, we first introduce non-asymptotic single tests based on Gaussian kernels with a given bandwidth, which are of prescribed level $alpha in (0,1)$. From a theoretical point of view, we upper-bound their uniform separation rate of testing over Sobolev and Nikolskii balls. Then, we aggregate several single tests, and obtain similar upper-bounds for the uniform separation rate of the aggregated procedure over the same regularity spaces. Another main contribution is that we provide a lower-bound for the non-asymptotic minimax separation rate of testing over Sobolev balls, and deduce that the aggregated procedure is adaptive in the minimax sense over such regularity spaces. Finally, from a practical point of view, we perform numerical studies in order to assess the efficiency of our aggregated procedure and compare it to existing independence tests in the literature.
Consider a random vector $mathbf{y}=mathbf{Sigma}^{1/2}mathbf{x}$, where the $p$ elements of the vector $mathbf{x}$ are i.i.d. real-valued random variables with zero mean and finite fourth moment, and $mathbf{Sigma}^{1/2}$ is a deterministic $ptimes p$ matrix such that the spectral norm of the population correlation matrix $mathbf{R}$ of $mathbf{y}$ is uniformly bounded. In this paper, we find that the log determinant of the sample correlation matrix $hat{mathbf{R}}$ based on a sample of size $n$ from the distribution of $mathbf{y}$ satisfies a CLT (central limit theorem) for $p/nto gammain (0, 1]$ and $pleq n$. Explicit formulas for the asymptotic mean and variance are provided. In case the mean of $mathbf{y}$ is unknown, we show that after recentering by the empirical mean the obtained CLT holds with a shift in the asymptotic mean. This result is of independent interest in both large dimensional random matrix theory and high-dimensional statistical literature of large sample correlation matrices for non-normal data. At last, the obtained findings are applied for testing of uncorrelatedness of $p$ random variables. Surprisingly, in the null case $mathbf{R}=mathbf{I}$, the test statistic becomes completely pivotal and the extensive simulations show that the obtained CLT also holds if the moments of order four do not exist at all, which conjectures a promising and robust test statistic for heavy-tailed high-dimensional data.
233 - Shige Peng , Quan Zhou 2019
The G-normal distribution was introduced by Peng [2007] as the limiting distribution in the central limit theorem for sublinear expectation spaces. Equivalently, it can be interpreted as the solution to a stochastic control problem where we have a se quence of random variables, whose variances can be chosen based on all past information. In this note we study the tail behavior of the G-normal distribution through analyzing a nonlinear heat equation. Asymptotic results are provided so that the tail probabilities can be easily evaluated with high accuracy. This study also has a significant impact on the hypothesis testing theory for heteroscedastic data; we show that even if the data are generated under the null hypothesis, it is possible to cheat and attain statistical significance by sequentially manipulating the error variances of the observations.
In this work we construct an optimal shrinkage estimator for the precision matrix in high dimensions. We consider the general asymptotics when the number of variables $prightarrowinfty$ and the sample size $nrightarrowinfty$ so that $p/nrightarrow ci n (0, +infty)$. The precision matrix is estimated directly, without inverting the corresponding estimator for the covariance matrix. The recent results from the random matrix theory allow us to find the asymptotic deterministic equivalents of the optimal shrinkage intensities and estimate them consistently. The resulting distribution-free estimator has almost surely the minimum Frobenius loss. Additionally, we prove that the Frobenius norms of the inverse and of the pseudo-inverse sample covariance matrices tend almost surely to deterministic quantities and estimate them consistently. At the end, a simulation is provided where the suggested estimator is compared with the estimators for the precision matrix proposed in the literature. The optimal shrinkage estimator shows significant improvement and robustness even for non-normally distributed data.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا