ترغب بنشر مسار تعليمي؟ اضغط هنا

Testing independence in high dimensions with sums of rank correlations

136   0   0.0 ( 0 )
 نشر من قبل Dennis Leung
 تاريخ النشر 2015
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We treat the problem of testing independence between m continuous variables when m can be larger than the available sample size n. We consider three types of test statistics that are constructed as sums or sums of squares of pairwise rank correlations. In the asymptotic regime where both m and n tend to infinity, a martingale central limit theorem is applied to show that the null distributions of these statistics converge to Gaussian limits, which are valid with no specific distributional or moment assumptions on the data. Using the framework of U-statistics, our result covers a variety of rank correlations including Kendalls tau and a dominating term of Spearmans rank correlation coefficient (rho), but also degenerate U-statistics such as Hoeffdings $D$, or the $tau^*$ of Bergsma and Dassios (2014). As in the classical theory for U-statistics, the test statistics need to be scaled differently when the rank correlations used to construct them are degenerate U-statistics. The power of the considered tests is explored in rate-optimality theory under Gaussian equicorrelation alternatives as well as in numerical experiments for specific cases of more general alternatives.



قيم البحث

اقرأ أيضاً

179 - Dennis Leung , Qi-Man Shao 2017
Let ${bf R}$ be the Pearson correlation matrix of $m$ normal random variables. The Raos score test for the independence hypothesis $H_0 : {bf R} = {bf I}_m$, where ${bf I}_m$ is the identity matrix of dimension $m$, was first considered by Schott (20 05) in the high dimensional setting. In this paper, we study the asymptotic minimax power function of this test, under an asymptotic regime in which both $m$ and the sample size $n$ tend to infinity with the ratio $m/n$ upper bounded by a constant. In particular, our result implies that the Raos score test is rate-optimal for detecting the dependency signal $|{bf R} - {bf I}_m|_F$ of order $sqrt{m/n}$, where $|cdot|_F$ is the matrix Frobenius norm.
We consider the problem of conditional independence testing of $X$ and $Y$ given $Z$ where $X,Y$ and $Z$ are three real random variables and $Z$ is continuous. We focus on two main cases - when $X$ and $Y$ are both discrete, and when $X$ and $Y$ are both continuous. In view of recent results on conditional independence testing (Shah and Peters, 2018), one cannot hope to design non-trivial tests, which control the type I error for all absolutely continuous conditionally independent distributions, while still ensuring power against interesting alternatives. Consequently, we identify various, natural smoothness assumptions on the conditional distributions of $X,Y|Z=z$ as $z$ varies in the support of $Z$, and study the hardness of conditional independence testing under these smoothness assumptions. We derive matching lower and upper bounds on the critical radius of separation between the null and alternative hypotheses in the total variation metric. The tests we consider are easily implementable and rely on binning the support of the continuous variable $Z$. To complement these results, we provide a new proof of the hardness result of Shah and Peters.
Rank correlations have found many innovative applications in the last decade. In particular, suitable rank correlations have been used for consistent tests of independence between pairs of random variables. Using ranks is especially appealing for con tinuous data as tests become distribution-free. However, the traditional concept of ranks relies on ordering data and is, thus, tied to univariate observations. As a result, it has long remained unclear how one may construct distribution-free yet consistent tests of independence between random vectors. This is the problem addressed in this paper, in which we lay out a general framework for designing dependence measures that give tests of multivariate independence that are not only consistent and distribution-free but which we also prove to be statistically efficient. Our framework leverages the recently introduced concept of center-outward ranks and signs, a multivariate generalization of traditional ranks, and adopts a common standard form for dependence measures that encompasses many popular examples. In a unified study, we derive a general asymptotic representation of center-outward rank-based test statistics under independence, extending to the multivariate setting the classical H{a}jek asymptotic representation results. This representation permits direct calculation of limiting null distributions and facilitates a local power analysis that provides strong support for the center-outward approach by establishing, for the first time, the nontrivial power of center-outward rank-based tests over root-$n$ neighborhoods within the class of quadratic mean differentiable alternatives.
This paper derives central limit and bootstrap theorems for probabilities that sums of centered high-dimensional random vectors hit hyperrectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for probabilit ies $Pr(n^{-1/2}sum_{i=1}^n X_iin A)$ where $X_1,dots,X_n$ are independent random vectors in $mathbb{R}^p$ and $A$ is a hyperrectangle, or, more generally, a sparsely convex set, and show that the approximation error converges to zero even if $p=p_nto infty$ as $n to infty$ and $p gg n$; in particular, $p$ can be as large as $O(e^{Cn^c})$ for some constants $c,C>0$. The result holds uniformly over all hyperrectangles, or more generally, sparsely convex sets, and does not require any restriction on the correlation structure among coordinates of $X_i$. Sparsely convex sets are sets that can be represented as intersections of many convex sets whose indicator functions depend only on a small subset of their arguments, with hyperrectangles being a special case.
This article is concerned with the spectral behavior of $p$-dimensional linear processes in the moderately high-dimensional case when both dimensionality $p$ and sample size $n$ tend to infinity so that $p/nto0$. It is shown that, under an appropriat e set of assumptions, the empirical spectral distributions of the renormalized and symmetrized sample autocovariance matrices converge almost surely to a nonrandom limit distribution supported on the real line. The key assumption is that the linear process is driven by a sequence of $p$-dimensional real or complex random vectors with i.i.d. entries possessing zero mean, unit variance and finite fourth moments, and that the $ptimes p$ linear process coefficient matrices are Hermitian and simultaneously diagonalizable. Several relaxations of these assumptions are discussed. The results put forth in this paper can help facilitate inference on model parameters, model diagnostics and prediction of future values of the linear process.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا