No Arabic abstract
Rank correlations have found many innovative applications in the last decade. In particular, suitable rank correlations have been used for consistent tests of independence between pairs of random variables. Using ranks is especially appealing for continuous data as tests become distribution-free. However, the traditional concept of ranks relies on ordering data and is, thus, tied to univariate observations. As a result, it has long remained unclear how one may construct distribution-free yet consistent tests of independence between random vectors. This is the problem addressed in this paper, in which we lay out a general framework for designing dependence measures that give tests of multivariate independence that are not only consistent and distribution-free but which we also prove to be statistically efficient. Our framework leverages the recently introduced concept of center-outward ranks and signs, a multivariate generalization of traditional ranks, and adopts a common standard form for dependence measures that encompasses many popular examples. In a unified study, we derive a general asymptotic representation of center-outward rank-based test statistics under independence, extending to the multivariate setting the classical H{a}jek asymptotic representation results. This representation permits direct calculation of limiting null distributions and facilitates a local power analysis that provides strong support for the center-outward approach by establishing, for the first time, the nontrivial power of center-outward rank-based tests over root-$n$ neighborhoods within the class of quadratic mean differentiable alternatives.
This paper investigates the problem of testing independence of two random vectors of general dimensions. For this, we give for the first time a distribution-free consistent test. Our approach combines distance covariance with the center-outward ranks and signs developed in Hallin (2017). In technical terms, the proposed test is consistent and distribution-free in the family of multivariate distributions with nonvanishing (Lebesgue) probability densities. Exploiting the (degenerate) U-statistic structure of the distance covariance and the combinatorial nature of Hallins center-outward ranks and signs, we are able to derive the limiting null distribution of our test statistic. The resulting asymptotic approximation is accurate already for moderate sample sizes and makes the test implementable without requiring permutation. The limiting distribution is derived via a more general result that gives a new type of combinatorial non-central limit theorem for double- and multiple-indexed permutation statistics.
We introduce new estimates and tests of independence in copula models with unknown margins using $phi$-divergences and the duality technique. The asymptotic laws of the estimates and the test statistics are established both when the parameter is an interior or a boundary value of the parameter space. Simulation results show that the choice of $chi^2$-divergence has good properties in terms of efficiency-robustness.
A popular approach for testing if two univariate random variables are statistically independent consists of partitioning the sample space into bins, and evaluating a test statistic on the binned data. The partition size matters, and the optimal partition size is data dependent. While for detecting simple relationships coarse partitions may be best, for detecting complex relationships a great gain in power can be achieved by considering finer partitions. We suggest novel consistent distribution-free tests that are based on summation or maximization aggregation of scores over all partitions of a fixed size. We show that our test statistics based on summation can serve as good estimators of the mutual information. Moreover, we suggest regularized tests that aggregate over all partition sizes, and prove those are consistent too. We provide polynomial-time algorithms, which are critical for computing the suggested test statistics efficiently. We show that the power of the regularized tests is excellent compared to existing tests, and almost as powerful as the tests based on the optimal (yet unknown in practice) partition size, in simulations as well as on a real data example.
For some variants of regression models, including partial, measurement error or error-in-variables, latent effects, semi-parametric and otherwise corrupted linear models, the classical parametric tests generally do not perform well. Various modifications and generalizations considered extensively in the literature rests on stringent regularity assumptions which are not likely to be tenable in many applications. However, in such non-standard cases, rank based tests can be adapted better, and further, incorporation of rank analysis of covariance tools enhance their power-efficiency. Numerical studies and a real data illustration show the superiority of rank based inference in such corrupted linear models.
We treat the problem of testing independence between m continuous variables when m can be larger than the available sample size n. We consider three types of test statistics that are constructed as sums or sums of squares of pairwise rank correlations. In the asymptotic regime where both m and n tend to infinity, a martingale central limit theorem is applied to show that the null distributions of these statistics converge to Gaussian limits, which are valid with no specific distributional or moment assumptions on the data. Using the framework of U-statistics, our result covers a variety of rank correlations including Kendalls tau and a dominating term of Spearmans rank correlation coefficient (rho), but also degenerate U-statistics such as Hoeffdings $D$, or the $tau^*$ of Bergsma and Dassios (2014). As in the classical theory for U-statistics, the test statistics need to be scaled differently when the rank correlations used to construct them are degenerate U-statistics. The power of the considered tests is explored in rate-optimality theory under Gaussian equicorrelation alternatives as well as in numerical experiments for specific cases of more general alternatives.