Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

The Chi-Square Test of Distance Correlation

95 0 0.0 ( 0 )

Download Cite

Added by Cencheng Shen

Publication date 2019

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Cencheng Shen - Sambit Panda - Joshua T. Vogelstein

Machine Learning Machine Learning Statistics Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Distance correlation has gained much recent attention in the data science community: the sample statistic is straightforward to compute and asymptotically equals zero if and only if independence, making it an ideal choice to discover any type of dependency structure given sufficient sample size. One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data. To overcome the difficulty, in this paper we propose a chi-square test for distance correlation. Method-wise, the chi-square test is non-parametric, extremely fast, and applicable to bias-corrected distance correlation using any strong negative type metric or characteristic kernel. The test exhibits a similar testing power as the standard permutation test, and can be utilized for K-sample and partial testing. Theory-wise, we show that the underlying chi-square distribution well approximates and dominates the limiting null distribution in upper tail, prove the chi-square test can be valid and universally consistent for testing independence, and establish a testing power inequality with respect to the permutation test.

rate research

Uniform regret bounds over $R^d$ for the sequential linear regression problem with the square loss

287 - Pierre Gaillard 2018

We consider the setting of online linear regression for arbitrary deterministic sequences, with the square loss. We are interested in the aim set by Bartlett et al. (2015): obtain regret bounds that hold uniformly over all competitor vectors. When the feature sequence is known at the beginning of the game, they provided closed-form regret bounds of $2d B^2 ln T + mathcal{O}_T(1)$, where $T$ is the number of rounds and $B$ is a bound on the observations. Instead, we derive bounds with an optimal constant of $1$ in front of the $d B^2 ln T$ term. In the case of sequentially revealed features, we also derive an asymptotic regret bound of $d B^2 ln T$ for any individual sequence of features and bounded observations. All our algorithms are variants of the online non-linear ridge regression forecaster, either with a data-dependent regularization or with almost no regularization.

Machine Learning Machine Learning Statistics Theory

FED-$chi^2$: Privacy Preserving Federated Correlation Test

101 - Lun Wang , Qi Pang , Shuai Wang 2021

In this paper, we propose the first secure federated $chi^2$-test protocol Fed-$chi^2$. To minimize both the privacy leakage and the communication cost, we recast $chi^2$-test to the second moment estimation problem and thus can take advantage of stable projection to encode the local information in a short vector. As such encodings can be aggregated with only summation, secure aggregation can be naturally applied to hide the individual updates. We formally prove the security guarantee of Fed-$chi^2$ that the joint distribution is hidden in a subspace with exponential possible distributions. Our evaluation results show that Fed-$chi^2$ achieves negligible accuracy drops with small client-side computation overhead. In several real-world case studies, the performance of Fed-$chi^2$ is comparable to the centralized $chi^2$-test.

Cryptography and Security Distributed Parallel and Cluster Computing

From Distance Correlation to Multiscale Graph Correlation

66 - Cencheng Shen , Carey E. Priebe , Joshua T. Vogelstein 2017

Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age. In this paper, we establish a new framework that generalizes distance correlation --- a correlation measure that was recently proposed and shown to be universally consistent for dependence testing against all joint distributions of finite moments --- to the Multiscale Graph Correlation (MGC). By utilizing the characteristic functions and incorporating the nearest neighbor machinery, we formalize the population version of local distance correlations, define the optimal scale in a given dependency, and name the optimal local correlation as MGC. The new theoretical framework motivates a theoretically sound Sample MGC and allows a number of desirable properties to be proved, including the universal consistency, convergence and almost unbiasedness of the sample version. The advantages of MGC are illustrated via a comprehensive set of simulations with linear, nonlinear, univariate, multivariate, and noisy dependencies, where it loses almost no power in monotone dependencies while achieving better performance in general dependencies, compared to distance correlation and other popular methods.

Machine Learning

A new test procedure of independence in copula models via chi-square-divergence

545 - Salim Bouzebda 2011

We introduce a new test procedure of independence in the framework of parametric copulas with unknown marginals. The method is based essentially on the dual representation of $chi^2$-divergence on signed finite measures. The asymptotic properties of the proposed estimate and the test statistic are studied under the null and alternative hypotheses, with simple and standard limit distributions both when the parameter is an interior point or not.

Statistics Theory Statistics Theory

Measuring the Algorithmic Convergence of Randomized Ensembles: The Regression Setting

432 - Miles E. Lopes , Suofei Wu , Thomas C. M. Lee 2019

When randomized ensemble methods such as bagging and random forests are implemented, a basic question arises: Is the ensemble large enough? In particular, the practitioner desires a rigorous guarantee that a given ensemble will perform nearly as well as an ideal infinite ensemble (trained on the same data). The purpose of the current paper is to develop a bootstrap method for solving this problem in the context of regression --- which complements our companion paper in the context of classification (Lopes 2019). In contrast to the classification setting, the current paper shows that theoretical guarantees for the proposed bootstrap can be established under much weaker assumptions. In addition, we illustrate the flexibility of the method by showing how it can be adapted to measure algorithmic convergence for variable selection. Lastly, we provide numerical results demonstrating that the method works well in a range of situations.

Machine Learning Machine Learning Statistics Theory

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

The Chi-Square Test of Distance Correlation

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions