ترغب بنشر مسار تعليمي؟ اضغط هنا

The Chi-Square Test of Distance Correlation

95   0   0.0 ( 0 )
 نشر من قبل Cencheng Shen
 تاريخ النشر 2019
والبحث باللغة English




اسأل ChatGPT حول البحث

Distance correlation has gained much recent attention in the data science community: the sample statistic is straightforward to compute and asymptotically equals zero if and only if independence, making it an ideal choice to discover any type of dependency structure given sufficient sample size. One major bottleneck is the testing process: because the null distribution of distance correlation depends on the underlying random variables and metric choice, it typically requires a permutation test to estimate the null and compute the p-value, which is very costly for large amount of data. To overcome the difficulty, in this paper we propose a chi-square test for distance correlation. Method-wise, the chi-square test is non-parametric, extremely fast, and applicable to bias-corrected distance correlation using any strong negative type metric or characteristic kernel. The test exhibits a similar testing power as the standard permutation test, and can be utilized for K-sample and partial testing. Theory-wise, we show that the underlying chi-square distribution well approximates and dominates the limiting null distribution in upper tail, prove the chi-square test can be valid and universally consistent for testing independence, and establish a testing power inequality with respect to the permutation test.



قيم البحث

اقرأ أيضاً

287 - Pierre Gaillard 2018
We consider the setting of online linear regression for arbitrary deterministic sequences, with the square loss. We are interested in the aim set by Bartlett et al. (2015): obtain regret bounds that hold uniformly over all competitor vectors. When th e feature sequence is known at the beginning of the game, they provided closed-form regret bounds of $2d B^2 ln T + mathcal{O}_T(1)$, where $T$ is the number of rounds and $B$ is a bound on the observations. Instead, we derive bounds with an optimal constant of $1$ in front of the $d B^2 ln T$ term. In the case of sequentially revealed features, we also derive an asymptotic regret bound of $d B^2 ln T$ for any individual sequence of features and bounded observations. All our algorithms are variants of the online non-linear ridge regression forecaster, either with a data-dependent regularization or with almost no regularization.
101 - Lun Wang , Qi Pang , Shuai Wang 2021
In this paper, we propose the first secure federated $chi^2$-test protocol Fed-$chi^2$. To minimize both the privacy leakage and the communication cost, we recast $chi^2$-test to the second moment estimation problem and thus can take advantage of sta ble projection to encode the local information in a short vector. As such encodings can be aggregated with only summation, secure aggregation can be naturally applied to hide the individual updates. We formally prove the security guarantee of Fed-$chi^2$ that the joint distribution is hidden in a subspace with exponential possible distributions. Our evaluation results show that Fed-$chi^2$ achieves negligible accuracy drops with small client-side computation overhead. In several real-world case studies, the performance of Fed-$chi^2$ is comparable to the centralized $chi^2$-test.
Understanding and developing a correlation measure that can detect general dependencies is not only imperative to statistics and machine learning, but also crucial to general scientific discovery in the big data age. In this paper, we establish a new framework that generalizes distance correlation --- a correlation measure that was recently proposed and shown to be universally consistent for dependence testing against all joint distributions of finite moments --- to the Multiscale Graph Correlation (MGC). By utilizing the characteristic functions and incorporating the nearest neighbor machinery, we formalize the population version of local distance correlations, define the optimal scale in a given dependency, and name the optimal local correlation as MGC. The new theoretical framework motivates a theoretically sound Sample MGC and allows a number of desirable properties to be proved, including the universal consistency, convergence and almost unbiasedness of the sample version. The advantages of MGC are illustrated via a comprehensive set of simulations with linear, nonlinear, univariate, multivariate, and noisy dependencies, where it loses almost no power in monotone dependencies while achieving better performance in general dependencies, compared to distance correlation and other popular methods.
163 - Salim Bouzebda 2011
We introduce a new test procedure of independence in the framework of parametric copulas with unknown marginals. The method is based essentially on the dual representation of $chi^2$-divergence on signed finite measures. The asymptotic properties of the proposed estimate and the test statistic are studied under the null and alternative hypotheses, with simple and standard limit distributions both when the parameter is an interior point or not.
When randomized ensemble methods such as bagging and random forests are implemented, a basic question arises: Is the ensemble large enough? In particular, the practitioner desires a rigorous guarantee that a given ensemble will perform nearly as well as an ideal infinite ensemble (trained on the same data). The purpose of the current paper is to develop a bootstrap method for solving this problem in the context of regression --- which complements our companion paper in the context of classification (Lopes 2019). In contrast to the classification setting, the current paper shows that theoretical guarantees for the proposed bootstrap can be established under much weaker assumptions. In addition, we illustrate the flexibility of the method by showing how it can be adapted to measure algorithmic convergence for variable selection. Lastly, we provide numerical results demonstrating that the method works well in a range of situations.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا