New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A kernel test for quasi-independence

134 0 0.0 ( 0 )

Download Cite

Added by Tamara Fernandez

Publication date 2020

fields Mathematical Statistics

and research's language is English

Authors Tamara Fernandez - Wenkai Xu - Marc Ditzhaus

Methodology Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider settings in which the data of interest correspond to pairs of ordered times, e.g, the birth times of the first and second child, the times at which a new user creates an account and makes the first purchase on a website, and the entry and survival times of patients in a clinical trial. In these settings, the two times are not independent (the second occurs after the first), yet it is still of interest to determine whether there exists significant dependence {em beyond} their ordering in time. We refer to this notion as quasi-(in)dependence. For instance, in a clinical trial, to avoid biased selection, we might wish to verify that recruitment times are quasi-independent of survival times, where dependencies might arise due to seasonal effects. In this paper, we propose a nonparametric statistical test of quasi-independence. Our test considers a potentially infinite space of alternatives, making it suitable for complex data where the nature of the possible quasi-dependence is not known in advance. Standard parametric approaches are recovered as special cases, such as the classical conditional Kendalls tau, and log-rank tests. The tests apply in the right-censored setting: an essential feature in clinical trials, where patients can withdraw from the study. We provide an asymptotic analysis of our test-statistic, and demonstrate in experiments that our test obtains better power than existing approaches, while being more computationally efficient.

rate research

A kernel log-rank test of independence for right-censored data

65 - Tamara Fernandez , Arthur Gretton , David Rindt 2019

We introduce a general non-parametric independence test between right-censored survival times and covariates, which may be multivariate. Our test statistic has a dual interpretation, first in terms of the supremum of a potentially infinite collection of weight-indexed log-rank tests, with weight functions belonging to a reproducing kernel Hilbert space (RKHS) of functions; and second, as the norm of the difference of embeddings of certain finite measures into the RKHS, similar to the Hilbert-Schmidt Independence Criterion (HSIC) test-statistic. We study the asymptotic properties of the test, finding sufficient conditions to ensure our test correctly rejects the null hypothesis under any alternative. The test statistic can be computed straightforwardly, and the rejection threshold is obtained via an asymptotically consistent Wild Bootstrap procedure. Extensive simulations demonstrate that our testing procedure generally performs better than competing approaches in detecting complex non-linear dependence.

Methodology Machine Learning

A Reproducing Kernel Hilbert Space log-rank test for the two-sample problem

66 - Tamara Fernandez , Nicolas Rivera 2019

Weighted log-rank tests are arguably the most widely used tests by practitioners for the two-sample problem in the context of right-censored data. Many approaches have been considered to make weighted log-rank tests more robust against a broader family of alternatives, among them, considering linear combinations of weighted log-rank tests, and taking the maximum among a finite collection of them. In this paper, we propose as test statistic the supremum of a collection of (potentially infinite) weight-indexed log-rank tests where the index space is the unit ball in a reproducing kernel Hilbert space (RKHS). By using some desirable properties of RKHSs we provide an exact and simple evaluation of the test statistic and establish connections with previous tests in the literature. Additionally, we show that for a special family of RKHSs, the proposed test is omnibus. We finalise by performing an empirical evaluation of the proposed methodology and show an application to a real data scenario. Our theoretical results are proved using techniques for double integrals with respect to martingales that may be of independent interest.

Methodology Machine Learning

A Markov Basis for Conditional Test of Common Diagonal Effect in Quasi-Independence Model for Square Contingency Tables

142 - Hisayuki Hara , Akimichi Takemura , Ruriko Yoshida 2008

In two-way contingency tables we sometimes find that frequencies along the diagonal cells are relatively larger(or smaller) compared to off-diagonal cells, particularly in square tables with the common categories for the rows and the columns. In this case the quasi-independence model with an additional parameter for each of the diagonal cells is usually fitted to the data. A simpler model than the quasi-independence model is to assume a common additional parameter for all the diagonal cells. We consider testing the goodness of fit of the common diagonal effect by Markov chain Monte Carlo (MCMC) method. We derive an explicit form of a Markov basis for performing the conditional test of the common diagonal effect. Once a Markov basis is given, MCMC procedure can be easily implemented by techniques of algebraic statistics. We illustrate the procedure with some real data sets.

Methodology

Kernel-based Tests for Joint Independence

89 - Niklas Pfister , Peter Buhlmann , Bernhard Scholkopf 2016

We investigate the problem of testing whether $d$ random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two variable Hilbert-Schmidt independence criterion (HSIC) but allows for an arbitrary number of variables. We embed the $d$-dimensional joint distribution and the product of the marginals into a reproducing kernel Hilbert space and define the $d$-variable Hilbert-Schmidt independence criterion (dHSIC) as the squared distance between the embeddings. In the population case, the value of dHSIC is zero if and only if the $d$ variables are jointly independent, as long as the kernel is characteristic. Based on an empirical estimate of dHSIC, we define three different non-parametric hypothesis tests: a permutation test, a bootstrap test and a test based on a Gamma approximation. We prove that the permutation test achieves the significance level and that the bootstrap test achieves pointwise asymptotic significance level as well as pointwise asymptotic consistency (i.e., it is able to detect any type of fixed dependence in the large sample limit). The Gamma approximation does not come with these guarantees; however, it is computationally very fast and for small $d$, it performs well in practice. Finally, we apply the test to a problem in causal discovery.

Statistics Theory Machine Learning Statistics Theory

A Test for Independence Via Bayesian Nonparametric Estimation of Mutual Information

94 - Luai Al-Labadi , Forough Fazeli Asl , 2020

Mutual information is a well-known tool to measure the mutual dependence between variables. In this paper, a Bayesian nonparametric estimation of mutual information is established by means of the Dirichlet process and the $k$-nearest neighbor distance. As a direct outcome of the estimation, an easy-to-implement test of independence is introduced through the relative belief ratio. Several theoretical properties of the approach are presented. The procedure is investigated through various examples where the results are compared to its frequentist counterpart and demonstrate a good performance.

Methodology Computation

comments

Fetching comments

King AbdulAziz University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A kernel test for quasi-independence

Ask ChatGPT about the research

No Arabic abstract

Read More