ﻻ يوجد ملخص باللغة العربية
We present a general framework for hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in time series, collections of words in text or a batch of images of a given phenomenon. This observation pattern, however, differs from the common assumptions required for hypothesis testing: each set differs in size, may have differing levels of noise, and also may incorporate nuisance variability, irrelevant for the analysis of the phenomenon of interest; all features that bias test decisions if not accounted for. In this paper, we propose to interpret sets as independent samples from a collection of latent probability distributions, and introduce kernel two-sample and independence tests in this latent space of distributions. We prove the consistency of tests and observe them to outperform in a wide range of synthetic experiments. Finally, we showcase their use in practice with experiments of healthcare and climate data, where previously heuristics were needed for feature extraction and testing.
Motivated by gene set enrichment analysis, we investigate the problem of combined hypothesis testing on a graph. We introduce a general framework to effectively use the structural information of the underlying graph when testing multivariate means. A
We present a study of generalization for data-dependent hypothesis sets. We give a general learning guarantee for data-dependent hypothesis sets based on a notion of transductive Rademacher complexity. Our main result is a generalization bound for da
Non-parametric goodness-of-fit testing procedures based on kernel Stein discrepancies (KSD) are promising approaches to validate general unnormalised distributions in various scenarios. Existing works have focused on studying optimal kernel choices t
In this paper, we consider data consisting of multiple networks, each comprised of a different edge set on a common set of nodes. Many models have been proposed for the analysis of such multi-view network data under the assumption that the data views
We introduce a general non-parametric independence test between right-censored survival times and covariates, which may be multivariate. Our test statistic has a dual interpretation, first in terms of the supremum of a potentially infinite collection