The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing


Abstract in English

Distance-based tests, also called energy statistics, are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from kernel mean embeddings, are leading methods for two-sample and independence tests from the machine learning community. A fixed-point transformation was previously proposed to connect the distance methods and kernel methods for the population statistics. In this paper, we propose a new bijective transformation between metrics and kernels. It simplifies the fixed-point transformation, inherits similar theoretical properties, allows distance methods to be exactly the same as kernel methods for sample statistics and p-value, and better preserves the data structure upon transformation. Our results further advance the understanding in distance and kernel-based tests, streamline the code base for implementing these tests, and enable a rich literature of distance-based and kernel-based methodologies to directly communicate with each other.

Download