Distances between probability distributions of different dimensions


Abstract in English

Comparing probability distributions is an indispensable and ubiquitous task in machine learning and statistics. The most common way to compare a pair of Borel probability measures is to compute a metric between them, and by far the most widely used notions of metric are the Wasserstein metric and the total variation metric. The next most common way is to compute a divergence between them, and in this case almost every known divergences such as those of Kullback--Leibler, Jensen--Shannon, Renyi, and many more, are special cases of the $f$-divergence. Nevertheless these metrics and divergences may only be computed, in fact, are only defined, when the pair of probability measures are on spaces of the same dimension. How would one quantify, say, a KL-divergence between the uniform distribution on the interval $[-1,1]$ and a Gaussian distribution on $mathbb{R}^3$? We will show that, in a completely natural manner, various common notions of metrics and divergences give rise to a distance between Borel probability measures defined on spaces of different dimensions, e.g., one on $mathbb{R}^m$ and another on $mathbb{R}^n$ where $m, n$ are distinct, so as to give a meaningful answer to the previous question.

Download