ترغب بنشر مسار تعليمي؟ اضغط هنا

Distances between probability distributions of different dimensions

63   0   0.0 ( 0 )
 نشر من قبل Lek-Heng Lim
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Comparing probability distributions is an indispensable and ubiquitous task in machine learning and statistics. The most common way to compare a pair of Borel probability measures is to compute a metric between them, and by far the most widely used notions of metric are the Wasserstein metric and the total variation metric. The next most common way is to compute a divergence between them, and in this case almost every known divergences such as those of Kullback--Leibler, Jensen--Shannon, Renyi, and many more, are special cases of the $f$-divergence. Nevertheless these metrics and divergences may only be computed, in fact, are only defined, when the pair of probability measures are on spaces of the same dimension. How would one quantify, say, a KL-divergence between the uniform distribution on the interval $[-1,1]$ and a Gaussian distribution on $mathbb{R}^3$? We will show that, in a completely natural manner, various common notions of metrics and divergences give rise to a distance between Borel probability measures defined on spaces of different dimensions, e.g., one on $mathbb{R}^m$ and another on $mathbb{R}^n$ where $m, n$ are distinct, so as to give a meaningful answer to the previous question.



قيم البحث

اقرأ أيضاً

122 - Tomohiro Nishiyama 2019
Log-concave distributions include some important distributions such as normal distribution, exponential distribution and so on. In this note, we show inequalities between two Lp-norms for log-concave distributions on the Euclidean space. These inequa lities are the generalizations of the upper and lower bound of the differential entropy and are also interpreted as a kind of expansion of the inequality between two Lp-norms on the measurable set with finite measure.
In high-dimensional linear regression, would increasing effect sizes always improve model selection, while maintaining all the other conditions unchanged (especially fixing the sparsity of regression coefficients)? In this paper, we answer this quest ion in the textit{negative} in the regime of linear sparsity for the Lasso method, by introducing a new notion we term effect size heterogeneity. Roughly speaking, a regression coefficient vector has high effect size heterogeneity if its nonzero entries have significantly different magnitudes. From the viewpoint of this new measure, we prove that the false and true positive rates achieve the optimal trade-off uniformly along the Lasso path when this measure is maximal in a certain sense, and the worst trade-off is achieved when it is minimal in the sense that all nonzero effect sizes are roughly equal. Moreover, we demonstrate that the first false selection occurs much earlier when effect size heterogeneity is minimal than when it is maximal. The underlying cause of these two phenomena is, metaphorically speaking, the competition among variables with effect sizes of the same magnitude in entering the model. Taken together, our findings suggest that effect size heterogeneity shall serve as an important complementary measure to the sparsity of regression coefficients in the analysis of high-dimensional regression problems. Our proofs use techniques from approximate message passing theory as well as a novel technique for estimating the rank of the first false variable.
Probability metrics have become an indispensable part of modern statistics and machine learning, and they play a quintessential role in various applications, including statistical hypothesis testing and generative modeling. However, in a practical se tting, the convergence behavior of the algorithms built upon these distances have not been well established, except for a few specific cases. In this paper, we introduce a broad family of probability metrics, coined as Generalized Sliced Probability Metrics (GSPMs), that are deeply rooted in the generalized Radon transform. We first verify that GSPMs are metrics. Then, we identify a subset of GSPMs that are equivalent to maximum mean discrepancy (MMD) with novel positive definite kernels, which come with a unique geometric interpretation. Finally, by exploiting this connection, we consider GSPM-based gradient flows for generative modeling applications and show that under mild assumptions, the gradient flow converges to the global optimum. We illustrate the utility of our approach on both real and synthetic problems.
We consider the linear regression problem of estimating a $p$-dimensional vector $beta$ from $n$ observations $Y = X beta + W$, where $beta_j stackrel{text{i.i.d.}}{sim} pi$ for a real-valued distribution $pi$ with zero mean and unit variance, $X_{ij } stackrel{text{i.i.d.}}{sim} mathcal{N}(0,1)$, and $W_istackrel{text{i.i.d.}}{sim} mathcal{N}(0, sigma^2)$. In the asymptotic regime where $n/p to delta$ and $ p/ sigma^2 to mathsf{snr}$ for two fixed constants $delta, mathsf{snr}in (0, infty)$ as $p to infty$, the limiting (normalized) minimum mean-squared error (MMSE) has been characterized by the MMSE of an associated single-letter (additive Gaussian scalar) channel. In this paper, we show that if the MMSE function of the single-letter channel converges to a step function, then the limiting MMSE of estimating $beta$ in the linear regression problem converges to a step function which jumps from $1$ to $0$ at a critical threshold. Moreover, we establish that the limiting mean-squared error of the (MSE-optimal) approximate message passing algorithm also converges to a step function with a larger threshold, providing evidence for the presence of a computational-statistical gap between the two thresholds.
After endowing the space of diagrams of probability spaces with an entropy distance, we study its large-scale geometry by identifying the asymptotic cone as a closed convex cone in a Banach space. We call this cone the tropical cone, and its elements tropical diagrams of probability spaces. Given that the tropical cone has a rich structure, while tropical diagrams are rather flexible objects, we expect the theory of tropical diagrams to be useful for information optimization problems in information theory and artificial intelligence. In a companion article, we give a first application to derive a statement about the entropic cone.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا