ﻻ يوجد ملخص باللغة العربية
Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural-network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.
We derive a new variational formula for the Renyi family of divergences, $R_alpha(Q|P)$, between probability measures $Q$ and $P$. Our result generalizes the classical Donsker-Varadhan variational formula for the Kullback-Leibler divergence. We furth
The Bregman divergence (Bregman distance, Bregman measure of distance) is a certain useful substitute for a distance, obtained from a well-chosen function (the Bregman function). Bregman functions and divergences have been extensively investigated du
We present a unified technique for sequential estimation of convex divergences between distributions, including integral probability metrics like the kernel maximum mean discrepancy, $varphi$-divergences like the Kullback-Leibler divergence, and opti
We show that the variational representations for f-divergences currently used in the literature can be tightened. This has implications to a number of methods recently proposed based on this representation. As an example application we use our tighte
The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning. Most works study this problem under very weak assumptions, in which case it is provably hard. We con