No Arabic abstract
Divergences are quantities that measure discrepancy between two probability distributions and play an important role in various fields such as statistics and machine learning. Divergences are non-negative and are equal to zero if and only if two distributions are the same. In addition, some important divergences such as the f-divergence have convexity, which we call convex divergence. In this paper, we show new properties of the convex divergences by using integral and differential operators that we introduce. For the convex divergence, the result applied the integral or differential operator is also a divergence. In particular, the integral operator preserves convexity. Furthermore, the results applied the integral operator multiple times constitute a monotonically decreasing sequence of the convex divergences. We derive new sequences of the convex divergences that include the Kullback-Leibler divergence or the reverse Kullback-Leibler divergence from these properties.
We present a unified technique for sequential estimation of convex divergences between distributions, including integral probability metrics like the kernel maximum mean discrepancy, $varphi$-divergences like the Kullback-Leibler divergence, and optimal transport costs, such as powers of Wasserstein distances. The technical underpinnings of our approach lie in the observation that empirical convex divergences are (partially ordered) reverse submartingales with respect to the exchangeable filtration, coupled with maximal inequalities for such processes. These techniques appear to be powerful additions to the existing literature on both confidence sequences and convex divergences. We construct an offline-to-sequential device that converts a wide array of existing offline concentration inequalities into time-uniform confidence sequences that can be continuously monitored, providing valid inference at arbitrary stopping times. The resulting sequential bounds pay only an iterated logarithmic price over the corresponding fixed-time bounds, retaining the same dependence on problem parameters (like dimension or alphabet size if applicable).
In this paper, we introduce a sophisticated path loss model into the stochastic geometry analysis incorporating both line-of-sight (LoS) and non-line-of-sight (NLoS) transmissions to study their performance impact in small cell networks (SCNs). Analytical results are obtained on the coverage probability and the area spectral efficiency (ASE) assuming both a general path loss model and a special case of path loss model recommended by the 3rd Generation Partnership Project (3GPP) standards. The performance impact of LoS and NLoS transmissions in SCNs in terms of the coverage probability and the ASE is shown to be significant both quantitatively and qualitatively, compared with previous work that does not differentiate LoS and NLoS transmissions. Particularly, our analysis demonstrates that when the density of small cells is larger than a threshold, the network coverage probability will decrease as small cells become denser, which in turn makes the ASE suffer from a slow growth or even a notable decrease. For practical regime of small cell density, the performance results derived from our analysis are distinctively different from previous results, and shed new insights on the design and deployment of future dense/ultra-dense SCNs.
Quantum f-divergences are a quantum generalization of the classical notion of f-divergences, and are a special case of Petz quasi-entropies. Many well known distinguishability measures of quantum states are given by, or derived from, f-divergences; special examples include the quantum relative entropy, the Renyi relative entropies, and the Chernoff and Hoeffding measures. Here we show that the quantum f-divergences are monotonic under the dual of Schwarz maps whenever the defining function is operator convex. This extends and unifies all previously known monotonicity results. We also analyze the case where the monotonicity inequality holds with equality, and extend Petz reversibility theorem for a large class of f-divergences and other distinguishability measures. We apply our findings to the problem of quantum error correction, and show that if a stochastic map preserves the pairwise distinguishability on a set of states, as measured by a suitable f-divergence, then its action can be reversed on that set by another stochastic map that can be constructed from the original one in a canonical way. We also provide an integral representation for operator convex functions on the positive half-line, which is the main ingredient in extending previously known results on the monotonicity inequality and the case of equality. We also consider some special cases where the convexity of f is sufficient for the monotonicity, and obtain the inverse Holder inequality for operators as an application. The presentation is completely self-contained and requires only standard knowledge of matrix analysis.
Variational representations of divergences and distances between high-dimensional probability distributions offer significant theoretical insights and practical advantages in numerous research areas. Recently, they have gained popularity in machine learning as a tractable and scalable approach for training probabilistic models and for statistically differentiating between data distributions. Their advantages include: 1) They can be estimated from data as statistical averages. 2) Such representations can leverage the ability of neural networks to efficiently approximate optimal solutions in function spaces. However, a systematic and practical approach to improving tightness of such variational formulas, and accordingly accelerate statistical learning and estimation from data, is lacking. Here we develop such a methodology for building new, tighter variational representations of divergences. Our approach relies on improved objective functionals constructed via an auxiliary optimization problem. Furthermore, the calculation of the functional Hessian of objective functionals unveils local curvature differences around the common optimal variational solution; this quantifies and orders the tightness gains between different variational representations. Finally, numerical simulations utilizing neural-network optimization demonstrate that tighter representations can result in significantly faster learning and more accurate estimation of divergences in both synthetic and real datasets (of more than 1000 dimensions), often accelerated by nearly an order of magnitude.
The Bregman divergence (Bregman distance, Bregman measure of distance) is a certain useful substitute for a distance, obtained from a well-chosen function (the Bregman function). Bregman functions and divergences have been extensively investigated during the last decades and have found applications in optimization, operations research, information theory, nonlinear analysis, machine learning and more. This paper re-examines various aspects related to the theory of Bregman functions and divergences. In particular, it presents many sufficient conditions which allow the construction of Bregman functions in a general setting and introduces new Bregman functions (such as a negative iterated log entropy). Moreover, it sheds new light on several known Bregman functions such as quadratic entropies, the negative Havrda-Charvat-Tsallis entropy, and the negative Boltzmann-Gibbs-Shannon entropy, and it shows that the negative Burg entropy, which is not a Bregman function according to the classical theory but nevertheless is known to have Bregmanian properties, can, by our re-examination of the theory, be considered as a Bregman function. Our analysis yields several by-products of independent interest such as the introduction of the concept of relative uniform convexity (a certain generalization of uniform convexity), new properties of uniformly and strongly convex functions, and results in Banach space theory.