No Arabic abstract
We consider a sub-class of the $f$-divergences satisfying a stronger convexity property, which we refer to as strongly convex, or $kappa$-convex divergences. We derive new and old relationships, based on convexity arguments, between popular $f$-divergences.
The divergence minimization problem plays an important role in various fields. In this note, we focus on differentiable and strictly convex divergences. For some minimization problems, we show the minimizer conditions and the uniqueness of the minimizer without assuming a specific form of divergences. Furthermore, we show geometric properties related to the minimization problems.
This paper is focused on $f$-divergences, consisting of three main contributions. The first one introduces integral representations of a general $f$-divergence by means of the relative information spectrum. The second part provides a new approach for the derivation of $f$-divergence inequalities, and it exemplifies their utility in the setup of Bayesian binary hypothesis testing. The last part of this paper further studies the local behavior of $f$-divergences.
This paper is focused on derivations of data-processing and majorization inequalities for $f$-divergences, and their applications in information theory and statistics. For the accessibility of the material, the main results are first introduced without proofs, followed by exemplifications of the theorems with further related analytical results, interpretations, and information-theoretic applications. One application refers to the performance analysis of list decoding with either fixed or variable list sizes; some earlier bounds on the list decoding error probability are reproduced in a unified way, and new bounds are obtained and exemplified numerically. Another application is related to a study of the quality of approximating a probability mass function, induced by the leaves of a Tunstall tree, by an equiprobable distribution. The compression rates of finite-length Tunstall codes are further analyzed for asserting their closeness to the Shannon entropy of a memoryless and stationary discrete source. Almost all the analysis is relegated to the appendices, which form a major part of this manuscript.
This paper studies forward and reverse projections for the R{e}nyi divergence of order $alpha in (0, infty)$ on $alpha$-convex sets. The forward projection on such a set is motivated by some works of Tsallis {em et al.} in statistical physics, and the reverse projection is motivated by robust statistics. In a recent work, van Erven and Harremoes proved a Pythagorean inequality for R{e}nyi divergences on $alpha$-convex sets under the assumption that the forward projection exists. Continuing this study, a sufficient condition for the existence of forward projection is proved for probability measures on a general alphabet. For $alpha in (1, infty)$, the proof relies on a new Apollonius theorem for the Hellinger divergence, and for $alpha in (0,1)$, the proof relies on the Banach-Alaoglu theorem from functional analysis. Further projection results are then obtained in the finite alphabet setting. These include a projection theorem on a specific $alpha$-convex set, which is termed an {em $alpha$-linear family}, generalizing a result by Csiszar for $alpha eq 1$. The solution to this problem yields a parametric family of probability measures which turns out to be an extension of the exponential family, and it is termed an {em $alpha$-exponential family}. An orthogonality relationship between the $alpha$-exponential and $alpha$-linear families is established, and it is used to turn the reverse projection on an $alpha$-exponential family into a forward projection on a $alpha$-linear family. This paper also proves a convergence result of an iterative procedure used to calculate the forward projection on an intersection of a finite number of $alpha$-linear families.
We consider online convex optimization (OCO) over a heterogeneous network with communication delay, where multiple workers together with a master execute a sequence of decisions to minimize the accumulation of time-varying global costs. The local data may not be independent or identically distributed, and the global cost functions may not be locally separable. Due to communication delay, neither the master nor the workers have in-time information about the current global cost function. We propose a new algorithm, termed Hierarchical OCO (HiOCO), which takes full advantage of the network heterogeneity in information timeliness and computation capacity to enable multi-step gradient descent at both the workers and the master. We analyze the impacts of the unique hierarchical architecture, multi-slot delay, and gradient estimation error to derive upper bounds on the dynamic regret of HiOCO, which measures the gap of costs between HiOCO and an offline globally optimal performance benchmark.