No Arabic abstract
In part I of this two-part work, certain minimization problems based on a parametric family of relative entropies (denoted $mathscr{I}_{alpha}$) were studied. Such minimizers were called forward $mathscr{I}_{alpha}$-projections. Here, a complementary class of minimization problems leading to the so-called reverse $mathscr{I}_{alpha}$-projections are studied. Reverse $mathscr{I}_{alpha}$-projections, particularly on log-convex or power-law families, are of interest in robust estimation problems ($alpha >1$) and in constrained compression settings ($alpha <1$). Orthogonality of the power-law family with an associated linear family is first established and is then exploited to turn a reverse $mathscr{I}_{alpha}$-projection into a forward $mathscr{I}_{alpha}$-projection. The transformed problem is a simpler quasiconvex minimization subject to linear constraints.
Minimization problems with respect to a one-parameter family of generalized relative entropies are studied. These relative entropies, which we term relative $alpha$-entropies (denoted $mathscr{I}_{alpha}$), arise as redundancies under mismatched compression when cumulants of compressed lengths are considered instead of expected compressed lengths. These parametric relative entropies are a generalization of the usual relative entropy (Kullback-Leibler divergence). Just like relative entropy, these relative $alpha$-entropies behave like squared Euclidean distance and satisfy the Pythagorean property. Minimizers of these relative $alpha$-entropies on closed and convex sets are shown to exist. Such minimizations generalize the maximum R{e}nyi or Tsallis entropy principle. The minimizing probability distribution (termed forward $mathscr{I}_{alpha}$-projection) for a linear family is shown to obey a power-law. Other results in connection with statistical inference, namely subspace transitivity and iterated projections, are also established. In a companion paper, a related minimization problem of interest in robust statistics that leads to a reverse $mathscr{I}_{alpha}$-projection is studied.
This paper studies forward and reverse projections for the R{e}nyi divergence of order $alpha in (0, infty)$ on $alpha$-convex sets. The forward projection on such a set is motivated by some works of Tsallis {em et al.} in statistical physics, and the reverse projection is motivated by robust statistics. In a recent work, van Erven and Harremoes proved a Pythagorean inequality for R{e}nyi divergences on $alpha$-convex sets under the assumption that the forward projection exists. Continuing this study, a sufficient condition for the existence of forward projection is proved for probability measures on a general alphabet. For $alpha in (1, infty)$, the proof relies on a new Apollonius theorem for the Hellinger divergence, and for $alpha in (0,1)$, the proof relies on the Banach-Alaoglu theorem from functional analysis. Further projection results are then obtained in the finite alphabet setting. These include a projection theorem on a specific $alpha$-convex set, which is termed an {em $alpha$-linear family}, generalizing a result by Csiszar for $alpha eq 1$. The solution to this problem yields a parametric family of probability measures which turns out to be an extension of the exponential family, and it is termed an {em $alpha$-exponential family}. An orthogonality relationship between the $alpha$-exponential and $alpha$-linear families is established, and it is used to turn the reverse projection on an $alpha$-exponential family into a forward projection on a $alpha$-linear family. This paper also proves a convergence result of an iterative procedure used to calculate the forward projection on an intersection of a finite number of $alpha$-linear families.
New upper bounds on the relative entropy are derived as a function of the total variation distance. One bound refines an inequality by Verd{u} for general probability measures. A second bound improves the tightness of an inequality by Csisz{a}r and Talata for arbitrary probability measures that are defined on a common finite set. The latter result is further extended, for probability measures on a finite set, leading to an upper bound on the R{e}nyi divergence of an arbitrary non-negative order (including $infty$) as a function of the total variation distance. Another lower bound by Verd{u} on the total variation distance, expressed in terms of the distribution of the relative information, is tightened and it is attained under some conditions. The effect of these improvements is exemplified.
The relative entropy and chi-squared divergence are fundamental divergence measures in information theory and statistics. This paper is focused on a study of integral relations between the two divergences, the implications of these relations, their information-theoretic applications, and some generalizations pertaining to the rich class of $f$-divergences. Applications that are studied in this paper refer to lossless compression, the method of types and large deviations, strong~data-processing inequalities, bounds on contraction coefficients and maximal correlation, and the convergence rate to stationarity of a type of discrete-time Markov chains.
We study minimization of a parametric family of relative entropies, termed relative $alpha$-entropies (denoted $mathscr{I}_{alpha}(P,Q)$). These arise as redundancies under mismatched compression when cumulants of compressed lengths are considered instead of expected compressed lengths. These parametric relative entropies are a generalization of the usual relative entropy (Kullback-Leibler divergence). Just like relative entropy, these relative $alpha$-entropies behave like squared Euclidean distance and satisfy the Pythagorean property. Minimization of $mathscr{I}_{alpha}(P,Q)$ over the first argument on a set of probability distributions that constitutes a linear family is studied. Such a minimization generalizes the maximum R{e}nyi or Tsallis entropy principle. The minimizing probability distribution (termed $mathscr{I}_{alpha}$-projection) for a linear family is shown to have a power-law.