No Arabic abstract
This paper considers derivation of $f$-divergence inequalities via the approach of functional domination. Bounds on an $f$-divergence based on one or several other $f$-divergences are introduced, dealing with pairs of probability measures defined on arbitrary alphabets. In addition, a variety of bounds are shown to hold under boundedness assumptions on the relative information. The journal paper, which includes more approaches for the derivation of f-divergence inequalities and proofs, is available on the arXiv at https://arxiv.org/abs/1508.00335, and it has been published in the IEEE Trans. on Information Theory, vol. 62, no. 11, pp. 5973-6006, November 2016.
This paper develops systematic approaches to obtain $f$-divergence inequalities, dealing with pairs of probability measures defined on arbitrary alphabets. Functional domination is one such approach, where special emphasis is placed on finding the best possible constant upper bounding a ratio of $f$-divergences. Another approach used for the derivation of bounds among $f$-divergences relies on moment inequalities and the logarithmic-convexity property, which results in tight bounds on the relative entropy and Bhattacharyya distance in terms of $chi^2$ divergences. A rich variety of bounds are shown to hold under boundedness assumptions on the relative information. Special attention is devoted to the total variation distance and its relation to the relative information and relative entropy, including reverse Pinsker inequalities, as well as on the $E_gamma$ divergence, which generalizes the total variation distance. Pinskers inequality is extended for this type of $f$-divergence, a result which leads to an inequality linking the relative entropy and relative information spectrum. Integral expressions of the Renyi divergence in terms of the relative information spectrum are derived, leading to bounds on the Renyi divergence in terms of either the variational distance or relative entropy.
Probabilistic models are often trained by maximum likelihood, which corresponds to minimizing a specific f-divergence between the model and data distribution. In light of recent successes in training Generative Adversarial Networks, alternative non-likelihood training criteria have been proposed. Whilst not necessarily statistically efficient, these alternatives may better match user requirements such as sharp image generation. A general variational method for training probabilistic latent variable models using maximum likelihood is well established; however, how to train latent variable models using other f-divergences is comparatively unknown. We discuss a variational approach that, when combined with the recently introduced Spread Divergence, can be applied to train a large class of latent variable models using any f-divergence.
We investigate the local differential privacy (LDP) guarantees of a randomized privacy mechanism via its contraction properties. We first show that LDP constraints can be equivalently cast in terms of the contraction coefficient of the $E_gamma$-divergence. We then use this equivalent formula to express LDP guarantees of privacy mechanisms in terms of contraction coefficients of arbitrary $f$-divergences. When combined with standard estimation-theoretic tools (such as Le Cams and Fanos converse methods), this result allows us to study the trade-off between privacy and utility in several testing and minimax and Bayesian estimation problems.
This paper is focused on $f$-divergences, consisting of three main contributions. The first one introduces integral representations of a general $f$-divergence by means of the relative information spectrum. The second part provides a new approach for the derivation of $f$-divergence inequalities, and it exemplifies their utility in the setup of Bayesian binary hypothesis testing. The last part of this paper further studies the local behavior of $f$-divergences.
Linear regression without correspondences concerns the recovery of a signal in the linear regression setting, where the correspondences between the observations and the linear functionals are unknown. The associated maximum likelihood function is NP-hard to compute when the signal has dimension larger than one. To optimize this objective function we reformulate it as a concave minimization problem, which we solve via branch-and-bound. This is supported by a computable search space to branch, an effective lower bounding scheme via convex envelope minimization and a refined upper bound, all naturally arising from the concave minimization reformulation. The resulting algorithm outperforms state-of-the-art methods for fully shuffled data and remains tractable for up to $8$-dimensional signals, an untouched regime in prior work.