Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss

62 0 0.0 ( 0 )

Download Cite

Added by Amichai Painsky

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Amichai Painsky - Gregory W. Wornell

Information Theory Information Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant. This implies that by minimizing the logarithmic loss associated with the KL divergence, we minimize an upper bound to any choice of loss from this set. As such the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures. Importantly, this notion of universality is not problem-specific, enabling its use in diverse applications, including predictive modeling, data clustering and sample complexity analysis. Generalizations to arbitrary finite alphabets are also developed. The derived inequalities extend several well-known $f$-divergence results.

rate research

Maximizing the Bregman divergence from a Bregman family

66 - Johannes Rauh , Frantiv{s}ek Matuv{s} 2020

The problem to maximize the information divergence from an exponential family is generalized to the setting of Bregman divergences and suitably defined Bregman families.

Information Theory Information Theory

On the Universality of the Logistic Loss Function

53 - Amichai Painsky , Gregory W. Wornell 2018

A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality.

Information Theory Information Theory

Analysis and Code Design for the Binary CEO Problem under Logarithmic Loss

63 - Mahdi Nangir , Reza Asvadi , Mahmoud Ahmadian-Attari 2018

In this paper, we propose an efficient coding scheme for the binary Chief Executive Officer (CEO) problem under logarithmic loss criterion. Courtade and Weissman obtained the exact rate-distortion bound for a two-link binary CEO problem under this criterion. We find the optimal test-channel model and its parameters for the encoder of each link by using the given bound. Furthermore, an efficient encoding scheme based on compound LDGM-LDPC codes is presented to achieve the theoretical rates. In the proposed encoding scheme, a binary quantizer using LDGM codes and a syndrome-decoding employing LDPC codes are applied. An iterative decoding is also presented as a fusion center to reconstruct the observation bits. The proposed decoder consists of a sum-product algorithm with a side information from other decoder and a soft estimator. The output of the CEO decoder is the probability of source bits conditional to the received sequences of both links. This method outperforms the majority-based estimation of the source bits utilized in the prior studies of the binary CEO problem. Our numerical examples verify a close performance of the proposed coding scheme to the theoretical bound in several cases.

Information Theory Information Theory

Successive Wyner-Ziv Coding for the Binary CEO Problem under Logarithmic Loss

50 - Mahdi Nangir , Reza Asvadi , Jun Chen 2018

The $L$-link binary Chief Executive Officer (CEO) problem under logarithmic loss is investigated in this paper. A quantization splitting technique is applied to convert the problem under consideration to a $(2L-1)$-step successive Wyner-Ziv (WZ) problem, for which a practical coding scheme is proposed. In the proposed scheme, low-density generator-matrix (LDGM) codes are used for binary quantization while low-density parity-check (LDPC) codes are used for syndrome generation; the decoder performs successive decoding based on the received syndromes and produces a soft reconstruction of the remote source. The simulation results indicate that the rate-distortion performance of the proposed scheme can approach the theoretical inner bound based on binary-symmetric test-channel models.

Information Theory Information Theory

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating $f$-Divergences

113 - Igal Sason 2015

Tight bounds for several symmetric divergence measures are introduced, given in terms of the total variation distance. Each of these bounds is attained by a pair of 2 or 3-element probability distributions. An application of these bounds for lossless source coding is provided, refining and improving a certain bound by Csiszar. A new inequality relating $f$-divergences is derived, and its use is exemplified. The last section of this conference paper is not included in the recent journal paper that was published in the February 2015 issue of the IEEE Trans. on Information Theory (see arXiv:1403.7164), as well as some new paragraphs throughout the paper which are linked to new references.

Information Theory Information Theory

comments

Fetching comments

Higher Institute for Demographic Studies and Researches

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss

Ask ChatGPT about the research

No Arabic abstract

Read More