ترغب بنشر مسار تعليمي؟ اضغط هنا

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss

62   0   0.0 ( 0 )
 نشر من قبل Amichai Painsky
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant. This implies that by minimizing the logarithmic loss associated with the KL divergence, we minimize an upper bound to any choice of loss from this set. As such the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures. Importantly, this notion of universality is not problem-specific, enabling its use in diverse applications, including predictive modeling, data clustering and sample complexity analysis. Generalizations to arbitrary finite alphabets are also developed. The derived inequalities extend several well-known $f$-divergence results.



قيم البحث

اقرأ أيضاً

The problem to maximize the information divergence from an exponential family is generalized to the setting of Bregman divergences and suitably defined Bregman families.
A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, an d the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality.
In this paper, we propose an efficient coding scheme for the binary Chief Executive Officer (CEO) problem under logarithmic loss criterion. Courtade and Weissman obtained the exact rate-distortion bound for a two-link binary CEO problem under this cr iterion. We find the optimal test-channel model and its parameters for the encoder of each link by using the given bound. Furthermore, an efficient encoding scheme based on compound LDGM-LDPC codes is presented to achieve the theoretical rates. In the proposed encoding scheme, a binary quantizer using LDGM codes and a syndrome-decoding employing LDPC codes are applied. An iterative decoding is also presented as a fusion center to reconstruct the observation bits. The proposed decoder consists of a sum-product algorithm with a side information from other decoder and a soft estimator. The output of the CEO decoder is the probability of source bits conditional to the received sequences of both links. This method outperforms the majority-based estimation of the source bits utilized in the prior studies of the binary CEO problem. Our numerical examples verify a close performance of the proposed coding scheme to the theoretical bound in several cases.
The $L$-link binary Chief Executive Officer (CEO) problem under logarithmic loss is investigated in this paper. A quantization splitting technique is applied to convert the problem under consideration to a $(2L-1)$-step successive Wyner-Ziv (WZ) prob lem, for which a practical coding scheme is proposed. In the proposed scheme, low-density generator-matrix (LDGM) codes are used for binary quantization while low-density parity-check (LDPC) codes are used for syndrome generation; the decoder performs successive decoding based on the received syndromes and produces a soft reconstruction of the remote source. The simulation results indicate that the rate-distortion performance of the proposed scheme can approach the theoretical inner bound based on binary-symmetric test-channel models.
101 - Igal Sason 2015
Tight bounds for several symmetric divergence measures are introduced, given in terms of the total variation distance. Each of these bounds is attained by a pair of 2 or 3-element probability distributions. An application of these bounds for lossless source coding is provided, refining and improving a certain bound by Csiszar. A new inequality relating $f$-divergences is derived, and its use is exemplified. The last section of this conference paper is not included in the recent journal paper that was published in the February 2015 issue of the IEEE Trans. on Information Theory (see arXiv:1403.7164), as well as some new paragraphs throughout the paper which are linked to new references.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا