ﻻ يوجد ملخص باللغة العربية
A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant. This implies that by minimizing the logarithmic loss associated with the KL divergence, we minimize an upper bound to any choice of loss from this set. As such the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures. Importantly, this notion of universality is not problem-specific, enabling its use in diverse applications, including predictive modeling, data clustering and sample complexity analysis. Generalizations to arbitrary finite alphabets are also developed. The derived inequalities extend several well-known $f$-divergence results.
The problem to maximize the information divergence from an exponential family is generalized to the setting of Bregman divergences and suitably defined Bregman families.
A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, an
In this paper, we propose an efficient coding scheme for the binary Chief Executive Officer (CEO) problem under logarithmic loss criterion. Courtade and Weissman obtained the exact rate-distortion bound for a two-link binary CEO problem under this cr
The $L$-link binary Chief Executive Officer (CEO) problem under logarithmic loss is investigated in this paper. A quantization splitting technique is applied to convert the problem under consideration to a $(2L-1)$-step successive Wyner-Ziv (WZ) prob
Tight bounds for several symmetric divergence measures are introduced, given in terms of the total variation distance. Each of these bounds is attained by a pair of 2 or 3-element probability distributions. An application of these bounds for lossless