On the Universality of the Logistic Loss Function

54 0 0.0 ( 0 )

Download Cite

Added by Amichai Painsky

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Amichai Painsky - Gregory W. Wornell

Information Theory Information Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality.

rate research

Bregman Divergence Bounds and Universality Properties of the Logarithmic Loss

61 - Amichai Painsky , Gregory W. Wornell 2018

A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying probability. We show that for binary classification, the divergence associated with smooth, proper, and convex loss functions is upper bounded by the Kullback-Leibler (KL) divergence, to within a normalization constant. This implies that by minimizing the logarithmic loss associated with the KL divergence, we minimize an upper bound to any choice of loss from this set. As such the logarithmic loss is universal in the sense of providing performance guarantees with respect to a broad class of accuracy measures. Importantly, this notion of universality is not problem-specific, enabling its use in diverse applications, including predictive modeling, data clustering and sample complexity analysis. Generalizations to arbitrary finite alphabets are also developed. The derived inequalities extend several well-known $f$-divergence results.

Information Theory Information Theory

The Capacity Loss of Dense Constellations

130 - Tobias Koch , Alfonso Martinez , Albert Guillen i F`abregas 2012

We determine the loss in capacity incurred by using signal constellations with a bounded support over general complex-valued additive-noise channels for suitably high signal-to-noise ratio. Our expression for the capacity loss recovers the power loss of 1.53dB for square signal constellations.

Information Theory Information Theory

Evaluating Multiple Guesses by an Adversary via a Tunable Loss Function

80 - Gowtham R. Kurri , Oliver Kosut , Lalitha Sankar 2021

We consider a problem of guessing, wherein an adversary is interested in knowing the value of the realization of a discrete random variable $X$ on observing another correlated random variable $Y$. The adversary can make multiple (say, $k$) guesses. The adversarys guessing strategy is assumed to minimize $alpha$-loss, a class of tunable loss functions parameterized by $alpha$. It has been shown before that this loss function captures well known loss functions including the exponential loss ($alpha=1/2$), the log-loss ($alpha=1$) and the $0$-$1$ loss ($alpha=infty$). We completely characterize the optimal adversarial strategy and the resulting expected $alpha$-loss, thereby recovering known results for $alpha=infty$. We define an information leakage measure from the $k$-guesses setup and derive a condition under which the leakage is unchanged from a single guess.

Information Theory Information Theory

On the Reliability Function of Distributed Hypothesis Testing Under Optimal Detection

212 - Nir Weinberger , Yuval Kochman 2018

The distributed hypothesis testing problem with full side-information is studied. The trade-off (reliability function) between the two types of error exponents under limited rate is studied in the following way. First, the problem is reduced to the problem of determining the reliability function of channel codes designed for detection (in analogy to a similar result which connects the reliability function of distributed lossless compression and ordinary channel codes). Second, a single-letter random-coding bound based on a hierarchical ensemble, as well as a single-letter expurgated bound, are derived for the reliability of channel-detection codes. Both bounds are derived for a system which employs the optimal detection rule. We conjecture that the resulting random-coding bound is ensemble-tight, and consequently optimal within the class of quantization-and-binning schemes.

Information Theory Information Theory

Improved Upper Bound on the Network Function Computing Capacity

88 - Xuan Guang , Raymond W. Yeung , Shenghao Yang 2017

The problem of network function computation over a directed acyclic network is investigated in this paper. In such a network, a sink node desires to compute with zero error a {em target function}, of which the inputs are generated at multiple source nodes. The edges in the network are assumed to be error-free and have limited capacity. The nodes in the network are assumed to have unbounded computing capability and be able to perform network coding. The {em computing rate} of a network code that can compute the target function over the network is the average number of times that the target function is computed with zero error for one use of the network. In this paper, we obtain an improved upper bound on the computing capacity, which is applicable to arbitrary target functions and arbitrary network topologies. This improved upper bound not only is an enhancement of the previous upper bounds but also is the first tight upper bound on the computing capacity for computing an arithmetic sum over a certain non-tree network, which has been widely studied in the literature. We also introduce a multi-dimensional array approach that facilitates evaluation of the improved upper bound. Furthermore, we apply this bound to the problem of computing a vector-linear function over a network. With this bound, we are able to not only enhance a previous result on computing a vector-linear function over a network but also simplify the proof significantly. Finally, we prove that for computing the binary maximum function over the reverse butterfly network, our improved upper bound is not achievable. This result establishes that in general our improved upper bound is non achievable, but whether it is asymptotically achievable or not remains open.

Information Theory Information Theory

comments

Fetching comments

National Institute of Business Administration

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

On the Universality of the Logistic Loss Function

Ask ChatGPT about the research

No Arabic abstract

Read More