ﻻ يوجد ملخص باللغة العربية
A loss function measures the discrepancy between the true values (observations) and their estimated fits, for a given instance of data. A loss function is said to be proper (unbiased, Fisher consistent) if the fits are defined over a unit simplex, and the minimizer of the expected loss is the true underlying probability of the data. Typical examples are the zero-one loss, the quadratic loss and the Bernoulli log-likelihood loss (log-loss). In this work we show that for binary classification problems, the divergence associated with smooth, proper and convex loss functions is bounded from above by the Kullback-Leibler (KL) divergence, up to a multiplicative normalization constant. It implies that by minimizing the log-loss (associated with the KL divergence), we minimize an upper bound to any choice of loss functions from this set. This property justifies the broad use of log-loss in regression, decision trees, deep neural networks and many other applications. In addition, we show that the KL divergence bounds from above any separable Bregman divergence that is convex in its second argument (up to a multiplicative normalization constant). This result introduces a new set of divergence inequalities, similar to the well-known Pinsker inequality.
A loss function measures the discrepancy between the true values and their estimated fits, for a given instance of data. In classification problems, a loss function is said to be proper if a minimizer of the expected loss is the true underlying proba
We determine the loss in capacity incurred by using signal constellations with a bounded support over general complex-valued additive-noise channels for suitably high signal-to-noise ratio. Our expression for the capacity loss recovers the power loss of 1.53dB for square signal constellations.
We consider a problem of guessing, wherein an adversary is interested in knowing the value of the realization of a discrete random variable $X$ on observing another correlated random variable $Y$. The adversary can make multiple (say, $k$) guesses. T
The distributed hypothesis testing problem with full side-information is studied. The trade-off (reliability function) between the two types of error exponents under limited rate is studied in the following way. First, the problem is reduced to the p
The problem of network function computation over a directed acyclic network is investigated in this paper. In such a network, a sink node desires to compute with zero error a {em target function}, of which the inputs are generated at multiple source