No Arabic abstract
Consider the problem of estimating the Shannon entropy of a distribution over $k$ elements from $n$ independent samples. We show that the minimax mean-square error is within universal multiplicative constant factors of $$Big(frac{k }{n log k}Big)^2 + frac{log^2 k}{n}$$ if $n$ exceeds a constant factor of $frac{k}{log k}$; otherwise there exists no consistent estimator. This refines the recent result of Valiant-Valiant cite{VV11} that the minimal sample size for consistent entropy estimation scales according to $Theta(frac{k}{log k})$. The apparatus of best polynomial approximation plays a key role in both the construction of optimal estimators and, via a duality argument, the minimax lower bound.
Differential privacy has become a widely accepted notion of privacy, leading to the introduction and deployment of numerous privatization mechanisms. However, ensuring the privacy guarantee is an error-prone process, both in designing mechanisms and in implementing those mechanisms. Both types of errors will be greatly reduced, if we have a data-driven approach to verify privacy guarantees, from a black-box access to a mechanism. We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required. We introduce a novel estimator that uses polynomial approximation of a carefully chosen degree to optimally trade-off bias and variance. With $n$ samples, we show that this estimator achieves performance of a straightforward plug-in estimator with $n ln n$ samples, a phenomenon referred to as effective sample size amplification. The minimax optimality of the proposed estimator is proved by comparing it to a matching fundamental lower bound.
We prove a Bernstein-type bound for the difference between the average of negative log-likelihoods of independent discrete random variables and the Shannon entropy, both defined on a countably infinite alphabet. The result holds for the class of discrete random variables with tails lighter than or on the same order of a discrete power-law distribution. Most commonly-used discrete distributions such as the Poisson distribution, the negative binomial distribution, and the power-law distribution itself belong to this class. The bound is effective in the sense that we provide a method to compute the constants in it.
Minimization problems with respect to a one-parameter family of generalized relative entropies are studied. These relative entropies, which we term relative $alpha$-entropies (denoted $mathscr{I}_{alpha}$), arise as redundancies under mismatched compression when cumulants of compressed lengths are considered instead of expected compressed lengths. These parametric relative entropies are a generalization of the usual relative entropy (Kullback-Leibler divergence). Just like relative entropy, these relative $alpha$-entropies behave like squared Euclidean distance and satisfy the Pythagorean property. Minimizers of these relative $alpha$-entropies on closed and convex sets are shown to exist. Such minimizations generalize the maximum R{e}nyi or Tsallis entropy principle. The minimizing probability distribution (termed forward $mathscr{I}_{alpha}$-projection) for a linear family is shown to obey a power-law. Other results in connection with statistical inference, namely subspace transitivity and iterated projections, are also established. In a companion paper, a related minimization problem of interest in robust statistics that leads to a reverse $mathscr{I}_{alpha}$-projection is studied.
A new upper bound on the relative entropy is derived as a function of the total variation distance for probability measures defined on a common finite alphabet. The bound improves a previously reported bound by Csiszar and Talata. It is further extended to an upper bound on the Renyi divergence of an arbitrary non-negative order (including $infty$) as a function of the total variation distance.
This paper deals with the problem of universal lossless coding on a countable infinite alphabet. It focuses on some classes of sources defined by an envelope condition on the marginal distribution, namely exponentially decreasing envelope classes with exponent $alpha$. The minimax redundancy of exponentially decreasing envelope classes is proved to be equivalent to $frac{1}{4 alpha log e} log^2 n$. Then a coding strategy is proposed, with a Bayes redundancy equivalent to the maximin redundancy. At last, an adaptive algorithm is provided, whose redundancy is equivalent to the minimax redundancy