Do you want to publish a course? Click here

Estimating Entropy of Distributions in Constant Space

64   0   0.0 ( 0 )
 Added by Sourbh Bhadane
 Publication date 2019
and research's language is English




Ask ChatGPT about the research

We consider the task of estimating the entropy of $k$-ary distributions from samples in the streaming model, where space is limited. Our main contribution is an algorithm that requires $Oleft(frac{k log (1/varepsilon)^2}{varepsilon^3}right)$ samples and a constant $O(1)$ memory words of space and outputs a $pmvarepsilon$ estimate of $H(p)$. Without space limitations, the sample complexity has been established as $S(k,varepsilon)=Thetaleft(frac k{varepsilonlog k}+frac{log^2 k}{varepsilon^2}right)$, which is sub-linear in the domain size $k$, and the current algorithms that achieve optimal sample complexity also require nearly-linear space in $k$. Our algorithm partitions $[0,1]$ into intervals and estimates the entropy contribution of probability values in each interval. The intervals are designed to trade off the bias and variance of these estimates.



rate research

Read More

We consider the problem of estimating Shannons entropy $H$ from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it also provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over $H$ can be computed analytically. We derive formulas for the posterior mean (Bayes least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over $H$, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous mixing measures such that the resulting mixture of Pitman-Yor processes produces an approximately flat prior over $H$. We show that the resulting Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of distributions. We explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.
We consider the problem of estimating sparse discrete distributions under local differential privacy (LDP) and communication constraints. We characterize the sample complexity for sparse estimation under LDP constraints up to a constant factor and the sample complexity under communication constraints up to a logarithmic factor. Our upper bounds under LDP are based on the Hadamard Response, a private coin scheme that requires only one bit of communication per user. Under communication constraints, we propose public coin schemes based on random hashing functions. Our tight lower bounds are based on the recently proposed method of chi squared contractions.
The von Neumann graph entropy (VNGE) can be used as a measure of graph complexity, which can be the measure of information divergence and distance between graphs. However, computing VNGE is extensively demanding for a large-scale graph. We propose novel quadratic approximations for fast computing VNGE. Various inequalities for error between the quadratic approximations and the exact VNGE are found. Our methods reduce the cubic complexity of VNGE to linear complexity. Computational simulations on random graph models and various real network datasets demonstrate superior performance.
This paper makes three contributions to estimating the number of perfect matching in bipartite graphs. First, we prove that the popular sequential importance sampling algorithm works in polynomial time for dense bipartite graphs. More carefully, our algorithm gives a $(1-epsilon)$-approximation for the number of perfect matchings of a $lambda$-dense bipartite graph, using $O(n^{frac{1-2lambda}{8lambda}+epsilon^{-2}})$ samples. With size $n$ on each side and for $frac{1}{2}>lambda>0$, a $lambda$-dense bipartite graph has all degrees greater than $(lambda+frac{1}{2})n$. Second, practical applications of the algorithm requires many calls to matching algorithms. A novel preprocessing step is provided which makes significant improvements. Third, three applications are provided. The first is for counting Latin squares, the second is a practical way of computing the greedy algorithm for a card guessing game with feedback, and the third is for stochastic block models. In all three examples, sequential importance sampling allows treating practical problems of reasonably large sizes.
A common task in physics, information theory, and other fields is the analysis of properties of subsystems of a given system. Given the covariance matrix $M$ of a system of $n$ coupled variables, the covariance matrices of the subsystems are principal submatrices of $M$. The rapid growth with $n$ of the set of principal submatrices makes it impractical to exhaustively study each submatrix for even modestly-sized systems. It is therefore of great interest to derive methods for approximating the distributions of important submatrix properties for a given matrix. Motivated by the importance of differential entropy as a systemic measure of disorder, we study the distribution of log-determinants of principal $ktimes k$ submatrices when the covariance matrix has bounded condition number. We derive upper bounds for the right tail and the variance of the distribution of minors, and we use these in turn to derive upper bounds on the standard error of the sample mean of subsystem entropy. Our results demonstrate that, despite the rapid growth of the set of subsystems with $n$, the number of samples that are needed to bound the sampling error is asymptotically independent of $n$. Instead, it is sufficient to increase the number of samples in linear proportion to $k$ to achieve a desired sampling accuracy.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا