New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Estimating Entropy of Distributions in Constant Space

64 0 0.0 ( 0 )

Download Cite

Added by Sourbh Bhadane

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Jayadev Acharya - Sourbh Bhadane - Piotr Indyk

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider the task of estimating the entropy of $k$-ary distributions from samples in the streaming model, where space is limited. Our main contribution is an algorithm that requires $Oleft(frac{k log (1/varepsilon)^2}{varepsilon^3}right)$ samples and a constant $O(1)$ memory words of space and outputs a $pmvarepsilon$ estimate of $H(p)$. Without space limitations, the sample complexity has been established as $S(k,varepsilon)=Thetaleft(frac k{varepsilonlog k}+frac{log^2 k}{varepsilon^2}right)$, which is sub-linear in the domain size $k$, and the current algorithms that achieve optimal sample complexity also require nearly-linear space in $k$. Our algorithm partitions $[0,1]$ into intervals and estimates the entropy contribution of probability values in each interval. The intervals are designed to trade off the bias and variance of these estimates.

rate research

Bayesian Entropy Estimation for Countable Discrete Distributions

266 - Evan Archer , Il Memming Park , Jonathan Pillow 2013

We consider the problem of estimating Shannons entropy $H$ from discrete data, in cases where the number of possible symbols is unknown or even countably infinite. The Pitman-Yor process, a generalization of Dirichlet process, provides a tractable prior distribution over the space of countably infinite discrete distributions, and has found major applications in Bayesian non-parametric statistics and machine learning. Here we show that it also provides a natural family of priors for Bayesian entropy estimation, due to the fact that moments of the induced posterior distribution over $H$ can be computed analytically. We derive formulas for the posterior mean (Bayes least squares estimate) and variance under Dirichlet and Pitman-Yor process priors. Moreover, we show that a fixed Dirichlet or Pitman-Yor process prior implies a narrow prior distribution over $H$, meaning the prior strongly determines the entropy estimate in the under-sampled regime. We derive a family of continuous mixing measures such that the resulting mixture of Pitman-Yor processes produces an approximately flat prior over $H$. We show that the resulting Pitman-Yor Mixture (PYM) entropy estimator is consistent for a large class of distributions. We explore the theoretical properties of the resulting estimator, and show that it performs well both in simulation and in application to real data.

Information Theory Information Theory

Estimating Sparse Discrete Distributions Under Local Privacy and Communication Constraints

52 - Jayadev Acharya , Peter Kairouz , Yuhan Liu 2020

We consider the problem of estimating sparse discrete distributions under local differential privacy (LDP) and communication constraints. We characterize the sample complexity for sparse estimation under LDP constraints up to a constant factor and the sample complexity under communication constraints up to a logarithmic factor. Our upper bounds under LDP are based on the Hadamard Response, a private coin scheme that requires only one bit of communication per user. Under communication constraints, we propose public coin schemes based on random hashing functions. Our tight lower bounds are based on the recently proposed method of chi squared contractions.

Information Theory Cryptography and Security Data Structures and Algorithms

Fast computation of von Neumann entropy for large-scale graphs via quadratic approximations

61 - Hayoung Choi , Jinglian He , Hang Hu 2018

The von Neumann graph entropy (VNGE) can be used as a measure of graph complexity, which can be the measure of information divergence and distance between graphs. However, computing VNGE is extensively demanding for a large-scale graph. We propose novel quadratic approximations for fast computing VNGE. Various inequalities for error between the quadratic approximations and the exact VNGE are found. Our methods reduce the cubic complexity of VNGE to linear complexity. Computational simulations on random graph models and various real network datasets demonstrate superior performance.

Information Theory Data Structures and Algorithms Social and Information Networks

Sequential importance sampling for estimating expectations over the space of perfect matchings

167 - Yeganeh Alimohammadi , Persi Diaconis , Mohammad Roghani 2021

This paper makes three contributions to estimating the number of perfect matching in bipartite graphs. First, we prove that the popular sequential importance sampling algorithm works in polynomial time for dense bipartite graphs. More carefully, our algorithm gives a $(1-epsilon)$-approximation for the number of perfect matchings of a $lambda$-dense bipartite graph, using $O(n^{frac{1-2lambda}{8lambda}+epsilon^{-2}})$ samples. With size $n$ on each side and for $frac{1}{2}>lambda>0$, a $lambda$-dense bipartite graph has all degrees greater than $(lambda+frac{1}{2})n$. Second, practical applications of the algorithm requires many calls to matching algorithms. A novel preprocessing step is provided which makes significant improvements. Third, three applications are provided. The first is for counting Latin squares, the second is a practical way of computing the greedy algorithm for a card guessing game with feedback, and the third is for stochastic block models. In all three examples, sequential importance sampling allows treating practical problems of reasonably large sizes.

Probability Data Structures and Algorithms

Log-minor distributions and an application to estimating mean subsystem entropy

45 - Alice C. Schwarze , Philip S. Chodrow , 2019

A common task in physics, information theory, and other fields is the analysis of properties of subsystems of a given system. Given the covariance matrix $M$ of a system of $n$ coupled variables, the covariance matrices of the subsystems are principal submatrices of $M$. The rapid growth with $n$ of the set of principal submatrices makes it impractical to exhaustively study each submatrix for even modestly-sized systems. It is therefore of great interest to derive methods for approximating the distributions of important submatrix properties for a given matrix. Motivated by the importance of differential entropy as a systemic measure of disorder, we study the distribution of log-determinants of principal $ktimes k$ submatrices when the covariance matrix has bounded condition number. We derive upper bounds for the right tail and the variance of the distribution of minors, and we use these in turn to derive upper bounds on the standard error of the sample mean of subsystem entropy. Our results demonstrate that, despite the rapid growth of the set of subsystems with $n$, the number of samples that are needed to bound the sampling error is asymptotically independent of $n$. Instead, it is sufficient to increase the number of samples in linear proportion to $k$ to achieve a desired sampling accuracy.

Probability Social and Information Networks Rings and Algebras

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Estimating Entropy of Distributions in Constant Space

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions