ﻻ يوجد ملخص باللغة العربية
We consider the problem of approximating the empirical Shannon entropy of a high-frequency data stream under the relaxed strict-turnstile model, when space limitations make exact computation infeasible. An equivalent measure of entropy is the Renyi entropy that depends on a constant alpha. This quantity can be estimated efficiently and unbiasedly from a low-dimensional synopsis called an alpha-stable data sketch via the method of compressed counting. An approximation to the Shannon entropy can be obtained from the Renyi entropy by taking alpha sufficiently close to 1. However, practical guidelines for parameter calibration with respect to alpha are lacking. We avoid this problem by showing that the random variables used in estimating the Renyi entropy can be transformed to have a proper distributional limit as alpha approaches 1: the maximally skewed, strictly stable distribution with alpha = 1 defined on the entire real line. We propose a family of asymptotically unbiased log-mean estimators of the Shannon entropy, indexed by a constant zeta > 0, that can be computed in a single-pass algorithm to provide an additive approximation. We recommend the log-mean estimator with zeta = 1 that has exponentially decreasing tail bounds on the error probability, asymptotic relative efficiency of 0.932, and near-optimal computational complexity.
In this note we provide explicit expressions and expansions for a special function which appears in nonparametric estimation of log-densities. This function returns the integral of a log-linear function on a simplex of arbitrary dimension. In particu
Recently a new algorithm for sampling posteriors of unnormalised probability densities, called ABC Shadow, was proposed in [8]. This talk introduces a global optimisation procedure based on the ABC Shadow simulation dynamics. First the general method
Mixture models are regularly used in density estimation applications, but the problem of estimating the mixing distribution remains a challenge. Nonparametric maximum likelihood produce estimates of the mixing distribution that are discrete, and thes
Principal component analysis (PCA) is fundamental to statistical machine learning. It extracts latent principal factors that contribute to the most variation of the data. When data are stored across multiple machines, however, communication cost can
We consider a Bayesian hierarchical version of the normal theory general linear model which is practically relevant in the sense that it is general enough to have many applications and it is not straightforward to sample directly from the correspondi