Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Extrapolating the profile of a finite population

99 0 0.0 ( 0 )

Download Cite

Added by Yury Polyanskiy

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Soham Jana - Yury Polyanskiy - Yihong Wu

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We study a prototypical problem in empirical Bayes. Namely, consider a population consisting of $k$ individuals each belonging to one of $k$ types (some types can be empty). Without any structural restrictions, it is impossible to learn the composition of the full population having observed only a small (random) subsample of size $m = o(k)$. Nevertheless, we show that in the sublinear regime of $m =omega(k/log k)$, it is possible to consistently estimate in total variation the emph{profile} of the population, defined as the empirical distribution of the sizes of each type, which determines many symmetric properties of the population. We also prove that in the linear regime of $m=c k$ for any constant $c$ the optimal rate is $Theta(1/log k)$. Our estimator is based on Wolfowitzs minimum distance method, which entails solving a linear program (LP) of size $k$. We show that there is a single infinite-dimensional LP whose value simultaneously characterizes the risk of the minimum distance estimator and certifies its minimax optimality. The sharp convergence rate is obtained by evaluating this LP using complex-analytic techniques.

rate research

On the Distribution of an Arbitrary Subset of the Eigenvalues for some Finite Dimensional Random Matrices

119 - Marco Chiani , Alberto Zanella 2020

We present some new results on the joint distribution of an arbitrary subset of the ordered eigenvalues of complex Wishart, double Wishart, and Gaussian hermitian random matrices of finite dimensions, using a tensor pseudo-determinant operator. Specifically, we derive compact expressions for the joint probability distribution function of the eigenvalues and the expectation of functions of the eigenvalues, including joint moments, for the case of both ordered and unordered eigenvalues.

Statistics Theory Information Theory Information Theory

Biwhitening Reveals the Rank of a Count Matrix

79 - Boris Landa , Thomas T.C.K. Zhang , Yuval Kluger 2021

Estimating the rank of a corrupted data matrix is an important task in data science, most notably for choosing the number of components in principal component analysis. Significant progress on this task has been made using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, such as Poisson or binomial, in which case the noise can be heteroskedastic with an unknown variance in each entry. In this work, focusing on a Poisson random matrix with independent entries, we propose a simple procedure termed textit{biwhitening} that makes it possible to estimate the rank of the underlying data matrix (i.e., the Poisson parameter matrix) without any prior knowledge on its structure. Our approach is based on the key observation that one can scale the rows and columns of the data matrix simultaneously so that the spectrum of the corresponding noise agrees with the standard Marchenko-Pastur (MP) law, justifying the use of the MP upper edge as a threshold for rank selection. Importantly, the required scaling factors can be estimated directly from the observations by solving a matrix scaling problem via the Sinkhorn-Knopp algorithm. Aside from the Poisson distribution, we extend our biwhitening approach to other discrete distributions, such as the generalized Poisson, binomial, multinomial, and negative binomial. We conduct numerical experiments that corroborate our theoretical findings, and demonstrate our approach on real single-cell RNA sequencing (scRNA-seq) data, where we show that our results agree with a slightly overdispersed generalized Poisson model.

Statistics Theory Information Theory Information Theory

On Estimation of Finite Population Proportion

362 - Xinjia Chen 2009

In this paper, we study the classical problem of estimating the proportion of a finite population. First, we consider a fixed sample size method and derive an explicit sample size formula which ensures a mixed criterion of absolute and relative errors. Second, we consider an inverse sampling scheme such that the sampling is continue until the number of units having a certain attribute reaches a threshold value or the whole population is examined. We have established a simple method to determine the threshold so that a prescribed relative precision is guaranteed. Finally, we develop a multistage sampling scheme for constructing fixed-width confidence interval for the proportion of a finite population. Powerful computational techniques are introduced to make it possible that the fixed-width confidence interval ensures prescribed level of coverage probability.

Statistics Theory Probability Applications

A generalized Liebs theorem and its applications to spectrum estimates for a sum of random matrices

70 - De Huang 2018

In this paper we prove the concavity of the $k$-trace functions, $Amapsto (text{Tr}_k[exp(H+ln A)])^{1/k}$, on the convex cone of all positive definite matrices. $text{Tr}_k[A]$ denotes the $k_{mathrm{th}}$ elementary symmetric polynomial of the eigenvalues of $A$. As an application, we use the concavity of these $k$-trace functions to derive tail bounds and expectation estimates on the sum of the $k$ largest (or smallest) eigenvalues of a sum of random matrices.

Statistics Theory Information Theory Information Theory

A simple randomized algorithm for sequential prediction of ergodic time series

345 - L. Gyorfi , G. Lugosi , G. Morvai 2008

We present a simple randomized procedure for the prediction of a binary sequence. The algorithm uses ideas from recent developments of the theory of the prediction of individual sequences. We show that if the sequence is a realization of a stationary and ergodic random process then the average number of mistakes converges, almost surely, to that of the optimum, given by the Bayes predictor. The desirable finite-sample properties of the predictor are illustrated by its performance for Markov processes. In such cases the predictor exhibits near optimal behavior even without knowing the order of the Markov process. Prediction with side information is also considered.

Statistics Theory Information Theory Information Theory

comments

Fetching comments

Syrian Virtual University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Extrapolating the profile of a finite population

Ask ChatGPT about the research

No Arabic abstract

Read More