Sample complexity of the distinct elements problem

71 0 0.0 ( 0 )

Download Cite

Added by Pengkun Yang

Publication date 2016

fields Mathematical Statistics

and research's language is English

Authors Yihong Wu - Pengkun Yang

Statistics Theory Statistics Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider the distinct elements problem, where the goal is to estimate the number of distinct colors in an urn containing $ k $ balls based on $n$ samples drawn with replacements. Based on discrete polynomial approximation and interpolation, we propose an estimator with additive error guarantee that achieves the optimal sample complexity within $O(loglog k)$ factors, and in fact within constant factors for most cases. The estimator can be computed in $O(n)$ time for an accurate estimation. The result also applies to sampling without replacement provided the sample size is a vanishing fraction of the urn size. One of the key auxiliary results is a sharp bound on the minimum singular values of a real rectangular Vandermonde matrix, which might be of independent interest.

rate research

On the Behrens--Fisher problem: A globally convergent algorithm and a finite-sample study of the Wald, LR and LM Tests

163 - Alexandre Belloni , Gustavo Didier 2008

In this paper we provide a provably convergent algorithm for the multivariate Gaussian Maximum Likelihood version of the Behrens--Fisher Problem. Our work builds upon a formulation of the log-likelihood function proposed by Buot and Richards citeBR. Instead of focusing on the first order optimality conditions, the algorithm aims directly for the maximization of the log-likelihood function itself to achieve a global solution. Convergence proof and complexity estimates are provided for the algorithm. Computational experiments illustrate the applicability of such methods to high-dimensional data. We also discuss how to extend the proposed methodology to a broader class of problems. We establish a systematic algebraic relation between the Wald, Likelihood Ratio and Lagrangian Multiplier Test ($Wgeq mathit{LR}geq mathit{LM}$) in the context of the Behrens--Fisher Problem. Moreover, we use our algorithm to computationally investigate the finite-sample size and power of the Wald, Likelihood Ratio and Lagrange Multiplier Tests, which previously were only available through asymptotic results. The methods developed here are applicable to much higher dimensional settings than the ones available in the literature. This allows us to better capture the role of high dimensionality on the actual size and power of the tests for finite samples.

Statistics Theory Statistics Theory

An approximate randomization test for high-dimensional two-sample Behrens-Fisher problem under arbitrary covariances

129 - Rui Wang , Wangli Xu 2021

This paper is concerned with the problem of comparing the population means of two groups of independent observations. An approximate randomization test procedure based on the test statistic of Chen & Qin (2010) is proposed. The asymptotic behavior of the test statistic as well as the randomized statistic is studied under weak conditions. In our theoretical framework, observations are not assumed to be identically distributed even within groups. No condition on the eigenstructure of the covariance is imposed. And the sample sizes of two groups are allowed to be unbalanced. Under general conditions, all possible asymptotic distributions of the test statistic are obtained. We derive the asymptotic level and local power of the proposed test procedure. Our theoretical results show that the proposed test procedure can adapt to all possible asymptotic distributions of the test statistic and always has correct test level asymptotically. Also, the proposed test procedure has good power behavior. Our numerical experiments show that the proposed test procedure has favorable performance compared with several altervative test procedures.

Statistics Theory Statistics Theory

Sample complexity of hidden subgroup problem

142 - Zekun Ye , Lvzhou Li 2021

The hidden subgroup problem ($mathsf{HSP}$) has been attracting much attention in quantum computing, since several well-known quantum algorithms including Shor algorithm can be described in a uniform framework as quantum methods to address different instances of it. One of the central issues about $mathsf{HSP}$ is to characterize its quantum/classical complexity. For example, from the viewpoint of learning theory, sample complexity is a crucial concept. However, while the quantum sample complexity of the problem has been studied, a full characterization of the classical sample complexity of $mathsf{HSP}$ seems to be absent, which will thus be the topic in this paper. $mathsf{HSP}$ over a finite group is defined as follows: For a finite group $G$ and a finite set $V$, given a function $f:G to V$ and the promise that for any $x, y in G, f(x) = f(xy)$ iff $y in H$ for a subgroup $H in mathcal{H}$, where $mathcal{H}$ is a set of candidate subgroups of $G$, the goal is to identify $H$. Our contributions are as follows: For $mathsf{HSP}$, we give the upper and lower bounds on the sample complexity of $mathsf{HSP}$. Furthermore, we have applied the result to obtain the sample complexity of some concrete instances of hidden subgroup problem. Particularly, we discuss generalized Simons problem ($mathsf{GSP}$), a special case of $mathsf{HSP}$, and show that the sample complexity of $mathsf{GSP}$ is $Thetaleft(maxleft{k,sqrt{kcdot p^{n-k}}right}right)$. Thus we obtain a complete characterization of the sample complexity of $mathsf{GSP}$.

Computational Complexity

The Generalized Lasso Problem and Uniqueness

99 - Alnur Ali , Ryan J. Tibshirani 2018

We study uniqueness in the generalized lasso problem, where the penalty is the $ell_1$ norm of a matrix $D$ times the coefficient vector. We derive a broad result on uniqueness that places weak assumptions on the predictor matrix $X$ and penalty matrix $D$; the implication is that, if $D$ is fixed and its null space is not too large (the dimension of its null space is at most the number of samples), and $X$ and response vector $y$ jointly follow an absolutely continuous distribution, then the generalized lasso problem has a unique solution almost surely, regardless of the number of predictors relative to the number of samples. This effectively generalizes previous uniqueness results for the lasso problem (which corresponds to the special case $D=I$). Further, we extend our study to the case in which the loss is given by the negative log-likelihood from a generalized linear model. In addition to uniqueness results, we derive results on the local stability of generalized lasso solutions that might be of interest in their own right.

Statistics Theory Statistics Theory

The exact distribution of the sample variance from bounded continuous random variables

199 - T. Royen 2008

For a sample of absolutely bounded i.i.d. random variables with a continuous density the cumulative distribution function of the sample variance is represented by a univariate integral over a Fourier series. If the density is a polynomial or a trigonometrical polynomial the coefficients of this series are simple finite terms containing only the error function, the exponential function and powers. In more general cases - e.g. for all beta densities - the coefficients are given by some series expansions. The method is generalized to positive semi-definite quadratic forms of bounded independent but not necessarily identically distributed random variables if the form matrix differs from a diagonal matrix D > 0 only by a matrix of rank 1

Statistics Theory Statistics Theory

comments

Fetching comments

Institut National d'Administration

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Sample complexity of the distinct elements problem

Ask ChatGPT about the research

No Arabic abstract

Read More