ترغب بنشر مسار تعليمي؟ اضغط هنا

Sample complexity of the distinct elements problem

71   0   0.0 ( 0 )
 نشر من قبل Pengkun Yang
 تاريخ النشر 2016
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider the distinct elements problem, where the goal is to estimate the number of distinct colors in an urn containing $ k $ balls based on $n$ samples drawn with replacements. Based on discrete polynomial approximation and interpolation, we propose an estimator with additive error guarantee that achieves the optimal sample complexity within $O(loglog k)$ factors, and in fact within constant factors for most cases. The estimator can be computed in $O(n)$ time for an accurate estimation. The result also applies to sampling without replacement provided the sample size is a vanishing fraction of the urn size. One of the key auxiliary results is a sharp bound on the minimum singular values of a real rectangular Vandermonde matrix, which might be of independent interest.



قيم البحث

اقرأ أيضاً

In this paper we provide a provably convergent algorithm for the multivariate Gaussian Maximum Likelihood version of the Behrens--Fisher Problem. Our work builds upon a formulation of the log-likelihood function proposed by Buot and Richards citeBR. Instead of focusing on the first order optimality conditions, the algorithm aims directly for the maximization of the log-likelihood function itself to achieve a global solution. Convergence proof and complexity estimates are provided for the algorithm. Computational experiments illustrate the applicability of such methods to high-dimensional data. We also discuss how to extend the proposed methodology to a broader class of problems. We establish a systematic algebraic relation between the Wald, Likelihood Ratio and Lagrangian Multiplier Test ($Wgeq mathit{LR}geq mathit{LM}$) in the context of the Behrens--Fisher Problem. Moreover, we use our algorithm to computationally investigate the finite-sample size and power of the Wald, Likelihood Ratio and Lagrange Multiplier Tests, which previously were only available through asymptotic results. The methods developed here are applicable to much higher dimensional settings than the ones available in the literature. This allows us to better capture the role of high dimensionality on the actual size and power of the tests for finite samples.
129 - Rui Wang , Wangli Xu 2021
This paper is concerned with the problem of comparing the population means of two groups of independent observations. An approximate randomization test procedure based on the test statistic of Chen & Qin (2010) is proposed. The asymptotic behavior of the test statistic as well as the randomized statistic is studied under weak conditions. In our theoretical framework, observations are not assumed to be identically distributed even within groups. No condition on the eigenstructure of the covariance is imposed. And the sample sizes of two groups are allowed to be unbalanced. Under general conditions, all possible asymptotic distributions of the test statistic are obtained. We derive the asymptotic level and local power of the proposed test procedure. Our theoretical results show that the proposed test procedure can adapt to all possible asymptotic distributions of the test statistic and always has correct test level asymptotically. Also, the proposed test procedure has good power behavior. Our numerical experiments show that the proposed test procedure has favorable performance compared with several altervative test procedures.
142 - Zekun Ye , Lvzhou Li 2021
The hidden subgroup problem ($mathsf{HSP}$) has been attracting much attention in quantum computing, since several well-known quantum algorithms including Shor algorithm can be described in a uniform framework as quantum methods to address different instances of it. One of the central issues about $mathsf{HSP}$ is to characterize its quantum/classical complexity. For example, from the viewpoint of learning theory, sample complexity is a crucial concept. However, while the quantum sample complexity of the problem has been studied, a full characterization of the classical sample complexity of $mathsf{HSP}$ seems to be absent, which will thus be the topic in this paper. $mathsf{HSP}$ over a finite group is defined as follows: For a finite group $G$ and a finite set $V$, given a function $f:G to V$ and the promise that for any $x, y in G, f(x) = f(xy)$ iff $y in H$ for a subgroup $H in mathcal{H}$, where $mathcal{H}$ is a set of candidate subgroups of $G$, the goal is to identify $H$. Our contributions are as follows: For $mathsf{HSP}$, we give the upper and lower bounds on the sample complexity of $mathsf{HSP}$. Furthermore, we have applied the result to obtain the sample complexity of some concrete instances of hidden subgroup problem. Particularly, we discuss generalized Simons problem ($mathsf{GSP}$), a special case of $mathsf{HSP}$, and show that the sample complexity of $mathsf{GSP}$ is $Thetaleft(maxleft{k,sqrt{kcdot p^{n-k}}right}right)$. Thus we obtain a complete characterization of the sample complexity of $mathsf{GSP}$.
We study uniqueness in the generalized lasso problem, where the penalty is the $ell_1$ norm of a matrix $D$ times the coefficient vector. We derive a broad result on uniqueness that places weak assumptions on the predictor matrix $X$ and penalty matr ix $D$; the implication is that, if $D$ is fixed and its null space is not too large (the dimension of its null space is at most the number of samples), and $X$ and response vector $y$ jointly follow an absolutely continuous distribution, then the generalized lasso problem has a unique solution almost surely, regardless of the number of predictors relative to the number of samples. This effectively generalizes previous uniqueness results for the lasso problem (which corresponds to the special case $D=I$). Further, we extend our study to the case in which the loss is given by the negative log-likelihood from a generalized linear model. In addition to uniqueness results, we derive results on the local stability of generalized lasso solutions that might be of interest in their own right.
198 - T. Royen 2008
For a sample of absolutely bounded i.i.d. random variables with a continuous density the cumulative distribution function of the sample variance is represented by a univariate integral over a Fourier series. If the density is a polynomial or a trigon ometrical polynomial the coefficients of this series are simple finite terms containing only the error function, the exponential function and powers. In more general cases - e.g. for all beta densities - the coefficients are given by some series expansions. The method is generalized to positive semi-definite quadratic forms of bounded independent but not necessarily identically distributed random variables if the form matrix differs from a diagonal matrix D > 0 only by a matrix of rank 1
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا