ترغب بنشر مسار تعليمي؟ اضغط هنا

A statistical analysis of probabilistic counting algorithms

556   0   0.0 ( 0 )
 نشر من قبل Ioana Cosma
 تاريخ النشر 2010
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper considers the problem of cardinality estimation in data stream applications. We present a statistical analysis of probabilistic counting algorithms, focusing on two techniques that use pseudo-random variates to form low-dimensional data sketches. We apply conventional statistical methods to compare probabilistic algorithms based on storing either selected order statistics, or random projections. We derive estimators of the cardinality in both cases, and show that the maximal-term estimator is recursively computable and has exponentially decreasing error bounds. Furthermore, we show that the estimators have comparable asymptotic efficiency, and explain this result by demonstrating an unexpected connection between the two approaches.



قيم البحث

اقرأ أيضاً

This article is the rejoinder for the paper Probabilistic Integration: A Role in Statistical Computation? to appear in Statistical Science with discussion. We would first like to thank the reviewers and many of our colleagues who helped shape this pa per, the editor for selecting our paper for discussion, and of course all of the discussants for their thoughtful, insightful and constructive comments. In this rejoinder, we respond to some of the points raised by the discussants and comment further on the fundamental questions underlying the paper: (i) Should Bayesian ideas be used in numerical analysis?, and (ii) If so, what role should such approaches have in statistical computation?
Archetypal analysis is an unsupervised learning method for exploratory data analysis. One major challenge that limits the applicability of archetypal analysis in practice is the inherent computational complexity of the existing algorithms. In this pa per, we provide a novel approximation approach to partially address this issue. Utilizing probabilistic ideas from high-dimensional geometry, we introduce two preprocessing techniques to reduce the dimension and representation cardinality of the data, respectively. We prove that, provided the data is approximately embedded in a low-dimensional linear subspace and the convex hull of the corresponding representations is well approximated by a polytope with a few vertices, our method can effectively reduce the scaling of archetypal analysis. Moreover, the solution of the reduced problem is near-optimal in terms of prediction errors. Our approach can be combined with other acceleration techniques to further mitigate the intrinsic complexity of archetypal analysis. We demonstrate the usefulness of our results by applying our method to summarize several moderately large-scale datasets.
The main purpose of this paper is to facilitate the communication between the Analytic, Probabilistic and Algorithmic communities. We present a proof of convergence of the Hamiltonian (Hybrid) Monte Carlo algorithm from the point of view of the D ynamical Systems, where the evolving objects are densities of probability distributions and the tool are derived from the Functional Analysis.
Probabilistic modeling is a powerful approach for analyzing empirical information. We describe Edward, a library for probabilistic modeling. Edwards design reflects an iterative process pioneered by George Box: build a model of a phenomenon, make inf erences about the model given data, and criticize the models fit to the data. Edward supports a broad class of probabilistic models, efficient algorithms for inference, and many techniques for model criticism. The library builds on top of TensorFlow to support distributed training and hardware such as GPUs. Edward enables the development of complex probabilistic models and their algorithms at a massive scale.
114 - W.T. Giele 1997
Conventional jet algorithms are based on a deterministic view of the underlying hard scattering process. Each outgoing parton from the hard scattering is associated with a hard, well separated jet. This approach is very successful because it allows q uantitative predictions using lowest order perturbation theory. However, beyond leading order in the coupling constant, when quantum fluctuations are included, deterministic jet algorithms will become problematic precisely because they attempt to describe an inherently stochastic quantum process using deterministic, classical language. This demands a shift in the way we view jet algorithms. We make a first attempt at constructing more probabilistic jet algorithms that reflect the properties of the underlying hard scattering and explore the basic properties and problems of such an approach.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا