Exact PPS Sampling with Bounded Sample Size

85 0 0.0 ( 0 )

Download Cite

Added by Brian Hentschel

Publication date 2021

fields Mathematical Statistics

and research's language is English

Authors Brian Hentschel - Peter J. Haas - Yuanyuan Tian

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Probability proportional to size (PPS) sampling schemes with a target sample size aim to produce a sample comprising a specified number $n$ of items while ensuring that each item in the population appears in the sample with a probability proportional to its specified weight (also called its size). These two objectives, however, cannot always be achieved simultaneously. Existing PPS schemes prioritize control of the sample size, violating the PPS property if necessary. We provide a new PPS scheme that allows a different trade-off: our method enforces the PPS property at all times while ensuring that the sample size never exceeds the target value $n$. The sample size is exactly equal to $n$ if possible, and otherwise has maximal expected value and minimal variance. Thus we bound the sample size, thereby avoiding storage overflows and helping to control the time required for analytics over the sample, while allowing the user complete control over the sample contents. The method is both simple to implement and efficient, being a one-pass streaming algorithm with an amortized processing time of $O(1)$ per item.

rate research

Estimating Population Size with Link-Tracing Sampling

521 - Kyle Vincent , Steve Thompson 2012

We present a new design and inference method for estimating population size of a hidden population best reached through a link-tracing design. The strategy involves the Rao-Blackwell Theorem applied to a sufficient statistic markedly different from the usual one that arises in sampling from a finite population. An empirical application is described. The result demonstrates that the strategy can efficiently incorporate adaptively selected members of the sample into the inference procedure.

Methodology

Bayesian Update with Importance Sampling: Required Sample Size

58 - Daniel Sanz-Alonso , Zijian Wang 2020

Importance sampling is used to approximate Bayes rule in many computational approaches to Bayesian inverse problems, data assimilation and machine learning. This paper reviews and further investigates the required sample size for importance sampling in terms of the $chi^2$-divergence between target and proposal. We develop general abstract theory and illustrate through numerous examples the roles that dimension, noise-level and other model parameters play in approximating the Bayesian update with importance sampling. Our examples also facilitate a new direct comparison of standard and optimal proposals for particle filtering.

Computation

Sample Size Calculations for SMARTs

65 - Eric J. Rose , Eric B. Laber , Marie Davidian 2019

Sequential Multiple Assignment Randomized Trials (SMARTs) are considered the gold standard for estimation and evaluation of treatment regimes. SMARTs are typically sized to ensure sufficient power for a simple comparison, e.g., the comparison of two fixed treatment sequences. Estimation of an optimal treatment regime is conducted as part of a secondary and hypothesis-generating analysis with formal evaluation of the estimated optimal regime deferred to a follow-up trial. However, running a follow-up trial to evaluate an estimated optimal treatment regime is costly and time-consuming; furthermore, the estimated optimal regime that is to be evaluated in such a follow-up trial may be far from optimal if the original trial was underpowered for estimation of an optimal regime. We derive sample size procedures for a SMART that ensure: (i) sufficient power for comparing the optimal treatment regime with standard of care; and (ii) the estimated optimal regime is within a given tolerance of the true optimal regime with high-probability. We establish asymptotic validity of the proposed procedures and demonstrate their finite sample performance in a series of simulation experiments.

Methodology

Recent Advances on Estimating Population Size with Link-Tracing Sampling

126 - Kyle Vincent 2017

A new approach to estimate population size based on a stratified link-tracing sampling design is presented. The method extends on the Frank and Snijders (1994) approach by allowing for heterogeneity in the initial sample selection procedure. Rao-Blackwell estimators and corresponding resampling approximations similar to that detailed in Vincent and Thompson (2017) are explored. An empirical application is provided for a hard-to-reach networked population. The results demonstrate that the approach has much potential for application to such populations. Supplementary materials for this article are available online.

Methodology

Estimating the size and distribution of networked populations with snowball sampling

233 - Kyle Vincent , Steve Thompson 2014

A new strategy is introduced for estimating population size and networked population characteristics. Sample selection is based on a multi-wave snowball sampling design. A generalized stochastic block model is posited for the populations network graph. Inference is based on a Bayesian data augmentation procedure. Applications are provided to an empirical and simulated populations. The results demonstrate that statistically efficient estimates of the size and distribution of the population can be achieved.

Methodology