An examination of the generalised pooled binomial distribution and its information properties


Abstract in English

This paper examines the statistical properties of a distributional form that arises from pooled testing for the prevalence of a binary outcome. Our base distribution is a two-parameter distribution using a prevalence and excess intensity parameter; the latter is included to allow for a dilution or intensification effect with larger pools. We also examine a generalised form of the distribution where pools have covariate information that affects the prevalence through a linked linear form. We study the general pooled binomial distribution in its own right and as a special case of broader forms of binomial GLMs using the complementary log-log link function. We examine the information function and show the information content of individual sample items. We demonstrate that pooling reduces information content of sample units and we give simple heuristics for choosing an optimal pool size for testing. We derive the form of the log-likelihood function and its derivatives and give results for maximum likelihood estimation. We also discuss diagnostic testing of the positive pool probabilities, including testing for intensification/dilution in the testing mechanism. We illustrate the use of this distribution by applying it to pooled testing data on virus prevalence in a mosquito population.

Download