No Arabic abstract
This paper examines the statistical properties of a distributional form that arises from pooled testing for the prevalence of a binary outcome. Our base distribution is a two-parameter distribution using a prevalence and excess intensity parameter; the latter is included to allow for a dilution or intensification effect with larger pools. We also examine a generalised form of the distribution where pools have covariate information that affects the prevalence through a linked linear form. We study the general pooled binomial distribution in its own right and as a special case of broader forms of binomial GLMs using the complementary log-log link function. We examine the information function and show the information content of individual sample items. We demonstrate that pooling reduces information content of sample units and we give simple heuristics for choosing an optimal pool size for testing. We derive the form of the log-likelihood function and its derivatives and give results for maximum likelihood estimation. We also discuss diagnostic testing of the positive pool probabilities, including testing for intensification/dilution in the testing mechanism. We illustrate the use of this distribution by applying it to pooled testing data on virus prevalence in a mosquito population.
For many decades, statisticians have made attempts to prepare the Bayesian omelette without breaking the Bayesian eggs; that is, to obtain probabilistic likelihood-based inferences without relying on informative prior distributions. A recent example is Murray Aitkins recent book, {em Statistical Inference}, which presents an approach to statistical hypothesis testing based on comparisons of posterior distributions of likelihoods under competing models. Aitkin develops and illustrates his method using some simple examples of inference from iid data and two-way tests of independence. We analyze in this note some consequences of the inferential paradigm adopted therein, discussing why the approach is incompatible with a Bayesian perspective and why we do not find it relevant for applied work.
This article proposes a visualization method for multidimensional data based on: (i) Animated functional Hypothetical Outcome Plots (f-HOPs); (ii) 3-dimensional Kiviat plot; and (iii) data sonification. In an Uncertainty Quantification (UQ) framework, such analysis coupled with standard statistical analysis tools such as Probability Density Functions (PDF) can be used to augment the understanding of how the uncertainties in the numerical code inputs translate into uncertainties in the quantity of interest (QoI). In contrast with static representation of most advanced techniques such as functional Highest Density Region (HDR) boxplot or functional boxplot, f-HOPs is a dynamic visualization that enables the practitioners to infer the dynamics of the physics and enables to see functional correlations that may exist. While this technique only allows to represent the QoI, we propose a 3-dimensional version of the Kiviat plot to encode all input parameters. This new visualization takes advantage of information from f-HOPs through data sonification. All in all, this allows to analyse large datasets within a high-dimensional parameter space and a functional QoI in the same canvas. The proposed method is assessed and showed its benefits on two related environmental datasets.
We prove a monotonicity property of the Hurwitz zeta function which, in turn, translates into a chain of inequalities for polygamma functions of different orders. We provide a probabilistic interpretation of our result by exploiting a connection between Hurwitz zeta function and the cumulants of the beta-exponential distribution.
For more than a century, fingerprints have been used with considerable success to identify criminals or verify the identity of individuals. The categorical conclusion scheme used by fingerprint examiners, and more generally the inference process followed by forensic scientists, have been heavily criticised in the scientific and legal literature. Instead, scholars have proposed to characterise the weight of forensic evidence using the Bayes factor as the key element of the inference process. In forensic science, quantifying the magnitude of support is equally as important as determining which model is supported. Unfortunately, the complexity of fingerprint patterns render likelihood-based inference impossible. In this paper, we use an Approximate Bayesian Computation model selection algorithm to quantify the weight of fingerprint evidence. We supplement the ABC algorithm using a Receiver Operating Characteristic curve to mitigate the effect of the curse of dimensionality. Our modified algorithm is computationally efficient and makes it easier to monitor convergence as the number of simulations increase. We use our method to quantify the weight of fingerprint evidence in forensic science, but we note that it can be applied to any other forensic pattern evidence.
As a classic parameter from the binomial distribution, the binomial proportion has been well studied in the literature owing to its wide range of applications. In contrast, the reciprocal of the binomial proportion, also known as the inverse proportion, is often overlooked, even though it also plays an important role in various fields including clinical studies and random sampling. The maximum likelihood estimator of the inverse proportion suffers from the zero-event problem, and to overcome it, alternative methods have been developed in the literature. Nevertheless, there is little work addressing the optimality of the existing estimators, as well as their practical performance comparison. Inspired by this, we propose to further advance the literature by developing an optimal estimator for the inverse proportion in a family of shrinkage estimators. We further derive the explicit and approximate formulas for the optimal shrinkage parameter under different settings. Simulation studies show that the performance of our new estimator performs better than, or as well as, the existing competitors in most practical settings. Finally, to illustrate the usefulness of our new method, we also revisit a recent meta-analysis on COVID-19 data for assessing the relative risks of physical distancing on the infection of coronavirus, in which six out of seven studies encounter the zero-event problem.