No Arabic abstract
This article proposes a visualization method for multidimensional data based on: (i) Animated functional Hypothetical Outcome Plots (f-HOPs); (ii) 3-dimensional Kiviat plot; and (iii) data sonification. In an Uncertainty Quantification (UQ) framework, such analysis coupled with standard statistical analysis tools such as Probability Density Functions (PDF) can be used to augment the understanding of how the uncertainties in the numerical code inputs translate into uncertainties in the quantity of interest (QoI). In contrast with static representation of most advanced techniques such as functional Highest Density Region (HDR) boxplot or functional boxplot, f-HOPs is a dynamic visualization that enables the practitioners to infer the dynamics of the physics and enables to see functional correlations that may exist. While this technique only allows to represent the QoI, we propose a 3-dimensional version of the Kiviat plot to encode all input parameters. This new visualization takes advantage of information from f-HOPs through data sonification. All in all, this allows to analyse large datasets within a high-dimensional parameter space and a functional QoI in the same canvas. The proposed method is assessed and showed its benefits on two related environmental datasets.
Stochastic kriging has been widely employed for simulation metamodeling to predict the response surface of a complex simulation model. However, its use is limited to cases where the design space is low-dimensional, because the number of design points required for stochastic kriging to produce accurate prediction, in general, grows exponentially in the dimension of the design space. The large sample size results in both a prohibitive sample cost for running the simulation model and a severe computational challenge due to the need of inverting large covariance matrices. Based on tensor Markov kernels and sparse grid experimental designs, we develop a novel methodology that dramatically alleviates the curse of dimensionality. We show that the sample complexity of the proposed methodology grows very mildly in the dimension, even under model misspecification. We also develop fast algorithms that compute stochastic kriging in its exact form without any approximation schemes. We demonstrate via extensive numerical experiments that our methodology can handle problems with a design space of more than 10,000 dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice.
This paper examines the statistical properties of a distributional form that arises from pooled testing for the prevalence of a binary outcome. Our base distribution is a two-parameter distribution using a prevalence and excess intensity parameter; the latter is included to allow for a dilution or intensification effect with larger pools. We also examine a generalised form of the distribution where pools have covariate information that affects the prevalence through a linked linear form. We study the general pooled binomial distribution in its own right and as a special case of broader forms of binomial GLMs using the complementary log-log link function. We examine the information function and show the information content of individual sample items. We demonstrate that pooling reduces information content of sample units and we give simple heuristics for choosing an optimal pool size for testing. We derive the form of the log-likelihood function and its derivatives and give results for maximum likelihood estimation. We also discuss diagnostic testing of the positive pool probabilities, including testing for intensification/dilution in the testing mechanism. We illustrate the use of this distribution by applying it to pooled testing data on virus prevalence in a mosquito population.
Abstract In Extreme Value methodology the choice of threshold plays an important role in efficient modelling of observations exceeding the threshold. The threshold must be chosen high enough to ensure an unbiased extreme value index but choosing the threshold too high results in uncontrolled variances. This paper investigates a generalized model that can assist in the choice of optimal threshold values in the gamma positive domain. A Bayesian approach is considered by deriving a posterior distribution for the unknown generalized parameter. Using the properties of the posterior distribution allows for a method to choose an optimal threshold without visual inspection.
For many decades, statisticians have made attempts to prepare the Bayesian omelette without breaking the Bayesian eggs; that is, to obtain probabilistic likelihood-based inferences without relying on informative prior distributions. A recent example is Murray Aitkins recent book, {em Statistical Inference}, which presents an approach to statistical hypothesis testing based on comparisons of posterior distributions of likelihoods under competing models. Aitkin develops and illustrates his method using some simple examples of inference from iid data and two-way tests of independence. We analyze in this note some consequences of the inferential paradigm adopted therein, discussing why the approach is incompatible with a Bayesian perspective and why we do not find it relevant for applied work.
For more than a century, fingerprints have been used with considerable success to identify criminals or verify the identity of individuals. The categorical conclusion scheme used by fingerprint examiners, and more generally the inference process followed by forensic scientists, have been heavily criticised in the scientific and legal literature. Instead, scholars have proposed to characterise the weight of forensic evidence using the Bayes factor as the key element of the inference process. In forensic science, quantifying the magnitude of support is equally as important as determining which model is supported. Unfortunately, the complexity of fingerprint patterns render likelihood-based inference impossible. In this paper, we use an Approximate Bayesian Computation model selection algorithm to quantify the weight of fingerprint evidence. We supplement the ABC algorithm using a Receiver Operating Characteristic curve to mitigate the effect of the curse of dimensionality. Our modified algorithm is computationally efficient and makes it easier to monitor convergence as the number of simulations increase. We use our method to quantify the weight of fingerprint evidence in forensic science, but we note that it can be applied to any other forensic pattern evidence.