No Arabic abstract
In performing a Bayesian analysis, two difficult problems often emerge. First, in estimating the parameters of some model for the data, the resulting posterior distribution may be multi-modal or exhibit pronounced (curving) degeneracies. Secondly, in selecting between a set of competing models, calculation of the Bayesian evidence for each model is computationally expensive using existing methods such as thermodynamic integration. Nested Sampling is a Monte Carlo method targeted at the efficient calculation of the evidence, but also produces posterior inferences as a by-product and therefore provides means to carry out parameter estimation as well as model selection. The main challenge in implementing Nested Sampling is to sample from a constrained probability distribution. One possible solution to this problem is provided by the Galilean Monte Carlo (GMC) algorithm. We show results of applying Nested Sampling with GMC to some problems which have proven very difficult for standard Markov Chain Monte Carlo (MCMC) and down-hill methods, due to the presence of large number of local minima and/or pronounced (curving) degeneracies between the parameters. We also discuss the use of Nested Sampling with GMC in Bayesian object detection problems, which are inherently multi-modal and require the evaluation of Bayesian evidence for distinguishing between true and spurious detections.
Bayesian inference involves two main computational challenges. First, in estimating the parameters of some model for the data, the posterior distribution may well be highly multi-modal: a regime in which the convergence to stationarity of traditional Markov Chain Monte Carlo (MCMC) techniques becomes incredibly slow. Second, in selecting between a set of competing models the necessary estimation of the Bayesian evidence for each is, by definition, a (possibly high-dimensional) integration over the entire parameter space; again this can be a daunting computational task, although new Monte Carlo (MC) integration algorithms offer solutions of ever increasing efficiency. Nested sampling (NS) is one such contemporary MC strategy targeted at calculation of the Bayesian evidence, but which also enables posterior inference as a by-product, thereby allowing simultaneous parameter estimation and model selection. The widely-used MultiNest algorithm presents a particularly efficient implementation of the NS technique for multi-modal posteriors. In this paper we discuss importance nested sampling (INS), an alternative summation of the MultiNest draws, which can calculate the Bayesian evidence at up to an order of magnitude higher accuracy than `vanilla NS with no change in the way MultiNest explores the parameter space. This is accomplished by treating as a (pseudo-)importance sample the totality of points collected by MultiNest, including those previously discarded under the constrained likelihood sampling of the NS algorithm. We apply this technique to several challenging test problems and compare the accuracy of Bayesian evidences obtained with INS against those from vanilla NS.
In order to identify the infected individuals of a population, their samples are divided in equally sized groups called pools and a single laboratory test is applied to each pool. Individuals whose samples belong to pools that test negative are declared healthy, while each pool that tests positive is divided into smaller, equally sized pools which are tested in the next stage. This scheme is called adaptive, because the composition of the pools at each stage depends on results from previous stages, and nested because each pool is a subset of a pool of the previous stage. Is the infection probability $p$ is not smaller than $1-3^{-1/3}$ it is best to test each sample (no pooling). If $p<1-3^{-1/3}$, we compute the mean $D_k(m,p)$ and the variance of the number of tests per individual as a function of the pool sizes $m=(m_1,dots,m_k)$ in the first $k$ stages; in the $(k+1)$-th stage all remaining samples are tested. The case $k=1$ was proposed by Dorfman in his seminal paper in 1943. The goal is to minimize $D_k(m,p)$, which is called the cost associated to~$m$. We show that for $pin (0, 1-3^{-1/3})$ the optimal choice is one of four possible schemes, which are explicitly described. For $p>2^{-51}$ we show overwhelming numerical evidence that the best choice is $(3^ktext{ or }3^{k-1}4,3^{k-1},dots,3^2,3 )$, with a precise description of the range of $p$s where each holds. We then focus on schemes of the type $(3^k,dots,3)$, and estimate that the cost of the best scheme of this type for $p$, determined by the choice of $k=k_3(p)$, is of order $Obig(plog(1/p)big)$. This is the same order as that of the cost of the optimal scheme, and the difference of these costs is explicitly bounded. As an example, for $p=0.02$ the optimal choice is $k=3$, $m=(27,9,3)$, with cost $0.20$; that is, the mean number of tests required to screen 100 individuals is 20.
Metropolis Hastings nested sampling evolves a Markov chain, accepting new points along the chain according to a version of the Metropolis Hastings acceptance ratio, which has been modified to satisfy the nested sampling likelihood constraint. The geometric nested sampling algorithm I present here is based on the Metropolis Hastings method, but treats parameters as though they represent points on certain geometric objects, namely circles, tori and spheres. For parameters which represent points on a circle or torus, the trial distribution is wrapped around the domain of the posterior distribution such that samples cannot be rejected automatically when evaluating the Metropolis ratio due to being outside the sampling domain. Furthermore, this enhances the mobility of the sampler. For parameters which represent coordinates on the surface of a sphere, the algorithm transforms the parameters into a Cartesian coordinate system before sampling which again makes sure no samples are automatically rejected, and provides a physically intuitive way of the sampling the parameter space.
It was recently emphasised by Riley (2019); Schittenhelm & Wacker (2020) that that in the presence of plateaus in the likelihood function nested sampling (NS) produces faulty estimates of the evidence and posterior densities. After informally explaining the cause of the problem, we present a modified version of NS that handles plateaus and can be applied retrospectively to NS runs from popular NS software using anesthetic. In the modified NS, live points in a plateau are evicted one by one without replacement, with ordinary NS compression of the prior volume after each eviction but taking into account the dynamic number of live points. The live points are replenished once all points in the plateau are removed. We demonstrate it on a number of examples. Since the modification is simple, we propose that it becomes the canonical version of Skillings NS algorithm.
Gravitational-wave astronomers often wish to characterize the expected parameter-estimation accuracy of future observations. The Fisher matrix provides a lower bound on the spread of the maximum-likelihood estimator across noise realizations, as well as the leading-order width of the posterior probability, but it is limited to high signal strengths often not realized in practice. By contrast, Monte Carlo Bayesian inference provides the full posterior for any signal strength, but it is too expensive to repeat for a representative set of noises. Here I describe an efficient semianalytical technique to map the exact sampling distribution of the maximum-likelihood estimator across noise realizations, for any signal strength. This technique can be applied to any estimation problem for signals in additive Gaussian noise.