No Arabic abstract
Bayesian inference involves two main computational challenges. First, in estimating the parameters of some model for the data, the posterior distribution may well be highly multi-modal: a regime in which the convergence to stationarity of traditional Markov Chain Monte Carlo (MCMC) techniques becomes incredibly slow. Second, in selecting between a set of competing models the necessary estimation of the Bayesian evidence for each is, by definition, a (possibly high-dimensional) integration over the entire parameter space; again this can be a daunting computational task, although new Monte Carlo (MC) integration algorithms offer solutions of ever increasing efficiency. Nested sampling (NS) is one such contemporary MC strategy targeted at calculation of the Bayesian evidence, but which also enables posterior inference as a by-product, thereby allowing simultaneous parameter estimation and model selection. The widely-used MultiNest algorithm presents a particularly efficient implementation of the NS technique for multi-modal posteriors. In this paper we discuss importance nested sampling (INS), an alternative summation of the MultiNest draws, which can calculate the Bayesian evidence at up to an order of magnitude higher accuracy than `vanilla NS with no change in the way MultiNest explores the parameter space. This is accomplished by treating as a (pseudo-)importance sample the totality of points collected by MultiNest, including those previously discarded under the constrained likelihood sampling of the NS algorithm. We apply this technique to several challenging test problems and compare the accuracy of Bayesian evidences obtained with INS against those from vanilla NS.
In performing a Bayesian analysis, two difficult problems often emerge. First, in estimating the parameters of some model for the data, the resulting posterior distribution may be multi-modal or exhibit pronounced (curving) degeneracies. Secondly, in selecting between a set of competing models, calculation of the Bayesian evidence for each model is computationally expensive using existing methods such as thermodynamic integration. Nested Sampling is a Monte Carlo method targeted at the efficient calculation of the evidence, but also produces posterior inferences as a by-product and therefore provides means to carry out parameter estimation as well as model selection. The main challenge in implementing Nested Sampling is to sample from a constrained probability distribution. One possible solution to this problem is provided by the Galilean Monte Carlo (GMC) algorithm. We show results of applying Nested Sampling with GMC to some problems which have proven very difficult for standard Markov Chain Monte Carlo (MCMC) and down-hill methods, due to the presence of large number of local minima and/or pronounced (curving) degeneracies between the parameters. We also discuss the use of Nested Sampling with GMC in Bayesian object detection problems, which are inherently multi-modal and require the evaluation of Bayesian evidence for distinguishing between true and spurious detections.
We investigate the use of the Multiple Optimised Parameter Estimation and Data compression algorithm (MOPED) for data compression and faster evaluation of likelihood functions. Since MOPED only guarantees maintaining the Fisher matrix of the likelihood at a chosen point, multimodal and some degenerate distributions will present a problem. We present examples of scenarios in which MOPED does faithfully represent the true likelihood but also cases in which it does not. Through these examples, we aim to define a set of criteria for which MOPED will accurately represent the likelihood and hence may be used to obtain a significant reduction in the time needed to calculate it. These criteria may involve the evaluation of the full likelihood function for comparison.
The Baikal-GVD is a large scale neutrino telescope being constructed in Lake Baikal. The majority of signal detected by the telescope are noise hits, caused primarily by the luminescence of the Baikal water. Separating noise hits from the hits produced by Cherenkov light emitted from the muon track is a challenging part of the muon event reconstruction. We present an algorithm that utilizes a known directional hit causality criterion to contruct a graph of hits and then use a clique-based technique to select the subset of signal hits.The algorithm was tested on realistic detector Monte-Carlo simulation for a wide range of muon energies and has proved to select a pure sample of PMT hits from Cherenkov photons while retaining above 90% of original signal.
In order to identify the infected individuals of a population, their samples are divided in equally sized groups called pools and a single laboratory test is applied to each pool. Individuals whose samples belong to pools that test negative are declared healthy, while each pool that tests positive is divided into smaller, equally sized pools which are tested in the next stage. This scheme is called adaptive, because the composition of the pools at each stage depends on results from previous stages, and nested because each pool is a subset of a pool of the previous stage. Is the infection probability $p$ is not smaller than $1-3^{-1/3}$ it is best to test each sample (no pooling). If $p<1-3^{-1/3}$, we compute the mean $D_k(m,p)$ and the variance of the number of tests per individual as a function of the pool sizes $m=(m_1,dots,m_k)$ in the first $k$ stages; in the $(k+1)$-th stage all remaining samples are tested. The case $k=1$ was proposed by Dorfman in his seminal paper in 1943. The goal is to minimize $D_k(m,p)$, which is called the cost associated to~$m$. We show that for $pin (0, 1-3^{-1/3})$ the optimal choice is one of four possible schemes, which are explicitly described. For $p>2^{-51}$ we show overwhelming numerical evidence that the best choice is $(3^ktext{ or }3^{k-1}4,3^{k-1},dots,3^2,3 )$, with a precise description of the range of $p$s where each holds. We then focus on schemes of the type $(3^k,dots,3)$, and estimate that the cost of the best scheme of this type for $p$, determined by the choice of $k=k_3(p)$, is of order $Obig(plog(1/p)big)$. This is the same order as that of the cost of the optimal scheme, and the difference of these costs is explicitly bounded. As an example, for $p=0.02$ the optimal choice is $k=3$, $m=(27,9,3)$, with cost $0.20$; that is, the mean number of tests required to screen 100 individuals is 20.
Variational Inference (VI) is a popular alternative to asymptotically exact sampling in Bayesian inference. Its main workhorse is optimization over a reverse Kullback-Leibler divergence (RKL), which typically underestimates the tail of the posterior leading to miscalibration and potential degeneracy. Importance sampling (IS), on the other hand, is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. The quality of IS crucially depends on the choice of the proposal distribution. Ideally, the proposal distribution has heavier tails than the target, which is rarely achievable by minimizing the RKL. We thus propose a novel combination of optimization and sampling techniques for approximate Bayesian inference by constructing an IS proposal distribution through the minimization of a forward KL (FKL) divergence. This approach guarantees asymptotic consistency and a fast convergence towards both the optimal IS estimator and the optimal variational approximation. We empirically demonstrate on real data that our method is competitive with variational boosting and MCMC.