No Arabic abstract
In order to identify the infected individuals of a population, their samples are divided in equally sized groups called pools and a single laboratory test is applied to each pool. Individuals whose samples belong to pools that test negative are declared healthy, while each pool that tests positive is divided into smaller, equally sized pools which are tested in the next stage. This scheme is called adaptive, because the composition of the pools at each stage depends on results from previous stages, and nested because each pool is a subset of a pool of the previous stage. Is the infection probability $p$ is not smaller than $1-3^{-1/3}$ it is best to test each sample (no pooling). If $p<1-3^{-1/3}$, we compute the mean $D_k(m,p)$ and the variance of the number of tests per individual as a function of the pool sizes $m=(m_1,dots,m_k)$ in the first $k$ stages; in the $(k+1)$-th stage all remaining samples are tested. The case $k=1$ was proposed by Dorfman in his seminal paper in 1943. The goal is to minimize $D_k(m,p)$, which is called the cost associated to~$m$. We show that for $pin (0, 1-3^{-1/3})$ the optimal choice is one of four possible schemes, which are explicitly described. For $p>2^{-51}$ we show overwhelming numerical evidence that the best choice is $(3^ktext{ or }3^{k-1}4,3^{k-1},dots,3^2,3 )$, with a precise description of the range of $p$s where each holds. We then focus on schemes of the type $(3^k,dots,3)$, and estimate that the cost of the best scheme of this type for $p$, determined by the choice of $k=k_3(p)$, is of order $Obig(plog(1/p)big)$. This is the same order as that of the cost of the optimal scheme, and the difference of these costs is explicitly bounded. As an example, for $p=0.02$ the optimal choice is $k=3$, $m=(27,9,3)$, with cost $0.20$; that is, the mean number of tests required to screen 100 individuals is 20.
In performing a Bayesian analysis, two difficult problems often emerge. First, in estimating the parameters of some model for the data, the resulting posterior distribution may be multi-modal or exhibit pronounced (curving) degeneracies. Secondly, in selecting between a set of competing models, calculation of the Bayesian evidence for each model is computationally expensive using existing methods such as thermodynamic integration. Nested Sampling is a Monte Carlo method targeted at the efficient calculation of the evidence, but also produces posterior inferences as a by-product and therefore provides means to carry out parameter estimation as well as model selection. The main challenge in implementing Nested Sampling is to sample from a constrained probability distribution. One possible solution to this problem is provided by the Galilean Monte Carlo (GMC) algorithm. We show results of applying Nested Sampling with GMC to some problems which have proven very difficult for standard Markov Chain Monte Carlo (MCMC) and down-hill methods, due to the presence of large number of local minima and/or pronounced (curving) degeneracies between the parameters. We also discuss the use of Nested Sampling with GMC in Bayesian object detection problems, which are inherently multi-modal and require the evaluation of Bayesian evidence for distinguishing between true and spurious detections.
Food webs represent the set of consumer-resource interactions among a set of species that co-occur in a habitat, but most food web studies have omitted parasites and their interactions. Recent studies have provided conflicting evidence on whether including parasites changes food web structure, with some suggesting that parasitic interactions are structurally distinct from those among free-living species while others claim the opposite. Here, we describe a principled method for understanding food web structure that combines an efficient optimization algorithm from statistical physics called parallel tempering with a probabilistic generalization of the empirically well-supported food web niche model. This generative model approach allows us to rigorously estimate the degree to which interactions that involve parasites are statistically distinguishable from interactions among free-living species, whether parasite niches behave similarly to free-living niches, and the degree to which existing hypotheses about food web structure are naturally recovered. We apply this method to the well-studied Flensburg Fjord food web and show that while predation on parasites, concomitant predation of parasites, and parasitic intraguild trophic interactions are largely indistinguishable from free-living predation interactions, parasite-host interactions are different. These results provide a powerful new tool for evaluating the impact of classes of species and interactions on food web structure to shed new light on the roles of parasites in food webs
Bayesian inference involves two main computational challenges. First, in estimating the parameters of some model for the data, the posterior distribution may well be highly multi-modal: a regime in which the convergence to stationarity of traditional Markov Chain Monte Carlo (MCMC) techniques becomes incredibly slow. Second, in selecting between a set of competing models the necessary estimation of the Bayesian evidence for each is, by definition, a (possibly high-dimensional) integration over the entire parameter space; again this can be a daunting computational task, although new Monte Carlo (MC) integration algorithms offer solutions of ever increasing efficiency. Nested sampling (NS) is one such contemporary MC strategy targeted at calculation of the Bayesian evidence, but which also enables posterior inference as a by-product, thereby allowing simultaneous parameter estimation and model selection. The widely-used MultiNest algorithm presents a particularly efficient implementation of the NS technique for multi-modal posteriors. In this paper we discuss importance nested sampling (INS), an alternative summation of the MultiNest draws, which can calculate the Bayesian evidence at up to an order of magnitude higher accuracy than `vanilla NS with no change in the way MultiNest explores the parameter space. This is accomplished by treating as a (pseudo-)importance sample the totality of points collected by MultiNest, including those previously discarded under the constrained likelihood sampling of the NS algorithm. We apply this technique to several challenging test problems and compare the accuracy of Bayesian evidences obtained with INS against those from vanilla NS.
We show that univariate and symmetric multivariate Hawkes processes are only weakly causal: the true log-likelihoods of real and reversed event time vectors are almost equal, thus parameter estimation via maximum likelihood only weakly depends on the direction of the arrow of time. In ideal (synthetic) conditions, tests of goodness of parametric fit unambiguously reject backward event times, which implies that inferring kernels from time-symmetric quantities, such as the autocovariance of the event rate, only rarely produce statistically significant fits. Finally, we find that fitting financial data with many-parameter kernels may yield significant fits for both arrows of time for the same event time vector, sometimes favouring the backward time direction. This goes to show that a significant fit of Hawkes processes to real data with flexible kernels does not imply a definite arrow of time unless one tests it.
We show that qualitatively different epidemic-like processes from distinct societal domains (finance, social and commercial blockbusters, epidemiology) can be quantitatively understood using the same unifying conceptual framework taking into account the interplay between the timescales of the grouping and fragmentation of social groups together with typical epidemic transmission processes. Different domain-specific empirical infection profiles, featuring multiple resurgences and abnormal decay times, are reproduced simply by varying the timescales for group formation and individual transmission. Our model emphasizes the need to account for the dynamic evolution of multi-connected networks. Our results reveal a new minimally-invasive dynamical method for controlling such outbreaks, help fill a gap in existing epidemiological theory, and offer a new understanding of complex system response functions.