Efficiently estimating small p-values in permutation tests using importance sampling and cross-entropy method

110 0 0.0 ( 0 )

Download Cite

Added by Yang Shi

Publication date 2016

fields Mathematical Statistics

and research's language is English

Authors Yang Shi - Huining Kang - Ji-Hyun Lee

Computation

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Permutation tests are commonly used for estimating p-values from statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is not available or unreliable for finite sample sizes. One critical challenge for permutation tests in genomic studies is that an enormous number of permutations is needed for obtaining reliable estimations of small p-values, which requires intensive computational efforts. In this paper, we develop a computationally efficient algorithm for evaluating small p-values from permutation tests based on an adaptive importance sampling approach, which uses the cross-entropy method for finding the optimal proposal density. Simulation studies and analysis of a real microarray dataset demonstrate that our approach achieves considerable gains in computational efficiency comparing with existing methods.

rate research

Accurate and Efficient Estimation of Small P-values with the Cross-Entropy Method: Applications in Genomic Data Analysis

104 - Yang Shi , Mengqiao Wang , Weiping Shi 2018

Small $p$-values are often required to be accurately estimated in large scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small $p$-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently calculating small $p$-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small $p$-values (e.g. $10^{-6}$ to $10^{-100}$). The proposed algorithm is helpful to the improvement of existing test procedures and the development of new test procedures in genomic studies.

Applications

Advances in Importance Sampling

118 - Victor Elvira , Luca Martino 2021

Importance sampling (IS) is a Monte Carlo technique for the approximation of intractable distributions and integrals with respect to them. The origin of IS dates from the early 1950s. In the last decades, the rise of the Bayesian paradigm and the increase of the available computational resources have propelled the interest in this theoretically sound methodology. In this paper, we first describe the basic IS algorithm and then revisit the recent advances in this methodology. We pay particular attention to two sophisticated lines. First, we focus on multiple IS (MIS), the case where more than one proposal is available. Second, we describe adaptive IS (AIS), the generic methodology for adapting one or more proposals.

Computation

Selecting Reduced Models in the Cross-Entropy Method

108 - Patrick Heas 2018

This paper deals with the estimation of rare event probabilities using importance sampling (IS), where an optimal proposal distribution is computed with the cross-entropy (CE) method. Although, IS optimized with the CE method leads to an efficient reduction of the estimator variance, this approach remains unaffordable for problems where the repeated evaluation of the score function represents a too intensive computational effort. This is often the case for score functions related to the solution of a partial differential equation (PDE) with random inputs. This work proposes to alleviate computation by the parsimonious use of a hierarchy of score function approximations in the CE optimization process. The score function approximation is obtained by selecting the surrogate of lowest dimensionality, whose accuracy guarantees to pass the current CE optimization stage. The selection relies on certified upper bounds on the error norm. An asymptotic analysis provides some theoretical guarantees on the efficiency and convergence of the proposed algorithm. Numerical results demonstrate the gain brought by the method in the context of pollution alerts and a system modeled by a PDE.

Computation Probability

Adaptive Multiple Importance Sampling

128 - Jean-Marie Cornuet 2009

The Adaptive Multiple Importance Sampling (AMIS) algorithm is aimed at an optimal recycling of past simulations in an iterated importance sampling scheme. The difference with earlier adaptive importance sampling implementations like Population Monte Carlo is that the importance weights of all simulated values, past as well as present, are recomputed at each iteration, following the technique of the deterministic multiple mixture estimator of Owen and Zhou (2000). Although the convergence properties of the algorithm cannot be fully investigated, we demonstrate through a challenging banana shape target distribution and a population genetics example that the improvement brought by this technique is substantial.

Computation Applications

Layered Adaptive Importance Sampling

236 - L. Martino , V. Elvira , D. Luengo 2015

Monte Carlo methods represent the de facto standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities to draw candidate samples. The performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a layered (i.e., hierarchical) procedure to generate samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal density is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity. Furthermore, we provide a general unified importance sampling (IS) framework, where multiple proposal densities are employed and several IS schemes are introduced by applying the so-called deterministic mixture approach. Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms efficiently combine the benefits of both IS and MCMC methods.

Computation Machine Learning Machine Learning