Combining independent p-values in replicability analysis: A comparative study

134 0 0.0 ( 0 )

Download Cite

Added by Thorsten Dickhaus

Publication date 2021

fields Mathematical Statistics

and research's language is English

Authors Anh-Tuan Hoang - Thorsten Dickhaus

Applications

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Given a family of null hypotheses $H_{1},ldots,H_{s}$, we are interested in the hypothesis $H_{s}^{gamma}$ that at most $gamma-1$ of these null hypotheses are false. Assuming that the corresponding $p$-values are independent, we are investigating combined $p$-values that are valid for testing $H_{s}^{gamma}$. In various settings in which $H_{s}^{gamma}$ is false, we determine which combined $p$-value works well in which setting. Via simulations, we find that the Stouffer method works well if the null $p$-values are uniformly distributed and the signal strength is low, and the Fisher method works better if the null $p$-values are conservative, i.e. stochastically larger than the uniform distribution. The minimum method works well if the evidence for the rejection of $H_{s}^{gamma}$ is focused on only a few non-null $p$-values, especially if the null $p$-values are conservative. Methods that incorporate the combination of $e$-values work well if the null hypotheses $H_{1},ldots,H_{s}$ are simple.

rate research

Randomized p-values for multiple testing and their application in replicability analysis

92 - Anh-Tuan Hoang , Thorsten Dickhaus 2019

We are concerned with testing replicability hypotheses for many endpoints simultaneously. This constitutes a multiple test problem with composite null hypotheses. Traditional $p$-values, which are computed under least favourable parameter configurations, are over-conservative in the case of composite null hypotheses. As demonstrated in prior work, this poses severe challenges in the multiple testing context, especially when one goal of the statistical analysis is to estimate the proportion $pi_0$ of true null hypotheses. Randomized $p$-values have been proposed to remedy this issue. In the present work, we discuss the application of randomized $p$-values in replicability analysis. In particular, we introduce a general class of statistical models for which valid, randomized $p$-values can be calculated easily. By means of computer simulations, we demonstrate that their usage typically leads to a much more accurate estimation of $pi_0$. Finally, we apply our proposed methodology to a real data example from genomics.

Methodology

Combining parameter values or $p$-values

303 - Louis Lyons , Emilien Chapon 2017

We review the methods to combine several measurements, in the form of parameter values or $p$-values.

Data Analysis Statistics and Probability High Energy Physics - Experiment

Accurate and Efficient Estimation of Small P-values with the Cross-Entropy Method: Applications in Genomic Data Analysis

104 - Yang Shi , Mengqiao Wang , Weiping Shi 2018

Small $p$-values are often required to be accurately estimated in large scale genomic studies for the adjustment of multiple hypothesis tests and the ranking of genomic features based on their statistical significance. For those complicated test statistics whose cumulative distribution functions are analytically intractable, existing methods usually do not work well with small $p$-values due to lack of accuracy or computational restrictions. We propose a general approach for accurately and efficiently calculating small $p$-values for a broad range of complicated test statistics based on the principle of the cross-entropy method and Markov chain Monte Carlo sampling techniques. We evaluate the performance of the proposed algorithm through simulations and demonstrate its application to three real examples in genomic studies. The results show that our approach can accurately evaluate small to extremely small $p$-values (e.g. $10^{-6}$ to $10^{-100}$). The proposed algorithm is helpful to the improvement of existing test procedures and the development of new test procedures in genomic studies.

Applications

Combining independent, arbitrarily weighted P-values: a new solution to an old problem using a novel expansion with controllable accuracy

479 - Gelio Alves , Yi-Kuo Yu 2010

Goods formula and Fishers method are frequently used for combining independent P-values. Interestingly, the equivalent of Goods formula already emerged in 1910 and mathematical expressions relevant to even more general situations have been repeatedly derived, albeit in different context. We provide here a novel derivation and show how the analytic formula obtained reduces to the two aforementioned ones as special cases. The main novelty of this paper, however, is the explicit treatment of nearly degenerate weights, which are known to cause numerical instabilities. We derive a controlled expansion, in powers of differences in inverse weights, that provides both accurate statistics and stable numerics.

Statistics Theory Quantitative Methods Statistics Theory

Churn Prediction with Sequential Data and Deep Neural Networks. A Comparative Analysis

409 - C. Gary Mena , Arno De Caigny , Kristof Coussement 2019

Off-the-shelf machine learning algorithms for prediction such as regularized logistic regression cannot exploit the information of time-varying features without previously using an aggregation procedure of such sequential data. However, recurrent neural networks provide an alternative approach by which time-varying features can be readily used for modeling. This paper assesses the performance of neural networks for churn modeling using recency, frequency, and monetary value data from a financial services provider. Results show that RFM variables in combination with LSTM neural networks have larger top-decile lift and expected maximum profit metrics than regularized logistic regression models with commonly-used demographic variables. Moreover, we show that using the fitted probabilities from the LSTM as feature in the logistic regression increases the out-of-sample performance of the latter by 25 percent compared to a model with only static features.

Applications Machine Learning Machine Learning

comments

Fetching comments

AlHawash Private University

Additional details More universities

Combining independent p-values in replicability analysis: A comparative study

Ask ChatGPT about the research

No Arabic abstract

Read More