Detecting weak signals by combining small P-values in genetic association studies

283 0 0.0 ( 0 )

Download Cite

Added by Olga Vsevolozhskaya

Publication date 2018

fields Mathematical Statistics

and research's language is English

Authors Olga A. Vsevolozhskaya - Fengjiao Hu - Dmitri V. Zaykin

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We approach the problem of combining top-ranking association statistics or P-value from a new perspective which leads to a remarkably simple and powerful method. Statistical methods, such as the Rank Truncated Product (RTP), have been developed for combining top-ranking associations and this general strategy proved to be useful in applications for detecting combined effects of multiple disease components. To increase power, these methods aggregate signals across top ranking SNPs, while adjusting for their total number assessed in a study. Analytic expressions for combined top statistics or P-values tend to be unwieldy, which complicates interpretation, practical implementation, and hinders further developments. Here, we propose the Augmented Rank Truncation (ART) method that retains main characteristics of the RTP but is substantially simpler to implement. ART leads to an efficient form of the adaptive algorithm, an approach where the number of top ranking SNPs is varied to optimize power. We illustrate our methods by strengthening previously reported associations of $mu$-opioid receptor variants with sensitivity to pain.

rate research

Combining parameter values or $p$-values

303 - Louis Lyons , Emilien Chapon 2017

We review the methods to combine several measurements, in the form of parameter values or $p$-values.

Data Analysis Statistics and Probability High Energy Physics - Experiment

On Combining Data From Genome-Wide Association Studies to Discover Disease-Associated SNPs

505 - Ruth M. Pfeiffer , Mitchell H. Gail , David Pee 2010

Combining data from several case-control genome-wide association (GWA) studies can yield greater efficiency for detecting associations of disease with single nucleotide polymorphisms (SNPs) than separate analyses of the component studies. We compared several procedures to combine GWA study data both in terms of the power to detect a disease-associated SNP while controlling the genome-wide significance level, and in terms of the detection probability ($mathit{DP}$). The $mathit{DP}$ is the probability that a particular disease-associated SNP will be among the $T$ most promising SNPs selected on the basis of low $p$-values. We studied both fixed effects and random effects models in which associations varied across studies. In settings of practical relevance, meta-analytic approaches that focus on a single degree of freedom had higher power and $mathit{DP}$ than global tests such as summing chi-square test-statistics across studies, Fishers combination of $p$-values, and forming a combined list of the best SNPs from within each study.

Methodology

Variable Prioritization in Nonlinear Black Box Methods: A Genetic Association Case Study

210 - Lorin Crawford , Seth R. Flaxman , Daniel E. Runcie 2018

The central aim in this paper is to address variable selection questions in nonlinear and nonparametric regression. Motivated by statistical genetics, where nonlinear interactions are of particular interest, we introduce a novel and interpretable way to summarize the relative importance of predictor variables. Methodologically, we develop the RelATive cEntrality (RATE) measure to prioritize candidate genetic variants that are not just marginally important, but whose associations also stem from significant covarying relationships with other variants in the data. We illustrate RATE through Bayesian Gaussian process regression, but the methodological innovations apply to other black box methods. It is known that nonlinear models often exhibit greater predictive accuracy than linear models, particularly for phenotypes generated by complex genetic architectures. With detailed simulations and two real data association mapping studies, we show that applying RATE enables an explanation for this improved performance.

Methodology Quantitative Methods Applications

Combining independent p-values in replicability analysis: A comparative study

133 - Anh-Tuan Hoang , Thorsten Dickhaus 2021

Given a family of null hypotheses $H_{1},ldots,H_{s}$, we are interested in the hypothesis $H_{s}^{gamma}$ that at most $gamma-1$ of these null hypotheses are false. Assuming that the corresponding $p$-values are independent, we are investigating combined $p$-values that are valid for testing $H_{s}^{gamma}$. In various settings in which $H_{s}^{gamma}$ is false, we determine which combined $p$-value works well in which setting. Via simulations, we find that the Stouffer method works well if the null $p$-values are uniformly distributed and the signal strength is low, and the Fisher method works better if the null $p$-values are conservative, i.e. stochastically larger than the uniform distribution. The minimum method works well if the evidence for the rejection of $H_{s}^{gamma}$ is focused on only a few non-null $p$-values, especially if the null $p$-values are conservative. Methods that incorporate the combination of $e$-values work well if the null hypotheses $H_{1},ldots,H_{s}$ are simple.

Applications

Variable Selection with Second-Generation P-Values

75 - Yi Zuo , Thomas G. Stewart , Jeffrey D. Blume 2020

Many statistical methods have been proposed for variable selection in the past century, but few balance inference and prediction tasks well. Here we report on a novel variable selection approach called Penalized regression with Second-Generation P-Values (ProSGPV). It captures the true model at the best rate achieved by current standards, is easy to implement in practice, and often yields the smallest parameter estimation error. The idea is to use an l0 penalization scheme with second-generation p-values (SGPV), instead of traditional ones, to determine which variables remain in a model. The approach yields tangible advantages for balancing support recovery, parameter estimation, and prediction tasks. The ProSGPV algorithm can maintain its good performance even when there is strong collinearity among features or when a high dimensional feature space with p > n is considered. We present extensive simulations and a real-world application comparing the ProSGPV approach with smoothly clipped absolute deviation (SCAD), adaptive lasso (AL), and mini-max concave penalty with penalized linear unbiased selection (MC+). While the last three algorithms are among the current standards for variable selection, ProSGPV has superior inference performance and comparable prediction performance in certain scenarios. Supplementary materials are available online.

Methodology