New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A powerful and efficient set test for genetic markers that handles confounders

159 0 0.0 ( 0 )

Download Cite

Added by Jennifer Listgarten

Publication date 2012

fields Biology Mathematical Statistics

and research's language is English

Authors Jennifer Listgarten - Christoph Lippert - Eun Yong Kang

Genomics Applications Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Approaches for testing sets of variants, such as a set of rare or common variants within a gene or pathway, for association with complex traits are important. In particular, set tests allow for aggregation of weak signal within a set, can capture interplay among variants, and reduce the burden of multiple hypothesis testing. Until now, these approaches did not address confounding by family relatedness and population structure, a problem that is becoming more important as larger data sets are used to increase power. Results: We introduce a new approach for set tests that handles confounders. Our model is based on the linear mixed model and uses two random effects-one to capture the set association signal and one to capture confounders. We also introduce a computational speedup for two-random-effects models that makes this approach feasible even for extremely large cohorts. Using this model with both the likelihood ratio test and score test, we find that the former yields more power while controlling type I error. Application of our approach to richly structured GAW14 data demonstrates that our method successfully corrects for population structure and family relatedness, while application of our method to a 15,000 individual Crohns disease case-control cohort demonstrates that it additionally recovers genes not recoverable by univariate analysis. Availability: A Python-based library implementing our approach is available at http://mscompbio.codeplex.com

rate research

Genetic Analysis of Transformed Phenotypes

204 - Nicolo Fusi , Christoph Lippert , Neil D. Lawrence 2014

Linear mixed models (LMMs) are a powerful and established tool for studying genotype-phenotype relationships. A limiting assumption of LMMs is that the residuals are Gaussian distributed, a requirement that rarely holds in practice. Violations of this assumption can lead to false conclusions and losses in power, and hence it is common practice to pre-process the phenotypic values to make them Gaussian, for instance by applying logarithmic or other non-linear transformations. Unfortunately, different phenotypes require different specific transformations, and choosing a good transformation is in general challenging and subjective. Here, we present an extension of the LMM that estimates an optimal transformation from the observed data. In extensive simulations and applications to real data from human, mouse and yeast we show that using such optimal transformations lead to increased power in genome-wide association studies and higher accuracy in heritability estimates and phenotype predictions.

Genomics Applications

DOT: Gene-set analysis by combining decorrelated association statistics

87 - Olga A Vsevolozhskaya , Min Shi , Fengjiao Hu 2019

Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost management, difficulties in consolidation of records across research groups, etc. These issues make methods based on SNP-level summary statistics for a joint analysis of variants in a group particularly appealing. The most common form of combining statistics is a sum of SNP-level squared scores, possibly weighted, as in burden tests for rare variants. The overall significance of the resulting statistic is evaluated using its distribution under the null hypothesis. Here, we demonstrate that this basic approach can be substantially improved by decorrelating scores prior to their addition, resulting in remarkable power gains in situations that are most commonly encountered in practice; namely, under heterogeneity of effect sizes and diversity between pairwise LD. In these situations, the power of the traditional test, based on the added squared scores, quickly reaches a ceiling, as the number of variants increases. Thus, the traditional approach does not benefit from information potentially contained in any additional SNPs, while our decorrelation by orthogonal transformation (DOT) method yields steady gain in power. We present theoretical and computational analyses of both approaches, and reveal causes behind sometimes dramatic difference in their respective powers. We showcase DOT by analyzing breast cancer data, in which our method strengthened levels of previously reported associations and implied the possibility of multiple new alleles that jointly confer breast cancer risk.

Genomics Applications

A New Genetic Code Table

144 - Miloje M. Rakocevic 2007

In this paper it is shown that within a Combined Genetic Code Table, realized through a combination of Watson-Crick Table and Codon Path Cube it exists, without an exception, a strict distinction between two classes of enzymes aminoacyl-tRNA synthetases, corresponding two classes of amino acids and belonging codons. By this, the distinction itself is followed by a strict balance of atom number within two subclasses of class I as well as two subclasses of class II of amino acids.

Genomics Biomolecules

MPAgenomics : An R package for multi-patients analysis of genomic markers

399 - Quentin Grimonprez , Meyling Cheok 2014

MPAgenomics, standing for multi-patients analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation, and (ii) genomic marker selection from multi-patient copy number and SNP data profiles. It provides wrappers from commonly used packages to facilitate their repeated (sometimes difficult) use, offering an easy-to-use pipeline for beginners in R. The segmentation of successive multiple profiles (finding losses and gains) is based on a new automatic choice of influential parameters since default ones were misleading in the original packages. Considering multiple profiles in the same time, MPAgenomics wraps efficient penalized regression methods to select relevant markers associated with a given response.

Quantitative Methods Applications

Efficient Randomized Test-And-Set Implementations

325 - George Giakkoupis , Philipp Woelfel 2019

We study randomized test-and-set (TAS) implementations from registers in the asynchronous shared memory model with n processes. We introduce the problem of group election, a natural variant of leader election, and propose a framework for the implementation of TAS objects from group election objects. We then present two group election algorithms, each yielding an efficient TAS implementation. The first implementation has expected max-step complexity $O(log^ast k)$ in the location-oblivious adversary model, and the second has expected max-step complexity $O(loglog k)$ against any read/write-oblivious adversary, where $kleq n$ is the contention. These algorithms improve the previous upper bound by Alistarh and Aspnes [2] of $O(loglog n)$ expected max-step complexity in the oblivious adversary model. We also propose a modification to a TAS algorithm by Alistarh, Attiya, Gilbert, Giurgiu, and Guerraoui [5] for the strong adaptive adversary, which improves its space complexity from super-linear to linear, while maintaining its $O(log n)$ expected max-step complexity. We then describe how this algorithm can be combined with any randomized TAS algorithm that has expected max-step complexity $T(n)$ in a weaker adversary model, so that the resulting algorithm has $O(log n)$ expected max-step complexity against any strong adaptive adversary and $O(T(n))$ in the weaker adversary model. Finally, we prove that for any randomized 2-process TAS algorithm, there exists a schedule determined by an oblivious adversary such that with probability at least $(1/4)^t$ one of the processes needs at least t steps to finish its TAS operation. This complements a lower bound by Attiya and Censor-Hillel [7] on a similar problem for $ngeq 3$ processes.

Distributed Parallel and Cluster Computing

comments

Fetching comments

Sham Private University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A powerful and efficient set test for genetic markers that handles confounders

Ask ChatGPT about the research

No Arabic abstract

Read More