A Penalized Multi-trait Mixed Model for Association Mapping in Pedigree-based GWAS

384 0 0.0 ( 0 )

Download Cite

Added by Jin Liu Jin Liu

Publication date 2013

fields Mathematical Statistics

and research's language is English

Authors Jin Liu - Can Yang - Xingjie Shi

Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In genome-wide association studies (GWAS), penalization is an important approach for identifying genetic markers associated with trait while mixed model is successful in accounting for a complicated dependence structure among samples. Therefore, penalized linear mixed model is a tool that combines the advantages of penalization approach and linear mixed model. In this study, a GWAS with multiple highly correlated traits is analyzed. For GWAS with multiple quantitative traits that are highly correlated, the analysis using traits marginally inevitably lose some essential information among multiple traits. We propose a penalized-MTMM, a penalized multivariate linear mixed model that allows both the within-trait and between-trait variance components simultaneously for multiple traits. The proposed penalized-MTMM estimates variance components using an AI-REML method and conducts variable selection and point estimation simultaneously using group MCP and sparse group MCP. Best linear unbiased predictor (BLUP) is used to find predictive values and the Pearsons correlations between predictive values and their corresponding observations are used to evaluate prediction performance. Both prediction and selection performance of the proposed approach and its comparison with the uni-trait penalized-LMM are evaluated through simulation studies. We apply the proposed approach to a GWAS data from Genetic Analysis Workshop (GAW) 18.

rate research

Multi-Model Penalized Regression

109 - Laura J. Wendelberger , Brian J. Reich , Alyson G. Wilson 2020

Model fitting often aims to fit a single model, assuming that the imposed form of the model is correct. However, there may be multiple possible underlying explanatory patterns in a set of predictors that could explain a response. Model selection without regarding model uncertainty can fail to bring these patterns to light. We present multi-model penalized regression (MMPR) to acknowledge model uncertainty in the context of penalized regression. In the penalty form explored here, we examine how different settings can promote either shrinkage or sparsity of coefficients in separate models. The method is tuned to explicitly limit model similarity. A choice of penalty form that enforces variable selection is applied to predict stacking fault energy (SFE) from steel alloy composition. The aim is to identify multiple models with different subsets of covariates that explain a single type of response.

Methodology Machine Learning

LMM-Lasso: A Lasso Multi-Marker Mixed Model for Association Mapping with Population Structure Correction

660 - Barbara Rakitsch , Christoph Lippert , Oliver Stegle 2012

Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In simple cases, single polymorphic loci explain a significant fraction of the phenotype variability. However, many traits of interest appear to be subject to multifactorial control by groups of genetic loci instead. Accurate detection of such multivariate associations is nontrivial and often hindered by limited power. At the same time, confounding influences such as population structure cause spurious association signals that result in false positive findings if they are not accounted for in the model. Here, we propose LMM-Lasso, a mixed model that allows for both, multi-locus mapping and correction for confounding effects. Our approach is simple and free of tuning parameters, effectively controls for population structure and scales to genome-wide datasets. We show practical use in genome-wide association studies and linkage mapping through retrospective analyses. In data from Arabidopsis thaliana and mouse, our method is able to find a genetic cause for significantly greater fractions of phenotype variation in 91% of the phenotypes considered. At the same time, our model dissects this variability into components that result from individual SNP effects and population structure. In addition to this increase of genetic heritability, enrichment of known candidate genes suggests that the associations retrieved by LMM-Lasso are more likely to be genuine.

Populations and Evolution Genomics Quantitative Methods

Model-based clustering of Gaussian copulas for mixed data

351 - Matthieu Marbac , Christophe Biernacki , 2014

Clustering task of mixed data is a challenging problem. In a probabilistic framework, the main difficulty is due to a shortage of conventional distributions for such data. In this paper, we propose to achieve the mixed data clustering with a Gaussian copula mixture model, since copulas, and in particular the Gaussian ones, are powerful tools for easily modelling the distribution of multivariate variables. Indeed, considering a mixing of continuous, integer and ordinal variables (thus all having a cumulative distribution function), this copula mixture model defines intra-component dependencies similar to a Gaussian mixture, so with classical correlation meaning. Simultaneously, it preserves standard margins associated to continuous, integer and ordered features, namely the Gaussian, the Poisson and the ordered multinomial distributions. As an interesting by-product, the proposed mixture model generalizes many well-known ones and also provides tools of visualization based on the parameters. At a practical level, the Bayesian inference is retained and it is achieved with a Metropolis-within-Gibbs sampler. Experiments on simulated and real data sets finally illustrate the expected advantages of the proposed model for mixed data: flexible and meaningful parametrization combined with visualization features.

Methodology

Testing for Association in Multi-View Network Data

96 - Lucy L. Gao , Daniela Witten , Jacob Bien 2019

In this paper, we consider data consisting of multiple networks, each comprised of a different edge set on a common set of nodes. Many models have been proposed for the analysis of such multi-view network data under the assumption that the data views are closely related. In this paper, we provide tools for evaluating this assumption. In particular, we ask: given two networks that each follow a stochastic block model, is there an association between the latent community memberships of the nodes in the two networks? To answer this question, we extend the stochastic block model for a single network view to the two-view setting, and develop a new hypothesis test for the null hypothesis that the latent community memberships in the two data views are independent. We apply our test to protein-protein interaction data from the HINT database (Das and Hint, 2012). We find evidence of a weak association between the latent community memberships of proteins defined with respect to binary interaction data and the latent community memberships of proteins defined with respect to co-complex association data. We also extend this proposal to the setting of a network with node covariates.

Methodology Machine Learning

Fast selection of nonlinear mixed effect models using penalized likelihood

131 - Edouard Ollier 2021

Nonlinear Mixed effects models are hidden variables models that are widely used in many field such as pharmacometrics. In such models, the distribution characteristics of hidden variables can be specified by including several parameters such as covariates or correlations which must be selected. Recent development of pharmacogenomics has brought averaged/high dimensional problems to the field of nonlinear mixed effects modeling for which standard covariates selection techniques like stepwise methods are not well suited. This work proposes to select covariates and correlation parameters using a penalized likelihood approach. The penalized likelihood problem is solved using a stochastic proximal gradient algorithm to avoid inner-outer iterations. Speed of convergence of the proximal gradient algorithm is improved by the use of component-wise adaptive gradient step sizes. The practical implementation and tuning of the proximal gradient algorithm is explored using simulations. Calibration of regularization parameters is performed by minimizing the Bayesian Information Criterion using particle swarm optimization, a zero order optimization procedure. The use of warm restart and parallelization allows to reduce significantly computing time. The performance of the proposed method compared to the traditional grid search strategy is explored using simulated data. Finally, an application to real data from two pharmacokinetics studies is provided, one studying an antifibrinolitic and the other studying an antibiotic.

Methodology Computation