Do you want to publish a course? Click here

Improving genetic risk prediction by leveraging pleiotropy

164   0   0.0 ( 0 )
 Added by Cong Li
 Publication date 2013
  fields Biology
and research's language is English




Ask ChatGPT about the research

An important task of human genetics studies is to accurately predict disease risks in individuals based on genetic markers, which allows for identifying individuals at high disease risks, and facilitating their disease treatment and prevention. Although hundreds of genome-wide association studies (GWAS) have been conducted on many complex human traits in recent years, there has been only limited success in translating these GWAS data into clinically useful risk prediction models. The predictive capability of GWAS data is largely bottlenecked by the available training sample size due to the presence of numerous variants carrying only small to modest effects. Recent studies have shown that different human traits may share common genetic bases. Therefore, an attractive strategy to increase the training sample size and hence improve the prediction accuracy is to integrate data of genetically correlated phenotypes. Yet the utility of genetic correlation in risk prediction has not been explored in the literature. In this paper, we analyzed GWAS data for bipolar and related disorders (BARD) and schizophrenia (SZ) with a bivariate ridge regression method, and found that jointly predicting the two phenotypes could substantially increase prediction accuracy as measured by the AUC (area under the receiver operating characteristic curve). We also found similar prediction accuracy improvements when we jointly analyzed GWAS data for Crohns disease (CD) and ulcerative colitis (UC). The empirical observations were substantiated through our comprehensive simulation studies, suggesting that a gain in prediction accuracy can be obtained by combining phenotypes with relatively high genetic correlations. Through both real data and simulation studies, we demonstrated pleiotropy as a valuable asset that opens up a new opportunity to improve genetic risk prediction in the future.



rate research

Read More

Much evolutionary information is stored in the fluctuations of protein length distributions. The genome size and non-coding DNA content can be calculated based only on the protein length distributions. So there is intrinsic relationship between the coding DNA size and non-coding DNA size. According to the correlations and quasi-periodicity of protein length distributions, we can classify life into three domains. Strong evidences are found to support the order in the structures of protein length distributions.
The phenotypic consequences of individual mutations are modulated by the wild type genetic background in which they occur.Although such background dependence is widely observed, we do not know whether general patterns across species and traits exist, nor about the mechanisms underlying it. We also lack knowledge on how mutations interact with genetic background to influence gene expression, and how this in turn mediates mutant phenotypes. Furthermore, how genetic background influences patterns of epistasis remains unclear. To investigate the genetic basis and genomic consequences of genetic background dependence of the scallopedE3 allele on the Drosophila melanogaster wing, we generated multiple novel genome level datasets from a mapping by introgression experiment and a tagged RNA gene expression dataset. In addition we used whole genome re-sequencing of the parental lines two commonly used laboratory strains to predict polymorphic transcription factor binding sites for SD. We integrated these data with previously published genomic datasets from expression microarrays and a modifier mutation screen. By searching for genes showing a congruent signal across multiple datasets, we were able to identify a robust set of candidate loci contributing to the background dependent effects of mutations in sd. We also show that the majority of background-dependent modifiers previously reported are caused by higher-order epistasis, not quantitative non-complementation. These findings provide a useful foundation for more detailed investigations of genetic background dependence in this system, and this approach is likely to prove useful in exploring the genetic basis of other traits as well.
144 - Miloje M. Rakocevic 2007
In this paper it is shown that within a Combined Genetic Code Table, realized through a combination of Watson-Crick Table and Codon Path Cube it exists, without an exception, a strict distinction between two classes of enzymes aminoacyl-tRNA synthetases, corresponding two classes of amino acids and belonging codons. By this, the distinction itself is followed by a strict balance of atom number within two subclasses of class I as well as two subclasses of class II of amino acids.
Enhancer-promoter interactions (EPIs) regulate the expression of specific genes in cells, and EPIs are important for understanding gene regulation, cell differentiation and disease mechanisms. EPI identification through the wet experiments is costly and time-consuming, and computational methods are in demand. In this paper, we propose a deep neural network-based method EPIHC based on sequence-derived features and genomic features for the EPI prediction. EPIHC extracts features from enhancer and promoter sequences respectively using convolutional neural networks (CNN), and then design a communicative learning module to captures the communicative information between enhancer and promoter sequences. EPIHC also take the genomic features of enhancers and promoters into account. At last, EPIHC combines sequence-derived features and genomic features to predict EPIs. The computational experiments show that EPIHC outperforms the existing state-of-the-art EPI prediction methods on the benchmark datasets and chromosome-split datasets, and the study reveal that the communicative learning module can bring explicit information about EPIs, which is ignore by CNN. Moreover, we consider two strategies to improve performances of EPIHC in the cross-cell line prediction, and experimental results show that EPIHC constructed on training cell lines exhibit improved performances for the other cell lines.
Several well-established benchmark predictors exist for Value-at-Risk (VaR), a major instrument for financial risk management. Hybrid methods combining AR-GARCH filtering with skewed-$t$ residuals and the extreme value theory-based approach are particularly recommended. This study introduces yet another VaR predictor, G-VaR, which follows a novel methodology. Inspired by the recent mathematical theory of sublinear expectation, G-VaR is built upon the concept of model uncertainty, which in the present case signifies that the inherent volatility of financial returns cannot be characterized by a single distribution but rather by infinitely many statistical distributions. By considering the worst scenario among these potential distributions, the G-VaR predictor is precisely identified. Extensive experiments on both the NASDAQ Composite Index and S&P500 Index demonstrate the excellent performance of the G-VaR predictor, which is superior to most existing benchmark VaR predictors.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا