No Arabic abstract
The phenotypic consequences of individual mutations are modulated by the wild type genetic background in which they occur.Although such background dependence is widely observed, we do not know whether general patterns across species and traits exist, nor about the mechanisms underlying it. We also lack knowledge on how mutations interact with genetic background to influence gene expression, and how this in turn mediates mutant phenotypes. Furthermore, how genetic background influences patterns of epistasis remains unclear. To investigate the genetic basis and genomic consequences of genetic background dependence of the scallopedE3 allele on the Drosophila melanogaster wing, we generated multiple novel genome level datasets from a mapping by introgression experiment and a tagged RNA gene expression dataset. In addition we used whole genome re-sequencing of the parental lines two commonly used laboratory strains to predict polymorphic transcription factor binding sites for SD. We integrated these data with previously published genomic datasets from expression microarrays and a modifier mutation screen. By searching for genes showing a congruent signal across multiple datasets, we were able to identify a robust set of candidate loci contributing to the background dependent effects of mutations in sd. We also show that the majority of background-dependent modifiers previously reported are caused by higher-order epistasis, not quantitative non-complementation. These findings provide a useful foundation for more detailed investigations of genetic background dependence in this system, and this approach is likely to prove useful in exploring the genetic basis of other traits as well.
The premise of genetic analysis is that a causal link exists between phenotypic and allelic variation. Yet it has long been documented that mutant phenotypes are not a simple result of a single DNA lesion, but rather are due to interactions of the focal allele with other genes and the environment. Although an experimentally rigorous approach, focusing on individual mutations and isogenic control strains, has facilitated amazing progress within genetics and related fields, a glimpse back suggests that a vast complexity has been omitted from our current understanding of allelic effects. Armed with traditional genetic analyses and the foundational knowledge they have provided, we argue that the time and tools are ripe to return to the under-explored aspects of gene function and embrace the context-dependent nature of genetic effects. We assert that a broad understanding of genetic effects and the evolutionary dynamics of alleles requires identifying how mutational outcomes depend upon the wild-type genetic background. Furthermore, we discuss how best to exploit genetic background effects to broaden genetic research programs.
RNA-Seq technology allows for studying the transcriptional state of the cell at an unprecedented level of detail. Beyond quantification of whole-gene expression, it is now possible to disentangle the abundance of individual alternatively spliced transcript isoforms of a gene. A central question is to understand the regulatory processes that lead to differences in relative abundance variation due to external and genetic factors. Here, we present a mixed model approach that allows for (i) joint analysis and genetic mapping of multiple transcript isoforms and (ii) mapping of isoform-specific effects. Central to our approach is to comprehensively model the causes of variation and correlation between transcript isoforms, including the genomic background and technical quantification uncertainty. As a result, our method allows to accurately test for shared as well as transcript-specific genetic regulation of transcript isoforms and achieves substantially improved calibration of these statistical tests. Experiments on genotype and RNA-Seq data from 126 human HapMap individuals demonstrate that our model can help to obtain a more fine-grained picture of the genetic basis of gene expression variation.
Much evolutionary information is stored in the fluctuations of protein length distributions. The genome size and non-coding DNA content can be calculated based only on the protein length distributions. So there is intrinsic relationship between the coding DNA size and non-coding DNA size. According to the correlations and quasi-periodicity of protein length distributions, we can classify life into three domains. Strong evidences are found to support the order in the structures of protein length distributions.
An important task of human genetics studies is to accurately predict disease risks in individuals based on genetic markers, which allows for identifying individuals at high disease risks, and facilitating their disease treatment and prevention. Although hundreds of genome-wide association studies (GWAS) have been conducted on many complex human traits in recent years, there has been only limited success in translating these GWAS data into clinically useful risk prediction models. The predictive capability of GWAS data is largely bottlenecked by the available training sample size due to the presence of numerous variants carrying only small to modest effects. Recent studies have shown that different human traits may share common genetic bases. Therefore, an attractive strategy to increase the training sample size and hence improve the prediction accuracy is to integrate data of genetically correlated phenotypes. Yet the utility of genetic correlation in risk prediction has not been explored in the literature. In this paper, we analyzed GWAS data for bipolar and related disorders (BARD) and schizophrenia (SZ) with a bivariate ridge regression method, and found that jointly predicting the two phenotypes could substantially increase prediction accuracy as measured by the AUC (area under the receiver operating characteristic curve). We also found similar prediction accuracy improvements when we jointly analyzed GWAS data for Crohns disease (CD) and ulcerative colitis (UC). The empirical observations were substantiated through our comprehensive simulation studies, suggesting that a gain in prediction accuracy can be obtained by combining phenotypes with relatively high genetic correlations. Through both real data and simulation studies, we demonstrated pleiotropy as a valuable asset that opens up a new opportunity to improve genetic risk prediction in the future.
The Kolmogorov-Arnold stochasticity parameter technique is applied for the first time to the study of cancer genome sequencing, to reveal mutations. Using data generated by next generation sequencing technologies, we have analyzed the exome sequences of brain tumor patients with matched tumor and normal blood. We show that mutations contained in sequencing data can be revealed using this technique thus providing a new methodology for determining subsequences of given length containing mutations i.e. its value differs from those of subsequences without mutations. A potential application for this technique involves simplifying the procedure of finding segments with mutations, speeding up genomic research, and accelerating its implementation in clinical diagnostic. Moreover, the prediction of a mutation associated to a family of frequent mutations in numerous types of cancers based purely on the value of the Kolmogorov function, indicates that this applied marker may recognize genomic sequences that are in extremely low abundance and can be used in revealing new types of mutations.