No Arabic abstract
Gene-gene interactions have long been recognized to be fundamentally important to understand genetic causes of complex disease traits. At present, identifying gene-gene interactions from genome-wide case-control studies is computationally and methodologically challenging. In this paper, we introduce a simple but powerful method, named `BOolean Operation based Screening and Testing(BOOST). To discover unknown gene-gene interactions that underlie complex diseases, BOOST allows examining all pairwise interactions in genome-wide case-control studies in a remarkably fast manner. We have carried out interaction analyses on seven data sets from the Wellcome Trust Case Control Consortium (WTCCC). Each analysis took less than 60 hours on a standard 3.0 GHz desktop with 4G memory running Windows XP system. The interaction patterns identified from the type 1 diabetes data set display significant difference from those identified from the rheumatoid arthritis data set, while both data sets share a very similar hit region in the WTCCC report. BOOST has also identified many undiscovered interactions between genes in the major histocompatibility complex (MHC) region in the type 1 diabetes data set. In the coming era of large-scale interaction mapping in genome-wide case-control studies, our method can serve as a computationally and statistically useful tool.
Motivation: The rapid growth in genome-wide association studies (GWAS) in plants and animals has brought about the need for a central resource that facilitates i) performing GWAS, ii) accessing data and results of other GWAS, and iii) enabling all users regardless of their background to exploit the latest statistical techniques without having to manage complex software and computing resources. Results: We present easyGWAS, a web platform that provides methods, tools and dynamic visualizations to perform and analyze GWAS. In addition, easyGWAS makes it simple to reproduce results of others, validate findings, and access larger sample sizes through merging of public datasets. Availability: Detailed method and data descriptions as well as tutorials are available in the supplementary materials. easyGWAS is available at http://easygwas.tuebingen.mpg.de/. Contact:
[email protected]
The Dissertation is focused on the studies of associations between functional elements in human genome and their nucleotide structure. The asymmetry in nucleotide content (skew, bias) was chosen as the main feature for nucleotide structure. A significant difference in nucleotide content asymmetry was found for human exons vs. introns. Specifically, exon sequences display bias for purines (i.e., excess of A and G over C and T), while introns exhibit keto-amino skew (i.e. excess of G and T over A and C). The extents of these biases depend upon gene expression patterns. The highest intronic keto-amino skew is found in the introns of housekeeping genes. In the case of introns, whose sequences are under weak repair system, the AT->GC and CG->TA substitutions are preferentially accumulated. A comparative analysis of gene sequences encoding cytochrome P450 2E1 of Homo sapiens and representative mammals was done. The cladistic tree on the basis of coding sequences similarity of the gene Cyp2e1 was constructed. A new programming tools of NCBI database sequence mining and analysis was developed, resulting in construction of a own database.
Our understanding of how chromosomes structurally organize and dynamically interact has been revolutionized through the lens of long-chain polymer physics. Major protein contributors to chromosome structure and dynamics are condensin and cohesin that stochastically generate loops within and between chains, and entrap proximal strands of sister chromatids. In this paper, we explore the ability of transient, protein-mediated, gene-gene crosslinks to induce clusters of genes, thereby dynamic architecture, within the highly repeated ribosomal DNA that comprises the nucleolus of budding yeast. We implement three approaches: live cell microscopy; computational modeling of the full genome during G1 in budding yeast, exploring four decades of timescales for transient crosslinks between 5k bp domains in the nucleolus on Chromosome XII; and, temporal network models with automated community detection algorithms applied to the full range of 4D modeling datasets. The data analysis tools detect and track gene clusters, their size, number, persistence time, and their plasticity. Of biological significance, our analysis reveals an optimal mean crosslink lifetime that promotes pairwise and cluster gene interactions through flexible clustering. In this state, large gene clusters self-assemble yet frequently interact, marked by gene exchanges between clusters, which in turn maximizes global gene interactions in the nucleolus. This regime stands between two limiting cases each with far less global gene interactions: with shorter crosslink lifetimes, rigid clustering emerges with clusters that interact infrequently; with longer crosslink lifetimes, there is a dissolution of clusters. These observations are compared with imaging experiments on a normal yeast strain and two condensin-modified mutant cell strains, applying the same image analysis pipeline to the experimental and simulated datasets.
The analysis of differential gene expression from RNA-Seq data has become a standard for several research areas mainly involving bioinformatics. The steps for the computational analysis of these data include many data types and file formats, and a wide variety of computational tools that can be applied alone or together as pipelines. This paper presents a review of differential expression analysis pipeline, addressing its steps and the respective objectives, the principal methods available in each step and their properties, bringing an overview in an organized way in this context. In particular, this review aims to address mainly the aspects involved in the differentially expressed gene (DEG) analysis from RNA sequencing data (RNA-Seq), considering the computational methods and its properties. In addition, a timeline of the evolution of computational methods for DEG is presented and discussed, as well as the relationships existing between the main computational tools are presented by an interaction network. A discussion on the challenges and gaps in DEG analysis is also highlighted in this review.
We report on a theoretical study of point mutations effects on charge transfer properties in the DNA sequence of the tumor-suppressor p53 gene. On the basis of effective single-strand or double-strand tight-binding models which simulate hole propagation along the DNA, a statistical analysis of charge transmission modulations associated with all possible point mutations is performed. We find that in contrast to non-cancerous mutations, mutation hotspots tend to result in significantly weaker {em changes of transmission properties}. This suggests that charge transport could play a significant role for DNA-repairing deficiency yielding carcinogenesis.