ترغب بنشر مسار تعليمي؟ اضغط هنا

TREEOME: A framework for epigenetic and transcriptomic data integration to explore regulatory interactions controlling transcription

94   0   0.0 ( 0 )
 نشر من قبل David Budden
 تاريخ النشر 2015
  مجال البحث علم الأحياء
والبحث باللغة English




اسأل ChatGPT حول البحث

Motivation: Predictive modelling of gene expression is a powerful framework for the in silico exploration of transcriptional regulatory interactions through the integration of high-throughput -omics data. A major limitation of previous approaches is their inability to handle conditional and synergistic interactions that emerge when collectively analysing genes subject to different regulatory mechanisms. This limitation reduces overall predictive power and thus the reliability of downstream biological inference. Results: We introduce an analytical modelling framework (TREEOME: tree of models of expression) that integrates epigenetic and transcriptomic data by separating genes into putative regulatory classes. Current predictive modelling approaches have found both DNA methylation and histone modification epigenetic data to provide little or no improvement in accuracy of prediction of transcript abundance despite, for example, distinct anti-correlation between mRNA levels and promoter-localised DNA methylation. To improve on this, in TREEOME we evaluate four possible methods of formulating gene-level DNA methylation metrics, which provide a foundation for identifying gene-level methylation events and subsequent differential analysis, whereas most previous techniques operate at the level of individual CpG dinucleotides. We demonstrate TREEOME by integrating gene-level DNA methylation (bisulfite-seq) and histone modification (ChIP-seq) data to accurately predict genome-wide mRNA transcript abundance (RNA-seq) for H1-hESC and GM12878 cell lines. Availability: TREEOME is implemented using open-source software and made available as a pre-configured bootable reference environment. All scripts and data presented in this study are available online at http://sourceforge.net/projects/budden2015treeome/.


قيم البحث

اقرأ أيضاً

We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying cluster ing structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/
Gene transcription is a stochastic process mostly occurring in bursts. Regulation of transcription arises from the interaction of transcription factors (TFs) with the promoter of the gene. The TFs, such as activators and repressors can interact with the promoter in a competitive or non-competitive way. Some experimental observations suggest that the mean expression and noise strength can be regulated at the transcription level. A Few theories are developed based on these experimental observations. Here we re-establish that experimental results with the help of our exact analytical calculations for a stochastic model with non-competitive transcriptional regulatory architecture and find out some properties of Noise strength (like sub-Poissonian fano factor) and mean expression as we found in a two state model earlier. Along with those aforesaid properties we also observe some anomalous characteristics in noise strength of mRNA and in variance of protein at lower activator concentrations.
This paper introduces a high-throughput software tool framework called {it sam2bam} that enables users to significantly speedup pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-me mory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156-186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize the multiple processors, available memory, high-bandwidth of storage, and hardware compression accelerators if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting the input data are provided by {it plug-in} tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of NGS data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime for whole-genome sequencing data from about 20 hours to about nine minutes on the same system using up to 711 GB of memory.
Network of packages with regulatory interactions (dependences and conflicts) from Debian GNU/Linux operating system is compiled and used as analogy of a gene regulatory network. Using a trace-back algorithm we assembly networks from the potential poo l of packages for both scale-free and exponential topology from real and a null model data, respectively. We calculate the maximum number of packages that can be functionally installed in the system (i.e., the active network size). We show that scale-free regulatory networks allow a larger active network size than random ones. Small genomes with scale-free regulatory topology could allow much more functionality than large genomes with an exponential one, with implications on its dynamics, robustness and evolution.
Advances in DNA sequencing have revolutionized our ability to read genomes. However, even in the most well-studied of organisms, the bacterium ${it Escherichia coli}$, for $approx$ 65$%$ of the promoters we remain completely ignorant of their regulat ion. Until we have cracked this regulatory Rosetta Stone, efforts to read and write genomes will remain haphazard. We introduce a new method (Reg-Seq) linking a massively-parallel reporter assay and mass spectrometry to produce a base pair resolution dissection of more than 100 promoters in ${it E. coli}$ in 12 different growth conditions. First, we show that our method recapitulates regulatory information from known sequences. Then, we examine the regulatory architectures for more than 80 promoters in the ${it E. coli}$ genome which previously had no known regulation. In many cases, we also identify which transcription factors mediate their regulation. The method introduced here clears a path for fully characterizing the regulatory genome of model organisms, with the potential of moving on to an array of other microbes of ecological and medical relevance.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا