Do you want to publish a course? Click here

H(O)TA: estimation of DNA methylation and hydroxylation levels and efficiencies from time course data

235   0   0.0 ( 0 )
 Publication date 2016
  fields Biology
and research's language is English




Ask ChatGPT about the research

Methylation and hydroxylation of cytosines to form 5-methylcytosine (5mC) and 5-droxymethylcytosine (5hmC) belong to the most important epigenetic modifications and their vital role in the regulation of gene expression has been widely recognized. Recent experimental techniques allow to infer methylation and hydroxylation levels at CpG dinucleotides but require a sophisticated statistical analysis to achieve accurate estimates.



rate research

Read More

Background: Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is important, but current approaches tackle average methylation within a genomic locus and are often limited to specific genomic regions. Results: We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict CpG site methylation levels using as features neighboring CpG site methylation levels and genomic distance, and co-localization with coding regions, CGIs, and regulatory elements from the ENCODE project, among others. Our approach achieves 91% -- 94% prediction accuracy of genome-wide methylation levels at single CpG site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs. Our classifier outperforms state-of-the-art methylation classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation status, CpG island status, co-localized DNase I hypersensitive sites, and specific transcription factor binding sites were found to be most predictive of methylation levels. Conclusions: Our observations of DNA methylation patterns led us to develop a classifier to predict site-specific methylation levels that achieves the best DNA methylation predictive accuracy to date. Furthermore, our method identified genomic features that interact with DNA methylation, elucidating mechanisms involved in DNA methylation modification and regulation, and linking different epigenetic processes.
Motivation: Bisulphite sequencing enables the detection of cytosine methylation. The sequence of the methylation states of cytosines on any given read forms a methylation pattern that carries substantially more information than merely studying the average methylation level at individual positions. In order to understand better the complexity of DNA methylation landscapes in biological samples, it is important to study the diversity of these methylation patterns. However, the accurate quantification of methylation patterns is subject to sequencing errors and spurious signals due to incomplete bisulphite conversion of cytosines. Results: A statistical model is developed which accounts for the distribution of DNA methylation patterns at any given locus. The model incorporates the effects of sequencing errors and spurious reads, and enables estimation of the true underlying distribution of methylation patterns. Conclusions: Calculation of the estimated distribution over methylation patterns is implemented in the R Bioconductor package MPFE. Source code and documentation of the package are also available for download at http://bioconductor.org/packages/3.0/bioc/html/MPFE.html.
The understanding of mechanisms that control epigenetic changes is an important research area in modern functional biology. Epigenetic modifications such as DNA methylation are in general very stable over many cell divisions. DNA methylation can however be subject to specific and fast changes over a short time scale even in non-dividing (i.e. not-replicating) cells. Such dynamic DNA methylation changes are caused by a combination of active demethylation and de novo methylation processes which have not been investigated in integrated models. Here we present a hybrid (hidden) Markov model to describe the cycle of methylation and demethylation over (short) time scales. Our hybrid model decribes several molecular events either happening at deterministic points (i.e. describing mechanisms that occur only during cell division) and other events occurring at random time points. We test our model on mouse embryonic stem cells using time-resolved data. We predict methylation changes and estimate the efficiencies of the different modification steps related to DNA methylation and demethylation.
Epigenome modulation in response to the environment potentially provides a mechanism for organisms to adapt, both within and between generations. However, neither the extent to which this occurs, nor the molecular mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects on DNA methylation were limited to transposons, where CHH methylation was found to increase with temperature. Genome-wide association mapping revealed that the extensive CHH methylation variation was strongly associated with genetic variants in both cis and trans, including a major trans-association close to the DNA methyltransferase CMT2. Unlike CHH methylation, CpG gene body methylation (GBM) on the coding region of genes was not affected by growth temperature, but was instead strongly correlated with the latitude of origin. Accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was correlated with elevated transcription levels for the genes affected. Genome-wide association mapping revealed that this effect was largely due to trans-acting loci, a significant fraction of which showed evidence of local adaptation. These findings constitute the first direct link between DNA methylation and adaptation to the environment, and provide a basis for further dissecting how environmentally driven and genetically determined epigenetic variation interact and influence organismal fitness.
We make use of ideas from the theory of complex networks to implement a machine learning classification of human DNA methylation data, that carry signatures of cancer development. The data were obtained from patients with various kinds of cancers and represented as parenclictic networks, wherein nodes correspond to genes, and edges are weighted according to pairwise variation from control group subjects. We demonstrate that for the $10$ types of cancer under study, it is possible to obtain a high performance of binary classification between cancer-positive and negative samples based on network measures. Remarkably, an accuracy as high as $93-99%$ is achieved with only $12$ network topology indices, in a dramatic reduction of complexity from the original $15295$ gene methylation levels. Moreover, it was found that the parenclictic networks are scale-free in cancer-negative subjects, and deviate from the power-law node degree distribution in cancer. The node centrality ranking and arising modular structure could provide insights into the systems biology of cancer.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا