Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Multiscale Identification of Topological Domains in Chromatin

365 0 0.0 ( 0 )

Download Cite

Added by Aaron Darling

Publication date 2013

fields Biology

and research's language is English

Authors Darya Filippova - Rob Patro - Geet Duggal

Quantitative Methods Genomics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Recent chromosome conformation capture experiments have led to the discovery of dense, contiguous, megabase-sized topological domains that are similar across cell types and conserved across species. These domains are strongly correlated with a number of chromatin markers and have since been included in a number of analyses. However, functionally-relevant domains may exist at multiple length scales. We introduce a new and efficient algorithm that is able to capture persistent domains across various resolutions by adjusting a single scale parameter. The identified novel domains are substantially different from domains reported previously and are highly enriched for insulating factor CTCF binding and histone modfications at the boundaries.

rate research

Multiscale Topology of Chromatin Folding

118 - Kevin Emmett , Benjamin Schweinhart , Raul Rabadan 2015

The three dimensional structure of DNA in the nucleus (chromatin) plays an important role in many cellular processes. Recent experimental advances have led to high-throughput methods of capturing information about chromatin conformation on genome-wide scales. New models are needed to quantitatively interpret this data at a global scale. Here we introduce the use of tools from topological data analysis to study chromatin conformation. We use persistent homology to identify and characterize conserved loops and voids in contact map data and identify scales of interaction. We demonstrate the utility of the approach on simulated data and then look data from both a bacterial genome and a human cell line. We identify substantial multiscale topology in these datasets.

Genomics Biomolecules

Rapid Sequence Identification of Potential Pathogens Using Techniques from Sparse Linear Algebra

468 - Stephanie Dodson , Darrell O. Ricke , Jeremy Kepner 2015

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D$^{4}$RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling and computational power of the Dynamic Distributed Dimensional Data Model (D4M). The method leverages linear algebra and statistical properties to increase computational performance while retaining accuracy by subsampling the data. Two run modes, Fast and Wise, yield speed and precision tradeoffs, with applications in biodefense and medical diagnostics. The D$^{4}$RAGenS analysis algorithm is tested over several datasets, including three utilized for the Defense Threat Reduction Agency (DTRA) metagenomic algorithm contest.

Quantitative Methods Genomics

Network modelling of topological domains using Hi-C data

71 - Y. X. Rachel Wang , Purnamrita Sarkar , Oana Ursu 2017

Chromosome conformation capture experiments such as Hi-C are used to map the three-dimensional spatial organization of genomes. One specific feature of the 3D organization is known as topologically associating domains (TADs), which are densely interacting, contiguous chromatin regions playing important roles in regulating gene expression. A few algorithms have been proposed to detect TADs. In particular, the structure of Hi-C data naturally inspires application of community detection methods. However, one of the drawbacks of community detection is that most methods take exchangeability of the nodes in the network for granted; whereas the nodes in this case, i.e. the positions on the chromosomes, are not exchangeable. We propose a network model for detecting TADs using Hi-C data that takes into account this non-exchangeability. In addition, our model explicitly makes use of cell-type specific CTCF binding sites as biological covariates and can be used to identify conserved TADs across multiple cell types. The model leads to a likelihood objective that can be efficiently optimized via relaxation. We also prove that when suitably initialized, this model finds the underlying TAD structure with high probability. Using simulated data, we show the advantages of our method and the caveats of popular community detection methods, such as spectral clustering, in this application. Applying our method to real Hi-C data, we demonstrate the domains identified have desirable epigenetic features and compare them across different cell types.

Applications Genomics

ProtRank: Bypassing the imputation of missing values in differential expression analysis of proteomic data

273 - Matus Medo , Daniel M. Aebersold , Michaela Medova 2019

Data from discovery proteomic and phosphoproteomic experiments typically include missing values that correspond to proteins that have not been identified in the analyzed sample. Replacing the missing values with random numbers, a process known as imputation, avoids apparent infinite fold-change values. However, the procedure comes at a cost: Imputing a large number of missing values has the potential to significantly impact the results of the subsequent differential expression analysis. We propose a method that identifies differentially expressed proteins by ranking their observed changes with respect to the changes observed for other proteins. Missing values are taken into account by this method directly, without the need to impute them. We illustrate the performance of the new method on two distinct datasets and show that it is robust to missing values and, at the same time, provides results that are otherwise similar to those obtained with edgeR which is a state-of-art differential expression analysis method. The new method for the differential expression analysis of proteomic data is available as an easy to use Python package.

Quantitative Methods Genomics

Statistical properties of thermodynamically predicted RNA secondary structures in viral genomes

502 - Marco Span`o , Fabrizio Lillo , Salvatore Miccich`e 2008

By performing a comprehensive study on 1832 segments of 1212 complete genomes of viruses, we show that in viral genomes the hairpin structures of thermodynamically predicted RNA secondary structures are more abundant than expected under a simple random null hypothesis. The detected hairpin structures of RNA secondary structures are present both in coding and in noncoding regions for the four groups of viruses categorized as dsDNA, dsRNA, ssDNA and ssRNA. For all groups hairpin structures of RNA secondary structures are detected more frequently than expected for a random null hypothesis in noncoding rather than in coding regions. However, potential RNA secondary structures are also present in coding regions of dsDNA group. In fact we detect evolutionary conserved RNA secondary structures in conserved coding and noncoding regions of a large set of complete genomes of dsDNA herpesviruses.

Quantitative Methods Genomics

comments

Fetching comments

Helwan

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Multiscale Identification of Topological Domains in Chromatin

Ask ChatGPT about the research

No Arabic abstract

Read More