A new parsimonious method for classifying Cancer Tissue-of-Origin Based on DNA Methylation 450K data

57 0 0.0 ( 0 )

Download Cite

Added by Shen Jia

Publication date 2021

fields Biology

and research's language is English

Authors Shen Jia - Yulin Zhang - Yiming Mao

Tissues and Organs

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

DNA methylation is a well-studied genetic modification that regulates gene transcription of Eukaryotes. Its alternations have been recognized as a significant component of cancer development. In this study, we use the DNA methylation 450k data from The Cancer Genome Atlas to evaluate the efficacy of DNA methylation data on cancer classification for 30 cancer types. We propose a new method for gene selection in high dimensional data(over 450 thousand). Variance filtering is first introduced for dimension reduction and Recursive feature elimination (RFE) is then used for feature selection. We address the problem of selecting a small subsets of genes from large number of methylated sites, and our parsimonious model is demonstrated to be efficient, achieving an accuracy over 91%, outperforming other studies which use DNA micro-arrays and RNA-seq Data . The performance of 20 models, which are based on 4 estimators (Random Forest, Decision Tree, Extra Tree and Support Vector Machine) and 5 classifiers (k-Nearest Neighbours, Support Vector Machine, XGboost, Light GBM and Multi-Layer Perceptron), is compared and robustness of the RFE algorithm is examined. Results suggest that the combined model of extra tree plus catboost classifier offers the best performance in cancer identification, with an overall validation accuracy of 91% , 92.3%, 93.3% and 93.5% for 20, 30, 40 and 50 features respectively. The biological functions in cancer development of 50 selected genes is also explored through enrichment analysis and the results show that 12 out of 16 of our top features have already been identified to be specific with cancer and we also propose some more genes to be tested for future studies. Therefore, our method may be utilzed as an auxiliary diagnostic method to determine the actual clinicopathological status of a specific cancer.

rate research

Parenclitic network analysis of methylation data for cancer identification

469 - Alexander Karsakov , Thomas Bartlett , Iosif Meyerov 2015

We make use of ideas from the theory of complex networks to implement a machine learning classification of human DNA methylation data, that carry signatures of cancer development. The data were obtained from patients with various kinds of cancers and represented as parenclictic networks, wherein nodes correspond to genes, and edges are weighted according to pairwise variation from control group subjects. We demonstrate that for the $10$ types of cancer under study, it is possible to obtain a high performance of binary classification between cancer-positive and negative samples based on network measures. Remarkably, an accuracy as high as $93-99%$ is achieved with only $12$ network topology indices, in a dramatic reduction of complexity from the original $15295$ gene methylation levels. Moreover, it was found that the parenclictic networks are scale-free in cancer-negative subjects, and deviate from the power-law node degree distribution in cancer. The node centrality ranking and arising modular structure could provide insights into the systems biology of cancer.

Genomics

A Hybrid HMM Approach for the Dynamics of DNA Methylation

159 - Charalampos Kyriakopoulos , Pascal Giehr , Alexander Luck 2019

The understanding of mechanisms that control epigenetic changes is an important research area in modern functional biology. Epigenetic modifications such as DNA methylation are in general very stable over many cell divisions. DNA methylation can however be subject to specific and fast changes over a short time scale even in non-dividing (i.e. not-replicating) cells. Such dynamic DNA methylation changes are caused by a combination of active demethylation and de novo methylation processes which have not been investigated in integrated models. Here we present a hybrid (hidden) Markov model to describe the cycle of methylation and demethylation over (short) time scales. Our hybrid model decribes several molecular events either happening at deterministic points (i.e. describing mechanisms that occur only during cell division) and other events occurring at random time points. We test our model on mouse embryonic stem cells using time-resolved data. We predict methylation changes and estimate the efficiencies of the different modification steps related to DNA methylation and demethylation.

Genomics

Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements

401 - Weiwei Zhang , Tim D Spector , Panos Deloukas 2013

Background: Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is important, but current approaches tackle average methylation within a genomic locus and are often limited to specific genomic regions. Results: We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict CpG site methylation levels using as features neighboring CpG site methylation levels and genomic distance, and co-localization with coding regions, CGIs, and regulatory elements from the ENCODE project, among others. Our approach achieves 91% -- 94% prediction accuracy of genome-wide methylation levels at single CpG site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs. Our classifier outperforms state-of-the-art methylation classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation status, CpG island status, co-localized DNase I hypersensitive sites, and specific transcription factor binding sites were found to be most predictive of methylation levels. Conclusions: Our observations of DNA methylation patterns led us to develop a classifier to predict site-specific methylation levels that achieves the best DNA methylation predictive accuracy to date. Furthermore, our method identified genomic features that interact with DNA methylation, elucidating mechanisms involved in DNA methylation modification and regulation, and linking different epigenetic processes.

Genomics

Modelling the effect of curvature on the collective behaviour of cells growing new tissue

223 - Almie Alias , Pascal R Buenzli 2016

The growth of several biological tissues is known to be controlled in part by local geometrical features, such as the curvature of the tissue interface. This control leads to changes in tissue shape that in turn can affect the tissues evolution. Understanding the cellular basis of this control is highly significant for bioscaffold tissue engineering, the evolution of bone microarchitecture, wound healing, and tumour growth. While previous models have proposed geometrical relationships between tissue growth and curvature, the role of cell density and cell vigor remains poorly understood. We propose a cell-based mathematical model of tissue growth to investigate the systematic influence of curvature on the collective crowding or spreading of tissue-synthesising cells induced by changes in local tissue surface area during the motion of the interface. Depending on the strength of diffusive damping, the model exhibits complex growth patterns such as undulating motion, efficient smoothing of irregularities, and the generation of cusps. We compare this model with in-vitro experiments of tissue deposition in bioscaffolds of different geometries. By accounting for the depletion of active cells, the model is able to capture both smoothing of initial substrate geometry and tissue deposition slowdown as observed experimentally.

Tissues and Organs Biological Physics Cell Behavior

Spatial mapping of protein composition and tissue organization: a primer for multiplexed antibody-based imaging

293 - John W. Hickey , Elizabeth K. Neumann , Andrea J. Radtke 2021

Tissues and organs are composed of distinct cell types that must operate in concert to perform physiological functions. Efforts to create high-dimensional biomarker catalogs of these cells are largely based on transcriptomic single-cell approaches that lack the spatial context required to understand critical cellular communication and correlated structural organization. To probe in situ biology with sufficient coverage depth, several multiplexed protein imaging methods have recently been developed. Though these antibody-based technologies differ in strategy and mode of immunolabeling and detection tags, they commonly utilize antibodies directed against protein biomarkers to provide detailed spatial and functional maps of complex tissues. As these promising antibody-based multiplexing approaches become more widely adopted, new frameworks and considerations are critical for training future users, generating molecular tools, validating antibody panels, and harmonizing datasets. In this perspective, we provide essential resources and key considerations for obtaining robust and reproducible multiplexed antibody-based imaging data compiling specialized knowledge from domain experts and technology developers.

Tissues and Organs