Biological Random Walks: integrating heterogeneous data in disease gene prioritization

305 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Michele Gentili

تاريخ النشر 2020

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Michele Gentili - Leonardo Martini - Manuela Petti

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This work proposes a unified framework to leverage biological information in network propagation-based gene prioritization algorithms. Preliminary results on breast cancer data show significant improvements over state-of-the-art baselines, such as the prioritization of genes that are not identified as potential candidates by interactome-based algorithms, but that appear to be involved in/or potentially related to breast cancer, according to a functional analysis based on recent literature.

قيم البحث

86 - Bo Peng , Xiaohui Yao , Shannon L. Risacher 2020

Background:Cognitive assessments represent the most common clinical routine for the diagnosis of Alzheimers Disease (AD). Given a large number of cognitive assessment tools and time-limited office visits, it is important to determine a proper set of cognitive tests for different subjects. Most current studies create guidelines of cognitive test selection for a targeted population, but they are not customized for each individual subject. In this manuscript, we develop a machine learning paradigm enabling personalized cognitive assessments prioritization. Method: We adapt a newly developed learning-to-rank approach PLTR to implement our paradigm. This method learns the latent scoring function that pushes the most effective cognitive assessments onto the top of the prioritization list. We also extend PLTR to better separate the most effective cognitive assessments and the less effective ones. Results: Our empirical study on the ADNI data shows that the proposed paradigm outperforms the state-of-the-art baselines on identifying and prioritizing individual-specific cognitive biomarkers. We conduct experiments in cross validation and level-out validation settings. In the two settings, our paradigm significantly outperforms the best baselines with improvement as much as 22.1% and 19.7%, respectively, on prioritizing cognitive features. Conclusions: The proposed paradigm achieves superior performance on prioritizing cognitive biomarkers. The cognitive biomarkers prioritized on top have great potentials to facilitate personalized diagnosis, disease subtyping, and ultimately precision medicine in AD.

الأساليب الكمية التعلم الآلي معالجة الصور والفيديو

Logic and connectivity jointly determine criticality in biological gene regulatory networks

82 - Bryan C. Daniels , Hyunju Kim , Douglas Moore 2018

The complex dynamics of gene expression in living cells can be well-approximated using Boolean networks. The average sensitivity is a natural measure of stability in these systems: values below one indicate typically stable dynamics associated with a n ordered phase, whereas values above one indicate chaotic dynamics. This yields a theoretically motivated adaptive advantage to being near the critical value of one, at the boundary between order and chaos. Here, we measure average sensitivity for 66 publicly available Boolean network models describing the function of gene regulatory circuits across diverse living processes. We find the average sensitivity values for these networks are clustered around unity, indicating they are near critical. In many types of random networks, mean connectivity <K> and the average activity bias of the logic functions <p> have been found to be the most important network properties in determining average sensitivity, and by extension a networks criticality. Surprisingly, many of these gene regulatory networks achieve the near-critical state with <K> and <p> far from that predicted for critical systems: randomized networks sharing the local causal structure and local logic of biological networks better reproduce their critical behavior than controlling for macroscale properties such as <K> and <p> alone. This suggests the local properties of genes interacting within regulatory networks are selected to collectively be near-critical, and this non-local property of gene regulatory network dynamics cannot be predicted using the density of interactions alone.

الشبكات الجزيئية

Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks

446 - Hyunghoon Cho , Bonnie Berger , Jian Peng 2015

Complex biological systems have been successfully modeled by biochemical and genetic interaction networks, typically gathered from high-throughput (HTP) data. These networks can be used to infer functional relationships between genes or proteins. Usi ng the intuition that the topological role of a gene in a network relates to its biological function, local or diffusion based guilt-by-association and graph-theoretic methods have had success in inferring gene functions. Here we seek to improve function prediction by integrating diffusion-based methods with a novel dimensionality reduction technique to overcome the incomplete and noisy nature of network data. In this paper, we introduce diffusion component analysis (DCA), a framework that plugs in a diffusion model and learns a low-dimensional vector representation of each node to encode the topological properties of a network. As a proof of concept, we demonstrate DCAs substantial improvement over state-of-the-art diffusion-based approaches in predicting protein function from molecular interaction networks. Moreover, our DCA framework can integrate multiple networks from heterogeneous sources, consisting of genomic information, biochemical experiments and other resources, to even further improve function prediction. Yet another layer of performance gain is achieved by integrating the DCA framework with support vector machines that take our node vector representations as features. Overall, our DCA framework provides a novel representation of nodes in a network that can be used as a plug-in architecture to other machine learning algorithms to decipher topological properties of and obtain novel insights into interactomes.

الشبكات الجزيئية التعلم الآلي الشبكات الاجتماعية والمعلومات

Random matrix analysis of localization properties of Gene co-expression network

514 - Sarika Jalan , Norbert Solymosi , Gabor Vattay 2010

We analyze gene co-expression network under the random matrix theory framework. The nearest neighbor spacing distribution of the adjacency matrix of this network follows Gaussian orthogonal statistics of random matrix theory (RMT). Spectral rigidity test follows random matrix prediction for a certain range, and deviates after wards. Eigenvector analysis of the network using inverse participation ratio (IPR) suggests that the statistics of bulk of the eigenvalues of network is consistent with those of the real symmetric random matrix, whereas few eigenvalues are localized. Based on these IPR calculations, we can divide eigenvalues in three sets; (A) The non-degenerate part that follows RMT. (B) The non-degenerate part, at both ends and at intermediate eigenvalues, which deviate from RMT and expected to contain information about {it important nodes} in the network. (C) The degenerate part with $zero$ eigenvalue, which fluctuates around RMT predicted value. We identify nodes corresponding to the dominant modes of the corresponding eigenvectors and analyze their structural properties.

الشبكات الجزيئية الفيزياء البيولوجية

Data-driven modelling of biological multi-scale processes

459 - Jan Hasenauer , Nick Jagiella , Sabrina Hross 2015

Biological processes involve a variety of spatial and temporal scales. A holistic understanding of many biological processes therefore requires multi-scale models which capture the relevant properties on all these scales. In this manuscript we review mathematical modelling approaches used to describe the individual spatial scales and how they are integrated into holistic models. We discuss the relation between spatial and temporal scales and the implication of that on multi-scale modelling. Based upon this overview over state-of-the-art modelling approaches, we formulate key challenges in mathematical and computational modelling of biological multi-scale and multi-physics processes. In particular, we considered the availability of analysis tools for multi-scale models and model-based multi-scale data integration. We provide a compact review of methods for model-based data integration and model-based hypothesis testing. Furthermore, novel approaches and recent trends are discussed, including computation time reduction using reduced order and surrogate models, which contribute to the solution of inference problems. We conclude the manuscript by providing a few ideas for the development of tailored multi-scale inference methods.

الشبكات الجزيئية الأساليب الكمية