Multiclass Disease Predictions Based on Integrated Clinical and Genomics Datasets

75 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Moeez Subhani

تاريخ النشر 2020

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Moeez M. Subhani - Ashiq Anjum

الجينوم الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Clinical predictions using clinical data by computational methods are common in bioinformatics. However, clinical predictions using information from genomics datasets as well is not a frequently observed phenomenon in research. Precision medicine research requires information from all available datasets to provide intelligent clinical solutions. In this paper, we have attempted to create a prediction model which uses information from both clinical and genomics datasets. We have demonstrated multiclass disease predictions based on combined clinical and genomics datasets using machine learning methods. We have created an integrated dataset, using a clinical (ClinVar) and a genomics (gene expression) dataset, and trained it using instance-based learner to predict clinical diseases. We have used an innovative but simple way for multiclass classification, where the number of output classes is as high as 75. We have used Principal Component Analysis for feature selection. The classifier predicted diseases with 73% accuracy on the integrated dataset. The results were consistent and competent when compared with other classification models. The results show that genomics information can be reliably included in datasets for clinical predictions and it can prove to be valuable in clinical diagnostics and precision medicine.

قيم البحث

136 - Zifeng Wang , Yifan Yang , Rui Wen 2021

Current deep learning based disease diagnosis systems usually fall short in catastrophic forgetting, i.e., directly fine-tuning the disease diagnosis model on new tasks usually leads to abrupt decay of performance on previous tasks. What is worse, th e trained diagnosis system would be fixed once deployed but collecting training data that covers enough diseases is infeasible, which inspires us to develop a lifelong learning diagnosis system. In this work, we propose to adopt attention to combine medical entities and context, embedding episodic memory and consolidation to retain knowledge, such that the learned model is capable of adapting to sequential disease-diagnosis tasks. Moreover, we establish a new benchmark, named Jarvis-40, which contains clinical notes collected from various hospitals. Our experiments show that the proposed method can achieve state-of-the-art performance on the proposed benchmark.

الذكاء الاصطناعي

The Genomic HyperBrowser: inferential genomics at the sequence level

327 - Geir K. Sandve , Sveinung Gundersen , Halfdan Rydbeck 2011

The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequ ence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no.

الجينوم

Structuring research methods and data with the Research Object model: genomics workflows as a case study

409 - Kristina M. Hettne , Harish Dharuri , Jun Zhao 2013

One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of su ch computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e.g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows.

الجينوم المكتبات الرقمية

A Data-Driven Biophysical Computational Model of Parkinsons Disease based on Marmoset Monkeys

66 - Caetano M. Ranieri , Jhielson M. Pimentel , Marcelo R. Romano 2021

In this work we propose a new biophysical computational model of brain regions relevant to Parkinsons Disease based on local field potential data collected from the brain of marmoset monkeys. Parkinsons disease is a neurodegenerative disorder, linked to the death of dopaminergic neurons at the substantia nigra pars compacta, which affects the normal dynamics of the basal ganglia-thalamus-cortex neuronal circuit of the brain. Although there are multiple mechanisms underlying the disease, a complete description of those mechanisms and molecular pathogenesis are still missing, and there is still no cure. To address this gap, computational models that resemble neurobiological aspects found in animal models have been proposed. In our model, we performed a data-driven approach in which a set of biologically constrained parameters is optimised using differential evolution. Evolved models successfully resembled single-neuron mean firing rates and spectral signatures of local field potentials from healthy and parkinsonian marmoset brain data. As far as we are concerned, this is the first computational model of Parkinsons Disease based on simultaneous electrophysiological recordings from seven brain regions of Marmoset monkeys. Results show that the proposed model could facilitate the investigation of the mechanisms of PD and support the development of techniques that can indicate new therapies. It could also be applied to other computational neuroscience problems in which biological data could be used to fit multi-scale models of brain circuits.

الخلايا العصبية والإدراك الذكاء الاصطناعي

Single-cell eQTLGen Consortium: a personalized understanding of disease

76 - Monique G.P. van der Wijst , Dylan H. de Vries , Hilde E. Groot 2019

In recent years, functional genomics approaches combining genetic information with bulk RNA-sequencing data have identified the downstream expression effects of disease-associated genetic risk factors through so-called expression quantitative trait l ocus (eQTL) analysis. Single-cell RNA-sequencing creates enormous opportunities for mapping eQTLs across different cell types and in dynamic processes, many of which are obscured when using bulk methods. The enormous increase in throughput and reduction in cost per cell now allow this technology to be applied to large-scale population genetics studies. Therefore, we have founded the single-cell eQTLGen consortium (sc-eQTLGen), aimed at pinpointing disease-causing genetic variants and identifying the cellular contexts in which they affect gene expression. Ultimately, this information can enable development of personalized medicine. Here, we outline the goals, approach, potential utility and early proofs-of-concept of the sc-eQTLGen consortium. We also provide a set of study design considerations for future single-cell eQTL studies.

الجينوم الشبكات الجزيئية السكان والتطور