Survival prediction and risk estimation of Glioma patients using mRNA expressions

62 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Navodini Wijethilake

تاريخ النشر 2020

مجال البحث علم الأحياء الهندسة المعلوماتية

والبحث باللغة English

تأليف Navodini Wijethilake - Dulani Meedeniya - Charith Chitraranjan

الجينوم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Gliomas are lethal type of central nervous system tumors with a poor prognosis. Recently, with the advancements in the micro-array technologies thousands of gene expression related data of glioma patients are acquired, leading for salient analysis in many aspects. Thus, genomics are been emerged into the field of prognosis analysis. In this work, we identify survival related 7 gene signature and explore two approaches for survival prediction and risk estimation. For survival prediction, we propose a novel probabilistic programming based approach, which outperforms the existing traditional machine learning algorithms. An average 4 fold accuracy of 74% is obtained with the proposed algorithm. Further, we construct a prognostic risk model for risk estimation of glioma patients. This model reflects the survival of glioma patients, with high risk for low survival patients.

قيم البحث

485 - Cong Li , Can Yang , Joel Gelernter 2013

An important task of human genetics studies is to accurately predict disease risks in individuals based on genetic markers, which allows for identifying individuals at high disease risks, and facilitating their disease treatment and prevention. Altho ugh hundreds of genome-wide association studies (GWAS) have been conducted on many complex human traits in recent years, there has been only limited success in translating these GWAS data into clinically useful risk prediction models. The predictive capability of GWAS data is largely bottlenecked by the available training sample size due to the presence of numerous variants carrying only small to modest effects. Recent studies have shown that different human traits may share common genetic bases. Therefore, an attractive strategy to increase the training sample size and hence improve the prediction accuracy is to integrate data of genetically correlated phenotypes. Yet the utility of genetic correlation in risk prediction has not been explored in the literature. In this paper, we analyzed GWAS data for bipolar and related disorders (BARD) and schizophrenia (SZ) with a bivariate ridge regression method, and found that jointly predicting the two phenotypes could substantially increase prediction accuracy as measured by the AUC (area under the receiver operating characteristic curve). We also found similar prediction accuracy improvements when we jointly analyzed GWAS data for Crohns disease (CD) and ulcerative colitis (UC). The empirical observations were substantiated through our comprehensive simulation studies, suggesting that a gain in prediction accuracy can be obtained by combining phenotypes with relatively high genetic correlations. Through both real data and simulation studies, we demonstrated pleiotropy as a valuable asset that opens up a new opportunity to improve genetic risk prediction in the future.

الجينوم الأساليب الكمية تطبيقات الإحصاء

Strategies to integrate multi-omics data for patient survival prediction

116 - Lana X Garmire 2020

Genomics, especially multi-omics, has made precision medicine feasible. The completion and publicly accessible multi-omics resource with clinical outcome, such as The Cancer Genome Atlas (TCGA) is a great test bed for developing computational methods that integrate multi-omics data to predict patient cancer phenotypes. We have been utilizing TCGA multi-omics data to predict cancer patient survival, using a variety of approaches, including prior-biological knowledge (such as pathways), and more recently, deep-learning methods. Over time, we have developed methods such as Cox-nnet, DeepProg, and two-stage Cox-nnet, to address the challenges due to multi-omics and multi-modality. Despite the limited sample size (hundreds to thousands) in the training datasets as well as the heterogeneity nature of human populations, these methods have shown significance and robustness at predicting patient survival in independent population cohorts. In the following, we would describe in detail these methodologies, the modeling results, and important biological insights revealed by these methods.

الجينوم

Survival Prediction of Heart Failure Patients using Stacked Ensemble Machine Learning Algorithm

104 - S.M Mehedi Zaman , Wasay Mahmood Qureshi , Md. Mohsin Sarker Raihan 2021

Cardiovascular disease, especially heart failure is one of the major health hazard issues of our time and is a leading cause of death worldwide. Advancement in data mining techniques using machine learning (ML) models is paving promising prediction a pproaches. Data mining is the process of converting massive volumes of raw data created by the healthcare institutions into meaningful information that can aid in making predictions and crucial decisions. Collecting various follow-up data from patients who have had heart failures, analyzing those data, and utilizing several ML models to predict the survival possibility of cardiovascular patients is the key aim of this study. Due to the imbalance of the classes in the dataset, Synthetic Minority Oversampling Technique (SMOTE) has been implemented. Two unsupervised models (K-Means and Fuzzy C-Means clustering) and three supervised classifiers (Random Forest, XGBoost and Decision Tree) have been used in our study. After thorough investigation, our results demonstrate a superior performance of the supervised ML algorithms over unsupervised models. Moreover, we designed and propose a supervised stacked ensemble learning model that can achieve an accuracy, precision, recall and F1 score of 99.98%. Our study shows that only certain attributes collected from the patients are imperative to successfully predict the surviving possibility post heart failure, using supervised ML algorithms.

التعلم الآلي الذكاء الاصطناعي الجزيئات الحيوية

An early warning tool for predicting mortality risk of COVID-19 patients using machine learning

123 - Muhammad E. H. Chowdhury , Tawsifur Rahman , Amith Khandakar 2020

COVID-19 pandemic has created an extreme pressure on the global healthcare services. Fast, reliable and early clinical assessment of the severity of the disease can help in allocating and prioritizing resources to reduce mortality. In order to study the important blood biomarkers for predicting disease mortality, a retrospective study was conducted on 375 COVID-19 positive patients admitted to Tongji Hospital (China) from January 10 to February 18, 2020. Demographic and clinical characteristics, and patient outcomes were investigated using machine learning tools to identify key biomarkers to predict the mortality of individual patient. A nomogram was developed for predicting the mortality risk among COVID-19 patients. Lactate dehydrogenase, neutrophils (%), lymphocyte (%), high sensitive C-reactive protein, and age - acquired at hospital admission were identified as key predictors of death by multi-tree XGBoost model. The area under curve (AUC) of the nomogram for the derivation and validation cohort were 0.961 and 0.991, respectively. An integrated score (LNLCA) was calculated with the corresponding death probability. COVID-19 patients were divided into three subgroups: low-, moderate- and high-risk groups using LNLCA cut-off values of 10.4 and 12.65 with the death probability less than 5%, 5% to 50%, and above 50%, respectively. The prognostic model, nomogram and LNLCA score can help in early detection of high mortality risk of COVID-19 patients, which will help doctors to improve the management of patient stratification.

الأساليب الكمية التعلم الآلي

Motif Identification using CNN-based Pairwise Subsequence Alignment Score Prediction

103 - Ethan Jacob Moyer , Anup Das 2021

A common problem in bioinformatics is related to identifying gene regulatory regions marked by relatively high frequencies of motifs, or deoxyribonucleic acid sequences that often code for transcription and enhancer proteins. Predicting alignment sco res between subsequence k-mers and a given motif enables the identification of candidate regulatory regions in a gene, which correspond to the transcription of these proteins. We propose a one-dimensional (1-D) Convolution Neural Network trained on k-mer formatted sequences interspaced with the given motif pattern to predict pairwise alignment scores between the consensus motif and subsequence k-mers. Our model consists of fifteen layers with three rounds of a one-dimensional convolution layer, a batch normalization layer, a dense layer, and a 1-D maximum pooling layer. We train the model using mean squared error loss on four different data sets each with a different motif pattern randomly inserted in DNA sequences: the first three data sets have zero, one, and two mutations applied on each inserted motif, and the fourth data set represents the inserted motif as a position-specific probability matrix. We use a novel proposed metric in order to evaluate the models performance, $S_{alpha}$, which is based on the Jaccard Index. We use 10-fold cross validation to evaluate out model. Using $S_{alpha}$, we measure the accuracy of the model by identifying the 15 highest-scoring 15-mer indices of the predicted scores that agree with that of the actual scores within a selected $alpha$ region. For the best performing data set, our results indicate on average 99.3% of the top 15 motifs were identified correctly within a one base pair stride ($alpha = 1$) in the out of sample data. To the best of our knowledge, this is a novel approach that illustrates how data formatted in an intelligent way can be extrapolated using machine learning.

الجينوم التعلم الآلي