ترغب بنشر مسار تعليمي؟ اضغط هنا

Integrating Hypertension Phenotype and Genotype with Hybrid Non-negative Matrix Factorization

86   0   0.0 ( 0 )
 نشر من قبل Chengsheng Mao
 تاريخ النشر 2018
والبحث باللغة English




اسأل ChatGPT حول البحث

Hypertension is a heterogeneous syndrome in need of improved subtyping using phenotypic and genetic measurements so that patients in different subtypes share similar pathophysiologic mechanisms and respond more uniformly to targeted treatments. Existing machine learning approaches often face challenges in integrating phenotype and genotype information and presenting to clinicians an interpretable model. We aim to provide informed patient stratification by introducing Hybrid Non-negative Matrix Factorization (HNMF) on phenotype and genotype matrices. HNMF simultaneously approximates the phenotypic and genetic matrices using different appropriate loss functions, and generates patient subtypes, phenotypic groups and genetic groups. Unlike previous methods, HNMF approximates phenotypic matrix under Frobenius loss, and genetic matrix under Kullback-Leibler (KL) loss. We propose an alternating projected gradient method to solve the approximation problem. Simulation shows HNMF converges fast and accurately to the true factor matrices. On real-world clinical dataset, we used the patient factor matrix as features to predict main cardiac mechanistic outcomes. We compared HNMF with six different models using phenotype or genotype features alone, with or without NMF, or using joint NMF with only one type of loss. HNMF significantly outperforms all comparison models. HNMF also reveals intuitive phenotype-genotype interactions that characterize cardiac abnormalities.



قيم البحث

اقرأ أيضاً

We introduce a simple approach to understanding the relationship between single nucleotide polymorphisms (SNPs), or groups of related SNPs, and the phenotypes they control. The pipeline involves training deep convolutional neural networks (CNNs) to d ifferentiate between images of plants with reference and alterna
Extracting genetic information from a full range of sequencing data is important for understanding diseases. We propose a novel method to effectively explore the landscape of genetic mutations and aggregate them to predict cancer type. We used multin omial logistic regression, nonsmooth non-negative matrix factorization (nsNMF), and support vector machine (SVM) to utilize the full range of sequencing data, aiming at better aggregating genetic mutations and improving their power in predicting cancer types. Specifically, we introduced a classifier to distinguish cancer types using somatic mutations obtained from whole-exome sequencing data. Mutations were identified from multiple cancers and scored using SIFT, PP2, and CADD, and grouped at the individual gene level. The nsNMF was then applied to reduce dimensionality and to obtain coefficient and basis matrices. A feature matrix was derived from the obtained matrices to train a classifier for cancer type classification with the SVM model. We have demonstrated that the classifier was able to distinguish the cancer types with reasonable accuracy. In five-fold cross-validations using mutation counts as features, the average prediction accuracy was 77.1% (SEM=0.1%), significantly outperforming baselines and outperforming models using mutation scores as features. Using the factor matrices derived from the nsNMF, we identified multiple genes and pathways that are significantly associated with each cancer type. This study presents a generic and complete pipeline to study the associations between somatic mutations and cancers. The discovered genes and pathways associated with each cancer type can lead to biological insights. The proposed method can be adapted to other studies for disease classification and pathway discovery.
In this paper we explore avenues for improving the reliability of dimensionality reduction methods such as Non-Negative Matrix Factorization (NMF) as interpretive exploratory data analysis tools. We first explore the difficulties of the optimization problem underlying NMF, showing for the first time that non-trivial NMF solutions always exist and that the optimization problem is actually convex, by using the theory of Completely Positive Factorization. We subsequently explore four novel approaches to finding globally-optimal NMF solutions using various ideas from convex optimization. We then develop a new method, isometric NMF (isoNMF), which preserves non-negativity while also providing an isometric embedding, simultaneously achieving two properties which are helpful for interpretation. Though it results in a more difficult optimization problem, we show experimentally that the resulting method is scalable and even achieves more compact spectra than standard NMF.
102 - Moses Charikar , Lunjia Hu 2021
In the non-negative matrix factorization (NMF) problem, the input is an $mtimes n$ matrix $M$ with non-negative entries and the goal is to factorize it as $Mapprox AW$. The $mtimes k$ matrix $A$ and the $ktimes n$ matrix $W$ are both constrained to h ave non-negative entries. This is in contrast to singular value decomposition, where the matrices $A$ and $W$ can have negative entries but must satisfy the orthogonality constraint: the columns of $A$ are orthogonal and the rows of $W$ are also orthogonal. The orthogonal non-negative matrix factorization (ONMF) problem imposes both the non-negativity and the orthogonality constraints, and previous work showed that it leads to better performances than NMF on many clustering tasks. We give the first constant-factor approximation algorithm for ONMF when one or both of $A$ and $W$ are subject to the orthogonality constraint. We also show an interesting connection to the correlation clustering problem on bipartite graphs. Our experiments on synthetic and real-world data show that our algorithm achieves similar or smaller errors compared to previous ONMF algorithms while ensuring perfect orthogonality (many previous algorithms do not satisfy the hard orthogonality constraint).
Motivation The genotype assignment problem consists of predicting, from the genotype of an individual, which of a known set of populations it originated from. The problem arises in a variety of contexts, including wildlife forensics, invasive species detection and biodiversity monitoring. Existing approaches perform well under ideal conditions but are sensitive to a variety of common violations of the assumptions they rely on. Results In this article, we introduce Mycorrhiza, a machine learning approach for the genotype assignment problem. Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples. Those features are then used as input to a Random Forests classifier. The classification accuracy was assessed on multiple published empirical SNP, microsatellite or consensus sequence datasets with wide ranges of size, geographical distribution and population structure and on simulated datasets. It compared favorably against widely used assessment tests or mixture analysis methods such as STRUCTURE and Admixture, and against another machine-learning based approach using principal component analysis for dimensionality reduction. Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium. Moreover, the phylogenetic network approach estimates mixture proportions with good accuracy.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا