ترغب بنشر مسار تعليمي؟ اضغط هنا

Non-Negative Matrix Factorization, Convexity and Isometry

214   0   0.0 ( 0 )
 نشر من قبل Nikolaos Vasiloglou
 تاريخ النشر 2009
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper we explore avenues for improving the reliability of dimensionality reduction methods such as Non-Negative Matrix Factorization (NMF) as interpretive exploratory data analysis tools. We first explore the difficulties of the optimization problem underlying NMF, showing for the first time that non-trivial NMF solutions always exist and that the optimization problem is actually convex, by using the theory of Completely Positive Factorization. We subsequently explore four novel approaches to finding globally-optimal NMF solutions using various ideas from convex optimization. We then develop a new method, isometric NMF (isoNMF), which preserves non-negativity while also providing an isometric embedding, simultaneously achieving two properties which are helpful for interpretation. Though it results in a more difficult optimization problem, we show experimentally that the resulting method is scalable and even achieves more compact spectra than standard NMF.

قيم البحث

اقرأ أيضاً

102 - Moses Charikar , Lunjia Hu 2021
In the non-negative matrix factorization (NMF) problem, the input is an $mtimes n$ matrix $M$ with non-negative entries and the goal is to factorize it as $Mapprox AW$. The $mtimes k$ matrix $A$ and the $ktimes n$ matrix $W$ are both constrained to h ave non-negative entries. This is in contrast to singular value decomposition, where the matrices $A$ and $W$ can have negative entries but must satisfy the orthogonality constraint: the columns of $A$ are orthogonal and the rows of $W$ are also orthogonal. The orthogonal non-negative matrix factorization (ONMF) problem imposes both the non-negativity and the orthogonality constraints, and previous work showed that it leads to better performances than NMF on many clustering tasks. We give the first constant-factor approximation algorithm for ONMF when one or both of $A$ and $W$ are subject to the orthogonality constraint. We also show an interesting connection to the correlation clustering problem on bipartite graphs. Our experiments on synthetic and real-world data show that our algorithm achieves similar or smaller errors compared to previous ONMF algorithms while ensuring perfect orthogonality (many previous algorithms do not satisfy the hard orthogonality constraint).
Extracting genetic information from a full range of sequencing data is important for understanding diseases. We propose a novel method to effectively explore the landscape of genetic mutations and aggregate them to predict cancer type. We used multin omial logistic regression, nonsmooth non-negative matrix factorization (nsNMF), and support vector machine (SVM) to utilize the full range of sequencing data, aiming at better aggregating genetic mutations and improving their power in predicting cancer types. Specifically, we introduced a classifier to distinguish cancer types using somatic mutations obtained from whole-exome sequencing data. Mutations were identified from multiple cancers and scored using SIFT, PP2, and CADD, and grouped at the individual gene level. The nsNMF was then applied to reduce dimensionality and to obtain coefficient and basis matrices. A feature matrix was derived from the obtained matrices to train a classifier for cancer type classification with the SVM model. We have demonstrated that the classifier was able to distinguish the cancer types with reasonable accuracy. In five-fold cross-validations using mutation counts as features, the average prediction accuracy was 77.1% (SEM=0.1%), significantly outperforming baselines and outperforming models using mutation scores as features. Using the factor matrices derived from the nsNMF, we identified multiple genes and pathways that are significantly associated with each cancer type. This study presents a generic and complete pipeline to study the associations between somatic mutations and cancers. The discovered genes and pathways associated with each cancer type can lead to biological insights. The proposed method can be adapted to other studies for disease classification and pathway discovery.
The Baum-Welsh algorithm together with its derivatives and variations has been the main technique for learning Hidden Markov Models (HMM) from observational data. We present an HMM learning algorithm based on the non-negative matrix factorization (NM F) of higher order Markovian statistics that is structurally different from the Baum-Welsh and its associated approaches. The described algorithm supports estimation of the number of recurrent states of an HMM and iterates the non-negative matrix factorization (NMF) algorithm to improve the learned HMM parameters. Numerical examples are provided as well.
Community structures detection is one of the fundamental problems in complex network analysis towards understanding the topology structures of the network and the functions of it. Nonnegative matrix factorization (NMF) is a widely used method for com munity detection, and modularity Q and modularity density D are criteria to evaluate the quality of community structures. In this paper, we establish the connections between Q, D and NMF for the first time. Q maximization can be approximately reformulated under the framework of NMF with Frobenius norm, especially when $n$ is large, and D maximization can also be reformulated under the framework of NMF. Q minimization can be reformulated under the framework of NMF with Kullback-Leibler divergence. We propose new methods for community structures detection based on the above findings, and the experimental results on synthetic networks demonstrate their effectiveness.
Hypertension is a heterogeneous syndrome in need of improved subtyping using phenotypic and genetic measurements so that patients in different subtypes share similar pathophysiologic mechanisms and respond more uniformly to targeted treatments. Exist ing machine learning approaches often face challenges in integrating phenotype and genotype information and presenting to clinicians an interpretable model. We aim to provide informed patient stratification by introducing Hybrid Non-negative Matrix Factorization (HNMF) on phenotype and genotype matrices. HNMF simultaneously approximates the phenotypic and genetic matrices using different appropriate loss functions, and generates patient subtypes, phenotypic groups and genetic groups. Unlike previous methods, HNMF approximates phenotypic matrix under Frobenius loss, and genetic matrix under Kullback-Leibler (KL) loss. We propose an alternating projected gradient method to solve the approximation problem. Simulation shows HNMF converges fast and accurately to the true factor matrices. On real-world clinical dataset, we used the patient factor matrix as features to predict main cardiac mechanistic outcomes. We compared HNMF with six different models using phenotype or genotype features alone, with or without NMF, or using joint NMF with only one type of loss. HNMF significantly outperforms all comparison models. HNMF also reveals intuitive phenotype-genotype interactions that characterize cardiac abnormalities.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا