No Arabic abstract
Non-negative Matrix Factorisation (NMF) has been extensively used in machine learning and data analytics applications. Most existing variations of NMF only consider how each row/column vector of factorised matrices should be shaped, and ignore the relationship among pairwise rows or columns. In many cases, such pairwise relationship enables better factorisation, for example, image clustering and recommender systems. In this paper, we propose an algorithm named, Relative Pairwise Relationship constrained Non-negative Matrix Factorisation (RPR-NMF), which places constraints over relative pairwise distances amongst features by imposing penalties in a triplet form. Two distance measures, squared Euclidean distance and Symmetric divergence, are used, and exponential and hinge loss penalties are adopted for the two measures respectively. It is well known that the so-called multiplicative update rules result in a much faster convergence than gradient descend for matrix factorisation. However, applying such update rules to RPR-NMF and also proving its convergence is not straightforward. Thus, we use reasonable approximations to relax the complexity brought by the penalties, which are practically verified. Experiments on both synthetic datasets and real datasets demonstrate that our algorithms have advantages on gaining close approximation, satisfying a high proportion of expected constraints, and achieving superior performance compared with other algorithms.
We present a fast variational Bayesian algorithm for performing non-negative matrix factorisation and tri-factorisation. We show that our approach achieves faster convergence per iteration and timestep (wall-clock) than Gibbs sampling and non-probabilistic approaches, and do not require additional samples to estimate the posterior. We show that in particular for matrix tri-factorisation convergence is difficult, but our variational Bayesian approach offers a fast solution, allowing the tri-factorisation approach to be used more effectively.
Multimorbidity, or the presence of several medical conditions in the same individual, has been increasing in the population, both in absolute and relative terms. However, multimorbidity remains poorly understood, and the evidence from existing research to describe its burden, determinants and consequences has been limited. Previous studies attempting to understand multimorbidity patterns are often cross-sectional and do not explicitly account for multimorbidity patterns evolution over time; some of them are based on small datasets and/or use arbitrary and narrow age ranges; and those that employed advanced models, usually lack appropriate benchmarking and validations. In this study, we (1) introduce a novel approach for using Non-negative Matrix Factorisation (NMF) for temporal phenotyping (i.e., simultaneously mining disease clusters and their trajectories); (2) provide quantitative metrics for the evaluation of disease clusters from such studies; and (3) demonstrate how the temporal characteristics of the disease clusters that result from our model can help mine multimorbidity networks and generate new hypotheses for the emergence of various multimorbidity patterns over time. We trained and evaluated our models on one of the worlds largest electronic health records (EHR), with 7 million patients, from which over 2 million where relevant to this study.
The mid-infrared (MIR) spectra observed with the textit{Spitzer} Infrared Spectrograph (IRS) provide a valuable dataset for untangling the physical processes and conditions within galaxies. This paper presents the first attempt to blindly learn fundamental spectral components of MIR galaxy spectra, using non-negative matrix factorisation (NMF). NMF is a recently developed multivariate technique shown to be successful in blind source separation problems. Unlike the more popular multivariate analysis technique, principal component analysis, NMF imposes the condition that weights and spectral components are non-negative. This more closely resembles the physical process of emission in the mid-infrared, resulting in physically intuitive components. By applying NMF to galaxy spectra in the Cornell Atlas of Spitzer/IRS sources (CASSIS), we find similar components amongst different NMF sets. These similar components include two for AGN emission and one for star formation. [... ABBREVIATED...] We show an NMF set with seven components can reconstruct the general spectral shape of a wide variety of objects, though struggle to fit the varying strength of emission lines. We also show that the seven components can be used to separate out different types of objects. We model this separation with Gaussian Mixtures modelling and use the result to provide a classification tool. We also show the NMF components can be used to separate out the emission from AGN and star formation regions and define a new star formation/AGN diagnostic which is consistent with all mid-infrared diagnostics already in use but has the advantage that it can be applied to mid-infrared spectra with low signal to noise or with limited spectral range. The 7 NMF components and code for classification are made public on arxiv and are available at: url{https://github.com/pdh21/NMF_software/}
Obtaining high-quality heart and lung sounds enables clinicians to accurately assess a newborns cardio-respiratory health and provide timely care. However, noisy chest sound recordings are common, hindering timely and accurate assessment. A new Non-negative Matrix Co-Factorisation-based approach is proposed to separate noisy chest sound recordings into heart, lung, and noise components to address this problem. This method is achieved through training with 20 high-quality heart and lung sounds, in parallel with separating the sounds of the noisy recording. The method was tested on 68 10-second noisy recordings containing both heart and lung sounds and compared to the current state of the art Non-negative Matrix Factorisation methods. Results show significant improvements in heart and lung sound quality scores respectively, and improved accuracy of 3.6bpm and 1.2bpm in heart and breathing rate estimation respectively, when compared to existing methods.
Low rank matrix factorisation is often used in recommender systems as a way of extracting latent features. When dealing with large and sparse datasets, traditional recommendation algorithms face the problem of acquiring large, unrestrained, fluctuating values over predictions especially for users/items with very few corresponding observations. Although the problem has been somewhat solved by imposing bounding constraints over its objectives, and/or over all entries to be within a fixed range, in terms of gaining better recommendations, these approaches have two major shortcomings that we aim to mitigate in this work: one is they can only deal with one pair of fixed bounds for all entries, and the other one is they are very time-consuming when applied on large scale recommender systems. In this paper, we propose a novel algorithm named Magnitude Bounded Matrix Factorisation (MBMF), which allows different bounds for individual users/items and performs very fast on large scale datasets. The key idea of our algorithm is to construct a model by constraining the magnitudes of each individual user/item feature vector. We achieve this by converting from the Cartesian to Spherical coordinate system with radii set as the corresponding magnitudes, which allows the above constrained optimisation problem to become an unconstrained one. The Stochastic Gradient Descent (SGD) method is then applied to solve the unconstrained task efficiently. Experiments on synthetic and real datasets demonstrate that in most cases the proposed MBMF is superior over all existing algorithms in terms of accuracy and time complexity.