No Arabic abstract
Manifold learning-based encoders have been playing important roles in nonlinear dimensionality reduction (NLDR) for data exploration. However, existing methods can often fail to preserve geometric, topological and/or distributional structures of data. In this paper, we propose a deep manifold learning framework, called deep manifold transformation (DMT) for unsupervised NLDR and embedding learning. DMT enhances deep neural networks by using cross-layer local geometry-preserving (LGP) constraints. The LGP constraints constitute the loss for deep manifold learning and serve as geometric regularizers for NLDR network training. Extensive experiments on synthetic and real-world data demonstrate that DMT networks outperform existing leading manifold-based NLDR methods in terms of preserving the structures of data.
We present a general framework of semi-supervised dimensionality reduction for manifold learning which naturally generalizes existing supervised and unsupervised learning frameworks which apply the spectral decomposition. Algorithms derived under our framework are able to employ both labeled and unlabeled examples and are able to handle complex problems where data form separate clusters of manifolds. Our framework offers simple views, explains relationships among existing frameworks and provides further extensions which can improve existing algorithms. Furthermore, a new semi-supervised kernelization framework called ``KPCA trick is proposed to handle non-linear problems.
Existing dimensionality reduction methods are adept at revealing hidden underlying manifolds arising from high-dimensional data and thereby producing a low-dimensional representation. However, the smoothness of the manifolds produced by classic techniques over sparse and noisy data is not guaranteed. In fact, the embedding generated using such data may distort the geometry of the manifold and thereby produce an unfaithful embedding. Herein, we propose a framework for nonlinear dimensionality reduction that generates a manifold in terms of smooth geodesics that is designed to treat problems in which manifold measurements are either sparse or corrupted by noise. Our method generates a network structure for given high-dimensional data using a nearest neighbors search and then produces piecewise linear shortest paths that are defined as geodesics. Then, we fit points in each geodesic by a smoothing spline to emphasize the smoothness. The robustness of this approach for sparse and noisy datasets is demonstrated by the implementation of the method on synthetic and real-world datasets.
Stochastic kernel based dimensionality reduction approaches have become popular in the last decade. The central component of many of these methods is a symmetric kernel that quantifies the vicinity between pairs of data points and a kernel-induced Markov chain on the data. Typically, the Markov chain is fully specified by the kernel through row normalization. However, in many cases, it is desirable to impose user-specified stationary-state and dynamical constraints on the Markov chain. Unfortunately, no systematic framework exists to impose such user-defined constraints. Here, we introduce a path entropy maximization based approach to derive the transition probabilities of Markov chains using a kernel and additional user-specified constraints. We illustrate the usefulness of these Markov chains with examples.
Dimension reduction (DR) aims to learn low-dimensional representations of high-dimensional data with the preservation of essential information. In the context of manifold learning, we define that the representation after information-lossless DR preserves the topological and geometric properties of data manifolds formally, and propose a novel two-stage DR method, called invertible manifold learning (inv-ML) to bridge the gap between theoretical information-lossless and practical DR. The first stage includes a homeomorphic sparse coordinate transformation to learn low-dimensional representations without destroying topology and a local isometry constraint to preserve local geometry. In the second stage, a linear compression is implemented for the trade-off between the target dimension and the incurred information loss in excessive DR scenarios. Experiments are conducted on seven datasets with a neural network implementation of inv-ML, called i-ML-Enc. Empirically, i-ML-Enc achieves invertible DR in comparison with typical existing methods as well as reveals the characteristics of the learned manifolds. Through latent space interpolation on real-world datasets, we find that the reliability of tangent space approximated by the local neighborhood is the key to the success of manifold-based DR algorithms.
We show unsupervised machine learning techniques are a valuable tool for both visualizing and computationally accelerating the estimation of galaxy physical properties from photometric data. As a proof of concept, we use self organizing maps (SOMs) to visualize a spectral energy distribution (SED) model library in the observed photometry space. The resulting visual maps allow for a better understanding of how the observed data maps to physical properties and to better optimize the model libraries for a given set of observational data. Next, the SOMs are used to estimate the physical parameters of 14,000 z~1 galaxies in the COSMOS field and found to be in agreement with those measured with SED fitting. However, the SOM method is able to estimate the full probability distribution functions for each galaxy up to about a million times faster than direct model fitting. We conclude by discussing how this speed up and learning how the galaxy data manifold maps to physical parameter space and visualizing this mapping in lower dimensions helps overcome other challenges in galaxy formation and evolution.