Geometric Foundations of Data Reduction


Abstract in English

The purpose of this paper is to write a complete survey of the (spectral) manifold learning methods and nonlinear dimensionality reduction (NLDR) in data reduction. The first two NLDR methods in history were respectively published in Science in 2000 in which they solve the similar reduction problem of high-dimensional data endowed with the intrinsic nonlinear structure. The intrinsic nonlinear structure is always interpreted as a concept in manifolds from geometry and topology in theoretical mathematics by computer scientists and theoretical physicists. In 2001, the concept of Manifold Learning first appears as an NLDR method called Laplacian Eigenmaps purposed by Belkin and Niyogi. In the typical manifold learning setup, the data set, also called the observation set, is distributed on or near a low dimensional manifold $M$ embedded in $mathbb{R}^D$, which yields that each observation has a $D$-dimensional representation. The goal of (spectral) manifold learning is to reduce these observations as a compact lower-dimensional representation based on the geometric information. The reduction procedure is called the (spectral) manifold learning method. In this paper, we derive each (spectral) manifold learning method with the matrix and operator representation, and we then discuss the convergence behavior of each method in a geometric uniform language. Hence, we name the survey Geometric Foundations of Data Reduction.

Download