Do you want to publish a course? Click here

Laplacian-Based Dimensionality Reduction Including Spectral Clustering, Laplacian Eigenmap, Locality Preserving Projection, Graph Embedding, and Diffusion Map: Tutorial and Survey

282   0   0.0 ( 0 )
 Added by Benyamin Ghojogh
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

This is a tutorial and survey paper for nonlinear dimensionality and feature extraction methods which are based on the Laplacian of graph of data. We first introduce adjacency matrix, definition of Laplacian matrix, and the interpretation of Laplacian. Then, we cover the cuts of graph and spectral clustering which applies clustering in a subspace of data. Different optimization variants of Laplacian eigenmap and its out-of-sample extension are explained. Thereafter, we introduce the locality preserving projection and its kernel variant as linear special cases of Laplacian eigenmap. Versions of graph embedding are then explained which are generaliz

rate research

Read More

This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analysis (PCA) with different kernels. This unification can be interpreted as eigenfunction learning or representation of kernel in terms of distance matrix. Then, since the spectral methods are unified as kernel PCA, we say let us learn the best kernel for unfolding the manifold of data to its maximum variance. We first briefly introduce kernel learning by SDP for the transduction task. Then, we explain MVU in detail. Vario
114 - Huan Qing , Jingli Wang 2020
Spectral clustering methods are widely used for detecting clusters in networks for community detection, while a small change on the graph Laplacian matrix could bring a dramatic improvement. In this paper, we propose a dual regularized graph Laplacian matrix and then employ it to three classical spectral clustering approaches under the degree-corrected stochastic block model. If the number of communities is known as $K$, we consider more than $K$ leading eigenvectors and weight them by their corresponding eigenvalues in the spectral clustering procedure to improve the performance. Three improved spectral clustering methods are dual regularized spectral clustering (DRSC) method, dual regularized spectral clustering on Ratios-of-eigenvectors (DRSCORE) method, and dual regularized symmetrized Laplacian inverse matrix (DRSLIM) method. Theoretical analysis of DRSC and DRSLIM show that under mild conditions DRSC and DRSLIM yield stable consistent community detection, moreover, DRSCORE returns perfect clustering under the ideal case. We compare the performances of DRSC, DRSCORE and DRSLIM with several spectral methods by substantial simulated networks and eight real-world networks.
Existing dimensionality reduction methods are adept at revealing hidden underlying manifolds arising from high-dimensional data and thereby producing a low-dimensional representation. However, the smoothness of the manifolds produced by classic techniques over sparse and noisy data is not guaranteed. In fact, the embedding generated using such data may distort the geometry of the manifold and thereby produce an unfaithful embedding. Herein, we propose a framework for nonlinear dimensionality reduction that generates a manifold in terms of smooth geodesics that is designed to treat problems in which manifold measurements are either sparse or corrupted by noise. Our method generates a network structure for given high-dimensional data using a nearest neighbors search and then produces piecewise linear shortest paths that are defined as geodesics. Then, we fit points in each geodesic by a smoothing spline to emphasize the smoothness. The robustness of this approach for sparse and noisy datasets is demonstrated by the implementation of the method on synthetic and real-world datasets.
In this paper, we redefine the Graph Fourier Transform (GFT) under the DSP$_mathrm{G}$ framework. We consider the Jordan eigenvectors of the directed Laplacian as graph harmonics and the corresponding eigenvalues as the graph frequencies. For this purpose, we propose a shift operator based on the directed Laplacian of a graph. Based on our shift operator, we then define total variation of graph signals, which is used in frequency ordering. We achieve natural frequency ordering and interpretation via the proposed definition of GFT. Moreover, we show that our proposed shift operator makes the LSI filters under DSP$_mathrm{G}$ to become polynomial in the directed Laplacian.
Spectral dimensionality reduction methods enable linear separations of complex data with high-dimensional features in a reduced space. However, these methods do not always give the desired results due to irregularities or uncertainties of the data. Thus, we consider aggressively modifying the scales of the features to obtain the desired classification. Using prior knowledge on the labels of partial samples to specify the Fiedler vector, we formulate an eigenvalue problem of a linear matrix pencil whose eigenvector has the feature scaling factors. The resulting factors can modify the features of entire samples to form clusters in the reduced space, according to the known labels. In this study, we propose new dimensionality reduction methods supervised using the feature scaling associated with the spectral clustering. Numerical experiments show that the proposed methods outperform well-established supervised methods for toy problems with more samples than features, and are more robust regarding clustering than existing methods. Also, the proposed methods outperform existing methods regarding classification for real-world problems with more features than samples of gene expression profiles of cancer diseases. Furthermore, the feature scaling tends to improve the clustering and classification accuracies of existing unsupervised methods, as the proportion of training data increases.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا