No Arabic abstract
Hyperspectral image (HSI) clustering is a challenging task due to the high complexity of HSI data. Subspace clustering has been proven to be powerful for exploiting the intrinsic relationship between data points. Despite the impressive performance in the HSI clustering, traditional subspace clustering methods often ignore the inherent structural information among data. In this paper, we revisit the subspace clustering with graph convolution and present a novel subspace clustering framework called Graph Convolutional Subspace Clustering (GCSC) for robust HSI clustering. Specifically, the framework recasts the self-expressiveness property of the data into the non-Euclidean domain, which results in a more robust graph embedding dictionary. We show that traditional subspace clustering models are the special forms of our framework with the Euclidean data. Basing on the framework, we further propose two novel subspace clustering models by using the Frobenius norm, namely Efficient GCSC (EGCSC) and Efficient Kernel GCSC (EKGCSC). Both models have a globally optimal closed-form solution, which makes them easier to implement, train, and apply in practice. Extensive experiments on three popular HSI datasets demonstrate that EGCSC and EKGCSC can achieve state-of-the-art clustering performance and dramatically outperforms many existing methods with significant margins.
Subspace clustering (SC) is a popular method for dimensionality reduction of high-dimensional data, where it generalizes Principal Component Analysis (PCA). Recently, several methods have been proposed to enhance the robustness of PCA and SC, while most of them are computationally very expensive, in particular, for high dimensional large-scale data. In this paper, we develop much faster iterative algorithms for SC, incorporating robustness using a {em non-squared} $ell_2$-norm objective. The known implementations for optimizing the objective would be costly due to the alternative optimization of two separate objectives: optimal cluster-membership assignment and robust subspace selection, while the substitution of one process to a faster surrogate can cause failure in convergence. To address the issue, we use a simplified procedure requiring efficient matrix-vector multiplications for subspace update instead of solving an expensive eigenvector problem at each iteration, in addition to release nested robust PCA loops. We prove that the proposed algorithm monotonically converges to a local minimum with approximation guarantees, e.g., it achieves 2-approximation for the robust PCA objective. In our experiments, the proposed algorithm is shown to converge at an order of magnitude faster than known algorithms optimizing the same objective, and have outperforms prior subspace clustering methods in accuracy and running time for MNIST dataset.
Finding a suitable data representation for a specific task has been shown to be crucial in many applications. The success of subspace clustering depends on the assumption that the data can be separated into different subspaces. However, this simple assumption does not always hold since the raw data might not be separable into subspaces. To recover the ``clustering-friendly representation and facilitate the subsequent clustering, we propose a graph filtering approach by which a smooth representation is achieved. Specifically, it injects graph similarity into data features by applying a low-pass filter to extract useful data representations for clustering. Extensive experiments on image and document clustering datasets demonstrate that our method improves upon state-of-the-art subspace clustering techniques. Especially, its comparable performance with deep learning methods emphasizes the effectiveness of the simple graph filtering scheme for many real-world applications. An ablation study shows that graph filtering can remove noise, preserve structure in the image, and increase the separability of classes.
Deep Subspace Clustering Networks (DSC) provide an efficient solution to the problem of unsupervised subspace clustering by using an undercomplete deep auto-encoder with a fully-connected layer to exploit the self expressiveness property. This method uses undercomplete representations of the input data which makes it not so robust and more dependent on pre-training. To overcome this, we propose a simple yet efficient alternative method - Overcomplete Deep Subspace Clustering Networks (ODSC) where we use overcomplete representations for subspace clustering. In our proposed method, we fuse the features from both undercomplete and overcomplete auto-encoder networks before passing them through the self-expressive layer thus enabling us to extract a more meaningful and robust representation of the input data for clustering. Experimental results on four benchmark datasets show the effectiveness of the proposed method over DSC and other clustering methods in terms of clustering error. Our method is also not as dependent as DSC is on where pre-training should be stopped to get the best performance and is also more robust to noise. Code - href{https://github.com/jeya-maria-jose/Overcomplete-Deep-Subspace-Clustering}{https://github.com/jeya-maria-jose/Overcomplete-Deep-Subspace-Clustering
Subspace clustering is the unsupervised grouping of points lying near a union of low-dimensional linear subspaces. Algorithms based directly on geometric properties of such data tend to either provide poor empirical performance, lack theoretical guarantees, or depend heavily on their initialization. We present a novel geometric approach to the subspace clustering problem that leverages ensembles of the K-subspaces (KSS) algorithm via the evidence accumulation clustering framework. Our algorithm, referred to as ensemble K-subspaces (EKSS), forms a co-association matrix whose (i,j)th entry is the number of times points i and j are clustered together by several runs of KSS with random initializations. We prove general recovery guarantees for any algorithm that forms an affinity matrix with entries close to a monotonic transformation of pairwise absolute inner products. We then show that a specific instance of EKSS results in an affinity matrix with entries of this form, and hence our proposed algorithm can provably recover subspaces under similar conditions to state-of-the-art algorithms. The finding is, to the best of our knowledge, the first recovery guarantee for evidence accumulation clustering and for KSS variants. We show on synthetic data that our method performs well in the traditionally challenging settings of subspaces with large intersection, subspaces with small principal angles, and noisy data. Finally, we evaluate our algorithm on six common benchmark datasets and show that unlike existing methods, EKSS achieves excellent empirical performance when there are both a small and large number of points per subspace.
Subspace clustering has been extensively studied from the hypothesis-and-test, algebraic, and spectral clustering based perspectives. Most assume that only a single type/class of subspace is present. Generalizations to multiple types are non-trivial, plagued by challenges such as choice of types and numbers of models, sampling imbalance and parameter tuning. In this work, we formulate the multi-type subspace clustering problem as one of learning non-linear subspace filters via deep multi-layer perceptrons (mlps). The response to the learnt subspace filters serve as the feature embedding that is clustering-friendly, i.e., points of the same clusters will be embedded closer together through the network. For inference, we apply K-means to the network output to cluster the data. Experiments are carried out on both synthetic and real world multi-type fitting problems, producing state-of-the-art results.