Scalable Iterative Algorithm for Robust Subspace Clustering

476 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sanghyuk Chun

تاريخ النشر 2015

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sanghyuk Chun - Yung-Kyun Noh - Jinwoo Shin

بنى وهياكل البيانات والخوارزميات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Subspace clustering (SC) is a popular method for dimensionality reduction of high-dimensional data, where it generalizes Principal Component Analysis (PCA). Recently, several methods have been proposed to enhance the robustness of PCA and SC, while most of them are computationally very expensive, in particular, for high dimensional large-scale data. In this paper, we develop much faster iterative algorithms for SC, incorporating robustness using a {em non-squared} $ell_2$-norm objective. The known implementations for optimizing the objective would be costly due to the alternative optimization of two separate objectives: optimal cluster-membership assignment and robust subspace selection, while the substitution of one process to a faster surrogate can cause failure in convergence. To address the issue, we use a simplified procedure requiring efficient matrix-vector multiplications for subspace update instead of solving an expensive eigenvector problem at each iteration, in addition to release nested robust PCA loops. We prove that the proposed algorithm monotonically converges to a local minimum with approximation guarantees, e.g., it achieves 2-approximation for the robust PCA objective. In our experiments, the proposed algorithm is shown to converge at an order of magnitude faster than known algorithms optimizing the same objective, and have outperforms prior subspace clustering methods in accuracy and running time for MNIST dataset.

قيم البحث

117 - Yaoming Cai , Zijia Zhang , Zhihua Cai 2020

Hyperspectral image (HSI) clustering is a challenging task due to the high complexity of HSI data. Subspace clustering has been proven to be powerful for exploiting the intrinsic relationship between data points. Despite the impressive performance in the HSI clustering, traditional subspace clustering methods often ignore the inherent structural information among data. In this paper, we revisit the subspace clustering with graph convolution and present a novel subspace clustering framework called Graph Convolutional Subspace Clustering (GCSC) for robust HSI clustering. Specifically, the framework recasts the self-expressiveness property of the data into the non-Euclidean domain, which results in a more robust graph embedding dictionary. We show that traditional subspace clustering models are the special forms of our framework with the Euclidean data. Basing on the framework, we further propose two novel subspace clustering models by using the Frobenius norm, namely Efficient GCSC (EGCSC) and Efficient Kernel GCSC (EKGCSC). Both models have a globally optimal closed-form solution, which makes them easier to implement, train, and apply in practice. Extensive experiments on three popular HSI datasets demonstrate that EGCSC and EKGCSC can achieve state-of-the-art clustering performance and dramatically outperforms many existing methods with significant margins.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Scalable Approximation Algorithm for Graph Summarization

104 - Maham Anwar Beg , Muhammad Ahmad , Arif Zaman 2018

Massive sizes of real-world graphs, such as social networks and web graph, impose serious challenges to process and perform analytics on them. These issues can be resolved by working on a small summary of the graph instead . A summary is a compressed version of the graph that removes several details, yet preserves its essential structure. Generally, some predefined quality measure of the summary is optimized to bound the approximation error incurred by working on the summary instead of the whole graph. All known summarization algorithms are computationally prohibitive and do not scale to large graphs. In this paper we present an efficient randomized algorithm to compute graph summaries with the goal to minimize reconstruction error. We propose a novel weighted sampling scheme to sample vertices for merging that will result in the least reconstruction error. We provide analytical bounds on the running time of the algorithm and prove approximation guarantee for our score computation. Efficiency of our algorithm makes it scalable to very large graphs on which known algorithms cannot be applied. We test our algorithm on several real world graphs to empirically demonstrate the quality of summaries produced and compare to state of the art algorithms. We use the summaries to answer several structural queries about original graph and report their accuracies.

بنى وهياكل البيانات والخوارزميات الرياضيات المتقطعة

A Scalable Concurrent Algorithm for Dynamic Connectivity

95 - Alexander Fedorov , Nikita Koval , Dan Alistarh 2021

Dynamic Connectivity is a fundamental algorithmic graph problem, motivated by a wide range of applications to social and communication networks and used as a building block in various other algorithms, such as the bi-connectivity and the dynamic mini mal spanning tree problems. In brief, we wish to maintain the connected components of the graph under dynamic edge insertions and deletions. In the sequential case, the problem has been well-studied from both theoretical and practical perspectives. However, much less is known about efficient concurrent solutions to this problem. This is the gap we address in this paper. We start from one of the classic data structures used to solve this problem, the Euler Tour Tree. Our first contribution is a non-blocking single-writer implementation of it. We leverage this data structure to obtain the first truly concurrent generalization of dynamic connectivity, which preserves the time complexity of its sequential counterpart, but is also scalable in practice. To achieve this, we rely on three main techniques. The first is to ensure that connectivity queries, which usually dominate real-world workloads, are non-blocking. The second non-trivial technique expands the above idea by making all queries that do not change the connectivity structure non-blocking. The third ingredient is applying fine-grained locking for updating the connected components, which allows operations on disjoint components to occur in parallel. We evaluate the resulting algorithm on various workloads, executing on both real and synthetic graphs. The results show the efficiency of each of the proposed optimizations; the most efficient variant improves the performance of a coarse-grained based implementation on realistic scenarios up to 6x on average and up to 30x when connectivity queries dominate.

بنى وهياكل البيانات والخوارزميات النظم الموزعة والتوازية والحوسبة العنقودية

Iterative Spectral Method for Alternative Clustering

447 - Chieh Wu , Stratis Ioannidis , Mario Sznaier 2019

Given a dataset and an existing clustering as input, alternative clustering aims to find an alternative partition. One of the state-of-the-art approaches is Kernel Dimension Alternative Clustering (KDAC). We propose a novel Iterative Spectral Method (ISM) that greatly improves the scalability of KDAC. Our algorithm is intuitive, relies on easily implementable spectral decompositions, and comes with theoretical guarantees. Its computation time improves upon existing implementations of KDAC by as much as 5 orders of magnitude.

التعلم الالي التعلم الآلي تطبيقات الإحصاء

Overcomplete Deep Subspace Clustering Networks

155 - Jeya Maria Jose Valanarasu , Vishal M. Patel 2020

Deep Subspace Clustering Networks (DSC) provide an efficient solution to the problem of unsupervised subspace clustering by using an undercomplete deep auto-encoder with a fully-connected layer to exploit the self expressiveness property. This method uses undercomplete representations of the input data which makes it not so robust and more dependent on pre-training. To overcome this, we propose a simple yet efficient alternative method - Overcomplete Deep Subspace Clustering Networks (ODSC) where we use overcomplete representations for subspace clustering. In our proposed method, we fuse the features from both undercomplete and overcomplete auto-encoder networks before passing them through the self-expressive layer thus enabling us to extract a more meaningful and robust representation of the input data for clustering. Experimental results on four benchmark datasets show the effectiveness of the proposed method over DSC and other clustering methods in terms of clustering error. Our method is also not as dependent as DSC is on where pre-training should be stopped to get the best performance and is also more robust to noise. Code - href{https://github.com/jeya-maria-jose/Overcomplete-Deep-Subspace-Clustering}{https://github.com/jeya-maria-jose/Overcomplete-Deep-Subspace-Clustering

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي