Noisy subspace clustering via matching pursuits

153 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Michael Tschannen

تاريخ النشر 2016

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Michael Tschannen - Helmut Bolcskei

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Sparsity-based subspace clustering algorithms have attracted significant attention thanks to their excellent performance in practical applications. A prominent example is the sparse subspace clustering (SSC) algorithm by Elhamifar and Vidal, which performs spectral clustering based on an adjacency matrix obtained by sparsely representing each data point in terms of all the other data points via the Lasso. When the number of data points is large or the dimension of the ambient space is high, the computational complexity of SSC quickly becomes prohibitive. Dyer et al. observed that SSC-OMP obtained by replacing the Lasso by the greedy orthogonal matching pursuit (OMP) algorithm results in significantly lower computational complexity, while often yielding comparable performance. The central goal of this paper is an analytical performance characterization of SSC-OMP for noisy data. Moreover, we introduce and analyze the SSC-MP algorithm, which employs matching pursuit (MP) in lieu of OMP. Both SSC-OMP and SSC-MP are proven to succeed even when the subspaces intersect and when the data points are contaminated by severe noise. The clustering conditions we obtain for SSC-OMP and SSC-MP are similar to those for SSC and for the thresholding-based subspace clustering (TSC) algorithm due to Heckel and Bolcskei. Analytical results in combination with numerical results indicate that both SSC-OMP and SSC-MP with a data-dependent stopping criterion automatically detect the dimensions of the subspaces underlying the data. Moreover, experiments on synthetic and on real data show that SSC-MP compares very favorably to SSC, SSC-OMP, TSC, and the nearest subspace neighbor algorithm, both in terms of clustering performance and running time. In addition, we find that, in contrast to SSC-OMP, the performance of SSC-MP is very robust with respect to the choice of parameters in the stopping criteria.

قيم البحث

107 - Antonio De Maio , Danilo Orlando 2015

This paper deals with adaptive radar detection of a subspace signal competing with two sources of interference. The former is Gaussian with unknown covariance matrix and accounts for the joint presence of clutter plus thermal noise. The latter is str uctured as a subspace signal and models coherent pulsed jammers impinging on the radar antenna. The problem is solved via the Principle of Invariance which is based on the identification of a suitable group of transformations leaving the considered hypothesis testing problem invariant. A maximal invariant statistic, which completely characterizes the class of invariant decision rules and significantly compresses the original data domain, as well as its statistical characterization are determined. Thus, the existence of the optimum invariant detector is addressed together with the design of practically implementable invariant decision rules. At the analysis stage, the performance of some receivers belonging to the new invariant class is established through the use of analytic expressions.

تطبيقات الإحصاء نظرية المعلومات نظرية المعلومات

Doubly Stochastic Subspace Clustering

123 - Derek Lim , Rene Vidal , Benjamin D. Haeffele 2020

Many state-of-the-art subspace clustering methods follow a two-step process by first constructing an affinity matrix between data points and then applying spectral clustering to this affinity. Most of the research into these methods focuses on the fi rst step of generating the affinity, which often exploits the self-expressive property of linear subspaces, with little consideration typically given to the spectral clustering step that produces the final clustering. Moreover, existing methods often obtain the final affinity that is used in the spectral clustering step by applying ad-hoc or arbitrarily chosen postprocessing steps to the affinity generated by a self-expressive clustering formulation, which can have a significant impact on the overall clustering performance. In this work, we unify these two steps by learning both a self-expressive representation of the data and an affinity matrix that is well-normalized for spectral clustering. In our proposed models, we constrain the affinity matrix to be doubly stochastic, which results in a principled method for affinity matrix normalization while also exploiting known benefits of doubly stochastic normalization in spectral clustering. We develop a general framework and derive two models: one that jointly learns the self-expressive representation along with the doubly stochastic affinity, and one that sequentially solves for one then the other. Furthermore, we leverage sparsity in the problem to develop a fast active-set method for the sequential solver that enables efficient computation on large datasets. Experiments show that our method achieves state-of-the-art subspace clustering performance on many common datasets in computer vision.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Towards Clustering-friendly Representations: Subspace Clustering via Graph Filtering

117 - Zhengrui Ma , Zhao Kang , Guangchun Luo 2021

Finding a suitable data representation for a specific task has been shown to be crucial in many applications. The success of subspace clustering depends on the assumption that the data can be separated into different subspaces. However, this simple a ssumption does not always hold since the raw data might not be separable into subspaces. To recover the ``clustering-friendly representation and facilitate the subsequent clustering, we propose a graph filtering approach by which a smooth representation is achieved. Specifically, it injects graph similarity into data features by applying a low-pass filter to extract useful data representations for clustering. Extensive experiments on image and document clustering datasets demonstrate that our method improves upon state-of-the-art subspace clustering techniques. Especially, its comparable performance with deep learning methods emphasizes the effectiveness of the simple graph filtering scheme for many real-world applications. An ablation study shows that graph filtering can remove noise, preserve structure in the image, and increase the separability of classes.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Statistical Learning Guarantees for Compressive Clustering and Compressive Mixture Modeling

84 - Remi Gribonval 2020

We provide statistical learning guarantees for two unsupervised learning tasks in the context of compressive statistical learning, a general framework for resource-efficient large-scale learning that we introduced in a companion paper.The principle o f compressive statistical learning is to compress a training collection, in one pass, into a low-dimensional sketch (a vector of random empirical generalized moments) that captures the information relevant to the considered learning task. We explicitly describe and analyze random feature functions which empirical averages preserve the needed information for compressive clustering and compressive Gaussian mixture modeling with fixed known variance, and establish sufficient sketch sizes given the problem dimensions.

التعلم الآلي نظرية المعلومات نظرية المعلومات

Sub-Classifier Construction for Error Correcting Output Code Using Minimum Weight Perfect Matching

407 - Patoomsiri Songsiri , Thimaporn Phetkaew , Ryutaro Ichise 2013

Multi-class classification is mandatory for real world problems and one of promising techniques for multi-class classification is Error Correcting Output Code. We propose a method for constructing the Error Correcting Output Code to obtain the suitab le combination of positive and negative classes encoded to represent binary classifiers. The minimum weight perfect matching algorithm is applied to find the optimal pairs of subset of classes by using the generalization performance as a weighting criterion. Based on our method, each subset of classes with positive and negative labels is appropriately combined for learning the binary classifiers. Experimental results show that our technique gives significantly higher performance compared to traditional methods including the dense random code and the sparse random code both in terms of accuracy and classification times. Moreover, our method requires significantly smaller number of binary classifiers while maintaining accuracy compared to the One-Versus-One.

التعلم الآلي نظرية المعلومات نظرية المعلومات