Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders

81 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Fengfu Li

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Fengfu Li - Hong Qiao - Bo Zhang

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Traditional image clustering methods take a two-step approach, feature learning and clustering, sequentially. However, recent research results demonstrated that combining the separated phases in a unified framework and training them jointly can achieve a better performance. In this paper, we first introduce fully convolutional auto-encoders for image feature learning and then propose a unified clustering framework to learn image representations and cluster centers jointly based on a fully convolutional auto-encoder and soft $k$-means scores. At initial stages of the learning procedure, the representations extracted from the auto-encoder may not be very discriminative for latter clustering. We address this issue by adopting a boosted discriminative distribution, where high score assignments are highlighted and low score ones are de-emphasized. With the gradually boosted discrimination, clustering assignment scores are discriminated and cluster purities are enlarged. Experiments on several vision benchmark datasets show that our methods can achieve a state-of-the-art performance.

قيم البحث

اقرأ أيضاً

Wasserstein Auto-Encoders

89 - Ilya Tolstikhin , Olivier Bousquet , Sylvain Gelly 2017

We propose the Wasserstein Auto-Encoder (WAE)---a new algorithm for building a generative model of the data distribution. WAE minimizes a penalized form of the Wasserstein distance between the model distribution and the target distribution, which lea ds to a different regularizer than the one used by the Variational Auto-Encoder (VAE). This regularizer encourages the encoded training distribution to match the prior. We compare our algorithm with several other techniques and show that it is a generalization of adversarial auto-encoders (AAE). Our experiments show that WAE shares many of the properties of VAEs (stable training, encoder-decoder architecture, nice latent manifold structure) while generating samples of better quality, as measured by the FID score.

التعلم الالي التعلم الآلي

Fisher Auto-Encoders

129 - Khalil Elkhalil , Ali Hasan , Jie Ding 2020

It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto -encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and latent variables, with that of the postulated/modeled joint distribution. In contrast to KL-based variational AEs (VAEs), the Fisher AE can exactly quantify the distance between the true and the model-based posterior distributions. Qualitative and quantitative results are provided on both MNIST and celebA datasets demonstrating the competitive performance of Fisher AEs in terms of robustness compared to other AEs such as VAEs and Wasserstein AEs.

التعلم الالي التعلم الآلي

Disentangling Space and Time in Video with Hierarchical Variational Auto-encoders

329 - Will Grathwohl , Aaron Wilson 2016

There are many forms of feature information present in video data. Principle among them are object identity information which is largely static across multiple video frames, and object pose and style information which continuously transforms from fra me to frame. Most existing models confound these two types of representation by mapping them to a shared feature space. In this paper we propose a probabilistic approach for learning separable representations of object identity and pose information using unsupervised video data. Our approach leverages a deep generative model with a factored prior distribution that encodes properties of temporal invariances in the hidden feature set. Learning is achieved via variational inference. We present results of learning identity and pose information on a dataset of moving characters as well as a dataset of rotating 3D objects. Our experimental results demonstrate our models success in factoring its representation, and demonstrate that the model achieves improved performance in transfer learning tasks.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي

Graph Convolutional Subspace Clustering: A Robust Subspace Clustering Framework for Hyperspectral Image

117 - Yaoming Cai , Zijia Zhang , Zhihua Cai 2020

Hyperspectral image (HSI) clustering is a challenging task due to the high complexity of HSI data. Subspace clustering has been proven to be powerful for exploiting the intrinsic relationship between data points. Despite the impressive performance in the HSI clustering, traditional subspace clustering methods often ignore the inherent structural information among data. In this paper, we revisit the subspace clustering with graph convolution and present a novel subspace clustering framework called Graph Convolutional Subspace Clustering (GCSC) for robust HSI clustering. Specifically, the framework recasts the self-expressiveness property of the data into the non-Euclidean domain, which results in a more robust graph embedding dictionary. We show that traditional subspace clustering models are the special forms of our framework with the Euclidean data. Basing on the framework, we further propose two novel subspace clustering models by using the Frobenius norm, namely Efficient GCSC (EGCSC) and Efficient Kernel GCSC (EKGCSC). Both models have a globally optimal closed-form solution, which makes them easier to implement, train, and apply in practice. Extensive experiments on three popular HSI datasets demonstrate that EGCSC and EKGCSC can achieve state-of-the-art clustering performance and dramatically outperforms many existing methods with significant margins.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Part-based approximations for morphological operators using asymmetric auto-encoders

86 - Bastien Ponchon 2019

This paper addresses the issue of building a part-based representation of a dataset of images. More precisely, we look for a non-negative, sparse decomposition of the images on a reduced set of atoms, in order to unveil a morphological and interpreta ble structure of the data. Additionally, we want this decomposition to be computed online for any new sample that is not part of the initial dataset. Therefore, our solution relies on a sparse, non-negative auto-encoder where the encoder is deep (for accuracy) and the decoder shallow (for interpretability). This method compares favorably to the state-of-the-art online methods on two datasets (MNIST and Fashion MNIST), according to classical metrics and to a new one we introduce, based on the invariance of the representation to morphological dilation.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي نظرية الإحصاء