Zero-Shot Image Classification Using Coupled Dictionary Embedding

145 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mohammad Rostami

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Mohammad Rostami - Soheil Kolouri - Zak Murez

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Zero-shot learning (ZSL) is a framework to classify images belonging to unseen classes based on solely semantic information about these unseen classes. In this paper, we propose a new ZSL algorithm using coupled dictionary learning. The core idea is that the visual features and the semantic attributes of an image can share the same sparse representation in an intermediate space. We use images from seen classes and semantic attributes from seen and unseen classes to learn two dictionaries that can represent sparsely the visual and semantic feature vectors of an image. In the ZSL testing stage and in the absence of labeled data, images from unseen classes can be mapped into the attribute space by finding the joint sparse representation using solely the visual data. The image is then classified in the attribute space given semantic descriptions of unseen classes. We also provide an attribute-aware formulation to tackle domain shift and hubness problems in ZSL. Extensive experiments are provided to demonstrate the superior performance of our approach against the state of the art ZSL algorithms on benchmark ZSL datasets.

قيم البحث

114 - Farshad G. Veshki , Sergiy A. Vorobyov 2017

We address the multi-focus image fusion problem, where multiple images captured with different focal settings are to be fused into an all-in-focus image of higher quality. Algorithms for this problem necessarily admit the source image characteristics along with focused and blurred features. However, most sparsity-based approaches use a single dictionary in focused feature space to describe multi-focus images, and ignore the representations in blurred feature space. We propose a multi-focus image fusion approach based on sparse representation using a coupled dictionary. It exploits the observations that the patches from a given training set can be sparsely represented by a couple of overcomplete dictionaries related to the focused and blurred categories of images and that a sparse approximation based on such coupled dictionary leads to a more flexible and therefore better fusion strategy than the one based on just selecting the sparsest representation in the original image estimate. In addition, to improve the fusion performance, we employ a coupled dictionary learning approach that enforces pairwise correlation between atoms of dictionaries learned to represent the focused and blurred feature spaces. We also discuss the advantages of the fusion approach based on coupled dictionary learning, and present efficient algorithms for fusion based on coupled dictionary learning. Extensive experimental comparisons with state-of-the-art multi-focus image fusion algorithms validate the effectiveness of the proposed approach.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need?

94 - Yonglong Tian , Yue Wang , Dilip Krishnan 2020

The focus of recent meta-learning research has been on the development of learning algorithms that can quickly adapt to test time tasks with limited data and low computational cost. Few-shot learning is widely used as one of the standard benchmarks i n meta-learning. In this work, we show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, followed by training a linear classifier on top of this representation, outperforms state-of-the-art few-shot learning methods. An additional boost can be achieved through the use of self-distillation. This demonstrates that using a good learned embedding model can be more effective than sophisticated meta-learning algorithms. We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms. Code is available at: http://github.com/WangYueFt/rfs/.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Multi-Label Zero-Shot Learning with Transfer-Aware Label Embedding Projection

464 - Meng Ye , Yuhong Guo 2018

Zero-shot learning transfers knowledge from seen classes to novel unseen classes to reduce human labor of labelling data for building new classifiers. Much effort on zero-shot learning however has focused on the standard multi-class setting, the more challenging multi-label zero-shot problem has received limited attention. In this paper we propose a transfer-aware embedding projection approach to tackle multi-label zero-shot learning. The approach projects the label embedding vectors into a low-dimensional space to induce better inter-label relationships and explicitly facilitate information transfer from seen labels to unseen labels, while simultaneously learning a max-margin multi-label classifier with the projected label embeddings. Auxiliary information can be conveniently incorporated to guide the label embedding projection to further improve label relation structures for zero-shot knowledge transfer. We conduct experiments for zero-shot multi-label image classification. The results demonstrate the efficacy of the proposed approach.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي

Hierarchical Image Classification using Entailment Cone Embeddings

156 - Ankit Dhall , Anastasia Makarova , Octavian Ganea 2020

Image classification has been studied extensively, but there has been limited work in using unconventional, external guidance other than traditional image-label pairs for training. We present a set of methods for leveraging information about the sema ntic hierarchy embedded in class labels. We first inject label-hierarchy knowledge into an arbitrary CNN-based classifier and empirically show that availability of such external semantic information in conjunction with the visual semantics from images boosts overall performance. Taking a step further in this direction, we model more explicitly the label-label and label-image interactions using order-preserving embeddings governed by both Euclidean and hyperbolic geometries, prevalent in natural language, and tailor them to hierarchical image classification and representation learning. We empirically validate all the models on the hierarchical ETHEC dataset.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي

Information Bottleneck Constrained Latent Bidirectional Embedding for Zero-Shot Learning

91 - Yang Liu , Lei Zhou , Xiao Bai 2020

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen classes. Though many ZSL methods rely on a direct mapping between the visual and the semantic space, the calibration deviation and hubness problem limit the generalization capability to unseen classes. Recently emerged generative ZSL methods generate unseen image features to transform ZSL into a supervised classification problem. However, most generative models still suffer from the seen-unseen bias problem as only seen data is used for training. To address these issues, we propose a novel bidirectional embedding based generative model with a tight visual-semantic coupling constraint. We learn a unified latent space that calibrates the embedded parametric distributions of both visual and semantic spaces. Since the embedding from high-dimensional visual features comprise much non-semantic information, the alignment of visual and semantic in latent space would inevitably been deviated. Therefore, we introduce information bottleneck (IB) constraint to ZSL for the first time to preserve essential attribute information during the mapping. Specifically, we utilize the uncertainty estimation and the wake-sleep procedure to alleviate the feature noises and improve model abstraction capability. In addition, our method can be easily extended to transductive ZSL setting by generating labels for unseen images. We then introduce a robust loss to solve this label noise problem. Extensive experimental results show that our method outperforms the state-of-the-art methods in different ZSL settings on most benchmark datasets. The code will be available at https://github.com/osierboy/IBZSL.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي