ﻻ يوجد ملخص باللغة العربية
Multi-label zero-shot classification aims to predict multiple unseen class labels for an input image. It is more challenging than its single-label counterpart. On one hand, the unconstrained number of labels assigned to each image makes the model more easily overfit to those seen classes. On the other hand, there is a large semantic gap between seen and unseen classes in the existing multi-label classification datasets. To address these difficult issues, this paper introduces a novel multi-label zero-shot classification framework by learning to transfer from external knowledge. We observe that ImageNet is commonly used to pretrain the feature extractor and has a large and fine-grained label space. This motivates us to exploit it as external knowledge to bridge the seen and unseen classes and promote generalization. Specifically, we construct a knowledge graph including not only classes from the target dataset but also those from ImageNet. Since ImageNet labels are not available in the target dataset, we propose a novel PosVAE module to infer their initial states in the extended knowledge graph. Then we design a relational graph convolutional network (RGCN) to propagate information among classes and achieve knowledge transfer. Experimental results on two benchmark datasets demonstrate the effectiveness of the proposed approach.
Zero-shot learning transfers knowledge from seen classes to novel unseen classes to reduce human labor of labelling data for building new classifiers. Much effort on zero-shot learning however has focused on the standard multi-class setting, the more
In this paper, we propose a novel approach for generalized zero-shot learning in a multi-modal setting, where we have novel classes of audio/video during testing that are not seen during training. We use the semantic relatedness of text embeddings as
New categories can be discovered by transforming semantic features into synthesized visual features without corresponding training samples in zero-shot image classification. Although significant progress has been made in generating high-quality synth
Recognizing multiple labels of an image is a practical yet challenging task, and remarkable progress has been achieved by searching for semantic regions and exploiting label dependencies. However, current works utilize RNN/LSTM to implicitly capture
Multi-label image classification is a fundamental but challenging task towards general visual understanding. Existing methods found the region-level cues (e.g., features from RoIs) can facilitate multi-label classification. Nevertheless, such methods