ﻻ يوجد ملخص باللغة العربية
In zero-shot learning (ZSL), conditional generators have been widely used to generate additional training features. These features can then be used to train the classifiers for testing data. However, some testing data are considered hard as they lie close to the decision boundaries and are prone to misclassification, leading to performance degradation for ZSL. In this paper, we propose to learn clusterable features for ZSL problems. Using a Conditional Variational Autoencoder (CVAE) as the feature generator, we project the original features to a new feature space supervised by an auxiliary classification loss. To further increase clusterability, we fine-tune the features using Gaussian similarity loss. The clusterable visual features are not only more suitable for CVAE reconstruction but are also more separable which improves classification accuracy. Moreover, we introduce Gaussian noise to enlarge the intra-class variance of the generated features, which helps to improve the classifiers robustness. Our experiments on SUN,CUB, and AWA2 datasets show consistent improvement over previous state-of-the-art ZSL results by a large margin. In addition to its effectiveness on zero-shot classification, experiments show that our method to increase feature clusterability benefits few-shot learning algorithms as well.
Zero-shot object recognition or zero-shot learning aims to transfer the object recognition ability among the semantically related categories, such as fine-grained animal or bird species. However, the images of different fine-grained objects tend to m
We present a novel counterfactual framework for both Zero-Shot Learning (ZSL) and Open-Set Recognition (OSR), whose common challenge is generalizing to the unseen-classes by only training on the seen-classes. Our idea stems from the observation that
Zero-shot learning (ZSL) aims to recognize unseen object classes without any training samples, which can be regarded as a form of transfer learning from seen classes to unseen ones. This is made possible by learning a projection between a feature spa
Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information. This paper focuses on visual KWS for words unseen during training, a real-world, practical setting which so far
The recognition of Chinese characters has always been a challenging task due to their huge variety and complex structures. The latest research proves that such an enormous character set can be decomposed into a collection of about 500 fundamental Chi