No Arabic abstract
Zero-shot learning (ZSL) algorithms typically work by exploiting attribute correlations to be able to make predictions in unseen classes. However, these correlations do not remain intact at test time in most practical settings and the resulting change in these correlations lead to adverse effects on zero-shot learning performance. In this paper, we present a new paradigm for ZSL that: (i) utilizes the class-attribute mapping of unseen classes to estimate the change in target distribution (target shift), and (ii) propose a novel technique called grouped Adversarial Learning (gAL) to reduce negative effects of this shift. Our approach is widely applicable for several existing ZSL algorithms, including those with implicit attribute predictions. We apply the proposed technique ($g$AL) on three popular ZSL algorithms: ALE, SJE, and DEVISE, and show performance improvements on 4 popular ZSL datasets: AwA2, aPY, CUB and SUN. We obtain SOTA results on SUN and aPY datasets and achieve comparable results on AwA2.
Normalization techniques have proved to be a crucial ingredient of successful training in a traditional supervised learning regime. However, in the zero-shot learning (ZSL) world, these ideas have received only marginal attention. This work studies normalization in ZSL scenario from both theoretical and practical perspectives. First, we give a theoretical explanation to two popular tricks used in zero-shot learning: normalize+scale and attributes normalization and show that they help training by preserving variance during a forward pass. Next, we demonstrate that they are insufficient to normalize a deep ZSL model and propose Class Normalization (CN): a normalization scheme, which alleviates this issue both provably and in practice. Third, we show that ZSL models typically have more irregular loss surface compared to traditional classifiers and that the proposed method partially remedies this problem. Then, we test our approach on 4 standard ZSL datasets and outperform sophisticated modern SotA with a simple MLP optimized without any bells and whistles and having ~50 times faster training speed. Finally, we generalize ZSL to a broader problem -- continual ZSL, and introduce some principled metrics and rigorous baselines for this new setup. The project page is located at https://universome.github.io/class-norm.
We are interested in developing a unified machine learning model over many mobile devices for practical learning tasks, where each device only has very few training data. This is a commonly encountered situation in mobile computing scenarios, where data is scarce and distributed while the tasks are distinct. In this paper, we propose a federated few-shot learning (FedFSL) framework to learn a few-shot classification model that can classify unseen data classes with only a few labeled samples. With the federated learning strategy, FedFSL can utilize many data sources while keeping data privacy and communication efficiency. There are two technical challenges: 1) directly using the existing federated learning approach may lead to misaligned decision boundaries produced by client models, and 2) constraining the decision boundaries to be similar over clients would overfit to training tasks but not adapt well to unseen tasks. To address these issues, we propose to regularize local updates by minimizing the divergence of client models. We also formulate the training in an adversarial fashion and optimize the client models to produce a discriminative feature space that can better represent unseen data samples. We demonstrate the intuitions and conduct experiments to show our approaches outperform baselines by more than 10% in learning vision tasks and 5% in language tasks.
In the process of exploring the world, the curiosity constantly drives humans to cognize new things. Supposing you are a zoologist, for a presented animal image, you can recognize it immediately if you know its class. Otherwise, you would more likely attempt to cognize it by exploiting the side-information (e.g., semantic information, etc.) you have accumulated. Inspired by this, this paper decomposes the generalized zero-shot learning (G-ZSL) task into an open set recognition (OSR) task and a zero-shot learning (ZSL) task, where OSR recognizes seen classes (if we have seen (or known) them) and rejects unseen classes (if we have never seen (or known) them before), while ZSL identifies the unseen classes rejected by the former. Simultaneously, without violating OSRs assumptions (only known class knowledge is available in training), we also first attempt to explore a new generalized open set recognition (G-OSR) by introducing the accumulated side-information from known classes to OSR. For G-ZSL, such a decomposition effectively solves the class overfitting problem with easily misclassifying unseen classes as seen classes. The problem is ubiquitous in most existing G-ZSL methods. On the other hand, for G-OSR, introducing such semantic information of known classes not only improves the recognition performance but also endows OSR with the cognitive ability of unknown classes. Specifically, a visual and semantic prototypes-jointly guided convolutional neural network (VSG-CNN) is proposed to fulfill these two tasks (G-ZSL and G-OSR) in a unified end-to-end learning framework. Extensive experiments on benchmark datasets demonstrate the advantages of our learning framework.
We propose a transductive Laplacian-regularized inference for few-shot tasks. Given any feature embedding learned from the base classes, we minimize a quadratic binary-assignment function containing two terms: (1) a unary term assigning query samples to the nearest class prototype, and (2) a pairwise Laplacian term encouraging nearby query samples to have consistent label assignments. Our transductive inference does not re-train the base model, and can be viewed as a graph clustering of the query set, subject to supervision constraints from the support set. We derive a computationally efficient bound optimizer of a relaxation of our function, which computes independent (parallel) updates for each query sample, while guaranteeing convergence. Following a simple cross-entropy training on the base classes, and without complex meta-learning strategies, we conducted comprehensive experiments over five few-shot learning benchmarks. Our LaplacianShot consistently outperforms state-of-the-art methods by significant margins across different models, settings, and data sets. Furthermore, our transductive inference is very fast, with computational times that are close to inductive inference, and can be used for large-scale few-shot tasks.
In this work, we investigate semi-supervised learning (SSL) for image classification using adversarial training. Previous results have illustrated that generative adversarial networks (GANs) can be used for multiple purposes. Triple-GAN, which aims to jointly optimize model components by incorporating three players, generates suitable image-label pairs to compensate for the lack of labeled data in SSL with improved benchmark performance. Conversely, Bad (or complementary) GAN, optimizes generation to produce complementary data-label pairs and force a classifiers decision boundary to lie between data manifolds. Although it generally outperforms Triple-GAN, Bad GAN is highly sensitive to the amount of labeled data used for training. Unifying these two approaches, we present unified-GAN (UGAN), a novel framework that enables a classifier to simultaneously learn from both good and bad samples through adversarial training. We perform extensive experiments on various datasets and demonstrate that UGAN: 1) achieves state-of-the-art performance among other deep generative models, and 2) is robust to variations in the amount of labeled data used for training.