ﻻ يوجد ملخص باللغة العربية
This paper proposes a novel model for recognizing images with composite attribute-object concepts, notably for composite concepts that are unseen during model training. We aim to explore the three key properties required by the task --- relation-aware, consistent, and decoupled --- to learn rich and robust features for primitive concepts that compose attribute-object pairs. To this end, we propose the Blocked Message Passing Network (BMP-Net). The model consists of two modules. The concept module generates semantically meaningful features for primitive concepts, whereas the visual module extracts visual features for attributes and objects from input images. A message passing mechanism is used in the concept module to capture the relations between primitive concepts. Furthermore, to prevent the model from being biased towards seen composite concepts and reduce the entanglement between attributes and objects, we propose a blocking mechanism that equalizes the information available to the model for both seen and unseen concepts. Extensive experiments and ablation studies on two benchmarks show the efficacy of the proposed model.
Chinese characters have a huge set of character categories, more than 20,000 and the number is still increasing as more and more novel characters continue being created. However, the enormous characters can be decomposed into a compact set of about 5
In this paper, we study the problem of recognizing compositional attribute-object concepts within the zero-shot learning (ZSL) framework. We propose an episode-based cross-attention (EpiCA) network which combines merits of cross-attention mechanism a
From the beginning of zero-shot learning research, visual attributes have been shown to play an important role. In order to better transfer attribute-based knowledge from known to unknown classes, we argue that an image representation with integrated
In compositional zero-shot learning, the goal is to recognize unseen compositions (e.g. old dog) of observed visual primitives states (e.g. old, cute) and objects (e.g. car, dog) in the training set. This is challenging because the same state can for
One of the key limitations of modern deep learning approaches lies in the amount of data required to train them. Humans, by contrast, can learn to recognize novel categories from just a few examples. Instrumental to this rapid learning ability is the