No Arabic abstract
External knowledge (a.k.a side information) plays a critical role in zero-shot learning (ZSL) which aims to predict with unseen classes that have never appeared in training data. Several kinds of external knowledge such as text and attribute have been widely investigated, but they alone are limited with incomplete semantics. Therefore, some very recent studies propose to use Knowledge Graph (KG) due to its high expressivity and compatibility for representing kinds of knowledge. However, the ZSL community is still short of standard benchmarks for studying and comparing different KG-based ZSL methods. In this paper, we proposed 5 resources for KG-based research in zero-shot image classification (ZS-IMGC) and zero-shot KG completion (ZS-KGC). For each resource, we contributed a benchmark and its KG with semantics ranging from text to attributes, from relational knowledge to logical expressions. We have clearly presented how the resources are constructed, their statistics and formats, and how they can be utilized with cases in evaluating ZSL methods performance and explanations. Our resources are available at https://github.com/China-UK-ZSL/Resources_for_KZSL.
Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc.However, such pipeline approaches suffer when some component does not perform well, which leads to error propagation and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue -- many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal F-VQA task.
Graph convolutional neural networks have recently shown great potential for the task of zero-shot learning. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, multi-layer architectures, which are required to propagate knowledge to distant nodes in the graph, dilute the knowledge by performing extensive Laplacian smoothing at each layer and thereby consequently decrease performance. In order to still enjoy the benefit brought by the graph structure while preventing dilution of knowledge from distant nodes, we propose a Dense Graph Propagation (DGP) module with carefully designed direct links among distant nodes. DGP allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a nodes relationship to its ancestors and descendants. A weighting scheme is further used to weigh their contribution depending on the distance to the node to improve information propagation in the graph. Combined with finetuning of the representations in a two-stage training approach our method outperforms state-of-the-art zero-shot learning approaches.
Zero-Shot Learning (ZSL) is an emerging research that aims to solve the classification problems with very few training data. The present works on ZSL mainly focus on the mapping of learning semantic space to visual space. It encounters many challenges that obstruct the progress of ZSL research. First, the representation of the semantic feature is inadequate to represent all features of the categories. Second, the domain drift problem still exists during the transfer from semantic space to visual space. In this paper, we introduce knowledge sharing (KS) to enrich the representation of semantic features. Based on KS, we apply a generative adversarial network to generate pseudo visual features from semantic features that are very close to the real visual features. Abundant experimental results from two benchmark datasets of ZSL show that the proposed approach has a consistent improvement.
Zero-shot learning (ZSL) aims to transfer knowledge from seen classes to semantically related unseen classes, which are absent during training. The promising strategies for ZSL are to synthesize visual features of unseen classes conditioned on semantic side information and to incorporate meta-learning to eliminate the models inherent bias towards seen classes. While existing meta generative approaches pursue a common model shared across task distributions, we aim to construct a generative network adaptive to task characteristics. To this end, we propose an Attribute-Modulated generAtive meta-model for Zero-shot learning (AMAZ). Our model consists of an attribute-aware modulation network, an attribute-augmented generative network, and an attribute-weighted classifier. Given unseen classes, the modulation network adaptively modulates the generator by applying task-specific transformations so that the generative network can adapt to highly diverse tasks. The weighted classifier utilizes the data quality to enhance the training procedure, further improving the model performance. Our empirical evaluations on four widely-used benchmarks show that AMAZ outperforms state-of-the-art methods by 3.8% and 3.1% in ZSL and generalized ZSL settings, respectively, demonstrating the superiority of our method. Our experiments on a zero-shot image retrieval task show AMAZs ability to synthesize instances that portray real visual characteristics.
Recent developments in pre-trained neural language modeling have led to leaps in accuracy on commonsense question-answering benchmarks. However, there is increasing concern that models overfit to specific tasks, without learning to utilize external knowledge or perform general semantic reasoning. In contrast, zero-shot evaluations have shown promise as a more robust measure of a models general reasoning abilities. In this paper, we propose a novel neuro-symbolic framework for zero-shot question answering across commonsense tasks. Guided by a set of hypotheses, the framework studies how to transform various pre-existing knowledge resources into a form that is most effective for pre-training models. We vary the set of language models, training regimes, knowledge sources, and data generation strategies, and measure their impact across tasks. Extending on prior work, we devise and compare four constrained distractor-sampling strategies. We provide empirical results across five commonsense question-answering tasks with data generated from five external knowledge resources. We show that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks. In addition, both preserving the structure of the task as well as generating fair and informative questions help language models learn more effectively.