Towards Visual Explainable Active Learning for Zero-Shot Classification

208 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shichao Jia

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shichao Jia - Zeyu Li - Nuo Chen

تفاعل الإنسان والحاسوب الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Zero-shot classification is a promising paradigm to solve an applicable problem when the training classes and test classes are disjoint. Achieving this usually needs experts to externalize their domain knowledge by manually specifying a class-attribute matrix to define which classes have which attributes. Designing a suitable class-attribute matrix is the key to the subsequent procedure, but this design process is tedious and trial-and-error with no guidance. This paper proposes a visual explainable active learning approach with its design and implementation called semantic navigator to solve the above problems. This approach promotes human-AI teaming with four actions (ask, explain, recommend, respond) in each interaction loop. The machine asks contrastive questions to guide humans in the thinking process of attributes. A novel visualization called semantic map explains the current status of the machine. Therefore analysts can better understand why the machine misclassifies objects. Moreover, the machine recommends the labels of classes for each attribute to ease the labeling burden. Finally, humans can steer the model by modifying the labels interactively, and the machine adjusts its recommendations. The visual explainable active learning approach improves humans efficiency of building zero-shot classification models interactively, compared with the method without guidance. We justify our results with user studies using the standard benchmarks for zero-shot classification.

قيم البحث

97 - Maxime Bucher 2017

This paper addresses the task of learning an image clas-sifier when some categories are defined by semantic descriptions only (e.g. visual attributes) while the others are defined by exemplar images as well. This task is often referred to as the Zero -Shot classification task (ZSC). Most of the previous methods rely on learning a common embedding space allowing to compare visual features of unknown categories with semantic descriptions. This paper argues that these approaches are limited as i) efficient discrimi-native classifiers cant be used ii) classification tasks with seen and unseen categories (Generalized Zero-Shot Classification or GZSC) cant be addressed efficiently. In contrast , this paper suggests to address ZSC and GZSC by i) learning a conditional generator using seen classes ii) generate artificial training examples for the categories without exemplars. ZSC is then turned into a standard supervised learning problem. Experiments with 4 generative models and 5 datasets experimentally validate the approach, giving state-of-the-art results on both ZSC and GZSC.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Class Knowledge Overlay to Visual Feature Learning for Zero-Shot Image Classification

159 - Cheng Xie , Ting Zeng , Hongxin Xiang 2021

New categories can be discovered by transforming semantic features into synthesized visual features without corresponding training samples in zero-shot image classification. Although significant progress has been made in generating high-quality synth esized visual features using generative adversarial networks, guaranteeing semantic consistency between the semantic features and visual features remains very challenging. In this paper, we propose a novel zero-shot learning approach, GAN-CST, based on class knowledge to visual feature learning to tackle the problem. The approach consists of three parts, class knowledge overlay, semi-supervised learning and triplet loss. It applies class knowledge overlay (CKO) to obtain knowledge not only from the corresponding class but also from other classes that have the knowledge overlay. It ensures that the knowledge-to-visual learning process has adequate information to generate synthesized visual features. The approach also applies a semi-supervised learning process to re-train knowledge-to-visual model. It contributes to reinforcing synthesized visual features generation as well as new category prediction. We tabulate results on a number of benchmark datasets demonstrating that the proposed model delivers superior performance over state-of-the-art approaches.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Exploiting the relationship between visual and textual features in social networks for image classification with zero-shot deep learning

93 - Luis Lucas , David Tomas , Jose Garcia-Rodriguez 2021

One of the main issues related to unsupervised machine learning is the cost of processing and extracting useful information from large datasets. In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLI P neural network architecture in multimodal environments (image and text) from social media. For this purpose, we used the InstaNY100K dataset and proposed a validation approach based on sampling techniques. Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part, and then adding the associated texts as support. The results obtained demonstrated that trained neural networks such as CLIP can be successfully applied to image classification with little fine-tuning, and considering the associated texts to the images can help to improve the accuracy depending on the goal. The results demonstrated what seems to be a promising research direction.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Learning Clusterable Visual Features for Zero-Shot Recognition

113 - Jingyi Xu , Zhixin Shu , Dimitris Samaras 2020

In zero-shot learning (ZSL), conditional generators have been widely used to generate additional training features. These features can then be used to train the classifiers for testing data. However, some testing data are considered hard as they lie close to the decision boundaries and are prone to misclassification, leading to performance degradation for ZSL. In this paper, we propose to learn clusterable features for ZSL problems. Using a Conditional Variational Autoencoder (CVAE) as the feature generator, we project the original features to a new feature space supervised by an auxiliary classification loss. To further increase clusterability, we fine-tune the features using Gaussian similarity loss. The clusterable visual features are not only more suitable for CVAE reconstruction but are also more separable which improves classification accuracy. Moreover, we introduce Gaussian noise to enlarge the intra-class variance of the generated features, which helps to improve the classifiers robustness. Our experiments on SUN,CUB, and AWA2 datasets show consistent improvement over previous state-of-the-art ZSL results by a large margin. In addition to its effectiveness on zero-shot classification, experiments show that our method to increase feature clusterability benefits few-shot learning algorithms as well.

الرؤية الحاسوبية وتمييز الأنماط

Visual Data Synthesis via GAN for Zero-Shot Video Classification

142 - Chenrui Zhang , Yuxin Peng 2018

Zero-Shot Learning (ZSL) in video classification is a promising research direction, which aims to tackle the challenge from explosive growth of video categories. Most existing methods exploit seen-to-unseen correlation via learning a projection betwe en visual and semantic spaces. However, such projection-based paradigms cannot fully utilize the discriminative information implied in data distribution, and commonly suffer from the information degradation issue caused by heterogeneity gap. In this paper, we propose a visual data synthesis framework via GAN to address these problems. Specifically, both semantic knowledge and visual distribution are leveraged to synthesize video feature of unseen categories, and ZSL can be turned into typical supervised problem with the synthetic features. First, we propose multi-level semantic inference to boost video feature synthesis, which captures the discriminative information implied in joint visual-semantic distribution via feature-level and label-level semantic inference. Second, we propose Matching-aware Mutual Information Correlation to overcome information degradation issue, which captures seen-to-unseen correlation in matched and mismatched visual-semantic pairs by mutual information, providing the zero-shot synthesis procedure with robust guidance signals. Experimental results on four video datasets demonstrate that our approach can improve the zero-shot video classification performance significantly.

الرؤية الحاسوبية وتمييز الأنماط