MetaPix: Few-Shot Video Retargeting

69 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jessica Lee

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jessica Lee - Deva Ramanan - Rohit Girdhar

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We address the task of unsupervised retargeting of human actions from one video to another. We consider the challenging setting where only a few frames of the target is available. The core of our approach is a conditional generative model that can transcode input skeletal poses (automatically extracted with an off-the-shelf pose estimator) to output target frames. However, it is challenging to build a universal transcoder because humans can appear wildly different due to clothing and background scene geometry. Instead, we learn to adapt - or personalize - a universal generator to the particular human and background in the target. To do so, we make use of meta-learning to discover effective strategies for on-the-fly personalization. One significant benefit of meta-learning is that the personalized transcoder naturally enforces temporal coherence across its generated frames; all frames contain consistent clothing and background geometry of the target. We experiment on in-the-wild internet videos and images and show our approach improves over widely-used baselines for the task.

قيم البحث

اقرأ أيضاً

Few-Shot Video Object Detection

117 - Qi Fan , Chi-Keung Tang , Yu-Wing Tai 2021

We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Ne twork (TPN) to generate high-quality video tube proposals to aggregate feature representation for the target video object; 3) a strategically improved Temporal Matching Network (TMN+) to match representative query tube features and supports with better discriminative ability. Our TPN and TMN+ are jointly and end-to-end trained. Extensive experiments demonstrate that our method produces significantly better detection results on two few-shot video object detection datasets compared to image-based methods and other naive video-based extensions. Codes and datasets will be released at https://github.com/fanq15/FewX.

الرؤية الحاسوبية وتمييز الأنماط

Generalized Many-Way Few-Shot Video Classification

127 - Yongqin Xian , Bruno Korbar , Matthijs Douze 2020

Few-shot learning methods operate in low data regimes. The aim is to learn with few training examples per class. Although significant progress has been made in few-shot image classification, few-shot video recognition is relatively unexplored and met hods based on 2D CNNs are unable to learn temporal information. In this work we thus develop a simple 3D CNN baseline, surpassing existing methods by a large margin. To circumvent the need of labeled examples, we propose to leverage weakly-labeled videos from a large dataset using tag retrieval followed by selecting the best clips with visual similarities, yielding further improvement. Our results saturate current 5-way benchmarks for few-shot video classification and therefore we propose a new challenging benchmark involving more classes and a mixture of classes with varying supervision.

الرؤية الحاسوبية وتمييز الأنماط

Temporal Alignment Prediction for Few-Shot Video Classification

100 - Fei Pan , Chunlei Xu , Jie Guo 2021

The goal of few-shot video classification is to learn a classification model with good generalization ability when trained with only a few labeled videos. However, it is difficult to learn discriminative feature representations for videos in such a s etting. In this paper, we propose Temporal Alignment Prediction (TAP) based on sequence similarity learning for few-shot video classification. In order to obtain the similarity of a pair of videos, we predict the alignment scores between all pairs of temporal positions in the two videos with the temporal alignment prediction function. Besides, the inputs to this function are also equipped with the context information in the temporal domain. We evaluate TAP on two video classification benchmarks including Kinetics and Something-Something V2. The experimental results verify the effectiveness of TAP and show its superiority over state-of-the-art methods.

الرؤية الحاسوبية وتمييز الأنماط

Learning Implicit Temporal Alignment for Few-shot Video Classification

122 - Songyang Zhang , Jiale Zhou , Xuming He 2021

Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications. However, it is particularly challenging to learn a class-invariant spatial-temp oral representation in such a setting. To address this, we propose a novel matching-based few-shot learning strategy for video sequences in this work. Our main idea is to introduce an implicit temporal alignment for a video pair, capable of estimating the similarity between them in an accurate and robust manner. Moreover, we design an effective context encoding module to incorporate spatial and feature channel context, resulting in better modeling of intra-class variations. To train our model, we develop a multi-task loss for learning video matching, leading to video features with better generalization. Extensive experimental results on two challenging benchmarks, show that our method outperforms the prior arts with a sizable margin on SomethingSomething-V2 and competitive results on Kinetics.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Few-Shot Class-Incremental Learning

153 - Xiaoyu Tao , Xiaopeng Hong , Xinyuan Chang 2020

The ability to incrementally learn new classes is crucial to the development of real-world artificial intelligence systems. In this paper, we focus on a challenging but practical few-shot class-incremental learning (FSCIL) problem. FSCIL requires CNN models to incrementally learn new classes from very few labelled samples, without forgetting the previously learned ones. To address this problem, we represent the knowledge using a neural gas (NG) network, which can learn and preserve the topology of the feature manifold formed by different classes. On this basis, we propose the TOpology-Preserving knowledge InCrementer (TOPIC) framework. TOPIC mitigates the forgetting of the old classes by stabilizing NGs topology and improves the representation learning for few-shot new classes by growing and adapting NG to new training samples. Comprehensive experimental results demonstrate that our proposed method significantly outperforms other state-of-the-art class-incremental learning methods on CIFAR100, miniImageNet, and CUB200 datasets.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي التعلم الالي