Improved Generalization of Heading Direction Estimation for Aerial Filming Using Semi-supervised Regression

71 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Rogerio Bonatti

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Wenshan Wang - Aayush Ahuja - Yanfu Zhang

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In the task of Autonomous aerial filming of a moving actor (e.g. a person or a vehicle), it is crucial to have a good heading direction estimation for the actor from the visual input. However, the models obtained in other similar tasks, such as pedestrian collision risk analysis and human-robot interaction, are very difficult to generalize to the aerial filming task, because of the difference in data distributions. Towards improving generalization with less amount of labeled data, this paper presents a semi-supervised algorithm for heading direction estimation problem. We utilize temporal continuity as the unsupervised signal to regularize the model and achieve better generalization ability. This semi-supervised algorithm is applied to both training and testing phases, which increases the testing performance by a large margin. We show that by leveraging unlabeled sequences, the amount of labeled data required can be significantly reduced. We also discuss several important details on improving the performance by balancing labeled and unlabeled loss, and making good combinations. Experimental results show that our approach robustly outputs the heading direction for different types of actor. The aesthetic value of the video is also improved in the aerial filming task.

قيم البحث

116 - Kaiyang Zhou , Chen Change Loy , Ziwei Liu 2021

Most existing research on domain generalization assumes source data gathered from multiple domains are fully annotated. However, in real-world applications, we might have only a few labels available from each source domain due to high annotation cost , along with abundant unlabeled data that are much easier to obtain. In this work, we investigate semi-supervised domain generalization (SSDG), a more realistic and practical setting. Our proposed approach, StyleMatch, is inspired by FixMatch, a state-of-the-art semi-supervised learning method based on pseudo-labeling, with several new ingredients tailored to solve SSDG. Specifically, 1) to mitigate overfitting in the scarce labeled source data while improving robustness against noisy pseudo labels, we introduce stochastic modeling to the classifiers weights, seen as class prototypes, with Gaussian distributions. 2) To enhance generalization under domain shift, we upgrade FixMatchs two-view consistency learning paradigm based on weak and strong augmentations to a multi-view version with style augmentation as the third complementary view. To provide a comprehensive study and evaluation, we establish two SSDG benchmarks, which cover a wide range of strong baseline methods developed in relevant areas including domain generalization and semi-supervised learning. Extensive experiments demonstrate that StyleMatch achieves the best out-of-distribution generalization performance in the low-data regime. We hope our approach and benchmarks can pave the way for future research on data-efficient and generalizable learning systems.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization

77 - Daiqing Li , Junlin Yang , Karsten Kreis 2021

Training deep networks with limited labeled data while achieving a strong generalization ability is key in the quest to reduce human annotation efforts. This is the goal of semi-supervised learning, which exploits more widely available unlabeled data to complement small labeled data sets. In this paper, we propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. Concretely, we learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images supplemented with only few labeled ones. We build our architecture on top of StyleGAN2, augmented with a label synthesis branch. Image labeling at test time is achieved by first embedding the target image into the joint latent space via an encoder network and test-time optimization, and then generating the label from the inferred embedding. We evaluate our approach in two important domains: medical image segmentation and part-based face segmentation. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization, such as transferring from CT to MRI in medical imaging, and photographs of real faces to paintings, sculptures, and even cartoons and animal faces. Project Page: url{https://nv-tlabs.github.io/semanticGAN/}

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Scalable Semi-supervised Landmark Localization for X-ray Images using Few-shot Deep Adaptive Graph

140 - Xiao-Yun Zhou , Bolin Lai , Weijian Li 2021

Landmark localization plays an important role in medical image analysis. Learning based methods, including CNN and GCN, have demonstrated the state-of-the-art performance. However, most of these methods are fully-supervised and heavily rely on manual labeling of a large training dataset. In this paper, based on a fully-supervised graph-based method, DAG, we proposed a semi-supervised extension of it, termed few-shot DAG, ie five-shot DAG. It first trains a DAG model on the labeled data and then fine-tunes the pre-trained model on the unlabeled data with a teacher-student SSL mechanism. In addition to the semi-supervised loss, we propose another loss using JS divergence to regulate the consistency of the intermediate feature maps. We extensively evaluated our method on pelvis, hand and chest landmark detection tasks. Our experiment results demonstrate consistent and significant improvements over previous methods.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Multiview Pseudo-Labeling for Semi-supervised Learning from Video

118 - Bo Xiong , Haoqi Fan , Kristen Grauman 2021

We present a multiview pseudo-labeling approach to video learning, a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video. The complementary views help obtain more reliab le pseudo-labels on unlabeled video, to learn stronger video representations than from purely supervised data. Though our method capitalizes on multiple views, it nonetheless trains a model that is shared across appearance and motion input and thus, by design, incurs no additional computation overhead at inference time. On multiple video recognition datasets, our method substantially outperforms its supervised counterpart, and compares favorably to previous work on standard benchmarks in self-supervised video representation learning.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

A Supervised Learning Approach For Heading Detection

238 - Sahib Singh Budhiraja , Vijay Mago 2018

As the Portable Document Format (PDF) file format increases in popularity, research in analysing its structure for text extraction and analysis is necessary. Detecting headings can be a crucial component of classifying and extracting meaningful data. This research involves training a supervised learning model to detect headings with features carefully selected through recursive feature elimination. The best performing classifier had an accuracy of 96.95%, sensitivity of 0.986 and a specificity of 0.953. This research into heading detection contributes to the field of PDF based text extraction and can be applied to the automation of large scale PDF text analysis in a variety of professional and policy based contexts.

استرجاع المعلومات الحساب واللغة التعلم الآلي