ترغب بنشر مسار تعليمي؟ اضغط هنا

Unsupervised Learning of Object Landmarks through Conditional Image Generation

96   0   0.0 ( 0 )
 نشر من قبل Tomas Jakab
 تاريخ النشر 2018
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision. We cast this as the problem of generating images that combine the appearance of the object as seen in a first example image with the geometry of the object as seen in a second example image, where the two examples differ by a viewpoint change and/or an object deformation. In order to factorize appearance and geometry, we introduce a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features. Compared to standard image generation problems, which often use generative adversarial networks, our generation task is conditioned on both appearance and geometry and thus is significantly less ambiguous, to the point that adopting a simple perceptual loss formulation is sufficient. We demonstrate that our approach can learn object landmarks from synthetic image deformations or videos, all without manual supervision, while outperforming state-of-the-art unsupervised landmark detectors. We further show that our method is applicable to a large variety of datasets - faces, people, 3D objects, and digits - without any modifications.



قيم البحث

اقرأ أيضاً

We propose an unsupervised multi-conditional image generation pipeline: cFineGAN, that can generate an image conditioned on two input images such that the generated image preserves the texture of one and the shape of the other input. To achieve this goal, we extend upon the recently proposed work of FineGAN citep{singh2018finegan} and make use of standard as well as shape-biased pre-trained ImageNet models. We demonstrate both qualitatively as well as quantitatively the benefit of using the shape-biased network. We present our image generation result across three benchmark datasets- CUB-200-2011, Stanford Dogs and UT Zappos50k.
Deep neural networks can model images with rich latent representations, but they cannot naturally conceptualize structures of object categories in a human-perceptible way. This paper addresses the problem of learning object structures in an image mod eling process without supervision. We propose an autoencoding formulation to discover landmarks as explicit structural representations. The encoding module outputs landmark coordinates, whose validity is ensured by constraints that reflect the necessary properties for landmarks. The decoding module takes the landmarks as a part of the learnable input representations in an end-to-end differentiable framework. Our discovered landmarks are semantically meaningful and more predictive of manually annotated landmarks than those discovered by previous methods. The coordinates of our landmarks are also complementary features to pretrained deep-neural-network representations in recognizing visual attributes. In addition, the proposed method naturally creates an unsupervised, perceptible interface to manipulate object shapes and decode images with controllable structures. The project webpage is at http://ytzhang.net/projects/lmdis-rep
145 - Minguk Kang , Jaesik Park 2020
Conditional image generation is the task of generating diverse images using class label information. Although many conditional Generative Adversarial Networks (GAN) have shown realistic results, such methods consider pairwise relations between the em bedding of an image and the embedding of the corresponding label (data-to-class relations) as the conditioning losses. In this paper, we propose ContraGAN that considers relations between multiple image embeddings in the same batch (data-to-data relations) as well as the data-to-class relations by using a conditional contrastive loss. The discriminator of ContraGAN discriminates the authenticity of given samples and minimizes a contrastive objective to learn the relations between training images. Simultaneously, the generator tries to generate realistic images that deceive the authenticity and have a low contrastive loss. The experimental results show that ContraGAN outperforms state-of-the-art-models by 7.3% and 7.7% on Tiny ImageNet and ImageNet datasets, respectively. Besides, we experimentally demonstrate that contrastive learning helps to relieve the overfitting of the discriminator. For a fair comparison, we re-implement twelve state-of-the-art GANs using the PyTorch library. The software package is available at https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.
68 - Wanyi Li , Fuyu Li , Yongkang Luo 2020
Object detection in thermal images is an important computer vision task and has many applications such as unmanned vehicles, robotics, surveillance and night vision. Deep learning based detectors have achieved major progress, which usually need large amount of labelled training data. However, labelled data for object detection in thermal images is scarce and expensive to collect. How to take advantage of the large number labelled visible images and adapt them into thermal image domain, is expected to solve. This paper proposes an unsupervised image-generation enhanced adaptation method for object detection in thermal images. To reduce the gap between visible domain and thermal domain, the proposed method manages to generate simulated fake thermal images that are similar to the target images, and preserves the annotation information of the visible source domain. The image generation includes a CycleGAN based image-to-image translation and an intensity inversion transformation. Generated fake thermal images are used as renewed source domain. And then the off-the-shelf Domain Adaptive Faster RCNN is utilized to reduce the gap between generated intermediate domain and the thermal target domain. Experiments demonstrate the effectiveness and superiority of the proposed method.
We present a novel unsupervised learning approach to image landmark discovery by incorporating the inter-subject landmark consistencies on facial images. This is achieved via an inter-subject mapping module that transforms original subject landmarks based on an auxiliary subject-related structure. To recover from the transformed images back to the original subject, the landmark detector is forced to learn spatial locations that contain the consistent semantic meanings both for the paired intra-subject images and between the paired inter-subject images. Our proposed method is extensively evaluated on two public facial image datasets (MAFL, AFLW) with various settings. Experimental results indicate that our method can extract the consistent landmarks for both datasets and achieve better performances compared to the previous state-of-the-art methods quantitatively and qualitatively.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا