Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Utkarsh Ojha

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Utkarsh Ojha - Krishna Kumar Singh - Yong Jae Lee

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e.g., dogs and cars). The goal is to learn a generative model that learns an intermediate distribution, which borrows a subset of properties from each domain, enabling the generation of images that did not exist in any domain exclusively. This challenging problem requires an accurate disentanglement of object shape, appearance, and background from each domain, so that the appearance and shape factors from the two domains can be interchanged. We augment an existing approach that can disentangle factors within a single domain but struggles to do so across domains. Our key technical contribution is to represent object appearance with a differentiable histogram of visual features, and to optimize the generator so that two images with the same latent appearance factor but different latent shape factors produce similar histograms. On multiple multi-domain datasets, we demonstrate our method leads to accurate and consistent appearance and shape transfer across domains.

قيم البحث

362 - Lior Yariv , Yoni Kasten , Dror Moran 2020

In this work we address the challenging problem of multiview 3D surface reconstruction. We introduce a neural network architecture that simultaneously learns the unknown geometry, camera parameters, and a neural renderer that approximates the light r eflected from the surface towards the camera. The geometry is represented as a zero level-set of a neural network, while the neural renderer, derived from the rendering equation, is capable of (implicitly) modeling a wide set of lighting conditions and materials. We trained our network on real world 2D images of objects with different material properties, lighting conditions, and noisy camera initializations from the DTU MVS dataset. We found our model to produce state of the art 3D surface reconstructions with high fidelity, resolution and detail.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي التعلم الآلي

Generating Person Images with Appearance-aware Pose Stylizer

107 - Siyu Huang , Haoyi Xiong , Zhi-Qi Cheng 2020

Generation of high-quality person images is challenging, due to the sophisticated entanglements among image factors, e.g., appearance, pose, foreground, background, local details, global structures, etc. In this paper, we present a novel end-to-end f ramework to generate realistic person images based on given person poses and appearances. The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively. The framework is highly flexible and controllable by effectively decoupling various complex person image factors in the encoding phase, followed by re-coupling them in the decoding phase. In addition, we present a new normalization method named adaptive patch normalization, which enables region-specific normalization and shows a good performance when adopted in person image generation model. Experiments on two benchmark datasets show that our method is capable of generating visually appealing and realistic-looking results using arbitrary image and pose inputs.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي التعلم الآلي

Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance

123 - Zhixin Shu , Mihir Sahasrabudhe , Alp Guler 2018

In this work we introduce Deforming Autoencoders, a generative model for images that disentangles shape from appearance in an unsupervised manner. As in the deformable template paradigm, shape is represented as a deformation between a canonical coord inate system (`template) and an observed image, while appearance is modeled in `canonical, template, coordinates, thus discarding variability due to deformations. We introduce novel techniques that allow this approach to be deployed in the setting of autoencoders and show that this method can be used for unsupervised group-wise image alignment. We show experiments with expression morphing in humans, hands, and digits, face manipulation, such as shape and appearance interpolation, as well as unsupervised landmark localization. A more powerful form of unsupervised disentangling becomes possible in template coordinates, allowing us to successfully decompose face images into shading and albedo, and further manipulate face images.

الرؤية الحاسوبية وتمييز الأنماط

Unsupervised Part Segmentation through Disentangling Appearance and Shape

90 - Shilong Liu , Lei Zhang , Xiao Yang 2021

We study the problem of unsupervised discovery and segmentation of object parts, which, as an intermediate local representation, are capable of finding intrinsic object structure and providing more explainable recognition results. Recent unsupervised methods have greatly relaxed the dependency on annotated data which are costly to obtain, but still rely on additional information such as object segmentation mask or saliency map. To remove such a dependency and further improve the part segmentation performance, we develop a novel approach by disentangling the appearance and shape representations of object parts followed with reconstruction losses without using additional object mask information. To avoid degenerated solutions, a bottleneck block is designed to squeeze and expand the appearance representation, leading to a more effective disentanglement between geometry and appearance. Combined with a self-supervised part classification loss and an improved geometry concentration constraint, we can segment more consistent parts with semantic meanings. Comprehensive experiments on a wide variety of objects such as face, bird, and PASCAL VOC objects demonstrate the effectiveness of the proposed method.

الرؤية الحاسوبية وتمييز الأنماط

Appearance-free Tripartite Matching for Multiple Object Tracking

133 - Lijun Wang , Yanting Zhu , Jue Shi 2020

Multiple Object Tracking (MOT) detects the trajectories of multiple objects given an input video, and it has become more and more popular in various research and industry areas, such as cell tracking for biomedical research and human tracking in vide o surveillance. We target at the general MOT problem regardless of the object appearance. The appearance-free tripartite matching is proposed to avoid the irregular velocity problem of traditional bipartite matching. The tripartite matching is formulated as maximizing the likelihood of the state vectors constituted of the position and velocity of objects, and a dynamic programming algorithm is employed to solve such maximum likelihood estimate (MLE). To overcome the high computational cost induced by the vast search space of dynamic programming, we decompose the space by the number of disappearing objects and propose a reduced-space approach by truncating the decomposition. Extensive simulations have shown the superiority and efficiency of our proposed method. We also applied our method to track the motion of natural killer cells around tumor cells in a cancer research.

الرؤية الحاسوبية وتمييز الأنماط تطبيقات الإحصاء