Disentangled Human Body Embedding Based on Deep Hierarchical Neural Network

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Boyi Jiang

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Boyi Jiang - Juyong Zhang - Jianfei Cai

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Human bodies exhibit various shapes for different identities or poses, but the body shape has certain similarities in structure and thus can be embedded in a low-dimensional space. This paper presents an autoencoder-like network architecture to learn disentangled shape and pose embedding specifically for the 3D human body. This is inspired by recent progress of deformation-based latent representation learning. To improve the reconstruction accuracy, we propose a hierarchical reconstruction pipeline for the disentangling process and construct a large dataset of human body models with consistent connectivity for the learning of the neural network. Our learned embedding can not only achieve superior reconstruction accuracy but also provide great flexibility in 3D human body generation via interpolation, bilinear interpolation, and latent space sampling. The results from extensive experiments demonstrate the powerfulness of our learned 3D human body embedding in various applications.

قيم البحث

190 - Ce Zheng , Wenhan Wu , Taojiannan Yang 2020

Human pose estimation aims to locate the human body parts and build human body representation (e.g., body skeleton) from input data such as images and videos. It has drawn increasing attention during the past decade and has been utilized in a wide ra nge of applications including human-computer interaction, motion analysis, augmented reality, and virtual reality. Although the recently developed deep learning-based solutions have achieved high performance in human pose estimation, there still remain challenges due to insufficient training data, depth ambiguities, and occlusion. The goal of this survey paper is to provide a comprehensive review of recent deep learning-based solutions for both 2D and 3D pose estimation via a systematic analysis and comparison of these solutions based on their input data and inference procedures. More than 240 research papers since 2014 are covered in this survey. Furthermore, 2D and 3D human pose estimation datasets and evaluation metrics are included. Quantitative performance comparisons of the reviewed methods on popular datasets are summarized and discussed. Finally, the challenges involved, applications, and future research directions are concluded. We also provide a regularly updated project page: url{https://github.com/zczcwh/DL-HPE}

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي الوسائط المتعددة

Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control

149 - Lingjie Liu , Marc Habermann , Viktor Rudnev 2021

We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is built upon recent neural scene representation and rendering works which learn representati ons of geometry and appearance from only 2D images. While existing works demonstrated compelling rendering of static scenes and playback of dynamic scenes, photo-realistic reconstruction and rendering of humans with neural implicit methods, in particular under user-controlled novel poses, is still difficult. To address this problem, we utilize a coarse body model as the proxy to unwarp the surrounding 3D space into a canonical pose. A neural radiance field learns pose-dependent geometric deformations and pose- and view-dependent appearance effects in the canonical space from multi-view video input. To synthesize novel views of high fidelity dynamic geometry and appearance, we leverage 2D texture maps defined on the body model as latent variables for predicting residual deformations and the dynamic appearance. Experiments demonstrate that our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses. Furthermore, our method also supports body shape control of the synthesized results.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي التعلم الآلي

High-Fidelity Neural Human Motion Transfer from Monocular Video

387 - Moritz Kappel , Vladislav Golyanik , Mohamed Elgharib 2020

Video-based human motion transfer creates video animations of humans following a source motion. Current methods show remarkable results for tightly-clad subjects. However, the lack of temporally consistent handling of plausible clothing dynamics, inc luding fine and high-frequency details, significantly limits the attainable visual quality. We address these limitations for the first time in the literature and present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations, for several types of loose garments. In contrast to the previous techniques, we perform image generation in three subsequent stages, synthesizing human shape, structure, and appearance. Given a monocular RGB video of an actor, we train a stack of recurrent deep neural networks that generate these intermediate representations from 2D poses and their temporal derivatives. Splitting the difficult motion transfer problem into subtasks that are aware of the temporal motion context helps us to synthesize results with plausible dynamics and pose-dependent detail. It also allows artistic control of results by manipulation of individual framework stages. In the experimental results, we significantly outperform the state-of-the-art in terms of video realism. Our code and data will be made publicly available.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي التعلم الآلي

i3DMM: Deep Implicit 3D Morphable Model of Human Heads

303 - Tarun Yenamandra , Ayush Tewari , Florian Bernard 2020

We present the first deep implicit 3D morphable model (i3DMM) of full heads. Unlike earlier morphable face models it not only captures identity-specific geometry, texture, and expressions of the frontal face, but also models the entire head, includin g hair. We collect a new dataset consisting of 64 people with different expressions and hairstyles to train i3DMM. Our approach has the following favorable properties: (i) It is the first full head morphable model that includes hair. (ii) In contrast to mesh-based models it can be trained on merely rigidly aligned scans, without requiring difficult non-rigid registration. (iii) We design a novel architecture to decouple the shape model into an implicit reference shape and a deformation of this reference shape. With that, dense correspondences between shapes can be learned implicitly. (iv) This architecture allows us to semantically disentangle the geometry and color components, as color is learned in the reference space. Geometry is further disentangled as identity, expressions, and hairstyle, while color is disentangled as identity and hairstyle components. We show the merits of i3DMM using ablation studies, comparisons to state-of-the-art models, and applications such as semantic head editing and texture transfer. We will make our model publicly available.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي التعلم الآلي

Single-image Full-body Human Relighting

144 - Manuel Lagunas , Xin Sun , Jimei Yang 2021

We present a single-image data-driven method to automatically relight images with full-body humans in them. Our framework is based on a realistic scene decomposition leveraging precomputed radiance transfer (PRT) and spherical harmonics (SH) lighting . In contrast to previous work, we lift the assumptions on Lambertian materials and explicitly model diffuse and specular reflectance in our data. Moreover, we introduce an additional light-dependent residual term that accounts for errors in the PRT-based image reconstruction. We propose a new deep learning architecture, tailored to the decomposition performed in PRT, that is trained using a combination of L1, logarithmic, and rendering losses. Our model outperforms the state of the art for full-body human relighting both with synthetic images and photographs.

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي