PaMIR: Parametric Model-Conditioned Implicit Representation for Image-based Human Reconstruction

126 0 0.0 ( 0 )

Download Cite

Added by Zerong Zheng

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Zerong Zheng - Tao Yu - Yebin Liu

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Modeling 3D humans accurately and robustly from a single image is very challenging, and the key for such an ill-posed problem is the 3D representation of the human models. To overcome the limitations of regular 3D representations, we propose Parametric Model-Conditioned Implicit Representation (PaMIR), which combines the parametric body model with the free-form deep implicit function. In our PaMIR-based reconstruction framework, a novel deep neural network is proposed to regularize the free-form deep implicit function using the semantic features of the parametric model, which improves the generalization ability under the scenarios of challenging poses and various clothing topologies. Moreover, a novel depth-ambiguity-aware training loss is further integrated to resolve depth ambiguities and enable successful surface detail reconstruction with imperfect body reference. Finally, we propose a body reference optimization method to improve the parametric model estimation accuracy and to enhance the consistency between the parametric model and the implicit function. With the PaMIR representation, our framework can be easily extended to multi-image input scenarios without the need of multi-camera calibration and pose synchronization. Experimental results demonstrate that our method achieves state-of-the-art performance for image-based 3D human reconstruction in the cases of challenging poses and clothing types.

rate research

Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction

90 - Bharat Lal Bhatnagar , Cristian Sminchisescu , Christian Theobalt 2020

Implicit functions represented as deep learning approximations are powerful for reconstructing 3D surfaces. However, they can only produce static surfaces that are not controllable, which provides limited ability to modify the resulting model by editing its pose or shape parameters. Nevertheless, such features are essential in building flexible models for both computer graphics and computer vision. In this work, we present methodology that combines detail-rich implicit functions and parametric representations in order to reconstruct 3D models of people that remain controllable and accurate even in the presence of clothing. Given sparse 3D point clouds sampled on the surface of a dressed person, we use an Implicit Part Network (IP-Net)to jointly predict the outer 3D surface of the dressed person, the and inner body surface, and the semantic correspondences to a parametric body model. We subsequently use correspondences to fit the body model to our inner surface and then non-rigidly deform it (under a parametric body + displacement model) to the outer surface in order to capture garment, face and hair detail. In quantitative and qualitative experiments with both full body data and hand scans we show that the proposed methodology generalizes, and is effective even given incomplete point clouds collected from single-view depth images. Our models and code can be downloaded from http://virtualhumans.mpi-inf.mpg.de/ipnet.

Computer Vision and Pattern Recognition

Learning Continuous Image Representation with Local Implicit Image Function

282 - Yinbo Chen , Sifei Liu , Xiaolong Wang 2020

How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit neural representation, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in arbitrary resolution. To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with super-resolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to x30 higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths.

Computer Vision and Pattern Recognition Machine Learning

Implicit Mesh Reconstruction from Unannotated Image Collections

79 - Shubham Tulsiani , Nilesh Kulkarni , Abhinav Gupta 2020

We present an approach to infer the 3D shape, texture, and camera pose for an object from a single RGB image, using only category-level image collections with foreground masks as supervision. We represent the shape as an image-conditioned implicit function that transforms the surface of a sphere to that of the predicted mesh, while additionally predicting the corresponding texture. To derive supervisory signal for learning, we enforce that: a) our predictions when rendered should explain the available image evidence, and b) the inferred 3D structure should be geometrically consistent with learned pixel to surface mappings. We empirically show that our approach improves over prior work that leverages similar supervision, and in fact performs competitively to methods that use stronger supervision. Finally, as our method enables learning with limited supervision, we qualitatively demonstrate its applicability over a set of about 30 object categories.

Computer Vision and Pattern Recognition

Group-based Sparse Representation for Image Compressive Sensing Reconstruction with Non-Convex Regularization

229 - Zhiyuan Zha , Xinggan Zhang , Qiong Wang 2017

Patch-based sparse representation modeling has shown great potential in image compressive sensing (CS) reconstruction. However, this model usually suffers from some limits, such as dictionary learning with great computational complexity, neglecting the relationship among similar patches. In this paper, a group-based sparse representation method with non-convex regularization (GSR-NCR) for image CS reconstruction is proposed. In GSR-NCR, the local sparsity and nonlocal self-similarity of images is simultaneously considered in a unified framework. Different from the previous methods based on sparsity-promoting convex regularization, we extend the non-convex weighted Lp (0 < p < 1) penalty function on group sparse coefficients of the data matrix, rather than conventional L1-based regularization. To reduce the computational complexity, instead of learning the dictionary with a high computational complexity from natural images, we learn the principle component analysis (PCA) based dictionary for each group. Moreover, to make the proposed scheme tractable and robust, we have developed an efficient iterative shrinkage/thresholding algorithm to solve the non-convex optimization problem. Experimental results demonstrate that the proposed method outperforms many state-of-the-art techniques for image CS reconstruction.

Computer Vision and Pattern Recognition

Multi-person Implicit Reconstruction from a Single Image

70 - Armin Mustafa , Akin Caliskan , Lourdes Agapito 2021

We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image. Existing multi-person methods suffer from two main drawbacks: they are often model-based and therefore cannot capture accurate 3D models of people with loose clothing and hair; or they require manual intervention to resolve occlusions or interactions. Our method addresses both limitations by introducing the first end-to-end learning approach to perform model-free implicit reconstruction for realistic 3D capture of multiple clothed people in arbitrary poses (with occlusions) from a single image. Our network simultaneously estimates the 3D geometry of each person and their 6DOF spatial locations, to obtain a coherent multi-human reconstruction. In addition, we introduce a new synthetic dataset that depicts images with a varying number of inter-occluded humans and a variety of clothing and hair styles. We demonstrate robust, high-resolution reconstructions on images of multiple humans with complex occlusions, loose clothing and a large variety of poses and scenes. Our quantitative evaluation on both synthetic and real-world datasets demonstrates state-of-the-art performance with significant improvements in the accuracy and completeness of the reconstructions over competing approaches.

Computer Vision and Pattern Recognition