PolyGen: An Autoregressive Generative Model of 3D Meshes

86 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Charlie Nash

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Charlie Nash - Yaroslav Ganin - S. M. Ali Eslami

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Polygon meshes are an efficient representation of 3D geometry, and are of central importance in computer graphics, robotics and games development. Existing learning-based approaches have avoided the challenges of working with 3D meshes, instead using alternative object representations that are more compatible with neural architectures and training approaches. We present an approach which models the mesh directly, predicting mesh vertices and faces sequentially using a Transformer-based architecture. Our model can condition on a range of inputs, including object classes, voxels, and images, and because the model is probabilistic it can produce samples that capture uncertainty in ambiguous scenarios. We show that the model is capable of producing high-quality, usable meshes, and establish log-likelihood benchmarks for the mesh-modelling task. We also evaluate the conditional models on surface reconstruction metrics against alternative methods, and demonstrate competitive performance despite not training directly on this task.

قيم البحث

392 - Rana Hanocka , Gal Metzer , Raja Giryes 2020

In this paper, we introduce Point2Mesh, a technique for reconstructing a surface mesh from an input point cloud. Instead of explicitly specifying a prior that encodes the expected shape properties, the prior is defined automatically using the input p oint cloud, which we refer to as a self-prior. The self-prior encapsulates reoccurring geometric repetitions from a single shape within the weights of a deep neural network. We optimize the network weights to deform an initial mesh to shrink-wrap a single input point cloud. This explicitly considers the entire reconstructed shape, since shared local kernels are calculated to fit the overall object. The convolutional kernels are optimized globally across the entire shape, which inherently encourages local-scale geometric self-similarity across the shape surface. We show that shrink-wrapping a point cloud with a self-prior converges to a desirable solution; compared to a prescribed smoothness prior, which often becomes trapped in undesirable local minima. While the performance of traditional reconstruction approaches degrades in non-ideal conditions that are often present in real world scanning, i.e., unoriented normals, noise and missing (low density) parts, Point2Mesh is robust to non-ideal conditions. We demonstrate the performance of Point2Mesh on a large variety of shapes with varying complexity.

الرسم الحاسوبي الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices

127 - Cho-Ying Wu , Ke Xu , Chin-Cheng Hsu 2021

This work focuses on the analysis that whether 3D face models can be learned from only the speech inputs of speakers. Previous works for cross-modal face synthesis study image generation from voices. However, image synthesis includes variations such as hairstyles, backgrounds, and facial textures, that are arguably irrelevant to voice or without direct studies to show correlations. We instead investigate the ability to reconstruct 3D faces to concentrate on only geometry, which is more physiologically grounded. We propose both the supervised learning and unsupervised learning frameworks. Especially we demonstrate how unsupervised learning is possible in the absence of a direct voice-to-3D-face dataset under limited availability of 3D face scans when the model is equipped with knowledge distillation. To evaluate the performance, we also propose several metrics to measure the geometric fitness of two 3D faces based on points, lines, and regions. We find that 3D face shapes can be reconstructed from voices. Experimental results suggest that 3D faces can be reconstructed from voices, and our method can improve the performance over the baseline. The best performance gains (15% - 20%) on ear-to-ear distance ratio metric (ER) coincides with the intuition that one can roughly envision whether a speakers face is overall wider or thinner only from a persons voice. See our project page for codes and data.

الرسم الحاسوبي الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Gaussian Curvature Filter on 3D Meshes

70 - Wenming Tang , Yuanhao Gong , Kanglin Liu 2020

Minimizing the Gaussian curvature of meshes can play a fundamental role in 3D mesh processing. However, there is a lack of computationally efficient and robust Gaussian curvature optimization method. In this paper, we present a simple yet effective m ethod that can efficiently reduce Gaussian curvature for 3D meshes. We first present the mathematical foundation of our method. Then, we introduce a simple and robust implicit Gaussian curvature optimization method named Gaussian Curvature Filter (GCF). GCF implicitly minimizes Gaussian curvature without the need to explicitly calculate the Gaussian curvature itself. GCF is highly efficient and this method can be used in a large range of applications that involve Gaussian curvature. We conduct extensive experiments to demonstrate that GCF significantly outperforms state-of-the-art methods in minimizing Gaussian curvature, and geometric feature preserving soothing on 3D meshes. GCF program is available at https://github.com/tangwenming/GCF-filter.

الرسم الحاسوبي

Generative Modelling of BRDF Textures from Flash Images

255 - Philipp Henzler , Valentin Deschaintre , Niloy J. Mitra 2021

We learn a latent space for easy capture, consistent interpolation, and efficient reproduction of visual material appearance. When users provide a photo of a stationary natural material captured under flashlight illumination, first it is converted in to a latent material code. Then, in the second step, conditioned on the material code, our method produces an infinite and diverse spatial field of BRDF model parameters (diffuse albedo, normals, roughness, specular albedo) that subsequently allows rendering in complex scenes and illuminations, matching the appearance of the input photograph. Technically, we jointly embed all flash images into a latent space using a convolutional encoder, and -- conditioned on these latent codes -- convert random spatial fields into fields of BRDF parameters using a convolutional neural network (CNN). We condition these BRDF parameters to match the visual characteristics (statistics and spectra of visual features) of the input under matching light. A user study compares our approach favorably to previous work, even those with access to BRDF supervision.

الرسم الحاسوبي الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Learning Generative Models of Textured 3D Meshes from Real-World Images

424 - Dario Pavllo , Jonas Kohler , Thomas Hofmann 2021

Recent advances in differentiable rendering have sparked an interest in learning generative models of textured 3D meshes from image collections. These models natively disentangle pose and appearance, enable downstream applications in computer graphic s, and improve the ability of generative models to understand the concept of image formation. Although there has been prior work on learning such models from collections of 2D images, these approaches require a delicate pose estimation step that exploits annotated keypoints, thereby restricting their applicability to a few specific datasets. In this work, we propose a GAN framework for generating textured triangle meshes without relying on such annotations. We show that the performance of our approach is on par with prior work that relies on ground-truth keypoints, and more importantly, we demonstrate the generality of our method by setting new baselines on a larger set of categories from ImageNet - for which keypoints are not available - without any class-specific hyperparameter tuning. We release our code at https://github.com/dariopavllo/textured-3d-gan

الرؤية الحاسوبية وتمييز الأنماط الرسم الحاسوبي التعلم الآلي