ترغب بنشر مسار تعليمي؟ اضغط هنا

Generation High resolution 3D model from natural language by Generative Adversarial Network

375   0   0.0 ( 0 )
 نشر من قبل Kentaro Fukamizu
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We present a method of generating high resolution 3D shapes from natural language descriptions. To achieve this goal, we propose two steps that generating low resolution shapes which roughly reflect texts and generating high resolution shapes which reflect the detail of texts. In a previous paper, the authors have shown a method of generating low resolution shapes. We improve it to generate 3D shapes more faithful to natural language and test the effectiveness of the method. To generate high resolution 3D shapes, we use the framework of Conditional Wasserstein GAN. We propose two roles of Critic separately, which calculate the Wasserstein distance between two probability distribution, so that we achieve generating high quality shapes or acceleration of learning speed of model. To evaluate our approach, we performed quantitive evaluation with several numerical metrics for Critic models. Our method is first to realize the generation of high quality model by propagating text embedding information to high resolution task when generating 3D model.



قيم البحث

اقرأ أيضاً

In order to generate novel 3D shapes with machine learning, one must allow for interpolation. The typical approach for incorporating this creative process is to interpolate in a learned latent space so as to avoid the problem of generating unrealisti c instances by exploiting the models learned structure. The process of the interpolation is supposed to form a semantically smooth morphing. While this approach is sound for synthesizing realistic media such as lifelike portraits or new designs for everyday objects, it subjectively fails to directly model the unexpected, unrealistic, or creative. In this work, we present a method for learning how to interpolate point clouds. By encoding prior knowledge about real-world objects, the intermediate forms are both realistic and unlike any existing forms. We show not only how this method can be used to generate creative point clouds, but how the method can also be leveraged to generate 3D models suitable for sculpture.
Natural language understanding (NLU) and natural language generation (NLG) are two fundamental and related tasks in building task-oriented dialogue systems with opposite objectives: NLU tackles the transformation from natural language to formal repre sentations, whereas NLG does the reverse. A key to success in either task is parallel training data which is expensive to obtain at a large scale. In this work, we propose a generative model which couples NLU and NLG through a shared latent variable. This approach allows us to explore both spaces of natural language and formal representations, and facilitates information sharing through the latent space to eventually benefit NLU and NLG. Our model achieves state-of-the-art performance on two dialogue datasets with both flat and tree-structured formal representations. We also show that the model can be trained in a semi-supervised fashion by utilising unlabelled data to boost its performance.
219 - Eric Heim 2019
Generative Adversarial Networks (GANs) have received a great deal of attention due in part to recent success in generating original, high-quality samples from visual domains. However, most current methods only allow for users to guide this image gene ration process through limited interactions. In this work we develop a novel GAN framework that allows humans to be in-the-loop of the image generation process. Our technique iteratively accepts relative constraints of the form Generate an image more like image A than image B. After each constraint is given, the user is presented with new outputs from the GAN, informing the next round of feedback. This feedback is used to constrain the output of the GAN with respect to an underlying semantic space that can be designed to model a variety of different notions of similarity (e.g. classes, attributes, object relationships, color, etc.). In our experiments, we show that our GAN framework is able to generate images that are of comparable quality to equivalent unsupervised GANs while satisfying a large number of the constraints provided by users, effectively changing a GAN into one that allows users interactive control over image generation without sacrificing image quality.
455 - Fan-Keng Sun , Cheng-I Lai 2020
Transformer-based language models have shown to be very powerful for natural language generation (NLG). However, text generation conditioned on some user inputs, such as topics or attributes, is non-trivial. Past approach relies on either modifying t he original LM architecture, re-training the LM on corpora with attribute labels, or having separately trained `guidance models to guide text generation in decoding. We argued that the above approaches are not necessary, and the original unconditioned LM is sufficient for conditioned NLG. We evaluated our approaches by the samples fluency and diversity with automated and human evaluation.
This work focuses on the analysis that whether 3D face models can be learned from only the speech inputs of speakers. Previous works for cross-modal face synthesis study image generation from voices. However, image synthesis includes variations such as hairstyles, backgrounds, and facial textures, that are arguably irrelevant to voice or without direct studies to show correlations. We instead investigate the ability to reconstruct 3D faces to concentrate on only geometry, which is more physiologically grounded. We propose both the supervised learning and unsupervised learning frameworks. Especially we demonstrate how unsupervised learning is possible in the absence of a direct voice-to-3D-face dataset under limited availability of 3D face scans when the model is equipped with knowledge distillation. To evaluate the performance, we also propose several metrics to measure the geometric fitness of two 3D faces based on points, lines, and regions. We find that 3D face shapes can be reconstructed from voices. Experimental results suggest that 3D faces can be reconstructed from voices, and our method can improve the performance over the baseline. The best performance gains (15% - 20%) on ear-to-ear distance ratio metric (ER) coincides with the intuition that one can roughly envision whether a speakers face is overall wider or thinner only from a persons voice. See our project page for codes and data.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا