ﻻ يوجد ملخص باللغة العربية
This paper presents a generic method for generating full facial 3D animation from speech. Existing approaches to audio-driven facial animation exhibit uncanny or static upper face animation, fail to produce accurate and plausible co-articulation or rely on person-specific models that limit their scalability. To improve upon existing models, we propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face. At the core of our approach is a categorical latent space for facial animation that disentangles audio-correlated and audio-uncorrelated information based on a novel cross-modality loss. Our approach ensures highly accurate lip motion, while also synthesizing plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion. We demonstrate that our approach outperforms several baselines and obtains state-of-the-art quality both qualitatively and quantitatively. A perceptual user study demonstrates that our approach is deemed more realistic than the current state-of-the-art in over 75% of cases. We recommend watching the supplemental video before reading the paper: https://research.fb.com/wp-content/uploads/2021/04/mesh_talk.mp4
3D face reconstruction and face alignment are two fundamental and highly related topics in computer vision. Recently, some works start to use deep learning models to estimate the 3DMM coefficients to reconstruct 3D face geometry. However, the perform
Speech-driven facial animation is the process that automatically synthesizes talking characters based on speech signals. The majority of work in this domain creates a mapping from audio features to visual features. This approach often requires post-p
We present a novel method to jointly learn a 3D face parametric model and 3D face reconstruction from diverse sources. Previous methods usually learn 3D face modeling from one kind of source, such as scanned data or in-the-wild images. Although 3D sc
Both image registration and label fusion in the multi-atlas segmentation (MAS) rely on the intensity similarity between target and atlas images. However, such similarity can be problematic when target and atlas images are acquired using different ima
Face image manipulation via three-dimensional guidance has been widely applied in various interactive scenarios due to its semantically-meaningful understanding and user-friendly controllability. However, existing 3D-morphable-model-based manipulatio