ﻻ يوجد ملخص باللغة العربية
Lip sync has emerged as a promising technique for generating mouth movements from audio signals. However, synthesizing a high-resolution and photorealistic virtual news anchor is still challenging. Lack of natural appearance, visual consistency, and processing efficiency are the main problems with existing methods. This paper presents a novel lip-sync framework specially designed for producing high-fidelity virtual news anchors. A pair of Temporal Convolutional Networks are used to learn the cross-modal sequential mapping from audio signals to mouth movements, followed by a neural rendering network that translates the synthetic facial map into a high-resolution and photorealistic appearance. This fully trainable framework provides end-to-end processing that outperforms traditional graphics-based methods in many low-delay applications. Experiments also show the framework has advantages over modern neural-based methods in both visual appearance and efficiency.
Lip-reading aims to recognize speech content from videos via visual analysis of speakers lip movements. This is a challenging task due to the existence of homophemes-words which involve identical or highly similar lip movements, as well as diverse li
In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more
We propose a novel framework for creating large-scale photorealistic datasets of indoor scenes, with ground truth geometry, material, lighting and semantics. Our goal is to make the dataset creation process widely accessible, transforming scans into
We present a data-driven inference method that can synthesize a photorealistic texture map of a complete 3D face model given a partial 2D view of a person in the wild. After an initial estimation of shape and low-frequency albedo, we compute a high-f
In Virtual, augmented, and mixed reality, the use of hand gestures is increasingly becoming popular to reduce the difference between the virtual and real world. The precise location of the fingertip is essential/crucial for a seamless experience. Muc