No Arabic abstract
We propose a novel deep generative model based on causal convolutions for multi-subject motion modeling and synthesis, which is inspired by the success of WaveNet in multi-subject speech synthesis. However, it is nontrivial to adapt WaveNet to handle high-dimensional and physically constrained motion data. To this end, we add an encoder and a decoder to the WaveNet to translate the motion data into features and back to the predicted motions. We also add 1D convolution layers to take skeleton configuration as an input to model skeleton variations across different subjects. As a result, our network can scale up well to large-scale motion data sets across multiple subjects and support various applications, such as random and controllable motion synthesis, motion denoising, and motion completion, in a unified way. Complex motions, such as punching, kicking and, kicking while punching, are also well handled. Moreover, our network can synthesize motions for novel skeletons not in the training dataset. After fine-tuning the network with a few motion data of the novel skeleton, it is able to capture the personalized style implied in the motion and generate high-quality motions for the skeleton. Thus, it has the potential to be used as a pre-trained network in few-shot learning for motion modeling and synthesis. Experimental results show that our model can effectively handle the variation of skeleton configurations, and it runs fast to synthesize different types of motions on-line. We also perform user studies to verify that the quality of motions generated by our network is superior to the motions of state-of-the-art human motion synthesis methods.
This paper introduces a new generative deep learning network for human motion synthesis and control. Our key idea is to combine recurrent neural networks (RNNs) and adversarial training for human motion modeling. We first describe an efficient method for training a RNNs model from prerecorded motion data. We implement recurrent neural networks with long short-term memory (LSTM) cells because they are capable of handling nonlinear dynamics and long term temporal dependencies present in human motions. Next, we train a refiner network using an adversarial loss, similar to Generative Adversarial Networks (GANs), such that the refined motion sequences are indistinguishable from real motion capture data using a discriminative network. We embed contact information into the generative deep learning model to further improve the performance of our generative model. The resulting model is appealing to motion synthesis and control because it is compact, contact-aware, and can generate an infinite number of naturally looking motions with infinite lengths. Our experiments show that motions generated by our deep learning model are always highly realistic and comparable to high-quality motion capture data. We demonstrate the power and effectiveness of our models by exploring a variety of applications, ranging from random motion synthesis, online/offline motion control, and motion filtering. We show the superiority of our generative model by comparison against baseline models.
We propose a novel and flexible roof modeling approach that can be used for constructing planar 3D polygon roof meshes. Our method uses a graph structure to encode roof topology and enforces the roof validity by optimizing a simple but effective planarity metric we propose. This approach is significantly more efficient than using general purpose 3D modeling tools such as 3ds Max or SketchUp, and more powerful and expressive than specialized tools such as the straight skeleton. Our optimization-based formulation is also flexible and can accommodate different styles and user preferences for roof modeling. We showcase two applications. The first application is an interactive roof editing framework that can be used for roof design or roof reconstruction from aerial images. We highlight the efficiency and generality of our approach by constructing a mesh-image paired dataset consisting of 2539 roofs. Our second application is a generative model to synthesize new roof meshes from scratch. We use our novel dataset to combine machine learning and our roof optimization techniques, by using transformers and graph convolutional networks to model roof topology, and our roof optimization methods to enforce the planarity constraint.
3D human dance motion is a cooperative and elegant social movement. Unlike regular simple locomotion, it is challenging to synthesize artistic dance motions due to the irregularity, kinematic complexity and diversity. It requires the synthesized dance is realistic, diverse and controllable. In this paper, we propose a novel generative motion model based on temporal convolution and LSTM,TC-LSTM, to synthesize realistic and diverse dance motion. We introduce a unique control signal, dance melody line, to heighten controllability. Hence, our model, and its switch for control signals, promote a variety of applications: random dance synthesis, music-to-dance, user control, and more. Our experiments demonstrate that our model can synthesize artistic dance motion in various dance types. Compared with existing methods, our method achieved start-of-the-art results.
Existing physical cloth simulators suffer from expensive computation and difficulties in tuning mechanical parameters to get desired wrinkling behaviors. Data-driven methods provide an alternative solution. It typically synthesizes cloth animation at a much lower computational cost, and also creates wrinkling effects that highly resemble the much controllable training data. In this paper we propose a deep learning based method for synthesizing cloth animation with high resolution meshes. To do this we first create a dataset for training: a pair of low and high resolution meshes are simulated and their motions are synchronized. As a result the two meshes exhibit similar large-scale deformation but different small wrinkles. Each simulated mesh pair are then converted into a pair of low and high resolution images (a 2D array of samples), with each sample can be interpreted as any of three features: the displacement, the normal and the velocity. With these image pairs, we design a multi-feature super-resolution (MFSR) network that jointly train an upsampling synthesizer for the three features. The MFSR architecture consists of two key components: a sharing module that takes multiple features as input to learn low-level representations from corresponding super-resolution tasks simultaneously; and task-specific modules focusing on various high-level semantics. Frame-to-frame consistency is well maintained thanks to the proposed kinematics-based loss function. Our method achieves realistic results at high frame rates: 12-14 times faster than traditional physical simulation. We demonstrate the performance of our method with various experimental scenes, including a dressed character with sophisticated collisions.
The field of physics-based animation is gaining importance due to the increasing demand for realism in video games and films, and has recently seen wide adoption of data-driven techniques, such as deep reinforcement learning (RL), which learn control from (human) demonstrations. While RL has shown impressive results at reproducing individual motions and interactive locomotion, existing methods are limited in their ability to generalize to new motions and their ability to compose a complex motion sequence interactively. In this paper, we propose a physics-based universal neural controller (UniCon) that learns to master thousands of motions with different styles by learning on large-scale motion datasets. UniCon is a two-level framework that consists of a high-level motion scheduler and an RL-powered low-level motion executor, which is our key innovation. By systematically analyzing existing multi-motion RL frameworks, we introduce a novel objective function and training techniques which make a significant leap in performance. Once trained, our motion executor can be combined with different high-level schedulers without the need for retraining, enabling a variety of real-time interactive applications. We show that UniCon can support keyboard-driven control, compose motion sequences drawn from a large pool of locomotion and acrobatics skills and teleport a person captured on video to a physics-based virtual avatar. Numerical and qualitative results demonstrate a significant improvement in efficiency, robustness and generalizability of UniCon over prior state-of-the-art, showcasing transferability to unseen motions, unseen humanoid models and unseen perturbation.