ﻻ يوجد ملخص باللغة العربية
Human motion prediction is an increasingly interesting topic in computer vision and robotics. In this paper, we propose a new 2D CNN based network, TrajectoryNet, to predict future poses in the trajectory space. Compared with most existing methods, our model focuses on modeling the motion dynamics with coupled spatio-temporal features, local-global spatial features and global temporal co-occurrence features of the previous pose sequence. Specifically, the coupled spatio-temporal features describe the spatial and temporal structure information hidden in the natural human motion sequence, which can be mined by covering the space and time dimensions of the input pose sequence with the convolutional filters. The local-global spatial features that encode different correlations of different joints of the human body (e.g. strong correlations between joints of one limb, weak correlations between joints of different limbs) are captured hierarchically by enlarging the receptive field layer by layer and residual connections from the lower layers to the deeper layers in our proposed convolutional network. And the global temporal co-occurrence features represent the co-occurrence relationship that different subsequences in a complex motion sequence are appeared simultaneously, which can be obtained automatically with our proposed TrajectoryNet by reorganizing the temporal information as the depth dimension of the input tensor. Finally, future poses are approximated based on the captured motion dynamics features. Extensive experiments show that our method achieves state-of-the-art performance on three challenging benchmarks (e.g. Human3.6M, G3D, and FNTU), which demonstrates the effectiveness of our proposed method. The code will be available if the paper is accepted.
In this paper, we propose a novel Transformer-based architecture for the task of generative modelling of 3D human motion. Previous works commonly rely on RNN-based models considering shorter forecast horizons reaching a stationary and often implausib
Better machine understanding of pedestrian behaviors enables faster progress in modeling interactions between agents such as autonomous vehicles and humans. Pedestrian trajectories are not only influenced by the pedestrian itself but also by interact
Predicting the future paths of an agents neighbors accurately and in a timely manner is central to the autonomous applications for collision avoidance. Conventional approaches, e.g., LSTM-based models, take considerable computational costs in the pre
Human motion prediction aims to predict future 3D skeletal sequences by giving a limited human motion as inputs. Two popular methods, recurrent neural networks and feed-forward deep networks, are able to predict rough motion trend, but motion details
3D convolutional neural networks have achieved promising results for video tasks in computer vision, including video saliency prediction that is explored in this paper. However, 3D convolution encodes visual representation merely on fixed local space