ﻻ يوجد ملخص باللغة العربية
In this paper, we propose a novel Transformer-based architecture for the task of generative modelling of 3D human motion. Previous works commonly rely on RNN-based models considering shorter forecast horizons reaching a stationary and often implausible state quickly. Instead, our focus lies on the generation of plausible future developments over longer time horizons. To mitigate the issue of convergence to a static pose, we propose a novel architecture that leverages the recently proposed self-attention concept. The task of 3D motion prediction is inherently spatio-temporal and thus the proposed model learns high dimensional embeddings for skeletal joints followed by a decoupled temporal and spatial self-attention mechanism. This allows the model to access past information directly and to capture spatio-temporal dependencies explicitly. We show empirically that this reduces error accumulation over time and allows for the generation of perceptually plausible motion sequences over long time horizons up to 20 seconds as well as accurate short-term predictions. Accompanying video available at https://youtu.be/yF0cdt2yCNE.
Human motion prediction is an increasingly interesting topic in computer vision and robotics. In this paper, we propose a new 2D CNN based network, TrajectoryNet, to predict future poses in the trajectory space. Compared with most existing methods, o
We propose a multiscale spatio-temporal graph neural network (MST-GNN) to predict the future 3D skeleton-based human poses in an action-category-agnostic manner. The core of MST-GNN is a multiscale spatio-temporal graph that explicitly models the rel
Understanding crowd motion dynamics is critical to real-world applications, e.g., surveillance systems and autonomous driving. This is challenging because it requires effectively modeling the socially aware crowd spatial interaction and complex tempo
Predicting the future paths of an agents neighbors accurately and in a timely manner is central to the autonomous applications for collision avoidance. Conventional approaches, e.g., LSTM-based models, take considerable computational costs in the pre
Human motion prediction is a challenging and important task in many computer vision application domains. Existing work only implicitly models the spatial structure of the human skeleton. In this paper, we propose a novel approach that decomposes the