ترغب بنشر مسار تعليمي؟ اضغط هنا

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction

51   0   0.0 ( 0 )
 نشر من قبل Junwei Liang
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper studies the problem of predicting future trajectories of people in unseen cameras of novel scenarios and views. We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras. We propose a novel approach to learn robust representation through augmenting the simulation training data such that the representation can better generalize to unseen real-world test data. The key idea is to mix the feature of the hardest camera view with the adversarial feature of the original view. We refer to our method as SimAug. We show that SimAug achieves promising results on three real-world benchmarks using zero real training data, and state-of-the-art performance in the Stanford Drone and the VIRAT/ActEV dataset when using in-domain training data.



قيم البحث

اقرأ أيضاً

Activity analysis in which multiple people interact across a large space is challenging due to the interplay of individual actions and collective group dynamics. We propose an end-to-end approach for learning person trajectory representations for gro up activity analysis. The learned representations encode rich spatio-temporal dependencies and capture useful motion patterns for recognizing individual events, as well as characteristic group dynamics that can be used to identify groups from their trajectories alone. We develop our deep learning approach in the context of team sports, which provide well-defined sets of events (e.g. pass, shot) and groups of people (teams). Analysis of events and team formations using NHL hockey and NBA basketball datasets demonstrate the generality of our approach.
91 - Rui Yu , Zihan Zhou 2021
Human trajectory prediction has received increased attention lately due to its importance in applications such as autonomous vehicles and indoor robots. However, most existing methods make predictions based on human-labeled trajectories and ignore th e errors and noises in detection and tracking. In this paper, we study the problem of human trajectory forecasting in raw videos, and show that the prediction accuracy can be severely affected by various types of tracking errors. Accordingly, we propose a simple yet effective strategy to correct the tracking failures by enforcing prediction consistency over time. The proposed re-tracking algorithm can be applied to any existing tracking and prediction pipelines. Experiments on public benchmark datasets demonstrate that the proposed method can improve both tracking and prediction performance in challenging real-world scenarios. The code and data are available at https://git.io/retracking-prediction.
86 - Pei Lv , Hui Wei , Tianxin Gu 2021
Trajectory prediction is a fundamental and challenging task for numerous applications, such as autonomous driving and intelligent robots. Currently, most of existing work treat the pedestrian trajectory as a series of fixed two-dimensional coordinate s. However, in real scenarios, the trajectory often exhibits randomness, and has its own probability distribution. Inspired by this observed fact, also considering other movement characteristics of pedestrians, we propose one simple and intuitive movement description, probability trajectory, which maps the coordinate points of pedestrian trajectory into two-dimensional Gaussian distribution in images. Based on this unique description, we develop one novel trajectory prediction method, called social probability. The method combines the new probability trajectory and powerful convolution recurrent neural networks together. Both the input and output of our method are probability trajectories, which provide the recurrent neural network with sufficient spatial and random information of moving pedestrians. And the social probability extracts spatio-temporal features directly on the new movement description to generate robust and accurate predicted results. The experiments on public benchmark datasets show the effectiveness of the proposed method.
This paper addresses the task of unsupervised learning of representations for action recognition in videos. Previous works proposed to utilize future prediction, or other domain-specific objectives to train a network, but achieved only limited succes s. In contrast, in the relevant field of image representation learning, simpler, discrimination-based methods have recently bridged the gap to fully-supervised performance. We first propose to adapt two top performing objectives in this class - instance recognition and local aggregation, to the video domain. In particular, the latter approach iterates between clustering the videos in the feature space of a network and updating it to respect the cluster with a non-parametric classification loss. We observe promising performance, but qualitative analysis shows that the learned representations fail to capture motion patterns, grouping the videos based on appearance. To mitigate this issue, we turn to the heuristic-based IDT descriptors, that were manually designed to encode motion patterns in videos. We form the clusters in the IDT space, using these descriptors as a an unsupervised prior in the iterative local aggregation algorithm. Our experiments demonstrates that this approach outperform prior work on UCF101 and HMDB51 action recognition benchmarks. We also qualitatively analyze the learned representations and show that they successfully capture video dynamics.
145 - Xiaoli Liu , Jianqin Yin 2020
Predicting future human motion is critical for intelligent robots to interact with humans in the real world, and human motion has the nature of multi-granularity. However, most of the existing work either implicitly modeled multi-granularity informat ion via fixed modes or focused on modeling a single granularity, making it hard to well capture this nature for accurate predictions. In contrast, we propose a novel end-to-end network, Semi-Decoupled Multi-grained Trajectory Learning network (SDMTL), to predict future poses, which not only flexibly captures rich multi-grained trajectory information but also aggregates multi-granularity information for predictions. Specifically, we first introduce a Brain-inspired Semi-decoupled Motion-sensitive Encoding module (BSME), effectively capturing spatiotemporal features in a semi-decoupled manner. Then, we capture the temporal dynamics of motion trajectory at multi-granularity, including fine granularity and coarse granularity. We learn multi-grained trajectory information using BSMEs hierarchically and further capture the information of temporal evolutional directions at each granularity by gathering the outputs of BSMEs at each granularity and applying temporal convolutions along the motion trajectory. Next, the captured motion dynamics can be further enhanced by aggregating the information of multi-granularity with a weighted summation scheme. Finally, experimental results on two benchmarks, including Human3.6M and CMU-Mocap, show that our method achieves state-of-the-art performance, demonstrating the effectiveness of our proposed method. The code will be available if the paper is accepted.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا