Social-IWSTCNN: A Social Interaction-Weighted Spatio-Temporal Convolutional Neural Network for Pedestrian Trajectory Prediction in Urban Traffic Scenarios

188 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Chi Zhang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Chi Zhang Departmentn of Computer Science

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Pedestrian trajectory prediction in urban scenarios is essential for automated driving. This task is challenging because the behavior of pedestrians is influenced by both their own history paths and the interactions with others. Previous research modeled these interactions with pooling mechanisms or aggregating with hand-crafted attention weights. In this paper, we present the Social Interaction-Weighted Spatio-Temporal Convolutional Neural Network (Social-IWSTCNN), which includes both the spatial and the temporal features. We propose a novel design, namely the Social Interaction Extractor, to learn the spatial and social interaction features of pedestrians. Most previous works used ETH and UCY datasets which include five scenes but do not cover urban traffic scenarios extensively for training and evaluation. In this paper, we use the recently released large-scale Waymo Open Dataset in urban traffic scenarios, which includes 374 urban training scenes and 76 urban testing scenes to analyze the performance of our proposed algorithm in comparison to the state-of-the-art (SOTA) models. The results show that our algorithm outperforms SOTA algorithms such as Social-LSTM, Social-GAN, and Social-STGCNN on both Average Displacement Error (ADE) and Final Displacement Error (FDE). Furthermore, our Social-IWSTCNN is 54.8 times faster in data pre-processing speed, and 4.7 times faster in total test speed than the current best SOTA algorithm Social-STGCNN.

قيم البحث

145 - Abduallah Mohamed , Kun Qian , Mohamed Elhoseiny 2020

Better machine understanding of pedestrian behaviors enables faster progress in modeling interactions between agents such as autonomous vehicles and humans. Pedestrian trajectories are not only influenced by the pedestrian itself but also by interact ion with surrounding objects. Previous methods modeled these interactions by using a variety of aggregation methods that integrate different learned pedestrians states. We propose the Social Spatio-Temporal Graph Convolutional Neural Network (Social-STGCNN), which substitutes the need of aggregation methods by modeling the interactions as a graph. Our results show an improvement over the state of art by 20% on the Final Displacement Error (FDE) and an improvement on the Average Displacement Error (ADE) with 8.5 times less parameters and up to 48 times faster inference speed than previously reported methods. In addition, our model is data efficient, and exceeds previous state of the art on the ADE metric with only 20% of the training data. We propose a kernel function to embed the social interactions between pedestrians within the adjacency matrix. Through qualitative analysis, we show that our model inherited social behaviors that can be expected between pedestrians trajectories. Code is available at https://github.com/abduallahmohamed/Social-STGCNN.

الرؤية الحاسوبية وتمييز الأنماط

STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction

129 - Zhishuai Zhang , Jiyang Gao , Junhua Mao 2020

Detecting pedestrians and predicting future trajectories for them are critical tasks for numerous applications, such as autonomous driving. Previous methods either treat the detection and prediction as separate tasks or simply add a trajectory regres sion head on top of a detector. In this work, we present a novel end-to-end two-stage network: Spatio-Temporal-Interactive Network (STINet). In addition to 3D geometry modeling of pedestrians, we model the temporal information for each of the pedestrians. To do so, our method predicts both current and past locations in the first stage, so that each pedestrian can be linked across frames and the comprehensive spatio-temporal information can be captured in the second stage. Also, we model the interaction among objects with an interaction graph, to gather the information among the neighboring objects. Comprehensive experiments on the Lyft Dataset and the recently released large-scale Waymo Open Dataset for both object detection and future trajectory prediction validate the effectiveness of the proposed method. For the Waymo Open Dataset, we achieve a bird-eyes-view (BEV) detection AP of 80.73 and trajectory prediction average displacement error (ADE) of 33.67cm for pedestrians, which establish the state-of-the-art for both tasks.

الرؤية الحاسوبية وتمييز الأنماط

Spectral Temporal Graph Neural Network for Trajectory Prediction

180 - Defu Cao , Jiachen Li , Hengbo Ma 2021

An effective understanding of the contextual environment and accurate motion forecasting of surrounding agents is crucial for the development of autonomous vehicles and social mobile robots. This task is challenging since the behavior of an autonomou s agent is not only affected by its own intention, but also by the static environment and surrounding dynamically interacting agents. Previous works focused on utilizing the spatial and temporal information in time domain while not sufficiently taking advantage of the cues in frequency domain. To this end, we propose a Spectral Temporal Graph Neural Network (SpecTGNN), which can capture inter-agent correlations and temporal dependency simultaneously in frequency domain in addition to time domain. SpecTGNN operates on both an agent graph with dynamic state information and an environment graph with the features extracted from context images in two streams. The model integrates graph Fourier transform, spectral graph convolution and temporal gated convolution to encode history information and forecast future trajectories. Moreover, we incorporate a multi-head spatio-temporal attention mechanism to mitigate the effect of error propagation in a long time horizon. We demonstrate the performance of SpecTGNN on two public trajectory prediction benchmark datasets, which achieves state-of-the-art performance in terms of prediction accuracy.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Social-STAGE: Spatio-Temporal Multi-Modal Future Trajectory Forecast

239 - Srikanth Malla , Chiho Choi , Behzad Dariush 2020

This paper considers the problem of multi-modal future trajectory forecast with ranking. Here, multi-modality and ranking refer to the multiple plausible path predictions and the confidence in those predictions, respectively. We propose Social-STAGE, Social interaction-aware Spatio-Temporal multi-Attention Graph convolution network with novel Evaluation for multi-modality. Our main contributions include analysis and formulation of multi-modality with ranking using interaction and multi-attention, and introduction of new metrics to evaluate the diversity and associated confidence of multi-modal predictions. We evaluate our approach on existing public datasets ETH and UCY and show that the proposed algorithm outperforms the state of the arts on these datasets.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Spatio-Temporal Graph Transformer Networks for Pedestrian Trajectory Prediction

218 - Cunjun Yu , Xiao Ma , Jiawei Ren 2020

Understanding crowd motion dynamics is critical to real-world applications, e.g., surveillance systems and autonomous driving. This is challenging because it requires effectively modeling the socially aware crowd spatial interaction and complex tempo ral dependencies. We believe attention is the most important factor for trajectory prediction. In this paper, we present STAR, a Spatio-Temporal grAph tRansformer framework, which tackles trajectory prediction by only attention mechanisms. STAR models intra-graph crowd interaction by TGConv, a novel Transformer-based graph convolution mechanism. The inter-graph temporal dependencies are modeled by separate temporal Transformers. STAR captures complex spatio-temporal interactions by interleaving between spatial and temporal Transformers. To calibrate the temporal prediction for the long-lasting effect of disappeared pedestrians, we introduce a read-writable external memory module, consistently being updated by the temporal Transformer. We show that with only attention mechanism, STAR achieves state-of-the-art performance on 5 commonly used real-world pedestrian prediction datasets.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات