ﻻ يوجد ملخص باللغة العربية
The Tactical Driver Behavior modeling problem requires understanding of driver actions in complicated urban scenarios from a rich multi modal signals including video, LiDAR and CAN bus data streams. However, the majority of deep learning research is focused either on learning the vehicle/environment state (sensor fusion) or the driver policy (from temporal data), but not both. Learning both tasks end-to-end offers the richest distillation of knowledge, but presents challenges in formulation and successful training. In this work, we propose promising first steps in this direction. Inspired by the gating mechanisms in LSTM, we propose gated recurrent fusion units (GRFU) that learn fusion weighting and temporal weighting simultaneously. We demonstrate its superior performance over multimodal and temporal baselines in supervised regression and classification tasks, all in the realm of autonomous navigation. We note a 10% improvement in the mAP score over state-of-the-art for tactical driver behavior classification in HDD dataset and a 20% drop in overall Mean squared error for steering action regression on TORCS dataset.
Autonomous vehicles and mobile robotic systems are typically equipped with multiple sensors to provide redundancy. By integrating the observations from different sensors, these mobile agents are able to perceive the environment and estimate system st
Leveraging multimodal information with recursive Bayesian filters improves performance and robustness of state estimation, as recursive filters can combine different modalities according to their uncertainties. Prior work has studied how to optimally
Deep multimodal fusion by using multiple sources of data for classification or regression has exhibited a clear advantage over the unimodal counterpart on various applications. Yet, current methods including aggregation-based and alignment-based fusi
Tasks that rely on multi-modal information typically include a fusion module that combines information from different modalities. In this work, we develop a Refiner Fusion Network (ReFNet) that enables fusion modules to combine strong unimodal repres
This paper proposes a method for representation learning of multimodal data using contrastive losses. A traditional approach is to contrast different modalities to learn the information shared between them. However, that approach could fail to learn