Warping of Radar Data into Camera Image for Cross-Modal Supervision in Automotive Applications

72 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Christopher Grimm

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Christopher Grimm - Tai Fei - Ernst Warsitz

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we present a novel framework to project automotive radar range-Doppler (RD) spectrum into camera image. The utilized warping operation is designed to be fully differentiable, which allows error backpropagation through the operation. This enables the training of neural networks (NN) operating exclusively on RD spectrum by utilizing labels provided from camera vision models. As the warping operation relies on accurate scene flow, additionally, we present a novel scene flow estimation algorithm fed from camera, lidar and radar, enabling us to improve the accuracy of the warping operation. We demonstrate the framework in multiple applications like direction-of-arrival (DoA) estimation, target detection, semantic segmentation and estimation of radar power from camera data. Extensive evaluations have been carried out for the DoA application and suggest superior quality for NN based estimators compared to classical estimators. The novel scene flow estimation approach is benchmarked against state-of-the-art scene flow algorithms and outperforms them by roughly a third.

قيم البحث

105 - Yizhou Wang , Zhongyu Jiang , Xiangyu Gao 2020

Radar is usually more robust than the camera in severe driving scenarios, e.g., weak/strong lighting and bad weather. However, unlike RGB images captured by a camera, the semantic information from the radar signals is noticeably difficult to extract. In this paper, we propose a deep radar object detection network (RODNet), to effectively detect objects purely from the carefully processed radar frequency data in the format of range-azimuth frequency heatmaps (RAMaps). Three different 3D autoencoder based architectures are introduced to predict object confidence distribution from each snippet of the input RAMaps. The final detection results are then calculated using our post-processing method, called location-based non-maximum suppression (L-NMS). Instead of using burdensome human-labeled ground truth, we train the RODNet using the annotations generated automatically by a novel 3D localization method using a camera-radar fusion (CRF) strategy. To train and evaluate our method, we build a new dataset -- CRUW, containing synchronized videos and RAMaps in various driving scenarios. After intensive experiments, our RODNet shows favorable object detection performance without the presence of the camera.

الرؤية الحاسوبية وتمييز الأنماط معالجة الإشارات

Speech2Action: Cross-modal Supervision for Action Recognition

197 - Arsha Nagrani , Chen Sun , David Ross 2020

Is it possible to guess human action from dialogue alone? In this work we investigate the link between spoken words and actions in movies. We note that movie screenplays describe actions, as well as contain the speech of characters and hence can be u sed to learn this correlation with no additional supervision. We train a BERT-based Speech2Action classifier on over a thousand movie screenplays, to predict action labels from transcribed speech segments. We then apply this model to the speech segments of a large unlabelled movie corpus (188M speech segments from 288K movies). Using the predictions of this model, we obtain weak action labels for over 800K video clips. By training on these video clips, we demonstrate superior action recognition performance on standard action recognition benchmarks, without using a single manually labelled action example.

الرؤية الحاسوبية وتمييز الأنماط

Off-the-shelf sensor vs. experimental radar -- How much resolution is necessary in automotive radar classification?

67 - Nicolas Scheiner , Ole Schumann , Florian Kraus 2020

Radar-based road user detection is an important topic in the context of autonomous driving applications. The resolution of conventional automotive radar sensors results in a sparse data representation which is tough to refine during subsequent signal processing. On the other hand, a new sensor generation is waiting in the wings for its application in this challenging field. In this article, two sensors of different radar generations are evaluated against each other. The evaluation criterion is the performance on moving road user object detection and classification tasks. To this end, two data sets originating from an off-the-shelf radar and a high resolution next generation radar are compared. Special attention is given on how the two data sets are assembled in order to make them comparable. The utilized object detector consists of a clustering algorithm, a feature extraction module, and a recurrent neural network ensemble for classification. For the assessment, all components are evaluated both individually and, for the first time, as a whole. This allows for indicating where overall performance improvements have their origin in the pipeline. Furthermore, the generalization capabilities of both data sets are evaluated and important comparison metrics for automotive radar object detection are discussed. Results show clear benefits of the next generation radar. Interestingly, those benefits do not actually occur due to better performance at the classification stage, but rather because of the vast improvements at the clustering stage.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الإشارات

Accurate Visual Localization for Automotive Applications

68 - Eli Brosh , Matan Friedmann , Ilan Kadar 2019

Accurate vehicle localization is a crucial step towards building effective Vehicle-to-Vehicle networks and automotive applications. Yet standard grade GPS data, such as that provided by mobile phones, is often noisy and exhibits significant localizat ion errors in many urban areas. Approaches for accurate localization from imagery often rely on structure-based techniques, and thus are limited in scale and are expensive to compute. In this paper, we present a scalable visual localization approach geared for real-time performance. We propose a hybrid coarse-to-fine approach that leverages visual and GPS location cues. Our solution uses a self-supervised approach to learn a compact road image representation. This representation enables efficient visual retrieval and provides coarse localization cues, which are fused with vehicle ego-motion to obtain high accuracy location estimates. As a benchmark to evaluate the performance of our visual localization approach, we introduce a new large-scale driving dataset based on video and GPS data obtained from a large-scale network of connected dash-cams. Our experiments confirm that our approach is highly effective in challenging urban environments, reducing localization error by an order of magnitude.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

Fast Rule-Based Clutter Detection in Automotive Radar Data

97 - Johannes Kopp , Dominik Kellner , Aldi Piroli 2021

Automotive radar sensors output a lot of unwanted clutter or ghost detections, whose position and velocity do not correspond to any real object in the sensors field of view. This poses a substantial challenge for environment perception methods like o bject detection or tracking. Especially problematic are clutter detections that occur in groups or at similar locations in multiple consecutive measurements. In this paper, a new algorithm for identifying such erroneous detections is presented. It is mainly based on the modeling of specific commonly occurring wave propagation paths that lead to clutter. In particular, the three effects explicitly covered are reflections at the underbody of a car or truck, signals traveling back and forth between the vehicle on which the sensor is mounted and another object, and multipath propagation via specular reflection. The latter often occurs near guardrails, concrete walls or similar reflective surfaces. Each of these effects is described both theoretically and regarding a method for identifying the corresponding clutter detections. Identification is done by analyzing detections generated from a single sensor measurement only. The final algorithm is evaluated on recordings of real extra-urban traffic. For labeling, a semi-automatic process is employed. The results are promising, both in terms of performance and regarding the very low execution time. Typically, a large part of clutter is found, while only a small ratio of detections corresponding to real objects are falsely classified by the algorithm.

الرؤية الحاسوبية وتمييز الأنماط معالجة الإشارات