Accurate Visual Localization for Automotive Applications

69 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Elad Levi

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Eli Brosh - Matan Friedmann - Ilan Kadar

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Accurate vehicle localization is a crucial step towards building effective Vehicle-to-Vehicle networks and automotive applications. Yet standard grade GPS data, such as that provided by mobile phones, is often noisy and exhibits significant localization errors in many urban areas. Approaches for accurate localization from imagery often rely on structure-based techniques, and thus are limited in scale and are expensive to compute. In this paper, we present a scalable visual localization approach geared for real-time performance. We propose a hybrid coarse-to-fine approach that leverages visual and GPS location cues. Our solution uses a self-supervised approach to learn a compact road image representation. This representation enables efficient visual retrieval and provides coarse localization cues, which are fused with vehicle ego-motion to obtain high accuracy location estimates. As a benchmark to evaluate the performance of our visual localization approach, we introduce a new large-scale driving dataset based on video and GPS data obtained from a large-scale network of connected dash-cams. Our experiments confirm that our approach is highly effective in challenging urban environments, reducing localization error by an order of magnitude.

قيم البحث

52 - Zhenyi Liu , Minghao Shen , Jiaqi Zhang 2019

We describe an open-source simulator that creates sensor irradiance and sensor images of typical automotive scenes in urban settings. The purpose of the system is to support camera design and testing for automotive applications. The user can specify scene parameters (e.g., scene type, road type, traffic density, time of day) to assemble a large number of random scenes from graphics assets stored in a database. The sensor irradiance is generated using quantitative computer graphics methods, and the sensor images are created using image systems sensor simulation. The synthetic sensor images have pixel level annotations; hence, they can be used to train and evaluate neural networks for imaging tasks, such as object detection and classification. The end-to-end simulation system supports quantitative assessment, from scene to camera to network accuracy, for automotive applications.

الرؤية الحاسوبية وتمييز الأنماط

DVM-CAR: A large-scale automotive dataset for visual marketing research and applications

66 - Jingming Huang , Bowei Chen , Lan Luo 2021

The automotive industry is being transformed by technologies, applications and services ranging from sensors to big data analytics and to artificial intelligence. In this paper, we present our multidisciplinary initiative of creating a publicly avail able dataset to facilitate the visual-related marketing research and applications in automotive industry such as automotive exterior design, consumer analytics and sales modelling. We are motivated by the fact that there is growing interest in product aesthetics but there is no large-scale dataset available that covers a wide range of variables and information. We summarise the common issues faced by marketing researchers and computer scientists through a user survey study, and design our dataset to alleviate these issues. Our dataset contains 1.4 million images from 899 car models as well as their corresponding car model specification and sales information over more than ten years in the UK market. To the best of our knowledge, this is the very first large-scale automotive dataset which contains images, text and sales information from multiple sources over a long period of time. We describe the detailed data structure and the preparation steps, which we believe has the methodological contribution to the multi-source data fusion and sharing. In addition, we discuss three dataset application examples to illustrate the value of our dataset.

الرؤية الحاسوبية وتمييز الأنماط

Warping of Radar Data into Camera Image for Cross-Modal Supervision in Automotive Applications

71 - Christopher Grimm , Tai Fei , Ernst Warsitz 2020

In this paper, we present a novel framework to project automotive radar range-Doppler (RD) spectrum into camera image. The utilized warping operation is designed to be fully differentiable, which allows error backpropagation through the operation. Th is enables the training of neural networks (NN) operating exclusively on RD spectrum by utilizing labels provided from camera vision models. As the warping operation relies on accurate scene flow, additionally, we present a novel scene flow estimation algorithm fed from camera, lidar and radar, enabling us to improve the accuracy of the warping operation. We demonstrate the framework in multiple applications like direction-of-arrival (DoA) estimation, target detection, semantic segmentation and estimation of radar power from camera data. Extensive evaluations have been carried out for the DoA application and suggest superior quality for NN based estimators compared to classical estimators. The novel scene flow estimation approach is benchmarked against state-of-the-art scene flow algorithms and outperforms them by roughly a third.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي معالجة الإشارات

MPN: Multimodal Parallel Network for Audio-Visual Event Localization

88 - Jiashuo Yu , Ying Cheng , Rui Feng 2021

Audio-visual event localization aims to localize an event that is both audible and visible in the wild, which is a widespread audio-visual scene analysis task for unconstrained videos. To address this task, we propose a Multimodal Parallel Network (M PN), which can perceive global semantics and unmixed local information parallelly. Specifically, our MPN framework consists of a classification subnetwork to predict event categories and a localization subnetwork to predict event boundaries. The classification subnetwork is constructed by the Multimodal Co-attention Module (MCM) and obtains global contexts. The localization subnetwork consists of Multimodal Bottleneck Attention Module (MBAM), which is designed to extract fine-grained segment-level contents. Extensive experiments demonstrate that our framework achieves the state-of-the-art performance both in fully supervised and weakly supervised settings on the Audio-Visual Event (AVE) dataset.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي الوسائط المتعددة

A Behavioral Approach to Visual Navigation with Graph Localization Networks

491 - Kevin Chen , Juan Pablo de Vicente , Gabriel Sepulveda 2019

Inspired by research in psychology, we introduce a behavioral approach for visual navigation using topological maps. Our goal is to enable a robot to navigate from one location to another, relying only on its visual input and the topological map of t he environment. We propose using graph neural networks for localizing the agent in the map, and decompose the action space into primitive behaviors implemented as convolutional or recurrent neural networks. Using the Gibson simulator, we verify that our approach outperforms relevant baselines and is able to navigate in both seen and unseen environments.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي