CMRNet++: Map and Camera Agnostic Monocular Visual Localization in LiDAR Maps

203 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Daniele Cattaneo

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Daniele Cattaneo - Domenico Giorgio Sorrenti - Abhinav Valada

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Localization is a critically essential and crucial enabler of autonomous robots. While deep learning has made significant strides in many computer vision tasks, it is still yet to make a sizeable impact on improving capabilities of metric visual localization. One of the major hindrances has been the inability of existing Convolutional Neural Network (CNN)-based pose regression methods to generalize to previously unseen places. Our recently introduced CMRNet effectively addresses this limitation by enabling map independent monocular localization in LiDAR-maps. In this paper, we now take it a step further by introducing CMRNet++, which is a significantly more robust model that not only generalizes to new places effectively, but is also independent of the camera parameters. We enable this capability by combining deep learning with geometric techniques, and by moving the metric reasoning outside the learning process. In this way, the weights of the network are not tied to a specific camera. Extensive evaluations of CMRNet++ on three challenging autonomous driving datasets, i.e., KITTI, Argoverse, and Lyft5, show that CMRNet++ outperforms CMRNet as well as other baselines by a large margin. More importantly, for the first-time, we demonstrate the ability of a deep learning approach to accurately localize without any retraining or fine-tuning in a completely new environment and independent of the camera parameters.

قيم البحث

176 - Peter Karkus , Anelia Angelova , Vincent Vanhoucke 2020

Mapping and localization, preferably from a small number of observations, are fundamental tasks in robotics. We address these tasks by combining spatial structure (differentiable mapping) and end-to-end learning in a novel neural network architecture : the Differentiable Mapping Network (DMN). The DMN constructs a spatially structured view-embedding map and uses it for subsequent visual localization with a particle filter. Since the DMN architecture is end-to-end differentiable, we can jointly learn the map representation and localization using gradient descent. We apply the DMN to sparse visual localization, where a robot needs to localize in a new environment with respect to a small number of images from known viewpoints. We evaluate the DMN using simulated environments and a challenging real-world Street View dataset. We find that the DMN learns effective map representations for visual localization. The benefit of spatial structure increases with larger environments, more viewpoints for mapping, and when training data is scarce. Project website: http://sites.google.com/view/differentiable-mapping

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

A Robust Stereo Camera Localization Method with Prior LiDAR Map Constrains

106 - Dong Han , Zuhao Zou , Lujia Wang 2019

In complex environments, low-cost and robust localization is a challenging problem. For example, in a GPSdenied environment, LiDAR can provide accurate position information, but the cost is high. In general, visual SLAM based localization methods bec ome unreliable when the sunlight changes greatly. Therefore, inexpensive and reliable methods are required. In this paper, we propose a stereo visual localization method based on the prior LiDAR map. Different from the conventional visual localization system, we design a novel visual optimization model by matching planar information between the LiDAR map and visual image. Bundle adjustment is built by using coplanarity constraints. To solve the optimization problem, we use a graph-based optimization algorithm and a local window optimization method. Finally, we estimate a full six degrees of freedom (DOF) pose without scale drift. To validate the efficiency, the proposed method has been tested on the KITTI dataset. The results show that our method is more robust and accurate than the state-of-art ORB-SLAM2.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

Deep Online Correction for Monocular Visual Odometry

108 - Jiaxin Zhang , Wei Sui , Xinggang Wang 2021

In this work, we propose a novel deep online correction (DOC) framework for monocular visual odometry. The whole pipeline has two stages: First, depth maps and initial poses are obtained from convolutional neural networks (CNNs) trained in self-super vised manners. Second, the poses predicted by CNNs are further improved by minimizing photometric errors via gradient updates of poses during inference phases. The benefits of our proposed method are twofold: 1) Different from online-learning methods, DOC does not need to calculate gradient propagation for parameters of CNNs. Thus, it saves more computation resources during inference phases. 2) Unlike hybrid methods that combine CNNs with traditional methods, DOC fully relies on deep learning (DL) frameworks. Though without complex back-end optimization modules, our method achieves outstanding performance with relative transform error (RTE) = 2.0% on KITTI Odometry benchmark for Seq. 09, which outperforms traditional monocular VO frameworks and is comparable to hybrid methods.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

Depth Completion via Inductive Fusion of Planar LIDAR and Monocular Camera

86 - Chen Fu , Chiyu Dong , Christoph Mertz 2020

Modern high-definition LIDAR is expensive for commercial autonomous driving vehicles and small indoor robots. An affordable solution to this problem is fusion of planar LIDAR with RGB images to provide a similar level of perception capability. Even t hough state-of-the-art methods provide approaches to predict depth information from limited sensor input, they are usually a simple concatenation of sparse LIDAR features and dense RGB features through an end-to-end fusion architecture. In this paper, we introduce an inductive late-fusion block which better fuses different sensor modalities inspired by a probability model. The proposed demonstration and aggregation network propagates the mixed context and depth features to the prediction network and serves as a prior knowledge of the depth completion. This late-fusion block uses the dense context features to guide the depth prediction based on demonstrations by sparse depth features. In addition to evaluating the proposed method on benchmark depth completion datasets including NYUDepthV2 and KITTI, we also test the proposed method on a simulated planar LIDAR dataset. Our method shows promising results compared to previous approaches on both the benchmark datasets and simulated dataset with various 3D densities.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات معالجة الصور والفيديو

Learning a Domain-Agnostic Visual Representation for Autonomous Driving via Contrastive Loss

92 - Dongseok Shim , H. Jin Kim 2021

Deep neural networks have been widely studied in autonomous driving applications such as semantic segmentation or depth estimation. However, training a neural network in a supervised manner requires a large amount of annotated labels which are expens ive and time-consuming to collect. Recent studies leverage synthetic data collected from a virtual environment which are much easier to acquire and more accurate compared to data from the real world, but they usually suffer from poor generalization due to the inherent domain shift problem. In this paper, we propose a Domain-Agnostic Contrastive Learning (DACL) which is a two-stage unsupervised domain adaptation framework with cyclic adversarial training and contrastive loss. DACL leads the neural network to learn domain-agnostic representation to overcome performance degradation when there exists a difference between training and test data distribution. Our proposed approach achieves better performance in the monocular depth estimation task compared to previous state-of-the-art methods and also shows effectiveness in the semantic segmentation task.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات