UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models

81 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Senthil Yogamani

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Varun Ravi Kumar - Senthil Yogamani - Markus Bach

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In classical computer vision, rectification is an integral part of multi-view depth estimation. It typically includes epipolar rectification and lens distortion correction. This process simplifies the depth estimation significantly, and thus it has been adopted in CNN approaches. However, rectification has several side effects, including a reduced field of view (FOV), resampling distortion, and sensitivity to calibration errors. The effects are particularly pronounced in case of significant distortion (e.g., wide-angle fisheye cameras). In this paper, we propose a generic scale-aware self-supervised pipeline for estimating depth, euclidean distance, and visual odometry from unrectified monocular videos. We demonstrate a similar level of precision on the unrectified KITTI dataset with barrel distortion comparable to the rectified KITTI dataset. The intuition being that the rectification step can be implicitly absorbed within the CNN model, which learns the distortion model without increasing complexity. Our approach does not suffer from a reduced field of view and avoids computational costs for rectification at inference time. To further illustrate the general applicability of the proposed framework, we apply it to wide-angle fisheye cameras with 190$^circ$ horizontal field of view. The training framework UnRectDepthNet takes in the camera distortion model as an argument and adapts projection and unprojection functions accordingly. The proposed algorithm is evaluated further on the KITTI rectified dataset, and we achieve state-of-the-art results that improve upon our previous work FisheyeDistanceNet. Qualitative results on a distorted test scene video sequence indicate excellent performance https://youtu.be/K6pbx3bU4Ss.

قيم البحث

249 - Varun Ravi Kumar , Sandesh Athni Hiremath , Stefan Milz 2019

Fisheye cameras are commonly used in applications like autonomous driving and surveillance to provide a large field of view ($>180^{circ}$). However, they come at the cost of strong non-linear distortions which require more complex algorithms. In thi s paper, we explore Euclidean distance estimation on fisheye cameras for automotive scenes. Obtaining accurate and dense depth supervision is difficult in practice, but self-supervised learning approaches show promising results and could potentially overcome the problem. We present a novel self-supervised scale-aware framework for learning Euclidean distance and ego-motion from raw monocular fisheye videos without applying rectification. While it is possible to perform piece-wise linear approximation of fisheye projection surface and apply standard rectilinear models, it has its own set of issues like re-sampling distortion and discontinuities in transition regions. To encourage further research in this area, we will release our dataset as part of the WoodScape project cite{yogamani2019woodscape}. We further evaluated the proposed algorithm on the KITTI dataset and obtained state-of-the-art results comparable to other self-supervised monocular methods. Qualitative results on an unseen fisheye video demonstrate impressive performance https://youtu.be/Sgq1WzoOmXg.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

R4Dyn: Exploring Radar for Self-Supervised Monocular Depth Estimation of Dynamic Scenes

168 - Stefano Gasperini , Patrick Koch , Vinzenz Dallabetta 2021

While self-supervised monocular depth estimation in driving scenarios has achieved comparable performance to supervised approaches, violations of the static world assumption can still lead to erroneous depth predictions of traffic participants, posin g a potential safety issue. In this paper, we present R4Dyn, a novel set of techniques to use cost-efficient radar data on top of a self-supervised depth estimation framework. In particular, we show how radar can be used during training as weak supervision signal, as well as an extra input to enhance the estimation robustness at inference time. Since automotive radars are readily available, this allows to collect training data from a variety of existing vehicles. Moreover, by filtering and expanding the signal to make it compatible with learning-based approaches, we address radar inherent issues, such as noise and sparsity. With R4Dyn we are able to overcome a major limitation of self-supervised depth estimation, i.e. the prediction of traffic participants. We substantially improve the estimation on dynamic objects, such as cars by 37% on the challenging nuScenes dataset, hence demonstrating that radar is a valuable additional sensor for monocular depth estimation in autonomous vehicles. Additionally, we plan on making the code publicly available.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

Monocular Depth Estimation with Self-supervised Instance Adaptation

91 - Robert McCraith , Lukas Neumann , Andrew Zisserman 2020

Recent advances in self-supervised learning havedemonstrated that it is possible to learn accurate monoculardepth reconstruction from raw video data, without using any 3Dground truth for supervision. However, in robotics applications,multiple views o f a scene may or may not be available, depend-ing on the actions of the robot, switching between monocularand multi-view reconstruction. To address this mixed setting,we proposed a new approach that extends any off-the-shelfself-supervised monocular depth reconstruction system to usemore than one image at test time. Our method builds on astandard prior learned to perform monocular reconstruction,but uses self-supervision at test time to further improve thereconstruction accuracy when multiple images are available.When used to update the correct components of the model, thisapproach is highly-effective. On the standard KITTI bench-mark, our self-supervised method consistently outperformsall the previous methods with an average 25% reduction inabsolute error for the three common setups (monocular, stereoand monocular+stereo), and comes very close in accuracy whencompared to the fully-supervised state-of-the-art methods.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Calibrating Self-supervised Monocular Depth Estimation

139 - Robert McCraith , Lukas Neumann , Andrea Vedaldi 2020

In the recent years, many methods demonstrated the ability of neural networks tolearn depth and pose changes in a sequence of images, using only self-supervision as thetraining signal. Whilst the networks achieve good performance, the often over-look eddetail is that due to the inherent ambiguity of monocular vision they predict depth up to aunknown scaling factor. The scaling factor is then typically obtained from the LiDARground truth at test time, which severely limits practical applications of these methods.In this paper, we show that incorporating prior information about the camera configu-ration and the environment, we can remove the scale ambiguity and predict depth directly,still using the self-supervised formulation and not relying on any additional sensors.

الرؤية الحاسوبية وتمييز الأنماط

RealMonoDepth: Self-Supervised Monocular Depth Estimation for General Scenes

168 - Mertalp Ocal , Armin Mustafa 2020

We present a generalised self-supervised learning approach for monocular estimation of the real depth across scenes with diverse depth ranges from 1--100s of meters. Existing supervised methods for monocular depth estimation require accurate depth me asurements for training. This limitation has led to the introduction of self-supervised methods that are trained on stereo image pairs with a fixed camera baseline to estimate disparity which is transformed to depth given known calibration. Self-supervised approaches have demonstrated impressive results but do not generalise to scenes with different depth ranges or camera baselines. In this paper, we introduce RealMonoDepth a self-supervised monocular depth estimation approach which learns to estimate the real scene depth for a diverse range of indoor and outdoor scenes. A novel loss function with respect to the true scene depth based on relative depth scaling and warping is proposed. This allows self-supervised training of a single network with multiple data sets for scenes with diverse depth ranges from both stereo pair and in the wild moving camera data sets. A comprehensive performance evaluation across five benchmark data sets demonstrates that RealMonoDepth provides a single trained network which generalises depth estimation across indoor and outdoor scenes, consistently outperforming previous self-supervised approaches.

الرؤية الحاسوبية وتمييز الأنماط