ﻻ يوجد ملخص باللغة العربية
We present a novel algorithm for self-supervised monocular depth completion. Our approach is based on training a neural network that requires only sparse depth measurements and corresponding monocular video sequences without dense depth labels. Our self-supervised algorithm is designed for challenging indoor environments with textureless regions, glossy and transparent surface, non-Lambertian surfaces, moving people, longer and diverse depth ranges and scenes captured by complex ego-motions. Our novel architecture leverages both deep stacks of sparse convolution blocks to extract sparse depth features and pixel-adaptive convolutions to fuse image and depth features. We compare with existing approaches in NYUv2, KITTI, and NAVERLABS indoor datasets, and observe 5-34 % improvements in root-means-square error (RMSE) reduction.
In the recent years, many methods demonstrated the ability of neural networks tolearn depth and pose changes in a sequence of images, using only self-supervision as thetraining signal. Whilst the networks achieve good performance, the often over-look
For a robot deployed in the world, it is desirable to have the ability of autonomous learning to improve its initial pre-set knowledge. We formalize this as a bootstrapped self-supervised learning problem where a system is initially bootstrapped with
Modern high-definition LIDAR is expensive for commercial autonomous driving vehicles and small indoor robots. An affordable solution to this problem is fusion of planar LIDAR with RGB images to provide a similar level of perception capability. Even t
Depth estimation, as a necessary clue to convert 2D images into the 3D space, has been applied in many machine vision areas. However, to achieve an entire surrounding 360-degree geometric sensing, traditional stereo matching algorithms for depth esti
Previous methods on estimating detailed human depth often require supervised training with `ground truth depth data. This paper presents a self-supervised method that can be trained on YouTube videos without known depth, which makes training data col