ﻻ يوجد ملخص باللغة العربية
Depth completion deals with the problem of recovering dense depth maps from sparse ones, where color images are often used to facilitate this completion. Recent approaches mainly focus on image guided learning to predict dense results. However, blurry image guidance and object structures in depth still impede the performance of image guided frameworks. To tackle these problems, we explore a repetitive design in our image guided network to sufficiently and gradually recover depth values. Specifically, the repetition is embodied in a color image guidance branch and a depth generation branch. In the former branch, we design a repetitive hourglass network to extract higher-level image features of complex environments, which can provide powerful context guidance for depth prediction. In the latter branch, we design a repetitive guidance module based on dynamic convolution where the convolution factorization is applied to simultaneously reduce its complexity and progressively model high-frequency structures, e.g., boundaries. Further, in this module, we propose an adaptive fusion mechanism to effectively aggregate multi-step depth features. Extensive experiments show that our method achieves state-of-the-art result on the NYUv2 dataset and ranks 1st on the KITTI benchmark at the time of submission.
Depth completion aims at inferring a dense depth image from sparse depth measurement since glossy, transparent or distant surface cannot be scanned properly by the sensor. Most of existing methods directly interpolate the missing depth measurements b
Image guided depth completion is the task of generating a dense depth map from a sparse depth map and a high quality image. In this task, how to fuse the color and depth modalities plays an important role in achieving good performance. This paper pro
This paper focuses on semantic scene completion, a task for producing a complete 3D voxel representation of volumetric occupancy and semantic labels for a scene from a single-view depth map observation. Previous work has considered scene completion a
We propose a method to reconstruct, complete and semantically label a 3D scene from a single input depth image. We improve the accuracy of the regressed semantic 3D maps by a novel architecture based on adversarial learning. In particular, we suggest
While radar and video data can be readily fused at the detection level, fusing them at the pixel level is potentially more beneficial. This is also more challenging in part due to the sparsity of radar, but also because automotive radar beams are muc