ﻻ يوجد ملخص باللغة العربية
In this paper, we propose a Bidirectional Attention Network (BANet), an end-to-end framework for monocular depth estimation (MDE) that addresses the limitation of effectively integrating local and global information in convolutional neural networks. The structure of this mechanism derives from a strong conceptual foundation of neural machine translation, and presents a light-weight mechanism for adaptive control of computation similar to the dynamic nature of recurrent neural networks. We introduce bidirectional attention modules that utilize the feed-forward feature maps and incorporate the global context to filter out ambiguity. Extensive experiments reveal the high degree of capability of this bidirectional attention model over feed-forward baselines and other state-of-the-art methods for monocular depth estimation on two challenging datasets -- KITTI and DIODE. We show that our proposed approach either outperforms or performs at least on a par with the state-of-the-art monocular depth estimation methods with less memory and computational complexity.
Monocular depth estimation is an especially important task in robotics and autonomous driving, where 3D structural information is essential. However, extreme lighting conditions and complex surface objects make it difficult to predict depth in a sing
Monocular depth estimation is an essential task for scene understanding. The underlying structure of objects and stuff in a complex scene is critical to recovering accurate and visually-pleasing depth maps. Global structure conveys scene layouts, whi
For a robot deployed in the world, it is desirable to have the ability of autonomous learning to improve its initial pre-set knowledge. We formalize this as a bootstrapped self-supervised learning problem where a system is initially bootstrapped with
In this work, we present FFB6D, a Full Flow Bidirectional fusion network designed for 6D pose estimation from a single RGBD image. Our key insight is that appearance information in the RGB image and geometry information from the depth image are two c
We present a novel method to train machine learning algorithms to estimate scene depths from a single image, by using the information provided by a cameras aperture as supervision. Prior works use a depth sensors outputs or images of the same scene f