Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization

83 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Lili Meng

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Lili Meng - Frederick Tung - James J. Little

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Camera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure and loop closure detection. Recent random forests based methods exploit randomly sampled pixel comparison features to predict 3D world locations for 2D image locations to guide the camera pose optimization. However, these image features are only sampled randomly in the images, without considering the spatial structures or geometric information, leading to large errors or failure cases with the existence of poorly textured areas or in motion blur. Line segment features are more robust in these environments. In this work, we propose to jointly exploit points and lines within the framework of uncertainty driven regression forests. The proposed approach is thoroughly evaluated on three publicly available datasets against several strong state-of-the-art baselines in terms of several different error metrics. Experimental results prove the efficacy of our method, showing superior or on-par state-of-the-art performance.

قيم البحث

100 - Lili Meng , Jianhui Chen , Frederick Tung 2017

Camera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure, and loop closure detection. Recent random forests based methods directly predict 3D world locations for 2D image locations to guide the camera pose optimization. During training, each tree greedily splits the samples to minimize the spatial variance. However, these greedy splits often produce uneven sub-trees in training or incorrect 2D-3D correspondences in testing. To address these problems, we propose a sample-balanced objective to encourage equal numbers of samples in the left and right sub-trees, and a novel backtracking scheme to remedy the incorrect 2D-3D correspondence predictions. Furthermore, we extend the regression forests based methods to use local features in both training and testing stages for outdoor RGB-only applications. Experimental results on publicly available indoor and outdoor datasets demonstrate the efficacy of our approach, which shows superior or on-par accuracy with several state-of-the-art methods.

الرؤية الحاسوبية وتمييز الأنماط

Lets Take This Online: Adapting Scene Coordinate Regression Network Predictions for Online RGB-D Camera Relocalisation

145 - Tommaso Cavallari , Luca Bertinetto , Jishnu Mukhoti 2019

Many applications require a camera to be relocalised online, without expensive offline training on the target scene. Whilst both keyframe and sparse keypoint matching methods can be used online, the former often fail away from the training trajectory , and the latter can struggle in textureless regions. By contrast, scene coordinate regression (SCoRe) methods generalise to novel poses and can leverage dense correspondences to improve robustness, and recent work has shown how to adapt SCoRe forests between scenes, allowing their state-of-the-art performance to be leveraged online. However, because they use features hand-crafted for indoor use, they do not generalise well to harder outdoor scenes. Whilst replacing the forest with a neural network and learning suitable features for outdoor use is possible, the techniques used to adapt forests between scenes are unfortunately harder to transfer to a network context. In this paper, we address this by proposing a novel way of leveraging a network trained on one scene to predict points in another scene. Our approach replaces the appearance clustering performed by the branching structure of a regression forest with a two-step process that first uses the network to predict points in the original scene, and then uses these predicted points to look up clusters of points from the new scene. We show experimentally that our online approach achieves state-of-the-art performance on both the 7-Scenes and Cambridge Landmarks datasets, whilst running in under 300ms, making it highly effective in live scenarios.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation

158 - Tommaso Cavallari , Stuart Golodetz , Nicholas A. Lord 2017

Camera relocalisation is an important problem in computer vision, with applications in simultaneous localisation and mapping, virtual/augmented reality and navigation. Common techniques either match the current image against keyframes with known pose s coming from a tracker, or establish 2D-to-3D correspondences between keypoints in the current image and points in the scene in order to estimate the camera pose. Recently, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but must be trained offline on the target scene, preventing relocalisation in new environments. In this paper, we show how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. Our adapted forests achieve relocalisation performance that is on par with that of offline forests, and our approach runs in under 150ms, making it desirable for real-time systems that require online relocalisation.

الرؤية الحاسوبية وتمييز الأنماط

Single RGB-D Camera Teleoperation for General Robotic Manipulation

132 - Quan Vuong , Yuzhe Qin , Runlin Guo 2021

We propose a teleoperation system that uses a single RGB-D camera as the human motion capture device. Our system can perform general manipulation tasks such as cloth folding, hammering and 3mm clearance peg in hole. We propose the use of non-Cartesia n oblique coordinate frame, dynamic motion scaling and reposition of operator frames to increase the flexibility of our teleoperation system. We hypothesize that lowering the barrier of entry to teleoperation will allow for wider deployment of supervised autonomy system, which will in turn generates realistic datasets that unlock the potential of machine learning for robotic manipulation.

علم الروبوتات الذكاء الاصطناعي

Neural RGB->D Sensing: Depth and Uncertainty from a Video Camera

105 - Chao Liu , Jinwei Gu , Kihwan Kim 2019

Depth sensing is crucial for 3D reconstruction and scene understanding. Active depth sensors provide dense metric measurements, but often suffer from limitations such as restricted operating ranges, low spatial resolution, sensor interference, and hi gh power consumption. In this paper, we propose a deep learning (DL) method to estimate per-pixel depth and its uncertainty continuously from a monocular video stream, with the goal of effectively turning an RGB camera into an RGB-D camera. Unlike prior DL-based methods, we estimate a depth probability distribution for each pixel rather than a single depth value, leading to an estimate of a 3D depth probability volume for each input frame. These depth probability volumes are accumulated over time under a Bayesian filtering framework as more incoming frames are processed sequentially, which effectively reduces depth uncertainty and improves accuracy, robustness, and temporal stability. Compared to prior work, the proposed approach achieves more accurate and stable results, and generalizes better to new datasets. Experimental results also show the output of our approach can be directly fed into classical RGB-D based 3D scanning methods for 3D scene reconstruction.

الرؤية الحاسوبية وتمييز الأنماط