Do you want to publish a course? Click here

Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization

83   0   0.0 ( 0 )
 Added by Lili Meng
 Publication date 2017
and research's language is English




Ask ChatGPT about the research

Camera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure and loop closure detection. Recent random forests based methods exploit randomly sampled pixel comparison features to predict 3D world locations for 2D image locations to guide the camera pose optimization. However, these image features are only sampled randomly in the images, without considering the spatial structures or geometric information, leading to large errors or failure cases with the existence of poorly textured areas or in motion blur. Line segment features are more robust in these environments. In this work, we propose to jointly exploit points and lines within the framework of uncertainty driven regression forests. The proposed approach is thoroughly evaluated on three publicly available datasets against several strong state-of-the-art baselines in terms of several different error metrics. Experimental results prove the efficacy of our method, showing superior or on-par state-of-the-art performance.



rate research

Read More

Camera relocalization plays a vital role in many robotics and computer vision tasks, such as global localization, recovery from tracking failure, and loop closure detection. Recent random forests based methods directly predict 3D world locations for 2D image locations to guide the camera pose optimization. During training, each tree greedily splits the samples to minimize the spatial variance. However, these greedy splits often produce uneven sub-trees in training or incorrect 2D-3D correspondences in testing. To address these problems, we propose a sample-balanced objective to encourage equal numbers of samples in the left and right sub-trees, and a novel backtracking scheme to remedy the incorrect 2D-3D correspondence predictions. Furthermore, we extend the regression forests based methods to use local features in both training and testing stages for outdoor RGB-only applications. Experimental results on publicly available indoor and outdoor datasets demonstrate the efficacy of our approach, which shows superior or on-par accuracy with several state-of-the-art methods.
Many applications require a camera to be relocalised online, without expensive offline training on the target scene. Whilst both keyframe and sparse keypoint matching methods can be used online, the former often fail away from the training trajectory, and the latter can struggle in textureless regions. By contrast, scene coordinate regression (SCoRe) methods generalise to novel poses and can leverage dense correspondences to improve robustness, and recent work has shown how to adapt SCoRe forests between scenes, allowing their state-of-the-art performance to be leveraged online. However, because they use features hand-crafted for indoor use, they do not generalise well to harder outdoor scenes. Whilst replacing the forest with a neural network and learning suitable features for outdoor use is possible, the techniques used to adapt forests between scenes are unfortunately harder to transfer to a network context. In this paper, we address this by proposing a novel way of leveraging a network trained on one scene to predict points in another scene. Our approach replaces the appearance clustering performed by the branching structure of a regression forest with a two-step process that first uses the network to predict points in the original scene, and then uses these predicted points to look up clusters of points from the new scene. We show experimentally that our online approach achieves state-of-the-art performance on both the 7-Scenes and Cambridge Landmarks datasets, whilst running in under 300ms, making it highly effective in live scenarios.
Camera relocalisation is an important problem in computer vision, with applications in simultaneous localisation and mapping, virtual/augmented reality and navigation. Common techniques either match the current image against keyframes with known poses coming from a tracker, or establish 2D-to-3D correspondences between keypoints in the current image and points in the scene in order to estimate the camera pose. Recently, regression forests have become a popular alternative to establish such correspondences. They achieve accurate results, but must be trained offline on the target scene, preventing relocalisation in new environments. In this paper, we show how to circumvent this limitation by adapting a pre-trained forest to a new scene on the fly. Our adapted forests achieve relocalisation performance that is on par with that of offline forests, and our approach runs in under 150ms, making it desirable for real-time systems that require online relocalisation.
We propose a teleoperation system that uses a single RGB-D camera as the human motion capture device. Our system can perform general manipulation tasks such as cloth folding, hammering and 3mm clearance peg in hole. We propose the use of non-Cartesian oblique coordinate frame, dynamic motion scaling and reposition of operator frames to increase the flexibility of our teleoperation system. We hypothesize that lowering the barrier of entry to teleoperation will allow for wider deployment of supervised autonomy system, which will in turn generates realistic datasets that unlock the potential of machine learning for robotic manipulation.
105 - Chao Liu , Jinwei Gu , Kihwan Kim 2019
Depth sensing is crucial for 3D reconstruction and scene understanding. Active depth sensors provide dense metric measurements, but often suffer from limitations such as restricted operating ranges, low spatial resolution, sensor interference, and high power consumption. In this paper, we propose a deep learning (DL) method to estimate per-pixel depth and its uncertainty continuously from a monocular video stream, with the goal of effectively turning an RGB camera into an RGB-D camera. Unlike prior DL-based methods, we estimate a depth probability distribution for each pixel rather than a single depth value, leading to an estimate of a 3D depth probability volume for each input frame. These depth probability volumes are accumulated over time under a Bayesian filtering framework as more incoming frames are processed sequentially, which effectively reduces depth uncertainty and improves accuracy, robustness, and temporal stability. Compared to prior work, the proposed approach achieves more accurate and stable results, and generalizes better to new datasets. Experimental results also show the output of our approach can be directly fed into classical RGB-D based 3D scanning methods for 3D scene reconstruction.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا