Incorporating Learnt Local and Global Embeddings into Monocular Visual SLAM

115 0 0.0 ( 0 )

Download Cite

Added by Huaiyang Huang

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Huaiyang Huang - Haoyang Ye - Yuxiang Sun

Robotics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Traditional approaches for Visual Simultaneous Localization and Mapping (VSLAM) rely on low-level vision information for state estimation, such as handcrafted local features or the image gradient. While significant progress has been made through this track, under more challenging configuration for monocular VSLAM, e.g., varying illumination, the performance of state-of-the-art systems generally degrades. As a consequence, robustness and accuracy for monocular VSLAM are still widely concerned. This paper presents a monocular VSLAM system that fully exploits learnt features for better state estimation. The proposed system leverages both learnt local features and global embeddings at different modules of the system: direct camera pose estimation, inter-frame feature association, and loop closure detection. With a probabilistic explanation of keypoint prediction, we formulate the camera pose tracking in a direct manner and parameterize local features with uncertainty taken into account. To alleviate the quantization effect, we adapt the mapping module to generate 3D landmarks better to guarantee the systems robustness. Detecting temporal loop closure via deep global embeddings further improves the robustness and accuracy of the proposed system. The proposed system is extensively evaluated on public datasets (Tsukuba, EuRoC, and KITTI), and compared against the state-of-the-art methods. The competitive performance of camera pose estimation confirms the effectiveness of our method.

rate research

Integrating Objects into Monocular SLAM: Line Based Category Specific Models

79 - Nayan Joshi , Yogesh Sharma , Parv Parkhiya 2019

We propose a novel Line based parameterization for category specific CAD models. The proposed parameterization associates 3D category-specific CAD model and object under consideration using a dictionary based RANSAC method that uses object Viewpoints as prior and edges detected in the respective intensity image of the scene. The association problem is posed as a classical Geometry problem rather than being dataset driven, thus saving the time and labour that one invests in annotating dataset to train Keypoint Network for different category objects. Besides eliminating the need of dataset preparation, the approach also speeds up the entire process as this method processes the image only once for all objects, thus eliminating the need of invoking the network for every object in an image across all images. A 3D-2D edge association module followed by a resection algorithm for lines is used to recover object poses. The formulation optimizes for shape and pose of the object, thus aiding in recovering object 3D structure more accurately. Finally, a Factor Graph formulation is used to combine object poses with camera odometry to formulate a SLAM problem.

Robotics

Structure-SLAM: Low-Drift Monocular SLAM in Indoor Environments

351 - Yanyan Li , Nikolas Brasch , Yida Wang 2020

In this paper a low-drift monocular SLAM method is proposed targeting indoor scenarios, where monocular SLAM often fails due to the lack of textured surfaces. Our approach decouples rotation and translation estimation of the tracking process to reduce the long-term drift in indoor environments. In order to take full advantage of the available geometric information in the scene, surface normals are predicted by a convolutional neural network from each input RGB image in real-time. First, a drift-free rotation is estimated based on lines and surface normals using spherical mean-shift clustering, leveraging the weak Manhattan World assumption. Then translation is computed from point and line features. Finally, the estimated poses are refined with a map-to-frame optimization strategy. The proposed method outperforms the state of the art on common SLAM benchmarks such as ICL-NUIM and TUM RGB-D.

Robotics

Attention-SLAM: A Visual Monocular SLAM Learning from Human Gaze

118 - Jinquan Li , Ling Pei , Danping Zou 2020

This paper proposes a novel simultaneous localization and mapping (SLAM) approach, namely Attention-SLAM, which simulates human navigation mode by combining a visual saliency model (SalNavNet) with traditional monocular visual SLAM. Most SLAM methods treat all the features extracted from the images as equal importance during the optimization process. However, the salient feature points in scenes have more significant influence during the human navigation process. Therefore, we first propose a visual saliency model called SalVavNet in which we introduce a correlation module and propose an adaptive Exponential Moving Average (EMA) module. These modules mitigate the center bias to enable the saliency maps generated by SalNavNet to pay more attention to the same salient object. Moreover, the saliency maps simulate the human behavior for the refinement of SLAM results. The feature points extracted from the salient regions have greater importance in optimization process. We add semantic saliency information to the Euroc dataset to generate an open-source saliency SLAM dataset. Comprehensive test results prove that Attention-SLAM outperforms benchmarks such as Direct Sparse Odometry (DSO), ORB-SLAM, and Salient DSO in terms of efficiency, accuracy, and robustness in most test cases.

Computer Vision and Pattern Recognition Robotics

RT-SLAM: A Generic and Real-Time Visual SLAM Implementation

421 - Cyril Roussillon 2012

This article presents a new open-source C++ implementation to solve the SLAM problem, which is focused on genericity, versatility and high execution speed. It is based on an original object oriented architecture, that allows the combination of numerous sensors and landmark types, and the integration of various approaches proposed in the literature. The system capacities are illustrated by the presentation of an inertial/vision SLAM approach, for which several improvements over existing methods have been introduced, and that copes with very high dynamic motions. Results with a hand-held camera are presented.

Robotics

VIR-SLAM: Visual, Inertial, and Ranging SLAM for single and multi-robot systems

120 - Yanjun Cao , Giovanni Beltrame 2020

Monocular cameras coupled with inertial measurements generally give high performance visual inertial odometry. However, drift can be significant with long trajectories, especially when the environment is visually challenging. In this paper, we propose a system that leverages ultra-wideband ranging with one static anchor placed in the environment to correct the accumulated error whenever the anchor is visible. We also use this setup for collaborative SLAM: different robots use mutual ranging (when available) and the common anchor to estimate the transformation between each other, facilitating map fusion Our system consists of two modules: a double layer ranging, visual, and inertial odometry for single robots, and a transformation estimation module for collaborative SLAM. We test our system on public datasets by simulating an ultra-wideband sensor as well as on real robots. Experiments show our method can outperform state-of-the-art visual-inertial odometry by more than 20%. For visually challenging environments, our method works even the visual-inertial odometry has significant drift Furthermore, we can compute the collaborative SLAM transformation matrix at almost no extra computation cost.

Robotics