No Arabic abstract
We present a multi-camera visual-inertial odometry system based on factor graph optimization which estimates motion by using all cameras simultaneously while retaining a fixed overall feature budget. We focus on motion tracking in challenging environments such as in narrow corridors and dark spaces with aggressive motions and abrupt lighting changes. These scenarios cause traditional monocular or stereo odometry to fail. While tracking motion across extra cameras should theoretically prevent failures, it causes additional complexity and computational burden. To overcome these challenges, we introduce two novel methods to improve multi-camera feature tracking. First, instead of tracking features separately in each camera, we track features continuously as they move from one camera to another. This increases accuracy and achieves a more compact factor graph representation. Second, we select a fixed budget of tracked features which are spread across the cameras to ensure that the limited computational budget is never exceeded. We have found that using a smaller set of informative features can maintain the same tracking accuracy while reducing back-end optimization time. Our proposed method was extensively tested using a hardware-synchronized device containing an IMU and four cameras (a front stereo pair and two lateral) in scenarios including an underground mine, large open spaces, and building interiors with narrow stairs and corridors. Compared to stereo-only state-of-the-art VIO methods, our approach reduces the drift rate (RPE) by up to 80% in translation and 39% in rotation.
We present an efficient multi-sensor odometry system for mobile platforms that jointly optimizes visual, lidar, and inertial information within a single integrated factor graph. This runs in real-time at full framerate using fixed lag smoothing. To perform such tight integration, a new method to extract 3D line and planar primitives from lidar point clouds is presented. This approach overcomes the suboptimality of typical frame-to-frame tracking methods by treating the primitives as landmarks and tracking them over multiple scans. True integration of lidar features with standard visual features and IMU is made possible using a subtle passive synchronization of lidar and camera frames. The lightweight formulation of the 3D features allows for real-time execution on a single CPU. Our proposed system has been tested on a variety of platforms and scenarios, including underground exploration with a legged robot and outdoor scanning with a dynamically moving handheld device, for a total duration of 96 min and 2.4 km traveled distance. In these test sequences, using only one exteroceptive sensor leads to failure due to either underconstrained geometry (affecting lidar) or textureless areas caused by aggressive lighting changes (affecting vision). In these conditions, our factor graph naturally uses the best information available from each sensor modality without any hard switches.
We present VILENS (Visual Inertial Lidar Legged Navigation System), an odometry system for legged robots based on factor graphs. The key novelty is the tight fusion of four different sensor modalities to achieve reliable operation when the individual sensors would otherwise produce degenerate estimation. To minimize leg odometry drift, we extend the robots state with a linear velocity bias term which is estimated online. This bias is only observable because of the tight fusion of this preintegrated velocity factor with vision, lidar, and IMU factors. Extensive experimental validation on the ANYmal quadruped robots is presented, for a total duration of 2 h and 1.8 km traveled. The experiments involved dynamic locomotion over loose rocks, slopes, and mud; these included perceptual challenges, such as dark and dusty underground caverns or open, feature-deprived areas, as well as mobility challenges such as slipping and terrain deformation. We show an average improvement of 62% translational and 51% rotational errors compared to a state-of-the-art loosely coupled approach. To demonstrate its robustness, VILENS was also integrated with a perceptive controller and a local path planner.
Advances in micro-electro-mechanical (MEMS) techniques enable inertial measurements units (IMUs) to be small, cheap, energy efficient, and widely used in smartphones, robots, and drones. Exploiting inertial data for accurate and reliable navigation and localization has attracted significant research and industrial interest, as IMU measurements are completely ego-centric and generally environment agnostic. Recent studies have shown that the notorious issue of drift can be significantly alleviated by using deep neural networks (DNNs), e.g. IONet. However, the lack of sufficient labelled data for training and testing various architectures limits the proliferation of adopting DNNs in IMU-based tasks. In this paper, we propose and release the Oxford Inertial Odometry Dataset (OxIOD), a first-of-its-kind data collection for inertial-odometry research, with all sequences having ground-truth labels. Our dataset contains 158 sequences totalling more than 42 km in total distance, much larger than previous inertial datasets. Another notable feature of this dataset lies in its diversity, which can reflect the complex motions of phone-based IMUs in various everyday usage. The measurements were collected with four different attachments (handheld, in the pocket, in the handbag and on the trolley), four motion modes (halting, walking slowly, walking normally, and running), five different users, four types of off-the-shelf consumer phones, and large-scale localization from office buildings. Deep inertial tracking experiments were conducted to show the effectiveness of our dataset in training deep neural network models and evaluate learning-based and model-based algorithms. The OxIOD Dataset is available at: http://deepio.cs.ox.ac.uk
This work proposes a novel SLAM framework for stereo and visual inertial odometry estimation. It builds an efficient and robust parametrization of co-planar points and lines which leverages specific geometric constraints to improve camera pose optimization in terms of both efficiency and accuracy. %reduce the size of the Hessian matrix in the optimization. The pipeline consists of extracting 2D points and lines, predicting planar regions and filtering the outliers via RANSAC. Our parametrization scheme then represents co-planar points and lines as their 2D image coordinates and parameters of planes. We demonstrate the effectiveness of the proposed method by comparing it to traditional parametrizations in a novel Monte-Carlo simulation set. Further, the whole stereo SLAM and VIO system is compared with state-of-the-art methods on the public real-world dataset EuRoC. Our method shows better results in terms of accuracy and efficiency than the state-of-the-art. The code is released at https://github.com/LiXin97/Co-Planar-Parametrization.
Motion estimation by fusing data from at least a camera and an Inertial Measurement Unit (IMU) enables many applications in robotics. However, among the multitude of Visual Inertial Odometry (VIO) methods, few efficiently estimate device motion with consistent covariance, and calibrate sensor parameters online for handling data from consumer sensors. This paper addresses the gap with a Keyframe-based Structureless Filter (KSF). For efficiency, landmarks are not included in the filters state vector. For robustness, KSF associates feature observations and manages state variables using the concept of keyframes. For flexibility, KSF supports anytime calibration of IMU systematic errors, as well as extrinsic, intrinsic, and temporal parameters of each camera. Estimator consistency and observability of sensor parameters were analyzed by simulation. Sensitivity to design options, e.g., feature matching method and camera count was studied with the EuRoC benchmark. Sensor parameter estimation was evaluated on raw TUM VI sequences and smartphone data. Moreover, pose estimation accuracy was evaluated on EuRoC and TUM VI sequences versus recent VIO methods. These tests confirm that KSF reliably calibrates sensor parameters when the data contain adequate motion, and consistently estimate motion with accuracy rivaling recent VIO methods. Our implementation runs at 42 Hz with stereo camera images on a consumer laptop.