No Arabic abstract
Modern robotic systems sense the environment geometrically, through sensors like cameras, lidar, and sonar, as well as semantically, often through visual models learned from data, such as object detectors. We aim to develop robots that can use all of these sources of information for reliable navigation, but each is corrupted by noise. Rather than assume that object detection will eventually achieve near perfect performance across the lifetime of a robot, in this work we represent and cope with the semantic and geometric uncertainty inherent in methods like object detection. Specifically, we model data association ambiguity, which is typically non-Gaussian, in a way that is amenable to solution within the common nonlinear Gaussian formulation of simultaneous localization and mapping (SLAM). We do so by eliminating data association variables from the inference process through max-marginalization, preserving standard Gaussian posterior assumptions. The result is a max-mixture-type model that accounts for multiple data association hypotheses as well as incorrect loop closures. We provide experimental results on indoor and outdoor semantic navigation tasks with noisy odometry and object detection and find that the ability of the proposed approach to represent multiple hypotheses, including the null hypothesis, gives substantial robustness advantages in comparison to alternative semantic SLAM approaches.
Simultaneous mapping and localization (SLAM) in an real indoor environment is still a challenging task. Traditional SLAM approaches rely heavily on low-level geometric constraints like corners or lines, which may lead to tracking failure in textureless surroundings or cluttered world with dynamic objects. In this paper, a compact semantic SLAM framework is proposed, with utilization of both geometric and object-level semantic constraints jointly, a more consistent mapping result, and more accurate pose estimation can be obtained. Two main contributions are presented int the paper, a) a robust and efficient SLAM data association and optimization framework is proposed, it models both discrete semantic labeling and continuous pose. b) a compact map representation, combining 2D Lidar map with object detection is presented. Experiments on public indoor datasets, TUM-RGBD, ICL-NUIM, and our own collected datasets prove the improving of SLAM robustness and accuracy compared to other popular SLAM systems, meanwhile a map maintenance efficiency can be achieved.
We develop a belief space planning (BSP) approach that advances the state of the art by incorporating reasoning about data association (DA) within planning, while considering additional sources of uncertainty. Existing BSP approaches typically assume data association is given and perfect, an assumption that can be harder to justify while operating, in the presence of localization uncertainty, in ambiguous and perceptually aliased environments. In contrast, our data association aware belief space planning (DA-BSP) approach explicitly reasons about DA within belief evolution, and as such can better accommodate these challenging real world scenarios. In particular, we show that due to perceptual aliasing, the posterior belief becomes a mixture of probability distribution functions, and design cost functions that measure the expected level of ambiguity and posterior uncertainty. Using these and standard costs (e.g.~control penalty, distance to goal) within the objective function, yields a general framework that reliably represents action impact, and in particular, capable of active disambiguation. Our approach is thus applicable to robust active perception and autonomous navigation in perceptually aliased environments. We demonstrate key aspects in basic and realistic simulations.
This paper presents Kimera-Multi, the first multi-robot system that (i) is robust and capable of identifying and rejecting incorrect inter and intra-robot loop closures resulting from perceptual aliasing, (ii) is fully distributed and only relies on local (peer-to-peer) communication to achieve distributed localization and mapping, and (iii) builds a globally consistent metric-semantic 3D mesh model of the environment in real-time, where faces of the mesh are annotated with semantic labels. Kimera-Multi is implemented by a team of robots equipped with visual-inertial sensors. Each robot builds a local trajectory estimate and a local mesh using Kimera. When communication is available, robots initiate a distributed place recognition and robust pose graph optimization protocol based on a novel distributed graduated non-convexity algorithm. The proposed protocol allows the robots to improve their local trajectory estimates by leveraging inter-robot loop closures while being robust to outliers. Finally, each robot uses its improved trajectory estimate to correct the local mesh using mesh deformation techniques. We demonstrate Kimera-Multi in photo-realistic simulations, SLAM benchmarking datasets, and challenging outdoor datasets collected using ground robots. Both real and simulated experiments involve long trajectories (e.g., up to 800 meters per robot). The experiments show that Kimera-Multi (i) outperforms the state of the art in terms of robustness and accuracy, (ii) achieves estimation errors comparable to a centralized SLAM system while being fully distributed, (iii) is parsimonious in terms of communication bandwidth, (iv) produces accurate metric-semantic 3D meshes, and (v) is modular and can be also used for standard 3D reconstruction (i.e., without semantic labels) or for trajectory estimation (i.e., without reconstructing a 3D mesh).
We study a semantic SLAM problem faced by a robot tasked with autonomous weeding under the corn canopy. The goal is to detect corn stalks and localize them in a global coordinate frame. This is a challenging setup for existing algorithms because there is very little space between the camera and the plants, and the camera motion is primarily restricted to be along the row. To overcome these challenges, we present a multi-camera system where a side camera (facing the plants) is used for detection whereas front and back cameras are used for motion estimation. Next, we show how semantic features in the environment (corn stalks, ground, and crop planes) can be used to develop a robust semantic SLAM solution and present results from field trials performed throughout the growing season across various cornfields.
Nowadays in the field of semantic SLAM, how to correctly use semantic information for data association is still a problem worthy of study. The key to solving this problem is to correctly associate multiple object measurements of one object landmark, and refine the pose of object landmark. However, different objects locating closely are prone to be associated as one object landmark, and it is difficult to pick up a best pose from multiple object measurements associated with one object landmark. To tackle these problems, we propose a hierarchical object association strategy by means of multiple object tracking, through which closing objects will be correctly associated to different object landmarks, and an approach to refine the pose of object landmark from multiple object measurements. The proposed method is evaluated on a simulated sequence and several sequences in the Kitti dataset. Experimental results show a very impressive improvement with respect to the traditional SLAM and the state-of-the-art semantic SLAM method.