No Arabic abstract
In this work, we present a learning-based pipeline to realise local navigation with a quadrupedal robot in cluttered environments with static and dynamic obstacles. Given high-level navigation commands, the robot is able to safely locomote to a target location based on frames from a depth camera without any explicit mapping of the environment. First, the sequence of images and the current trajectory of the camera are fused to form a model of the world using state representation learning. The output of this lightweight module is then directly fed into a target-reaching and obstacle-avoiding policy trained with reinforcement learning. We show that decoupling the pipeline into these components results in a sample efficient policy learning stage that can be fully trained in simulation in just a dozen minutes. The key part is the state representation, which is trained to not only estimate the hidden state of the world in an unsupervised fashion, but also helps bridging the reality gap, enabling successful sim-to-real transfer. In our experiments with the quadrupedal robot ANYmal in simulation and in reality, we show that our system can handle noisy depth images, avoid dynamic obstacles unseen during training, and is endowed with local spatial awareness.
We present Interactive Gibson Benchmark, the first comprehensive benchmark for training and evaluating Interactive Navigation: robot navigation strategies where physical interaction with objects is allowed and even encouraged to accomplish a task. For example, the robot can move objects if needed in order to clear a path leading to the goal location. Our benchmark comprises two novel elements: 1) a new experimental setup, the Interactive Gibson Environment (iGibson 0.5), which simulates high fidelity visuals of indoor scenes, and high fidelity physical dynamics of the robot and common objects found in these scenes; 2) a set of Interactive Navigation metrics which allows one to study the interplay between navigation and physical interaction. We present and evaluate multiple learning-based baselines in Interactive Gibson, and provide insights into regimes of navigation with different trade-offs between navigation path efficiency and disturbance of surrounding objects. We make our benchmark publicly available(https://sites.google.com/view/interactivegibsonenv) and encourage researchers from all disciplines in robotics (e.g. planning, learning, control) to propose, evaluate, and compare their Interactive Navigation solutions in Interactive Gibson.
Autonomous spacecraft relative navigation technology has been planned for and applied to many famous space missions. The development of on-board electronics systems has enabled the use of vision-based and LiDAR-based methods to achieve better performances. Meanwhile, deep learning has reached great success in different areas, especially in computer vision, which has also attracted the attention of space researchers. However, spacecraft navigation differs from ground tasks due to high reliability requirements but lack of large datasets. This survey aims to systematically investigate the current deep learning-based autonomous spacecraft relative navigation methods, focusing on concrete orbital applications such as spacecraft rendezvous and landing on small bodies or the Moon. The fundamental characteristics, primary motivations, and contributions of deep learning-based relative navigation algorithms are first summarised from three perspectives of spacecraft rendezvous, asteroid exploration, and terrain navigation. Furthermore, popular visual tracking benchmarks and their respective properties are compared and summarised. Finally, potential applications are discussed, along with expected impediments.
This paper presents a vision-based modularized drone racing navigation system that uses a customized convolutional neural network (CNN) for the perception module to produce high-level navigation commands and then leverages a state-of-the-art planner and controller to generate low-level control commands, thus exploiting the advantages of both data-based and model-based approaches. Unlike the state-of-the-art method which only takes the current camera image as the CNN input, we further add the latest three drone states as part of the inputs. Our method outperforms the state-of-the-art method in various track layouts and offers two switchable navigation behaviors with a single trained network. The CNN-based perception module is trained to imitate an expert policy that automatically generates ground truth navigation commands based on the pre-computed global trajectories. Owing to the extensive randomization and our modified dataset aggregation (DAgger) policy during data collection, our navigation system, which is purely trained in simulation with synthetic textures, successfully operates in environments with randomly-chosen photorealistic textures without further fine-tuning.
Learning from Demonstration (LfD) is a popular approach to endowing robots with skills without having to program them by hand. Typically, LfD relies on human demonstrations in clutter-free environments. This prevents the demonstrations from being affected by irrelevant objects, whose influence can obfuscate the true intention of the human or the constraints of the desired skill. However, it is unrealistic to assume that the robots environment can always be restructured to remove clutter when capturing human demonstrations. To contend with this problem, we develop an importance weighted batch and incremental skill learning approach, building on a recent inference-based technique for skill representation and reproduction. Our approach reduces unwanted environmental influences on the learned skill, while still capturing the salient human behavior. We provide both batch and increment
Visual Indoor Navigation (VIN) task has drawn increasing attention from the data-driven machine learning communities especially with the recently reported success from learning-based methods. Due to the innate complexity of this task, researchers have tried approaching the problem from a variety of different angles, the full scope of which has not yet been captured within an overarching report. This survey first summarizes the representative work of learning-based approaches for the VIN task and then identifies and discusses lingering issues impeding the VIN performance, as well as motivates future research in these key areas worth exploring for the community.