Occupancy Anticipation for Efficient Exploration and Navigation

70 0 0.0 ( 0 )

Download Cite

Added by Santhosh Kumar Ramakrishnan

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Santhosh K. Ramakrishnan - Ziad Al-Halah - Kristen Grauman

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent. We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. In doing so, the agent builds its spatial awareness more rapidly, which facilitates efficient exploration and navigation in 3D environments. By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment, with performance significantly better than strong baselines. Furthermore, when deployed for the sequential decision-making tasks of exploration and navigation, our model outperforms state-of-the-art methods on the Gibson and Matterport3D datasets. Our approach is the winning entry in the 2020 Habitat PointNav Challenge. Project page: http://vision.cs.utexas.edu/projects/occupancy_anticipation/

rate research

Object Goal Navigation using Goal-Oriented Semantic Exploration

100 - Devendra Singh Chaplot , Dhiraj Gandhi , Abhinav Gupta 2020

This work studies the problem of object goal navigation which involves navigating to an instance of the given object category in unseen environments. End-to-end learning-based navigation methods struggle at this task as they are ineffective at exploration and long-term planning. We propose a modular system called, `Goal-Oriented Semantic Exploration which builds an episodic semantic map and uses it to explore the environment efficiently based on the goal object category. Empirical results in visually realistic simulation environments show that the proposed model outperforms a wide range of baselines including end-to-end learning-based methods as well as modular map-based methods and led to the winning entry of the CVPR-2020 Habitat ObjectNav Challenge. Ablation analysis indicates that the proposed model learns semantic priors of the relative arrangement of objects in a scene, and uses them to explore efficiently. Domain-agnostic module design allow us to transfer our model to a mobile robot platform and achieve similar performance for object goal navigation in the real-world.

Computer Vision and Pattern Recognition Machine Learning Robotics

SOON: Scenario Oriented Object Navigation with Graph-based Exploration

101 - Fengda Zhu , Xiwen Liang , Yi Zhu 2021

The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the holy grail goals of intelligent robots. Most visual navigation benchmarks, however, focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step. This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere. Accordingly, in this paper, we introduce a Scenario Oriented Object Navigation (SOON) task. In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description. To give a promising direction to solve this task, we propose a novel graph-based exploration (GBE) method, which models the navigation state as a graph and introduces a novel graph-based exploration approach to learn knowledge from the graph and stabilize training by learning sub-optimal trajectories. We also propose a new large-scale benchmark named From Anywhere to Object (FAO) dataset. To avoid target ambiguity, the descriptions in FAO provide rich semantic scene information includes: object attribute, object relationship, region description, and nearby region description. Our experiments reveal that the proposed GBE outperforms various state-of-the-arts on both FAO and R2R datasets. And the ablation studies on FAO validates the quality of the dataset.

Computer Vision and Pattern Recognition

Occupancy Map Prediction Using Generative and Fully Convolutional Networks for Vehicle Navigation

291 - Kapil Katyal , Katie Popek , Chris Paxton 2018

Fast, collision-free motion through unknown environments remains a challenging problem for robotic systems. In these situations, the robots ability to reason about its future motion is often severely limited by sensor field of view (FOV). By contrast, biological systems routinely make decisions by taking into consideration what might exist beyond their FOV based on prior experience. In this paper, we present an approach for predicting occupancy map representations of sensor data for future robot motions using deep neural networks. We evaluate several deep network architectures, including purely generative and adversarial models. Testing on both simulated and real environments we demonstrated performance both qualitatively and quantitatively, with SSIM similarity measure up to 0.899. We showed that it is possible to make predictions about occupied space beyond the physical robots FOV from simulated training data. In the future, this method will allow robots to navigate through unknown environments in a faster, safer manner.

Machine Learning Computer Vision and Pattern Recognition Robotics

Rethinking Anticipation Tasks: Uncertainty-aware Anticipation of Sparse Surgical Instrument Usage for Context-aware Assistance

41 - Dominik Rivoir , Sebastian Bodenstedt , Isabel Funke 2020

Intra-operative anticipation of instrument usage is a necessary component for context-aware assistance in surgery, e.g. for instrument preparation or semi-automation of robotic tasks. However, the sparsity of instrument occurrences in long videos poses a challenge. Current approaches are limited as they assume knowledge on the timing of future actions or require dense temporal segmentations during training and inference. We propose a novel learning task for anticipation of instrument usage in laparoscopic videos that overcomes these limitations. During training, only sparse instrument annotations are required and inference is done solely on image data. We train a probabilistic model to address the uncertainty associated with future events. Our approach outperforms several baselines and is competitive to a variant using richer annotations. We demonstrate the models ability to quantify task-relevant uncertainties. To the best of our knowledge, we are the first to propose a method for anticipating instruments in surgery.

Computer Vision and Pattern Recognition

Learning-based 3D Occupancy Prediction for Autonomous Navigation in Occluded Environments

356 - Lizi Wang , Hongkai Ye , Qianhao Wang 2020

In autonomous navigation of mobile robots, sensors suffer from massive occlusion in cluttered environments, leaving significant amount of space unknown during planning. In practice, treating the unknown space in optimistic or pessimistic ways both set limitations on planning performance, thus aggressiveness and safety cannot be satisfied at the same time. However, humans can infer the exact shape of the obstacles from only partial observation and generate non-conservative trajectories that avoid possible collisions in occluded space. Mimicking human behavior, in this paper, we propose a method based on deep neural network to predict occupancy distribution of unknown space reliably. Specifically, the proposed method utilizes contextual information of environments and learns from prior knowledge to predict obstacle distributions in occluded space. We use unlabeled and no-ground-truth data to train our network and successfully apply it to real-time navigation in unseen environments without any refinement. Results show that our method leverages the performance of a kinodynamic planner by improving security with no reduction of speed in clustered environments.

Robotics Computer Vision and Pattern Recognition Machine Learning