ترغب بنشر مسار تعليمي؟ اضغط هنا

Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation

236   0   0.0 ( 0 )
 نشر من قبل Medhini Narasimhan
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We introduce a learning-based approach for room navigation using semantic maps. Our proposed architecture learns to predict top-down belief maps of regions that lie beyond the agents field of view while modeling architectural and stylistic regularities in houses. First, we train a model to generate amodal semantic top-down maps indicating beliefs of location, size, and shape of rooms by learning the underlying architectural patterns in houses. Next, we use these maps to predict a point that lies in the target room and train a policy to navigate to the point. We empirically demonstrate that by predicting semantic maps, the model learns common correlations found in houses and generalizes to novel environments. We also demonstrate that reducing the task of room navigation to point navigation improves the performance further.

قيم البحث

اقرأ أيضاً

This paper focuses on visual semantic navigation, the task of producing actions for an active agent to navigate to a specified target object category in an unknown environment. To complete this task, the algorithm should simultaneously locate and nav igate to an instance of the category. In comparison to the traditional point goal navigation, this task requires the agent to have a stronger contextual prior to indoor environments. We introduce SSCNav, an algorithm that explicitly models scene priors using a confidence-aware semantic scene completion module to complete the scene and guide the agents navigation planning. Given a partial observation of the environment, SSCNav first infers a complete scene representation with semantic labels for the unobserved scene together with a confidence map associated with its own prediction. Then, a policy network infers the action from the scene completion result and confidence map. Our experiments demonstrate that the proposed scene completion module improves the efficiency of the downstream navigation policies. Video, code, and data: https://sscnav.cs.columbia.edu/
We consider the problem of object goal navigation in unseen environments. In our view, solving this problem requires learning of contextual semantic priors, a challenging endeavour given the spatial and semantic variability of indoor environments. Cu rrent methods learn to implicitly encode these priors through goal-oriented navigation policy functions operating on spatial representations that are limited to the agents observable areas. In this work, we propose a novel framework that actively learns to generate semantic maps outside the field of view of the agent and leverages the uncertainty over the semantic classes in the unobserved areas to decide on long term goals. We demonstrate that through this spatial prediction strategy, we are able to learn semantic priors in scenes that can be leveraged in unknown environments. Additionally, we show how different objectives can be defined by balancing exploration with exploitation during searching for semantic targets. Our method is validated in the visually realistic environments offered by the Matterport3D dataset and show state of the art results on the object goal navigation task.
Terrain assessment is a key aspect for autonomous exploration rovers, surrounding environment recognition is required for multiple purposes, such as optimal trajectory planning and autonomous target identification. In this work we present a technique to generate accurate three-dimensional semantic maps for Martian environment. The algorithm uses as input a stereo image acquired by a camera mounted on a rover. Firstly, images are labeled with DeepLabv3+, which is an encoder-decoder Convolutional Neural Networl (CNN). Then, the labels obtained by the semantic segmentation are combined to stereo depth-maps in a Voxel representation. We evaluate our approach on the ESA Katwijk Beach Planetary Rover Dataset.
We present a dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. The dataset covers over 6,000m2 and contains over 70,000 RGB images, along with the corresponding depths, surface normals, semantic annotations, global XYZ images (all in forms of both regular and 360{deg} equirectangular images) as well as camera information. It also includes registered raw and semantically annotated 3D meshes and point clouds. The dataset enables development of joint and cross-modal learning models and potentially unsupervised approaches utilizing the regularities present in large-scale indoor spaces. The dataset is available here: http://3Dsemantics.stanford.edu/
In this paper we propose a method to extract an abstracted floor plan from typical grid maps using Bayesian reasoning. The result of this procedure is a probabilistic generative model of the environment defined over abstract concepts. It is well suit ed for higher-level reasoning and communication purposes. We demonstrate the effectiveness of the approach through real-world experiments.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا