Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views

718 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Vincent Cartillier

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Vincent Cartillier - Zhile Ren - Neha Jain

الرؤية الحاسوبية وتمييز الأنماط

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the task of semantic mapping - specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map (what is where?) from egocentric observations of an RGB-D camera with known pose (via localization sensors). Towards this goal, we present SemanticMapNet (SMNet), which consists of: (1) an Egocentric Visual Encoder that encodes each egocentric RGB-D frame, (2) a Feature Projector that projects egocentric features to appropriate locations on a floor-plan, (3) a Spatial Memory Tensor of size floor-plan length x width x feature-dims that learns to accumulate projected egocentric features, and (4) a Map Decoder that uses the memory tensor to produce semantic top-down maps. SMNet combines the strengths of (known) projective camera geometry and neural representation learning. On the task of semantic mapping in the Matterport3D dataset, SMNet significantly outperforms competitive baselines by 4.01-16.81% (absolute) on mean-IoU and 3.81-19.69% (absolute) on Boundary-F1 metrics. Moreover, we show how to use the neural episodic memories and spatio-semantic allocentric representations build by SMNet for subsequent tasks in the same space - navigating to objects seen during the tour(Find chair) or answering questions about the space (How many chairs did you see in the house?). Project page: https://vincentcartillier.github.io/smnet.html.

قيم البحث

104 - Ziyuan Liu , Dong Chen , Georg von Wichert 2020

In this paper we propose a method to extract an abstracted floor plan from typical grid maps using Bayesian reasoning. The result of this procedure is a probabilistic generative model of the environment defined over abstract concepts. It is well suit ed for higher-level reasoning and communication purposes. We demonstrate the effectiveness of the approach through real-world experiments.

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات

Seed the Views: Hierarchical Semantic Alignment for Contrastive Representation Learning

306 - Haohang Xu , Xiaopeng Zhang , Hao Li 2020

Self-supervised learning based on instance discrimination has shown remarkable progress. In particular, contrastive learning, which regards each image as well as its augmentations as an individual class and tries to distinguish them from all other im ages, has been verified effective for representation learning. However, pushing away two images that are de facto similar is suboptimal for general representation. In this paper, we propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbf{Cross-samples and Multi-level} representation, and models the invariance to semantically similar images in a hierarchical way. This is achieved by extending the contrastive loss to allow for multiple positives per anchor, and explicitly pulling semantically similar images/patches together at different layers of the network. Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way. CsMl is applicable to current contrastive learning based methods and consistently improves the performance. Notably, using the moco as an instantiation, CsMl achieves a textbf{76.6% }top-1 accuracy with linear evaluation using ResNet-50 as backbone, and textbf{66.7%} and textbf{75.1%} top-1 accuracy with only 1% and 10% labels, respectively. textbf{All these numbers set the new state-of-the-art.}

الرؤية الحاسوبية وتمييز الأنماط

Visual Semantic Planning using Deep Successor Representations

133 - Yuke Zhu , Daniel Gordon , Eric Kolve 2017

A crucial capability of real-world intelligent agents is their ability to plan a sequence of actions to achieve their goals in the visual world. In this work, we address the problem of visual semantic planning: the task of predicting a sequence of ac tions from visual observations that transform a dynamic environment from an initial state to a goal state. Doing so entails knowledge about objects and their affordances, as well as actions and their preconditions and effects. We propose learning these through interacting with a visual and dynamic environment. Our proposed solution involves bootstrapping reinforcement learning with imitation learning. To ensure cross task generalization, we develop a deep predictive model based on successor representations. Our experimental results show near optimal results across a wide range of tasks in the challenging THOR environment.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي علم الروبوتات

Visual Representations for Semantic Target Driven Navigation

324 - Arsalan Mousavian , Alexander Toshev , Marek Fiser 2018

What is a good visual representation for autonomous agents? We address this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a complex environment to a target object, e.g. go to the refrig erator. Instead of acquiring a metric semantic map of an environment and using planning for navigation, our approach learns navigation policies on top of representations that capture spatial layout and semantic contextual cues. We propose to using high level semantic and contextual features including segmentation and detection masks obtained by off-the-shelf state-of-the-art vision as observations and use deep network to learn the navigation policy. This choice allows using additional data, from orthogonal sources, to better train different parts of the model the representation extraction is trained on large standard vision datasets while the navigation component leverages large synthetic environments for training. This combination of real and synthetic is possible because equitable feature representations are available in both (e.g., segmentation and detection masks), which alleviates the need for domain adaptation. Both the representation and the navigation policy can be readily applied to real non-synthetic environments as demonstrated on the Active Vision Dataset [1]. Our approach gets successfully to the target in 54% of the cases in unexplored environments, compared to 46% for non-learning based approach, and 28% for the learning-based baseline.

الرؤية الحاسوبية وتمييز الأنماط

Learning Hierarchical Semantic Image Manipulation through Structured Representations

78 - Seunghoon Hong , Xinchen Yan , Thomas Huang 2018

Understanding, reasoning, and manipulating semantic concepts of images have been a fundamental research problem for decades. Previous work mainly focused on direct manipulation on natural image manifold through color strokes, key-points, textures, an d holes-to-fill. In this work, we present a novel hierarchical framework for semantic image manipulation. Key to our hierarchical framework is that we employ a structured semantic layout as our intermediate representation for manipulation. Initialized with coarse-level bounding boxes, our structure generator first creates pixel-wise semantic layout capturing the object shape, object-object interactions, and object-scene relations. Then our image generator fills in the pixel-level textures guided by the semantic layout. Such framework allows a user to manipulate images at object-level by adding, removing, and moving one bounding box at a time. Experimental evaluations demonstrate the advantages of the hierarchical manipulation framework over existing image generation and context hole-filing models, both qualitatively and quantitatively. Benefits of the hierarchical framework are further demonstrated in applications such as semantic object manipulation, interactive image editing, and data-driven image manipulation.

الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد العالي للدراسات والبحوث السكانية

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً