Language-guided Semantic Mapping and Mobile Manipulation in Partially Observable Environments

139 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Matthew Walter

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Siddharth Patki - Ethan Fahnestock - Thomas M. Howard

علم الروبوتات الذكاء الاصطناعي الحساب واللغة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recent advances in data-driven models for grounded language understanding have enabled robots to interpret increasingly complex instructions. Two fundamental limitations of these methods are that most require a full model of the environment to be known a priori, and they attempt to reason over a world representation that is flat and unnecessarily detailed, which limits scalability. Recent semantic mapping methods address partial observability by exploiting language as a sensor to infer a distribution over topological, metric and semantic properties of the environment. However, maintaining a distribution over highly detailed maps that can support grounding of diverse instructions is computationally expensive and hinders real-time human-robot collaboration. We propose a novel framework that learns to adapt perception according to the task in order to maintain compact distributions over semantic maps. Experiments with a mobile manipulator demonstrate more efficient instruction following in a priori unknown environments.

قيم البحث

242 - Guy Tennenholtz , Shie Mannor , Uri Shalit 2019

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large errors. We def ine the problem of off-policy evaluation for Partially Observable Markov Decision Processes (POMDPs) and establish what we believe is the first off-policy evaluation result for POMDPs. In addition, we formulate a model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP. We show how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general POMDPs. We demonstrate the pitfalls of off-policy evaluation in POMDPs using a well-known off-policy method, Importance Sampling, and compare it with our result on synthetic medical data.

التعلم الآلي الذكاء الاصطناعي أنظمة وتحكم

Towards Safe Locomotion Navigation in Partially Observable Environments with Uneven Terrain

99 - Jonas Warnke , Abdulaziz Shamsah , Yingke Li 2020

This study proposes an integrated task and motion planning method for dynamic locomotion in partially observable environments with multi-level safety guarantees. This layered planning framework is composed of a high-level symbolic task planner and a low-level phase-space motion planner. A belief abstraction at the task planning level enables belief estimation of dynamic obstacle locations and guarantees navigation safety with collision avoidance. The high-level task planner, i.e., a two-level navigation planner, employs linear temporal logic for a reactive game synthesis between the robot and its environment while incorporating low-level safe keyframe policies into formal task specification design. The synthesized task planner commands a series of locomotion actions including walking step length, step height, and heading angle changes, to the underlying keyframe decision-maker, which further determines the robot center-of-mass apex velocity keyframe. The low-level phase-space planner uses a reduced-order locomotion model to generate non-periodic trajectories meeting balancing safety criteria for straight and steering walking. These criteria are characterized by constraints on locomotion keyframe states, and are used to define keyframe transition policies via viability kernels. Simulation results of a Cassie bipedal robot designed by Agility Robotics demonstrate locomotion maneuvering in a three-dimensional, partially observable environment consisting of dynamic obstacles and uneven terrain.

علم الروبوتات

A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution

520 - Valts Blukis , Chris Paxton , Dieter Fox 2021

Natural language provides an accessible and expressive interface to specify long-term tasks for robotic agents. However, non-experts are likely to specify such tasks with high-level instructions, which abstract over specific robot actions through sev eral layers of abstraction. We propose that key to bridging this gap between language and robot actions over long execution horizons are persistent representations. We propose a persistent spatial semantic representation method, and show how it enables building an agent that performs hierarchical reasoning to effectively execute long-term tasks. We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.

علم الروبوتات الذكاء الاصطناعي الحساب واللغة

Task-assisted Motion Planning in Partially Observable Domains

116 - Antony Thomas , Sunny Amatya , Fulvio Mastrogiovanni 2019

We present an integrated Task-Motion Planning framework for robot navigation in belief space. Autonomous robots operating in real world complex scenarios require planning in the discrete (task) space and the continuous (motion) space. To this end, we propose a framework for integrating belief space reasoning within a hybrid task planner. The expressive power of PDDL+ combined with heuristic-driven semantic attachments performs the propagated and posterior belief estimates while planning. The underlying methodology for the development of the combined hybrid planner is discussed, providing suggestions for improvements and future work. Furthermore we validate key aspects of our approach using a realistic scenario in simulation.

علم الروبوتات الذكاء الاصطناعي

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

310 - Sriram Srinivasan , Marc Lanctot , Vinicius Zambaldi 2018

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing disco unted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates similar to or better than a baseline model-free algorithm for zero sum games, without any domain-specific state space reductions.

التعلم الآلي الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب