LEAF: Latent Exploration Along the Frontier

62 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Homanga Bharadhwaj

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Homanga Bharadhwaj - Animesh Garg - Florian Shkurti

علم الروبوتات الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Self-supervised goal proposal and reaching is a key component for exploration and efficient policy learning algorithms. Such a self-supervised approach without access to any oracle goal sampling distribution requires deep exploration and commitment so that long horizon plans can be efficiently discovered. In this paper, we propose an exploration framework, which learns a dynamics-aware manifold of reachable states. For a goal, our proposed method deterministically visits a state at the current frontier of reachable states (commitment/reaching) and then stochastically explores to reach the goal (exploration). This allocates exploration budget near the frontier of the reachable region instead of its interior. We target the challenging problem of policy learning from initial and goal states specified as images, and do not assume any access to the underlying ground-truth states of the robot and the environment. To keep track of reachable latent states, we propose a distance-conditioned reachability network that is trained to infer whether one state is reachable from another within the specified latent space distance. Given an initial state, we obtain a frontier of reachable states from that state. By incorporating a curriculum for sampling easier goals (closer to the start state) before more difficult goals, we demonstrate that the proposed self-supervised exploration algorithm, superior performance compared to existing baselines on a set of challenging robotic environments.https://sites.google.com/view/leaf-exploration

قيم البحث

186 - Dhruv Shah , Benjamin Eysenbach , Nicholas Rhinehart 2021

We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments. At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory . We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration. Trained on a large offline dataset of prior experience, the model acquires a representation of visual goals that is robust to task-irrelevant distractors. We demonstrate our method on a mobile ground robot in open-world exploration scenarios. Given an image of a goal that is up to 80 meters away, our method leverages its representation to explore and discover the goal in under 20 minutes, even amidst previously-unseen obstacles and weather conditions. We encourage the reader to visit the project website for videos of our experiments and demonstrations https://sites.google.com/view/recon-robot

علم الروبوتات الذكاء الاصطناعي التعلم الآلي

FUEL: Fast UAV Exploration using Incremental Frontier Structure and Hierarchical Planning

121 - Boyu Zhou , Yichen Zhang , Xinyi Chen 2020

Autonomous exploration is a fundamental problem for various applications of unmanned aerial vehicles. Existing methods, however, were demonstrated to insufficient exploration rate, due to the lack of efficient global coverage, conservative motion pla ns and low decision frequencies. In this paper, we propose FUEL, a hierarchical framework that can support Fast UAV Exploration in complex unknown environments. We maintain crucial information in the entire space required by exploration planning by a frontier information structure (FIS), which can be updated incrementally when the space is explored. Supported by the FIS, a hierarchical planner plans exploration motions in three steps, which find efficient global coverage paths, refine a local set of viewpoints and generate minimum-time trajectories in sequence. We present extensive benchmark and real-world tests, in which our method completes the exploration tasks with unprecedented efficiency (3-8 times faster) compared to state-of-the-art approaches. Our method will be made open source to benefit the community.

علم الروبوتات

Latent Skill Planning for Exploration and Transfer

175 - Kevin Xie , Homanga Bharadhwaj , Danijar Hafner 2020

To quickly solve new tasks in complex environments, intelligent agents need to build up reusable knowledge. For example, a learned world model captures knowledge about the environment that applies to new tasks. Similarly, skills capture general behav iors that can apply to new tasks. In this paper, we investigate how these two approaches can be integrated into a single reinforcement learning agent. Specifically, we leverage the idea of partial amortization for fast adaptation at test time. For this, actions are produced by a policy that is learned over time while the skills it conditions on are chosen using online planning. We demonstrate the benefits of our design decisions across a suite of challenging locomotion tasks and demonstrate improved sample efficiency in single tasks as well as in transfer from one task to another, as compared to competitive baselines. Videos are available at: https://sites.google.com/view/latent-skill-planning/

التعلم الآلي الذكاء الاصطناعي علم الروبوتات

Autobots: Latent Variable Sequential Set Transformers

69 - Roger Girgis , Florian Golemo , Felipe Codevilla 2021

Robust multi-agent trajectory prediction is essential for the safe control of robots and vehicles that interact with humans. Many existing methods treat social and temporal information separately and therefore fall short of modelling the joint future trajectories of all agents in a socially consistent way. To address this, we propose a new class of Latent Variable Sequential Set Transformers which autoregressively model multi-agent trajectories. We refer to these architectures as AutoBots. AutoBots model the contents of sets (e.g. representing the properties of agents in a scene) over time and employ multi-head self-attention blocks over these sequences of sets to encode the sociotemporal relationships between the different actors of a scene. This produces either the trajectory of one ego-agent or a distribution over the future trajectories for all agents under consideration. Our approach works for general sequences of sets and we provide illustrative experiments modelling the sequential structure of the multiple strokes that make up symbols in the Omniglot data. For the single-agent prediction case, we validate our model on the NuScenes motion prediction task and achieve competitive results on the global leaderboard. In the multi-agent forecasting setting, we validate our model on TrajNet. We find that our method outperforms physical extrapolation and recurrent network baselines and generates scene-consistent trajectories.

علم الروبوتات الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Accelerating Grasp Exploration by Leveraging Learned Priors

76 - Han Yu Li , Michael Danielczuk , Ashwin Balakrishna 2020

The ability of robots to grasp novel objects has industry applications in e-commerce order fulfillment and home service. Data-driven grasping policies have achieved success in learning general strategies for grasping arbitrary objects. However, these approaches can fail to grasp objects which have complex geometry or are significantly outside of the training distribution. We present a Thompson sampling algorithm that learns to grasp a given object with unknown geometry using online experience. The algorithm leverages learned priors from the Dexterity Network robot grasp planner to guide grasp exploration and provide probabilistic estimates of grasp success for each stable pose of the novel object. We find that seeding the policy with the Dex-Net prior allows it to more efficiently find robust grasps on these objects. Experiments suggest that the best learned policy attains an average total reward 64.5% higher than a greedy baseline and achieves within 5.7% of an oracle baseline when evaluated over 300,000 training runs across a set of 3000 object poses.

علم الروبوتات الذكاء الاصطناعي التعلم الآلي