ﻻ يوجد ملخص باللغة العربية
Self-supervised goal proposal and reaching is a key component for exploration and efficient policy learning algorithms. Such a self-supervised approach without access to any oracle goal sampling distribution requires deep exploration and commitment so that long horizon plans can be efficiently discovered. In this paper, we propose an exploration framework, which learns a dynamics-aware manifold of reachable states. For a goal, our proposed method deterministically visits a state at the current frontier of reachable states (commitment/reaching) and then stochastically explores to reach the goal (exploration). This allocates exploration budget near the frontier of the reachable region instead of its interior. We target the challenging problem of policy learning from initial and goal states specified as images, and do not assume any access to the underlying ground-truth states of the robot and the environment. To keep track of reachable latent states, we propose a distance-conditioned reachability network that is trained to infer whether one state is reachable from another within the specified latent space distance. Given an initial state, we obtain a frontier of reachable states from that state. By incorporating a curriculum for sampling easier goals (closer to the start state) before more difficult goals, we demonstrate that the proposed self-supervised exploration algorithm, superior performance compared to existing baselines on a set of challenging robotic environments.https://sites.google.com/view/leaf-exploration
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments. At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory
Autonomous exploration is a fundamental problem for various applications of unmanned aerial vehicles. Existing methods, however, were demonstrated to insufficient exploration rate, due to the lack of efficient global coverage, conservative motion pla
To quickly solve new tasks in complex environments, intelligent agents need to build up reusable knowledge. For example, a learned world model captures knowledge about the environment that applies to new tasks. Similarly, skills capture general behav
Robust multi-agent trajectory prediction is essential for the safe control of robots and vehicles that interact with humans. Many existing methods treat social and temporal information separately and therefore fall short of modelling the joint future
The ability of robots to grasp novel objects has industry applications in e-commerce order fulfillment and home service. Data-driven grasping policies have achieved success in learning general strategies for grasping arbitrary objects. However, these