ﻻ يوجد ملخص باللغة العربية
Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.
Many control policies used in various applications determine the input or action by solving a convex optimization problem that depends on the current state and some parameters. Common examples of such convex optimization control policies (COCPs) incl
Control policies from imitation learning can often fail to generalize to novel environments due to imperfect demonstrations or the inability of imitation learning algorithms to accurately infer the experts policies. In this paper, we present rigorous
Multi-agent path finding (MAPF) is an essential component of many large-scale, real-world robot deployments, from aerial swarms to warehouse automation. However, despite the communitys continued efforts, most state-of-the-art MAPF planners still rely
Multi-agent path finding (MAPF) is an indispensable component of large-scale robot deployments in numerous domains ranging from airport management to warehouse automation. In particular, this work addresses lifelong MAPF (LMAPF) - an online variant o
Dexterous manipulation has been a long-standing challenge in robotics. Recently, modern model-free RL has demonstrated impressive results on a number of problems. However, complex domains like dexterous manipulation remain a challenge for RL due to t