No Arabic abstract
Robotic shepherding problem considers the control and navigation of a group of coherent agents (e.g., a flock of bird or a fleet of drones) through the motion of an external robot, called shepherd. Machine learning based methods have successfully solved this problem in an empty environment with no obstacles. Rule-based methods, on the other hand, can handle more complex scenarios in which environments are cluttered with obstacles and allow multiple shepherds to work collaboratively. However, these rule-based methods are fragile due to the difficulty in defining a comprehensive set of rules that can handle all possible cases. To overcome these limitations, we propose the first known learning-based method that can herd agents amongst obstacles. By using deep reinforcement learning techniques combined with the probabilistic roadmaps, we train a shepherding model using noisy but controlled environmental and behavioral parameters. Our experimental results show that the proposed method is robust, namely, it is insensitive to the uncertainties originated from both environmental and behavioral models. Consequently, the proposed method has a higher success rate, shorter completion time and path length than the rule-based behavioral methods have. These advantages are particularly prominent in more challenging scenarios involving more difficult groups and strenuous passages.
This paper presents a hierarchical framework based on deep reinforcement learning that learns a diversity of policies for humanoid balance control. Conventional zero moment point based controllers perform limited actions during under-actuation, whereas the proposed framework can perform human-like balancing behaviors such as active push-off of ankles. The learning is done through the design of an explainable reward based on physical constraints. The simulated results are presented and analyzed. The successful emergence of human-like behaviors through deep reinforcement learning proves the feasibility of using an AI-based approach for learning humanoid balancing control in a unified framework.
In this letter, we introduce a deep reinforcement learning (RL) based multi-robot formation controller for the task of autonomous aerial human motion capture (MoCap). We focus on vision-based MoCap, where the objective is to estimate the trajectory of body pose and shape of a single moving person using multiple micro aerial vehicles. State-of-the-art solutions to this problem are based on classical control methods, which depend on hand-crafted system and observation models. Such models are difficult to derive and generalize across different systems. Moreover, the non-linearity and non-convexities of these models lead to sub-optimal controls. In our work, we formulate this problem as a sequential decision making task to achieve the vision-based motion capture objectives, and solve it using a deep neural network-based RL method. We leverage proximal policy optimization (PPO) to train a stochastic decentralized control policy for formation control. The neural network is trained in a parallelized setup in synthetic environments. We performed extensive simulation experiments to validate our approach. Finally, real-robot experiments demonstrate that our policies generalize to real world conditions. Video Link: https://bit.ly/38SJfjo Supplementary: https://bit.ly/3evfo1O
We introduce a new recurrent agent architecture and associated auxiliary losses which improve reinforcement learning in partially observable tasks requiring long-term memory. We employ a temporal hierarchy, using a slow-ticking recurrent core to allow information to flow more easily over long time spans, and three fast-ticking recurrent cores with connections designed to create an information asymmetry. The emph{reaction} core incorporates new observations with input from the slow core to produce the agents policy; the emph{perception} core accesses only short-term observations and informs the slow core; lastly, the emph{prediction} core accesses only long-term memory. An auxiliary loss regularizes policies drawn from all three cores against each other, enacting the prior that the policy should be expressible from either recent or long-term memory. We present the resulting emph{Perception-Prediction-Reaction} (PPR) agent and demonstrate its improved performance over a strong LSTM-agent baseline in DMLab-30, particularly in tasks requiring long-term memory. We further show significant improvements in Capture the Flag, an environment requiring agents to acquire a complicated mixture of skills over long time scales. In a series of ablation experiments, we probe the importance of each component of the PPR agent, establishing that the entire, novel combination is necessary for this intriguing result.
Robots have limited adaptation ability compared to humans and animals in the case of damage. However, robot damages are prevalent in real-world applications, especially for robots deployed in extreme environments. The fragility of robots greatly limits their widespread application. We propose an adversarial reinforcement learning framework, which significantly increases robot robustness over joint damage cases in both manipulation tasks and locomotion tasks. The agent is trained iteratively under the joint damage cases where it has poor performance. We validate our algorithm on a three-fingered robot hand and a quadruped robot. Our algorithm can be trained only in simulation and directly deployed on a real robot without any fine-tuning. It also demonstrates exceeding success rates over arbitrary joint damage cases.
Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time,real world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn; as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.