No Arabic abstract
ROBEL is an open-source platform of cost-effective robots designed for reinforcement learning in the real world. ROBEL introduces two robots, each aimed to accelerate reinforcement learning research in different task domains: DClaw is a three-fingered hand robot that facilitates learning dexterous manipulation tasks, and DKitty is a four-legged robot that facilitates learning agile legged locomotion tasks. These low-cost, modular robots are easy to maintain and are robust enough to sustain on-hardware reinforcement learning from scratch with over 14000 training hours registered on them to date. To leverage this platform, we propose an extensible set of continuous control benchmark tasks for each robot. These tasks feature dense and sparse task objectives, and additionally introduce score metrics as hardware-safety. We provide benchmark scores on an initial set of tasks using a variety of learning-based methods. Furthermore, we show that these results can be replicated across copies of the robots located in different institutions. Code, documentation, design files, detailed assembly instructions, final policies, baseline details, task videos, and all supplementary materials required to reproduce the results are available at www.roboticsbenchmarks.org.
The Amazon Robotics Challenge enlisted sixteen teams to each design a pick-and-place robot for autonomous warehousing, addressing development in robotic vision and manipulation. This paper presents the design of our custom-built, cost-effective, Cartesian robot system Cartman, which won first place in the competition finals by stowing 14 (out of 16) and picking all 9 items in 27 minutes, scoring a total of 272 points. We highlight our experience-centred design methodology and key aspects of our system that contributed to our competitiveness. We believe these aspects are crucial to building robust and effective robotic systems.
A key challenge in Imitation Learning (IL) is that optimal state actions demonstrations are difficult for the teacher to provide. For example in robotics, providing kinesthetic demonstrations on a robotic manipulator requires the teacher to control multiple degrees of freedom at once. The difficulty of requiring optimal state action demonstrations limits the space of problems where the teacher can provide quality feedback. As an alternative to state action demonstrations, the teacher can provide corrective feedback such as their preferences or rewards. Prior work has created algorithms designed to learn from specific types of noisy feedback, but across teachers and tasks different forms of feedback may be required. Instead we propose that in order to learn from a diversity of scenarios we need to learn from a variety of feedback. To learn from a variety of feedback we make the following insight: the teachers cost function is latent and we can model a stream of feedback as a stream of loss functions. We then use any online learning algorithm to minimize the sum of these losses. With this insight we can learn from a diversity of feedback that is weakly correlated with the teachers true cost function. We unify prior work into a general corrective feedback meta-algorithm and show that regardless of feedback we can obtain the same regret bounds. We demonstrate our approach by learning to perform a household navigation task on a robotic racecar platform. Our results show that our approach can learn quickly from a variety of noisy feedback.
We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.
The last half-decade has seen a steep rise in the number of contributions on safe learning methods for real-world robotic deployments from both the control and reinforcement learning communities. This article provides a concise but holistic review of the recent advances made in using machine learning to achieve safe decision making under uncertainties, with a focus on unifying the language and frameworks used in control theory and reinforcement learning research. Our review includes: learning-based control approaches that safely improve performance by learning the uncertain dynamics, reinforcement learning approaches that encourage safety or robustness, and methods that can formally certify the safety of a learned control policy. As data- and learning-based robot control methods continue to gain traction, researchers must understand when and how to best leverage them in real-world scenarios where safety is imperative, such as when operating in close proximity to humans. We highlight some of the open challenges that will drive the field of robot learning in the coming years, and emphasize the need for realistic physics-based benchmarks to facilitate fair comparisons between control and reinforcement learning approaches.
Biological studies show that hummingbirds can perform extreme aerobatic maneuvers during fast escape. Given a sudden looming visual stimulus at hover, a hummingbird initiates a fast backward translation coupled with a 180-degree yaw turn, which is followed by instant posture stabilization in just under 10 wingbeats. Consider the wingbeat frequency of 40Hz, this aggressive maneuver is carried out in just 0.2 seconds. Inspired by the hummingbirds near-maximal performance during such extreme maneuvers, we developed a flight control strategy and experimentally demonstrated that such maneuverability can be achieved by an at-scale 12-gram hummingbird robot equipped with just two actuators. The proposed hybrid control policy combines model-based nonlinear control with model-free reinforcement learning. We use model-based nonlinear control for nominal flight control, as the dynamic model is relatively accurate for these conditions. However, during extreme maneuver, the modeling error becomes unmanageable. A model-free reinforcement learning policy trained in simulation was optimized to destabilize the system and maximize the performance during maneuvering. The hybrid policy manifests a maneuver that is close to that observed in hummingbirds. Direct simulation-to-real transfer is achieved, demonstrating the hummingbird-like fast evasive maneuvers on the at-scale hummingbird robot.