ﻻ يوجد ملخص باللغة العربية
We propose a method for sim-to-real robot learning which exploits simulator state information in a way that scales to many objects. We first train a pair of encoder networks to capture multi-object state information in a latent space. One of these encoders is a CNN, which enables our system to operate on RGB images in the real world; the other is a graph neural network (GNN) state encoder, which directly consumes a set of raw object poses and enables more accurate reward calculation and value estimation. Once trained, we use these encoders in a reinforcement learning algorithm to train image-based policies that can manipulate many objects. We evaluate our method on the task of pushing a collection of objects to desired tabletop regions. Compared to methods which rely only on images or use fixed-length state encodings, our method achieves higher success rates, performs well in the real world without fine tuning, and generalizes to different numbers and types of objects not seen during training.
There is a large variety of objects and appliances in human environments, such as stoves, coffee dispensers, juice extractors, and so on. It is challenging for a roboticist to program a robot for each of these object types and for each of their insta
Well structured visual representations can make robot learning faster and can improve generalization. In this paper, we study how we can acquire effective object-centric representations for robotic manipulation tasks without human labeling by using a
We propose HyperDynamics, a dynamics meta-learning framework that conditions on an agents interactions with the environment and optionally its visual observations, and generates the parameters of neural dynamics models based on inferred properties of
Humans are adept at learning new tasks by watching a few instructional videos. On the other hand, robots that learn new actions either require a lot of effort through trial and error, or use expert demonstrations that are challenging to obtain. In th
Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state, where the ability to reason about spatial relationships among object entities from raw sensor inputs