How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

76 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Quan Vuong

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Quan Vuong - Sharad Vikram - Hao Su

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising successes in applying RL algorithms directly on real systems, their performance on more complex systems remains bottle-necked by the relative data inefficiency of RL algorithms. Domain randomization is a promising direction of research that has demonstrated impressive results using RL algorithms to control real robots. At a high level, domain randomization works by training a policy on a distribution of environmental conditions in simulation. If the environments are diverse enough, then the policy trained on this distribution will plausibly generalize to the real world. A human-specified design choice in domain randomization is the form and parameters of the distribution of simulated environments. It is unclear how to the best pick the form and parameters of this distribution and prior work uses hand-tuned distributions. This extended abstract demonstrates that the choice of the distribution plays a major role in the performance of the trained policies in the real world and that the parameter of this distribution can be optimized to maximize the performance of the trained policies in the real world

قيم البحث

75 - Reda Bahi Slaoui , William R. Clements , Jakob N. Foerster 2019

Producing agents that can generalize to a wide range of visually different environments is a significant challenge in reinforcement learning. One method for overcoming this issue is visual domain randomization, whereby at the start of each training e pisode some visual aspects of the environment are randomized so that the agent is exposed to many possible variations. However, domain randomization is highly inefficient and may lead to policies with high variance across domains. Instead, we propose a regularization method whereby the agent is only trained on one variation of the environment, and its learned state representations are regularized during training to be invariant across domains. We conduct experiments that demonstrate that our technique leads to more efficient and robust learning than standard domain randomization, while achieving equal generalization scores.

التعلم الآلي الذكاء الاصطناعي

Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey

218 - Wenshuai Zhao , Jorge Pe~na Queralta , Tomi Westerlund 2020

Deep reinforcement learning has recently seen huge success across multiple areas in the robotics domain. Owing to the limitations of gathering real-world data, i.e., sample inefficiency and the cost of collecting it, simulation environments are utili zed for training the different agents. This not only aids in providing a potentially infinite data source, but also alleviates safety concerns with real robots. Nonetheless, the gap between the simulated and real worlds degrades the performance of the policies once the models are transferred into real robots. Multiple research efforts are therefore now being directed towards closing this sim-to-real gap and accomplish more efficient policy transfer. Recent years have seen the emergence of multiple methods applicable to different domains, but there is a lack, to the best of our knowledge, of a comprehensive review summarizing and putting into context the different methods. In this survey paper, we cover the fundamental background behind sim-to-real transfer in deep reinforcement learning and overview the main methods being utilized at the moment: domain randomization, domain adaptation, imitation learning, meta-learning and knowledge distillation. We categorize some of the most relevant recent works, and outline the main application scenarios. Finally, we discuss the main opportunities and challenges of the different approaches and point to the most promising directions.

التعلم الآلي علم الروبوتات

AdaRL: What, Where, and How to Adapt in Transfer Reinforcement Learning

84 - Biwei Huang , Fan Feng , Chaochao Lu 2021

Most approaches in reinforcement learning (RL) are data-hungry and specific to fixed environments. In this paper, we propose a principled framework for adaptive RL, called AdaRL, that adapts reliably to changes across domains. Specifically, we constr uct a generative environment model for the structural relationships among variables in the system and embed the changes in a compact way, which provides a clear and interpretable picture for locating what and where the changes are and how to adapt. Based on the environment model, we characterize a minimal set of representations, including both domain-specific factors and domain-shared state representations, that suffice for reliable and low-cost transfer. Moreover, we show that by explicitly leveraging a compact representation to encode changes, we can adapt the policy with only a few samples without further policy optimization in the target domain. We illustrate the efficacy of AdaRL through a series of experiments that allow for changes in different components of Cartpole and Atari games.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Learning Domain Randomization Distributions for Training Robust Locomotion Policies

109 - Melissa Mozifian , Juan Camilo Gamboa Higuera , David Meger 2019

Domain randomization (DR) is a successful technique for learning robust policies for robot systems, when the dynamics of the target robot system are unknown. The success of policies trained with domain randomization however, is highly dependent on th e correct selection of the randomization distribution. The majority of success stories typically use real world data in order to carefully select the DR distribution, or incorporate real world trajectories to better estimate appropriate randomization distributions. In this paper, we consider the problem of finding good domain randomization parameters for simulation, without prior access to data from the target system. We explore the use of gradient-based search methods to learn a domain randomization with the following properties: 1) The trained policy should be successful in environments sampled from the domain randomization distribution 2) The domain randomization distribution should be wide enough so that the experience similar to the target robot system is observed during training, while addressing the practicality of training finite capacity models. These two properties aim to ensure the trajectories encountered in the target system are close to those observed during training, as existing methods in machine learning are better suited for interpolation than extrapolation. We show how adapting the domain randomization distribution while training context-conditioned policies results in improvements on jump-start and asymptotic performance when transferring a learned policy to the target environment.

التعلم الآلي علم الروبوتات التعلم الالي

DIRL: Domain-Invariant Representation Learning for Sim-to-Real Transfer

73 - Ajay Kumar Tanwani 2020

Generating large-scale synthetic data in simulation is a feasible alternative to collecting/labelling real data for training vision-based deep learning models, albeit the modelling inaccuracies do not generalize to the physical world. In this paper, we present a domain-invariant representation learning (DIRL) algorithm to adapt deep models to the physical environment with a small amount of real data. Existing approaches that only mitigate the covariate shift by aligning the marginal distributions across the domains and assume the conditional distributions to be domain-invariant can lead to ambiguous transfer in real scenarios. We propose to jointly align the marginal (input domains) and the conditional (output labels) distributions to mitigate the covariate and the conditional shift across the domains with adversarial learning, and combine it with a triplet distribution loss to make the conditional distributions disjoint in the shared feature space. Experiments on digit domains yield state-of-the-art performance on challenging benchmarks, while sim-to-real transfer of object recognition for vision-based decluttering with a mobile robot improves from 26.8 % to 91.0 %, resulting in 86.5 % grasping accuracy of a wide variety of objects. Code and supplementary details are available at https://sites.google.com/view/dirl

الرؤية الحاسوبية وتمييز الأنماط علم الروبوتات