Robust Visual Domain Randomization for Reinforcement Learning

76 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل William Clements

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Reda Bahi Slaoui - William R. Clements - Jakob N. Foerster

التعلم الآلي الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Producing agents that can generalize to a wide range of visually different environments is a significant challenge in reinforcement learning. One method for overcoming this issue is visual domain randomization, whereby at the start of each training episode some visual aspects of the environment are randomized so that the agent is exposed to many possible variations. However, domain randomization is highly inefficient and may lead to policies with high variance across domains. Instead, we propose a regularization method whereby the agent is only trained on one variation of the environment, and its learned state representations are regularized during training to be invariant across domains. We conduct experiments that demonstrate that our technique leads to more efficient and robust learning than standard domain randomization, while achieving equal generalization scores.

قيم البحث

109 - Melissa Mozifian , Juan Camilo Gamboa Higuera , David Meger 2019

Domain randomization (DR) is a successful technique for learning robust policies for robot systems, when the dynamics of the target robot system are unknown. The success of policies trained with domain randomization however, is highly dependent on th e correct selection of the randomization distribution. The majority of success stories typically use real world data in order to carefully select the DR distribution, or incorporate real world trajectories to better estimate appropriate randomization distributions. In this paper, we consider the problem of finding good domain randomization parameters for simulation, without prior access to data from the target system. We explore the use of gradient-based search methods to learn a domain randomization with the following properties: 1) The trained policy should be successful in environments sampled from the domain randomization distribution 2) The domain randomization distribution should be wide enough so that the experience similar to the target robot system is observed during training, while addressing the practicality of training finite capacity models. These two properties aim to ensure the trajectories encountered in the target system are close to those observed during training, as existing methods in machine learning are better suited for interpolation than extrapolation. We show how adapting the domain randomization distribution while training context-conditioned policies results in improvements on jump-start and asymptotic performance when transferring a learned policy to the target environment.

التعلم الآلي علم الروبوتات التعلم الالي

Towards robust and domain agnostic reinforcement learning competitions

96 - William Hebgen Guss , Stephanie Milani , Nicholay Topin 2021

Reinforcement learning competitions have formed the basis for standard research benchmarks, galvanized advances in the state-of-the-art, and shaped the direction of the field. Despite this, a majority of challenges suffer from the same fundamental pr oblems: participant solutions to the posed challenge are usually domain-specific, biased to maximally exploit compute resources, and not guaranteed to be reproducible. In this paper, we present a new framework of competition design that promotes the development of algorithms that overcome these barriers. We propose four central mechanisms for achieving this end: submission retraining, domain randomization, desemantization through domain obfuscation, and the limitation of competition compute and environment-sample budget. To demonstrate the efficacy of this design, we proposed, organized, and ran the MineRL 2020 Competition on Sample-Efficient Reinforcement Learning. In this work, we describe the organizational outcomes of the competition and show that the resulting participant submissions are reproducible, non-specific to the competition environment, and sample/resource efficient, despite the difficult competition task.

التعلم الآلي الذكاء الاصطناعي الحوسبة العصبية والتطورية

How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies?

75 - Quan Vuong , Sharad Vikram , Hao Su 2019

Recently, reinforcement learning (RL) algorithms have demonstrated remarkable success in learning complicated behaviors from minimally processed input. However, most of this success is limited to simulation. While there are promising successes in app lying RL algorithms directly on real systems, their performance on more complex systems remains bottle-necked by the relative data inefficiency of RL algorithms. Domain randomization is a promising direction of research that has demonstrated impressive results using RL algorithms to control real robots. At a high level, domain randomization works by training a policy on a distribution of environmental conditions in simulation. If the environments are diverse enough, then the policy trained on this distribution will plausibly generalize to the real world. A human-specified design choice in domain randomization is the form and parameters of the distribution of simulated environments. It is unclear how to the best pick the form and parameters of this distribution and prior work uses hand-tuned distributions. This extended abstract demonstrates that the choice of the distribution plays a major role in the performance of the trained policies in the real world and that the parameter of this distribution can be optimized to maximize the performance of the trained policies in the real world

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Robust Reinforcement Learning for Continuous Control with Model Misspecification

122 - Daniel J. Mankowitz , Nir Levine , Rae Jeong 2019

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating r obustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. In addition, we show improved robust performance on a high-dimensional, simulated, dexterous robotic hand. Finally, we present multiple investigative experiments that provide a deeper insight into the robustness framework. This includes an adaptation to another continuous control RL algorithm as well as learning the uncertainty set from offline data. Performance videos can be found online at https://sites.google.com/view/robust-rl.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Robusta: Robust AutoML for Feature Selection via Reinforcement Learning

115 - Xiaoyang Wang , Bo Li , Yibo Zhang 2021

Several AutoML approaches have been proposed to automate the machine learning (ML) process, such as searching for the ML model architectures and hyper-parameters. However, these AutoML pipelines only focus on improving the learning accuracy of benign samples while ignoring the ML model robustness under adversarial attacks. As ML systems are increasingly being used in a variety of mission-critical applications, improving the robustness of ML systems has become of utmost importance. In this paper, we propose the first robust AutoML framework, Robusta--based on reinforcement learning (RL)--to perform feature selection, aiming to select features that lead to both accurate and robust ML systems. We show that a variation of the 0-1 robust loss can be directly optimized via an RL-based combinatorial search in the feature selection scenario. In addition, we employ heuristics to accelerate the search procedure based on feature scoring metrics, which are mutual information scores, tree-based classifiers feature importance scores, F scores, and Integrated Gradient (IG) scores, as well as their combinations. We conduct extensive experiments and show that the proposed framework is able to improve the model robustness by up to 22% while maintaining competitive accuracy on benign samples compared with other feature selection methods.

التعلم الآلي الذكاء الاصطناعي