Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

105 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Chen Tessler

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Chen Tessler - Tom Zahavy - Deborah Cohen

التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces. Specifically, we introduce a new compressed sensing algorithm, named IK-OMP, which can be seen as an extension to the Orthogonal Matching Pursuit (OMP). We incorporate IK-OMP into a supervised imitation learning setting and show that the combined approach (Sparse Imitation Learning, Sparse-IL) solves the entire text-based game of Zork1 with an action space of approximately 10 million actions given both perfect and noisy demonstrations.

قيم البحث

اقرأ أيضاً

Growing Action Spaces

120 - Gregory Farquhar , Laura Gustafson , Zeming Lin 2019

In complex tasks, such as those with large combinatorial action spaces, random exploration may be too inefficient to achieve meaningful learning progress. In this work, we use a curriculum of progressively growing action spaces to accelerate learning . We assume the environment is out of our control, but that the agent may set an internal curriculum by initially restricting its action space. Our approach uses off-policy reinforcement learning to estimate optimal value functions for multiple action spaces simultaneously and efficiently transfers data, value estimates, and state representations from restricted action spaces to the full task. We show the efficacy of our approach in proof-of-concept control tasks and on challenging large-scale StarCraft micromanagement tasks with large, multi-agent action spaces.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Learning Action-Transferable Policy with Action Embedding

156 - Yu Chen , Yingfeng Chen , Zhipeng Hu 2019

Transfer learning (TL) is a promising way to improve the sample efficiency of reinforcement learning. However, how to efficiently transfer knowledge across tasks with different state-action spaces is investigated at an early stage. Most previous stud ies only addressed the inconsistency across different state spaces by learning a common feature space, without considering that similar actions in different action spaces of related tasks share similar semantics. In this paper, we propose a method to learning action embeddings by leveraging this idea, and a framework that learns both state embeddings and action embeddings to transfer policy across tasks with different state and action spaces. Our experimental results on various tasks show that the proposed method can not only learn informative action embeddings but accelerate policy learning.

التعلم الآلي الذكاء الاصطناعي

Learning and Planning in Complex Action Spaces

111 - Thomas Hubert , Julian Schrittwieser , Ioannis Antonoglou andn Mohammadamin Barekatain 2021

Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluatio n and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.

التعلم الآلي

Safe Deep Reinforcement Learning for Multi-Agent Systems with Continuous Action Spaces

254 - Ziyad Sheebaelhamd , Konstantinos Zisis , Athina Nisioti 2021

Multi-agent control problems constitute an interesting area of application for deep reinforcement learning models with continuous action spaces. Such real-world applications, however, typically come with critical safety constraints that must not be v iolated. In order to ensure safety, we enhance the well-known multi-agent deep deterministic policy gradient (MADDPG) framework by adding a safety layer to the deep policy network. In particular, we extend the idea of linearizing the single-step transition dynamics, as was done for single-agent systems in Safe DDPG (Dalal et al., 2018), to multi-agent settings. We additionally propose to circumvent infeasibility problems in the action correction step using soft constraints (Kerrigan & Maciejowski, 2000). Results from the theory of exact penalty functions can be used to guarantee constraint satisfaction of the soft constraints under mild assumptions. We empirically find that the soft formulation achieves a dramatic decrease in constraint violations, making safety available even during the learning procedure.

التعلم الآلي علم الروبوتات

Progressive extension of reinforcement learning action dimension for asymmetric assembly tasks

154 - Yuhang Gai , Jiuming Guo , Dan Wu 2021

Reinforcement learning (RL) is always the preferred embodiment to construct the control strategy of complex tasks, like asymmetric assembly tasks. However, the convergence speed of reinforcement learning severely restricts its practical application. In this paper, the convergence is first accelerated by combining RL and compliance control. Then a completely innovative progressive extension of action dimension (PEAD) mechanism is proposed to optimize the convergence of RL algorithms. The PEAD method is verified in DDPG and PPO. The results demonstrate the PEAD method will enhance the data-efficiency and time-efficiency of RL algorithms as well as increase the stable reward, which provides more potential for the application of RL.

التعلم الآلي أنظمة وتحكم أنظمة وتحكم