ﻻ يوجد ملخص باللغة العربية
A common strategy to deal with the expensive reinforcement learning (RL) of complex tasks is to decompose them into a collection of subtasks that are usually simpler to learn as well as reusable for new problems. However, when a robot learns the policies for these subtasks, common approaches treat every policy learning process separately. Therefore, all these individual (composable) policies need to be learned before tackling the learning process of the complex task through policies composition. Moreover, such composition of individual policies is usually performed sequentially, which is not suitable for tasks that require to perform the subtasks concurrently. In this paper, we propose to combine a set of composable Gaussian policies corresponding to these subtasks using a set of activation vectors, resulting in a complex Gaussian policy that is a function of the means and covariances matrices of the composable policies. Moreover, we propose an algorithm for learning both compound and composable policies within the same learning process by exploiting the off-policy data generated from the compound policy. The algorithm is built on a maximum entropy RL approach to favor exploration during the learning process. The results of the experiments show that the experience collected with the compound policy permits not only to solve the complex task but also to obtain useful composable policies that successfully perform in their corresponding subtasks.
Reactive motion generation problems are usually solved by computing actions as a sum of policies. However, these policies are independent of each other and thus, they can have conflicting behaviors when summing their contributions together. We introd
We address the problem of learning hierarchical deep neural network policies for reinforcement learning. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each
Humans are able to perform a myriad of sophisticated tasks by drawing upon skills acquired through prior experience. For autonomous agents to have this capability, they must be able to extract reusable skills from past experience that can be recombin
A general-purpose intelligent robot must be able to learn autonomously and be able to accomplish multiple tasks in order to be deployed in the real world. However, standard reinforcement learning approaches learn separate task-specific policies and a
Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time