ﻻ يوجد ملخص باللغة العربية
We propose an active learning architecture for robots, capable of organizing its learning process to achieve a field of complex tasks by learning sequences of motor policies, called Intrinsically Motivated Procedure Babbling (IM-PB). The learner can generalize over its experience to continuously learn new tasks. It chooses actively what and how to learn based by empirical measures of its own progress. In this paper, we are considering the learning of a set of interrelated tasks outcomes hierarchically organized. We introduce a framework called procedures, which are sequences of policies defined by the combination of previously learned skills. Our algorithmic architecture uses the procedures to autonomously discover how to combine simple skills to achieve complex goals. It actively chooses between 2 strategies of goal-directed exploration : exploration of the policy space or the procedural space. We show on a simulated environment that our new architecture is capable of tackling the learning of complex motor policies, to adapt the complexity of its policies to the task at hand. We also show that our procedures framework helps the learner to tackle difficult hierarchical tasks.
Combining model-based and model-free learning systems has been shown to improve the sample efficiency of learning to perform complex robotic tasks. However, dual-system approaches fail to consider the reliability of the learned model when it is appli
In vision-based reinforcement learning (RL) tasks, it is prevalent to assign the auxiliary task with a surrogate self-supervised loss so as to obtain more semantic representations and improve sample efficiency. However, abundant information in self-s
In this paper, we present results from a human-subject study designed to explore two facets of human mental models of robots---inferred capability and intention---and their relationship to overall trust and eventual decisions. In particular, we exami
Reinforcement learning algorithms rely on exploration to discover new behaviors, which is typically achieved by following a stochastic policy. In continuous control tasks, policies with a Gaussian distribution have been widely adopted. Gaussian explo
Collective motion is found in various animal systems, active suspensions and robotic or virtual agents. This is often understood using high level models that directly encode selected empirical features, such as co-alignment and cohesion. Can these fe