Programmable Agents

56 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Misha Denil

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Misha Denil - Sergio Gomez Colmenarejo - Serkan Cabi

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We build deep RL agents that execute declarative programs expressed in formal language. The agents learn to ground the terms in this language in their environment, and can generalize their behavior at test time to execute new programs that refer to objects that were not referenced during training. The agents develop disentangled interpretable representations that allow them to generalize to a wide variety of zero-shot semantic tasks.

قيم البحث

302 - Joel Z. Leibo , Cyprien de Masson dAutume , Daniel Zoran 2018

Psychlab is a simulated psychology laboratory inside the first-person 3D game world of DeepMind Lab (Beattie et al. 2016). Psychlab enables implementations of classical laboratory psychological experiments so that they work with both human and artifi cial agents. Psychlab has a simple and flexible API that enables users to easily create their own tasks. As examples, we are releasing Psychlab implementations of several classical experimental paradigms including visual search, change detection, random dot motion discrimination, and multiple object tracking. We also contribute a study of the visual psychophysics of a specific state-of-the-art deep reinforcement learning agent: UNREAL (Jaderberg et al. 2016). This study leads to the surprising conclusion that UNREAL learns more quickly about larger target stimuli than it does about smaller stimuli. In turn, this insight motivates a specific improvement in the form of a simple model of foveal vision that turns out to significantly boost UNREALs performance, both on Psychlab tasks, and on standard DeepMind Lab tasks. By open-sourcing Psychlab we hope to facilitate a range of future such studies that simultaneously advance deep reinforcement learning and improve its links with cognitive science.

الذكاء الاصطناعي الحوسبة العصبية والتطورية الخلايا العصبية والإدراك

Policy Supervectors: General Characterization of Agents by their Behaviour

73 - Anssi Kanervisto , Tomi Kinnunen , Ville Hautamaki 2020

By studying the underlying policies of decision-making agents, we can learn about their shortcomings and potentially improve them. Traditionally, this has been done either by examining the agents implementation, its behaviour while it is being execut ed, its performance with a reward/fitness function or by visualizing the density of states the agent visits. However, these methods fail to describe the policys behaviour in complex, high-dimensional environments or do not scale to thousands of policies, which is required when studying training algorithms. We propose policy supervectors for characterizing agents by the distribution of states they visit, adopting successful techniques from the area of speech technology. Policy supervectors can characterize policies regardless of their design philosophy (e.g. rule-based vs. neural networks) and scale to thousands of policies on a single workstation machine. We demonstrate methods applicability by studying the evolution of policies during reinforcement learning, evolutionary training and imitation learning, providing insight on e.g. how the search space of evolutionary algorithms is also reflected in agents behaviour, not just in the parameters.

الذكاء الاصطناعي الحوسبة العصبية والتطورية

Meta-trained agents implement Bayes-optimal agents

185 - Vladimir Mikulik , Gregoire Deletang , Tom McGrath 2020

Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises a gents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently dont possess tractable models.

الذكاء الاصطناعي التعلم الآلي الحوسبة العصبية والتطورية

Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?

88 - Maithra Raghu , Alex Irpan , Jacob Andreas 2017

Deep reinforcement learning has achieved many recent successes, but our understanding of its strengths and limitations is hampered by the lack of rich environments in which we can fully characterize optimal behavior, and correspondingly diagnose indi vidual actions against such a characterization. Here we consider a family of combinatorial games, arising from work of Erdos, Selfridge, and Spencer, and we propose their use as environments for evaluating and comparing different approaches to reinforcement learning. These games have a number of appealing features: they are challenging for current learning approaches, but they form (i) a low-dimensional, simply parametrized environment where (ii) there is a linear closed form solution for optimal behavior from any state, and (iii) the difficulty of the game can be tuned by changing environment parameters in an interpretable way. We use these Erdos-Selfridge-Spencer games not only to compare different algorithms, but test for generalization, make comparisons to supervised learning, analyse multiagent play, and even develop a self play algorithm. Code can be found at: https://github.com/rubai5/ESS_Game

الذكاء الاصطناعي الحوسبة العصبية والتطورية التعلم الالي

Being curious about the answers to questions: novelty search with learned attention

226 - Nicholas Guttenberg , Martin Biehl , Nathaniel Virgo 2018

We investigate the use of attentional neural network layers in order to learn a `behavior characterization which can be used to drive novelty search and curiosity-based policies. The space is structured towards answering a particular distribution of questions, which are used in a supervised way to train the attentional neural network. We find that in a 2d exploration task, the structure of the space successfully encodes local sensory-motor contingencies such that even a greedy local `do the most novel action policy with no reinforcement learning or evolution can explore the space quickly. We also apply this to a high/low number guessing game task, and find that guessing according to the learned attention profile performs active inference and can discover the correct number more quickly than an exact but passive approach.

الذكاء الاصطناعي الحوسبة العصبية والتطورية التعلم الالي