ترغب بنشر مسار تعليمي؟ اضغط هنا

Humans and other intelligent animals evolved highly sophisticated perception systems that combine multiple sensory modalities. On the other hand, state-of-the-art artificial agents rely mostly on visual inputs or structured low-dimensional observatio ns provided by instrumented environments. Learning to act based on combined visual and auditory inputs is still a new topic of research that has not been explored beyond simple scenarios. To facilitate progress in this area we introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations. We study the performance of different model architectures in a series of tasks that require the agent to recognize sounds and execute instructions given in natural language. Finally, we train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary. We are currently in the process of merging the augmented simulator with the main ViZDoom code repository. Video demonstrations and experiment code can be found at https://sites.google.com/view/sound-rl.
Reinforcement learning (RL) research focuses on general solutions that can be applied across different domains. This results in methods that RL practitioners can use in almost any domain. However, recent studies often lack the engineering steps (tric ks) which may be needed to effectively use RL, such as reward shaping, curriculum learning, and splitting a large task into smaller chunks. Such tricks are common, if not necessary, to achieve state-of-the-art results and win RL competitions. To ease the engineering efforts, we distill descriptions of tricks from state-of-the-art results and study how well these tricks can improve a standard deep Q-learning agent. The long-term goal of this work is to enable combining proven RL methods with domain-specific tricks by providing a unified software framework and accompanying insights in multiple domains.
By studying the underlying policies of decision-making agents, we can learn about their shortcomings and potentially improve them. Traditionally, this has been done either by examining the agents implementation, its behaviour while it is being execut ed, its performance with a reward/fitness function or by visualizing the density of states the agent visits. However, these methods fail to describe the policys behaviour in complex, high-dimensional environments or do not scale to thousands of policies, which is required when studying training algorithms. We propose policy supervectors for characterizing agents by the distribution of states they visit, adopting successful techniques from the area of speech technology. Policy supervectors can characterize policies regardless of their design philosophy (e.g. rule-based vs. neural networks) and scale to thousands of policies on a single workstation machine. We demonstrate methods applicability by studying the evolution of policies during reinforcement learning, evolutionary training and imitation learning, providing insight on e.g. how the search space of evolutionary algorithms is also reflected in agents behaviour, not just in the parameters.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا