What Can Learned Intrinsic Rewards Capture?

126 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zeyu Zheng

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zeyu Zheng - Junhyuk Oh - Matteo Hessel

الذكاء الاصطناعي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the reward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful intrinsic reward functions across multiple lifetimes of experience. Through several proof-of-concept experiments, we show that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. Furthermore, we show that unlike policy transfer methods that capture how the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing what the agent should strive to do.

قيم البحث

83 - Zeyu Zheng , Junhyuk Oh , Satinder Singh 2018

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et.al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents. We compare the performance of an augmented agent that uses our algorithm to provide additive intrinsic rewards to an A2C-based policy learner (for Atari games) and a PPO-based policy learner (for Mujoco domains) with a baseline agent that uses the same policy learners but with only extrinsic rewards. Our results show improved performance on most but not all of the domains.

الذكاء الاصطناعي التعلم الآلي التعلم الالي

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

215 - Michael S. Ryoo , AJ Piergiovanni , Anurag Arnab 2021

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks. Instead of relying on hand-designed splitting strategies t o obtain visual tokens and processing a large number of densely sampled patches for attention, our approach learns to mine important tokens in visual data. This results in efficiently and effectively finding a few important visual tokens and enables modeling of pairwise attention between such tokens, over a longer temporal horizon for videos, or the spatial content in images. Our experiments demonstrate strong performance on several challenging benchmarks for both image and video recognition tasks. Importantly, due to our tokens being adaptive, we accomplish competitive results at significantly reduced compute amount.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

Game Plan: What AI can do for Football, and What Football can do for AI

409 - Karl Tuyls , Shayegan Omidshafiei , Paul Muller 2020

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with the goal of better addressing new scientific challenges involved in the analysis of both individual players and coordinated teams behaviors. The research challenges associated with predictive and prescriptive football analytics require new developments and progress at the intersection of statistical learning, game theory, and computer vision. In this paper, we provide an overarching perspective highlighting how the combination of these fields, in particular, forms a unique microcosm for AI research, while offering mutual benefits for professional teams, spectators, and broadcasters in the years to come. We illustrate that this duality makes football analytics a game changer of tremendous value, in terms of not only changing the game of football itself, but also in terms of what this domain can mean for the field of AI. We review the state-of-the-art and exemplify the types of analysis enabled by combining the aforementioned fields, including illustrative examples of counterfactual analysis using predictive models, and the combination of game-theoretic analysis of penalty kicks with statistical learning of player attributes. We conclude by highlighting envisioned downstream impacts, including possibilities for extensions to other sports (real and virtual).

الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب أنظمة متعددة العملاء

Fine structure of giant resonances: What can be learned

76 - Peter von Neumann-Cosel 2018

Fine structure of giant resonances (GR) has been established in recent years as a global phenomenon across the nuclear chart and for different types of resonances. A quantitative description of the fine structure in terms of characteristic scales der ived by wavelet techniques is discussed. By comparison with microscpic calculations of GR strength distributions one can extract information on the role of different decay mechanisms contributing to the width of GRs. The observed cross-section fluctuations contain information on the level density (LD) of states with a given spin and parity defined by the multipolarity of the GR.

التجربة النووية نظرية نووية

Systematic Generalization: What Is Required and Can It Be Learned?

236 - Dzmitry Bahdanau , Shikhar Murty , Michael Noukhovitch 2018

Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantia ted. We compare both types of models in how much they lend themselves to a particular form of systematic generalization. Using a synthetic VQA test, we evaluate which models are capable of reasoning about all possible object pairs after training on only a small subset of them. Our findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected. We furthermore investigate if modular models that generalize well could be made more end-to-end by learning their layout and parametrization. We find that end-to-end methods from prior work often learn inappropriate layouts or parametrizations that do not facilitate systematic generalization. Our results suggest that, in addition to modularity, systematic generalization in language understanding may require explicit regularizers or priors.

الحساب واللغة الذكاء الاصطناعي