ﻻ يوجد ملخص باللغة العربية
We cast policy gradient methods as the repeated application of two operators: a policy improvement operator $mathcal{I}$, which maps any policy $pi$ to a better one $mathcal{I}pi$, and a projection operator $mathcal{P}$, which finds the best approximation of $mathcal{I}pi$ in the set of realizable policies. We use this framework to introduce operator-bas
We use functional mirror ascent to propose a general framework (referred to as FMA-PG) for designing policy gradient methods. The functional perspective distinguishes between a policys functional representation (what are its sufficient statistics) an
We study the use of policy gradient algorithms to optimize over a class of generalized Thompson sampling policies. Our central insight is to view the posterior parameter sampled by Thompson sampling as a kind of pseudo-action. Policy gradient methods
Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as so
Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies. Their prim
We build a theoretical framework for designing and understanding practical meta-learning methods that integrates sophisticated formalizations of task-similarity with the extensive literature on online convex optimization and sequential prediction alg