An operator view of policy gradient methods


الملخص بالإنكليزية

We cast policy gradient methods as the repeated application of two operators: a policy improvement operator $mathcal{I}$, which maps any policy $pi$ to a better one $mathcal{I}pi$, and a projection operator $mathcal{P}$, which finds the best approximation of $mathcal{I}pi$ in the set of realizable policies. We use this framework to introduce operator-bas

تحميل البحث