ترغب بنشر مسار تعليمي؟ اضغط هنا

A Study of Policy Gradient on a Class of Exactly Solvable Models

60   0   0.0 ( 0 )
 نشر من قبل Gavin McCracken
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Policy gradient methods are extensively used in reinforcement learning as a way to optimize expected return. In this paper, we explore the evolution of the policy parameters, for a special class of exactly solvable POMDPs, as a continuous-state Markov chain, whose transition probabilities are determined by the gradient of the distribution of the policys value. Our approach relies heavily on random walk theory, specifically on affine Weyl groups. We construct a class of novel partially observable environments with controllable exploration difficulty, in which the value distribution, and hence the policy parameter evolution, can be derived analytically. Using these environments, we analyze the probabilistic convergence of policy gradient to different local maxima of the value function. To our knowledge, this is the first approach developed to analytically compute the landscape of policy gradient in POMDPs for a class of such environments, leading to interesting insights into the difficulty of this problem.



قيم البحث

اقرأ أيضاً

We study diffusion of hardcore particles on a one dimensional periodic lattice subjected to a constraint that the separation between any two consecutive particles does not increase beyond a fixed value $(n+1);$ initial separation larger than $(n+1)$ can however decrease. These models undergo an absorbing state phase transition when the conserved particle density of the system falls bellow a critical threshold $rho_c= 1/(n+1).$ We find that $phi_k$s, the density of $0$-clusters ($0$ representing vacancies) of size $0le k<n,$ vanish at the transition point along with activity density $rho_a$. The steady state of these models can be written in matrix product form to obtain analytically the static exponents $beta_k= n-k, u=1=eta$ corresponding to each $phi_k$. We also show from numerical simulations that starting from a natural condition, $phi_k(t)$s decay as $t^{-alpha_k}$ with $alpha_k= (n-k)/2$ even though other dynamic exponents $ u_t=2=z$ are independent of $k$; this ensures the validity of scaling laws $beta= alpha u_t,$ $ u_t = z u$.
In this paper a review is given of a class of sub-models of both approaches, characterized by the fact that they can be solved exactly, highlighting in the process a number of generic results related to both the nature of pair-correlated systems as w ell as collective modes of motion in the atomic nucleus.
136 - Urna Basu , P. K. Mohanty 2009
We introduce and solve a model of hardcore particles on a one dimensional periodic lattice which undergoes an active-absorbing state phase transition at finite density. In this model an occupied site is defined to be active if its left neighbour is o ccupied and the right neighbour is vacant. Particles from such active sites hop stochastically to their right. We show that, both the density of active sites and the survival probability vanish as the particle density is decreased below half. The critical exponents and spatial correlations of the model are calculated exactly using the matrix product ansatz. Exact analytical study of several variations of the model reveals that these non-equilibrium phase transitions belong to a new universality class different from the generic active-absorbing-state phase transition, namely directed percolation.
Some results for two distinct but complementary exactly solvable algebraic models for pairing in atomic nuclei are presented: 1) binding energy predictions for isotopic chains of nuclei based on an extended pairing model that includes multi-pair exci tations; and 2) fine structure effects among excited $0^+$ states in $N approx Z$ nuclei that track with the proton-neutron ($pn$) and like-particle isovector pairing interactions as realized within an algebraic $sp(4)$ shell model. The results show that these models can be used to reproduce significant ranges of known experimental data, and in so doing, confirm their power to predict pairing-dominated phenomena in domains where data is unavailable.
Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is especially observed in many widely popular policy gradient algorithms th at perform updates using on-policy samples. The price of such inefficiency becomes evident in real-world scenarios such as interaction-driven robot learning, where the success of RL has been rather limited. We address this issue by building on the general sample efficiency of off-policy algorithms. With nonparametric regression and density estimation methods we construct a nonparametric Bellman equation in a principled manner, which allows us to obtain closed-form estimates of the value function, and to analytically express the full policy gradient. We provide a theoretical analysis of our estimate to show that it is consistent under mild smoothness assumptions and empirically show that our approach has better sample efficiency than state-of-the-art policy gradient methods.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا