ترغب بنشر مسار تعليمي؟ اضغط هنا

A Tensor Network Approach to Finite Markov Decision Processes

61   0   0.0 ( 0 )
 نشر من قبل Edward Gillman
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

Tensor network (TN) techniques - often used in the context of quantum many-body physics - have shown promise as a tool for tackling machine learning (ML) problems. The application of TNs to ML, however, has mostly focused on supervised and unsupervised learning. Yet, with their direct connection to hidden Markov chains, TNs are also naturally suited to Markov decision processes (MDPs) which provide the foundation for reinforcement learning (RL). Here we introduce a general TN formulation of finite, episodic and discrete MDPs. We show how this formulation allows us to exploit algorithms developed for TNs for policy optimisation, the key aim of RL. As an application we consider the issue - formulated as an RL problem - of finding a stochastic evolution that satisfies specific dynamical conditions, using the simple example of random walk excursions as an illustration.

قيم البحث

اقرأ أيضاً

Understanding the generative mechanism of a natural system is a vital component of the scientific method. Here, we investigate one of the fundamental steps toward this goal by presenting the minimal generator of an arbitrary binary Markov process. Th is is a class of processes whose predictive model is well known. Surprisingly, the generative model requires three distinct topologies for different regions of parameter space. We show that a previously proposed generator for a particular set of binary Markov processes is, in fact, not minimal. Our results shed the first quantitative light on the relative (minimal) costs of prediction and generation. We find, for instance, that the difference between prediction and generation is maximized when the process is approximately independently, identically distributed.
Approximate Newton methods are a standard optimization tool which aim to maintain the benefits of Newtons method, such as a fast rate of convergence, whilst alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov Decision Processes (MDPs). We first analyse the structure of the Hessian of the objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton Methods for MDPs. Like the Gauss-Newton method for non-linear least squares, these methods involve approximating the Hessian by ignoring certain terms in the Hessian which are difficult to estimate. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to affine transformation of the parameter space, and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss-Newton algorithm is closely related to both the EM-algorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains.
197 - S. Iblisdir 2013
Markov chains for probability distributions related to matrix product states and 1D Hamiltonians are introduced. With appropriate inverse temperature schedules, these chains can be combined into a random approximation scheme for ground states of such Hamiltonians. Numerical experiments suggest that a linear, i.e. fast, schedule is possible in non-trivial cases. A natural extension of these chains to 2D settings is next presented and tested. The obtained results compare well with Euclidean evolution. The proposed Markov chains are easy to implement and are inherently sign problem free (even for fermionic degrees of freedom).
59 - Zhengling Qi , Peng Liao 2020
We study the sequential decision making problem in Markov decision process (MDP) where each policy is evaluated by a set containing average rewards over different horizon lengths and with different initial distributions. Given a pre-collected dataset of multiple trajectories generated by some behavior policy, our goal is to learn a robust policy in a pre-specified policy class that can maximize the smallest value of this set. Leveraging the semi-parametric efficiency theory from statistics, we develop a policy learning method for estimating the defined robust optimal policy that can efficiently break the curse of horizon under mild technical conditions. A rate-optimal regret bound up to a logarithmic factor is established in terms of the number of trajectories and the number of decision points.
We present a 2D bosonization duality using the language of tensor networks. Specifically, we construct a tensor network operator (TNO) that implements an exact 2D bosonization duality. The primary benefit of the TNO is that it allows for bosonization at the level of quantum states. Thus, we use the TNO to provide an explicit algorithm for bosonizing fermionic projected entangled pair states (fPEPs). A key step in the algorithm is to account for a choice of spin-structure, encoded in a set of bonds of the bosonized fPEPS. This enables our tensor network approach to bosonization to be applied to systems on arbitrary triangulations of orientable 2D manifolds.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا