ترغب بنشر مسار تعليمي؟ اضغط هنا

Mean-field Markov decision processes with common noise and open-loop controls

85   0   0.0 ( 0 )
 نشر من قبل Huyen Pham
 تاريخ النشر 2019
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

We develop an exhaustive study of Markov decision process (MDP) under mean field interaction both on states and actions in the presence of common noise, and when optimization is performed over open-loop controls on infinite horizon. Such model, called CMKV-MDP for conditional McKean-Vlasov MDP, arises and is obtained here rigorously with a rate of convergence as the asymptotic problem of N-cooperative agents controlled by a social planner/influencer that observes the environment noises but not necessarily the individual states of the agents. We highlight the crucial role of relaxed controls and randomization hypothesis for this class of models with respect to classical MDP theory. We prove the correspondence between CMKV-MDP and a general lifted MDP on the space of probability measures, and establish the dynamic programming Bellman fixed point equation satisfied by the value function, as well as the existence of-optimal randomized feedback controls. The arguments of proof involve an original measurable optimal coupling for the Wasserstein distance. This provides a procedure for learning strategies in a large population of interacting collaborative agents. MSC Classification: 90C40, 49L20.



قيم البحث

اقرأ أيضاً

A theory of existence and uniqueness is developed for general stochastic differential mean field games with common noise. The concepts of strong and weak solutions are introduced in analogy with the theory of stochastic differential equations, and ex istence of weak solutions for mean field games is shown to hold under very general assumptions. Examples and counter-examples are provided to enlighten the underpinnings of the existence theory. Finally, an analog of the famous result of Yamada and Watanabe is derived, and it is used to prove existence and uniqueness of a strong solution under additional assumptions.
We study discrete-time discounted constrained Markov decision processes (CMDPs) on Borel spaces with unbounded reward functions. In our approach the transition probability functions are weakly or set-wise continuous. The reward functions are upper se micontinuous in state-action pairs or semicontinuous in actions. Our aim is to study models with unbounded reward functions, which are often encountered in applications, e.g., in consumption/investment problems. We provide some general assumptions under which the optimization problems in CMDPs are solvable in the class of stationary randomized policies. Then, we indicate that if the initial distribution and transition probabilities are non-atomic, then using a general purification result of Feinberg and Piunovskiy, stationary optimal policies can be deterministic. Our main results are illustrated by five examples.
In a variety of applications, an agents success depends on the knowledge that an adversarial observer has or can gather about the agents decisions. It is therefore desirable for the agent to achieve a task while reducing the ability of an observer to infer the agents policy. We consider the task of the agent as a reachability problem in a Markov decision process and study the synthesis of policies that minimize the observers ability to infer the transition probabilities of the agent between the states of the Markov decision process. We introduce a metric that is based on the Fisher information as a proxy for the information leaked to the observer and using this metric formulate a problem that minimizes expected total information subject to the reachability constraint. We proceed to solve the problem using convex optimization methods. To verify the proposed method, we analyze the relationship between the expected total information and the estimation error of the observer, and show that, for a particular class of Markov decision processes, these two values are inversely proportional.
167 - Haotian Gu , Xin Guo , Xiaoli Wei 2019
Dynamic programming principle (DPP) is fundamental for control and optimization, including Markov decision problems (MDPs), reinforcement learning (RL), and more recently mean-field controls (MFCs). However, in the learning framework of MFCs, DPP has not been rigorously established, despite its critical importance for algorithm designs. In this paper, we first present a simple example in MFCs with learning where DPP fails with a mis-specified Q function; and then propose the correct form of Q function in an appropriate space for MFCs with learning. This particular form of Q function is different from the classical one and is called the IQ function. In the special case when the transition probability and the reward are independent of the mean-field information, it integrates the classical Q function for single-agent RL over the state-action distribution. In other words, MFCs with learning can be viewed as lifting the classical RLs by replacing the state-action space with its probability distribution space. This identification of the IQ function enables us to establish precisely the DPP in the learning framework of MFCs. Finally, we illustrate through numerical experiments the time consistency of this IQ function.
Mean field games are concerned with the limit of large-population stochastic differential games where the agents interact through their empirical distribution. In the classical setting, the number of players is large but fixed throughout the game. Ho wever, in various applications, such as population dynamics or economic growth, the number of players can vary across time which may lead to different Nash equilibria. For this reason, we introduce a branching mechanism in the population of agents and obtain a variation on the mean field game problem. As a first step, we study a simple model using a PDE approach to illustrate the main differences with the classical setting. We prove existence of a solution and show that it provides an approximate Nash-equilibrium for large population games. We also present a numerical example for a linear--quadratic model. Then we study the problem in a general setting by a probabilistic approach. It is based upon the relaxed formulation of stochastic control problems which allows us to obtain a general existence result.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا