Intrinsic Lipschitz Regularity of Mean-Field Optimal Controls

110 0 0.0 ( 0 )

Download Cite

Added by Beno\\^it Bonnet

Publication date 2019

fields

and research's language is English

Authors Beno^it Bonnet - Francesco Rossi

Optimization and Control

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this article, we provide sufficient conditions under which the controlled vector fields solution of optimal control problems formulated on continuity equations are Lipschitz regular in space. Our approach involves a novel combination of mean-field approximations for infinite-dimensional multi-agent optimal control problems, along with a careful extension of an existence result of locally optimal Lipschitz feedbacks. The latter is based on the reformulation of a coercivity estimate in the language of Wasserstein calculus, which is used to obtain uniform Lipschitz bounds along sequences of approximations by empirical measures.

rate research

Dynamic Programming Principles for Mean-Field Controls with Learning

167 - Haotian Gu , Xin Guo , Xiaoli Wei 2019

Dynamic programming principle (DPP) is fundamental for control and optimization, including Markov decision problems (MDPs), reinforcement learning (RL), and more recently mean-field controls (MFCs). However, in the learning framework of MFCs, DPP has not been rigorously established, despite its critical importance for algorithm designs. In this paper, we first present a simple example in MFCs with learning where DPP fails with a mis-specified Q function; and then propose the correct form of Q function in an appropriate space for MFCs with learning. This particular form of Q function is different from the classical one and is called the IQ function. In the special case when the transition probability and the reward are independent of the mean-field information, it integrates the classical Q function for single-agent RL over the state-action distribution. In other words, MFCs with learning can be viewed as lifting the classical RLs by replacing the state-action space with its probability distribution space. This identification of the IQ function enables us to establish precisely the DPP in the learning framework of MFCs. Finally, we illustrate through numerical experiments the time consistency of this IQ function.

Optimization and Control

Variance Optimization and Control Regularity for Mean-Field Dynamics

196 - Beno^it Bonnet , Francesco Rossi 2021

We study a family of optimal control problems in which one aims at minimizing a cost that mixes a quadratic control penalization and the variance of the system, both for finitely many agents and for the mean-field dynamics as their number goes to infinity. While solutions of the discrete problem always exist in a unique and explicit form, the behavior of their macroscopic counterparts is very sensitive to the magnitude of the time horizon and penalization parameter. When one minimizes the final variance, there always exists a Lipschitz-in-space optimal controls for the infinite dimensional problem, which can be obtained as a suitable extension of the optimal controls for the finite-dimensional problems. The same holds true for variance maximizations whenever the time horizon is sufficiently small. On the contrary, for large final times (or equivalently for small penalizations of the control cost), it can be proven that there does not exist Lipschitz-regular optimal controls for the macroscopic problem.

Optimization and Control

Mean-field Markov decision processes with common noise and open-loop controls

84 - Mederic Motte , Huy^en Pham (UPD7 2019

We develop an exhaustive study of Markov decision process (MDP) under mean field interaction both on states and actions in the presence of common noise, and when optimization is performed over open-loop controls on infinite horizon. Such model, called CMKV-MDP for conditional McKean-Vlasov MDP, arises and is obtained here rigorously with a rate of convergence as the asymptotic problem of N-cooperative agents controlled by a social planner/influencer that observes the environment noises but not necessarily the individual states of the agents. We highlight the crucial role of relaxed controls and randomization hypothesis for this class of models with respect to classical MDP theory. We prove the correspondence between CMKV-MDP and a general lifted MDP on the space of probability measures, and establish the dynamic programming Bellman fixed point equation satisfied by the value function, as well as the existence of-optimal randomized feedback controls. The arguments of proof involve an original measurable optimal coupling for the Wasserstein distance. This provides a procedure for learning strategies in a large population of interacting collaborative agents. MSC Classification: 90C40, 49L20.

Optimization and Control Probability

A Mean Field Game of Optimal Portfolio Liquidation

165 - Guanxing Fu , Paulwin Graewe , Ulrich Horst 2018

We consider a mean field game (MFG) of optimal portfolio liquidation under asymmetric information. We prove that the solution to the MFG can be characterized in terms of a FBSDE with possibly singular terminal condition on the backward component or, equivalently, in terms of a FBSDE with finite terminal value, yet singular driver. Extending the method of continuation to linear-quadratic FBSDE with singular driver we prove that the MFG has a unique solution. Our existence and uniqueness result allows to prove that the MFG with possibly singular terminal condition can be approximated by a sequence of MFGs with finite terminal values.

Optimization and Control Probability

Optimal and Sub-optimal Feedback Controls for Biogas Production

63 - Antoine Haddon 2019

We revisit the optimal control problem of maximizing biogas production in continuous bio-processes in two directions: 1. over an infinite horizon, 2. with sub-optimal controllers independent of the time horizon. For the first point, we identify a set of optimal controls for the problems with an averaged reward and with a discounted reward when the discount factor goes to 0 and we show that the value functions of both problems are equal. For the finite horizon problem, our approach relies on a framing of the value function by considering a different reward for which the optimal solution has an explicit optimal feedback that is time-independent. In particular, we show that this technique allows us to provide explicit bounds on the sub-optimality of the proposed controllers. The various strategies are finally illustrated on Haldane and Contois growth functions.

Optimization and Control