No Arabic abstract
This paper studies asymptotic solvability of a linear quadratic (LQ) mean field social optimization problem with controlled diffusions and indefinite state and control weights. Starting with an $N$-agent model, we employ a rescaling approach to derive a low-dimensional Riccati ordinary differential equation (ODE) system, which characterizes a necessary and sufficient condition for asymptotic solvability. The decentralized control obtained from the mean field limit ensures a bounded optimality loss in minimizing the social cost having magnitude $O(N)$, which implies an optimality loss of $O(1/N)$ per agent. We further quantify the efficiency gain of the social optimum with respect to the solution of the mean field game.
This paper studies an asymptotic solvability problem for linear quadratic (LQ) mean field games with controlled diffusions and indefinite weights for the state and control in the costs. We employ a rescaling approach to derive a low dimensional Riccati ordinary differential equation (ODE) system, which characterizes a necessary and sufficient condition for asymptotic solvability. The rescaling technique is further used for performance estimates, establishing an $O(1/N)$-Nash equilibrium for the obtained decentralized strategies.
We provide an exhaustive treatment of Linear-Quadratic control problems for a class of stochastic Volterra equations of convolution type, whose kernels are Laplace transforms of certain signed matrix measures which are not necessarily finite. These equations are in general neither Markovian nor semimartingales, and include the fractional Brownian motion with Hurst index smaller than $1/2$ as a special case. We establish the correspondence of the initial problem with a possibly infinite dimensional Markovian one in a Banach space, which allows us to identify the Markovian controlled state variables. Using a refined martingale verification argument combined with a squares completion technique, we prove that the value function is of linear quadratic form in these state variables with a linear optimal feedback control, depending on non-standard Banach space valued Riccati equations. Furthermore, we show that the value function of the stochastic Volterra optimization problem can be approximated by that of conventional finite dimensional Markovian Linear--Quadratic problems, which is of crucial importance for numerical implementation.
We study a family of optimal control problems in which one aims at minimizing a cost that mixes a quadratic control penalization and the variance of the system, both for finitely many agents and for the mean-field dynamics as their number goes to infinity. While solutions of the discrete problem always exist in a unique and explicit form, the behavior of their macroscopic counterparts is very sensitive to the magnitude of the time horizon and penalization parameter. When one minimizes the final variance, there always exists a Lipschitz-in-space optimal controls for the infinite dimensional problem, which can be obtained as a suitable extension of the optimal controls for the finite-dimensional problems. The same holds true for variance maximizations whenever the time horizon is sufficiently small. On the contrary, for large final times (or equivalently for small penalizations of the control cost), it can be proven that there does not exist Lipschitz-regular optimal controls for the macroscopic problem.
In this paper, we show existence and uniqueness of solutions of the infinite horizon McKean-Vlasov FBSDEs using two different methods, which lead to two different sets of assumptions. We use these results to solve the infinite horizon mean field type control problems and mean field games.
This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerances inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Lastly, we numerically test ZODPO on multi-zone HVAC systems.