Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games

99 0 0.0 ( 0 )

Download Cite

Added by Muhammad Aneeq Uz Zaman

Publication date 2020

fields Electronic Engineering Informatics Engineering

and research's language is English

Authors Muhammad Aneeq uz Zaman - Kaiqing Zhang - Erik Miehling

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

In this paper, we study large population multi-agent reinforcement learning (RL) in the context of discrete-time linear-quadratic mean-field games (LQ-MFGs). Our setting differs from most existing work on RL for MFGs, in that we consider a non-stationary MFG over an infinite horizon. We propose an actor-critic algorithm to iteratively compute the mean-field equilibrium (MFE) of the LQ-MFG. There are two primary challenges: i) the non-stationarity of the MFG induces a linear-quadratic tracking problem, which requires solving a backwards-in-time (non-causal) equation that cannot be solved by standard (causal) RL algorithms; ii) Many RL algorithms assume that the states are sampled from the stationary distribution of a Markov chain (MC), that is, the chain is already mixed, an assumption that is not satisfied for real data sources. We first identify that the mean-field trajectory follows linear dynamics, allowing the problem to be reformulated as a linear quadratic Gaussian problem. Under this reformulation, we propose an actor-critic algorithm that allows samples to be drawn from an unmixed MC. Finite-sample convergence guarantees for the algorithm are then provided. To characterize the performance of our algorithm in multi-agent RL, we have developed an error bound with respect to the Nash equilibrium of the finite-population game.

rate research

Approximate Equilibrium Computation for Discrete-Time Linear-Quadratic Mean-Field Games

134 - Muhammad Aneeq uz Zaman , Kaiqing Zhang , Erik Miehling 2020

While the topic of mean-field games (MFGs) has a relatively long history, heretofore there has been limited work concerning algorithms for the computation of equilibrium control policies. In this paper, we develop a computable policy iteration algorithm for approximating the mean-field equilibrium in linear-quadratic MFGs with discounted cost. Given the mean-field, each agent faces a linear-quadratic tracking problem, the solution of which involves a dynamical system evolving in retrograde time. This makes the development of forward-in-time algorithm updates challenging. By identifying a structural property of the mean-field update operator, namely that it preserves sequences of a particular form, we develop a forward-in-time equilibrium computation algorithm. Bounds that quantify the accuracy of the computed mean-field equilibrium as a function of the algorithms stopping condition are provided. The optimality of the computed equilibrium is validated numerically. In contrast to the most recent/concurrent results, our algorithm appears to be the first to study infinite-horizon MFGs with non-stationary mean-field equilibria, though with focus on the linear quadratic setting.

Systems and Control Multiagent Systems Systems and Control

Partially Observed Discrete-Time Risk-Sensitive Mean Field Games

258 - Naci Saldi , Tamer Basar , Maxim Raginsky 2020

In this paper, we consider discrete-time partially observed mean-field games with the risk-sensitive optimality criterion. We introduce risk-sensitivity behaviour for each agent via an exponential utility function. In the game model, each agent is weakly coupled with the rest of the population through its individual cost and state dynamics via the empirical distribution of states. We establish the mean-field equilibrium in the infinite-population limit using the technique of converting the underlying original partially observed stochastic control problem to a fully observed one on the belief space and the dynamic programming principle. Then, we show that the mean-field equilibrium policy, when adopted by each agent, forms an approximate Nash equilibrium for games with sufficiently many agents. We first consider finite-horizon cost function, and then, discuss extension of the result to infinite-horizon cost in the next-to-last section of the paper.

Systems and Control Systems and Control

Multi-Agent Reinforcement Learning in Cournot Games

95 - Yuanyuan Shi , Baosen Zhang 2020

In this work, we study the interaction of strategic agents in continuous action Cournot games with limited information feedback. Cournot game is the essential market model for many socio-economic systems where agents learn and compete without the full knowledge of the system or each other. We consider the dynamics of the policy gradient algorithm, which is a widely adopted continuous control reinforcement learning algorithm, in concave Cournot games. We prove the convergence of policy gradient dynamics to the Nash equilibrium when the price function is linear or the number of agents is two. This is the first result (to the best of our knowledge) on the convergence property of learning algorithms with continuous action spaces that do not fall in the no-regret class.

Optimization and Control Computer Science and Game Theory Machine Learning

Discrete potential mean field games

64 - J. Frederic Bonnans , Pierre Lavigne , Laurent Pfeiffer 2021

We propose and investigate a general class of discrete time and finite state space mean field game (MFG) problems with potential structure. Our model incorporates interactions through a congestion term and a price variable. It also allows hard constraints on the distribution of the agents. We analyze the connection between the MFG problem and two optimal control problems in duality. We present two families of numerical methods and detail their implementation: (i) primal-dual proximal methods (and their extension with nonlinear proximity operators), (ii) the alternating direction method of multipliers (ADMM) and a variant called ADM-G. We give some convergence results. Numerical results are provided for two examples with hard constraints.

Optimization and Control

Policy Gradient Methods Find the Nash Equilibrium in N-player General-sum Linear-quadratic Games

169 - Ben Hambly , Renyuan Xu , Huining Yang 2021

We consider a general-sum N-player linear-quadratic game with stochastic dynamics over a finite horizon and prove the global convergence of the natural policy gradient method to the Nash equilibrium. In order to prove the convergence of the method, we require a certain amount of noise in the system. We give a condition, essentially a lower bound on the covariance of the noise in terms of the model parameters, in order to guarantee convergence. We illustrate our results with numerical experiments to show that even in situations where the policy gradient method may not converge in the deterministic setting, the addition of noise leads to convergence.

Optimization and Control Computer Science and Game Theory Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Reinforcement Learning in Non-Stationary Discrete-Time Linear-Quadratic Mean-Field Games

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions