ترغب بنشر مسار تعليمي؟ اضغط هنا

Decentralized Online Learning for Noncooperative Games in Dynamic Environments

149   0   0.0 ( 0 )
 نشر من قبل Min Meng
 تاريخ النشر 2021
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

Decentralized online learning for seeking generalized Nash equilibrium (GNE) of noncooperative games in dynamic environments is studied in this paper. Each player aims at selfishly minimizing its own time-varying cost function subject to time-varying coupled constraints and local feasible set constraints. Only local cost functions and local constraints are available to individual players, who can receive their neighbors information through a fixed and connected graph. In addition, players have no prior knowledge of cost functions and local constraint functions in the future time. In this setting, a novel distributed online learning algorithm for seeking GNE of the studied game is devised based on mirror descent and a primal-dual strategy. It is shown that the presented algorithm can achieve sublinearly bounded dynamic regrets and constraint violation by appropriately choosing decreasing stepsizes. Finally, the obtained theoretical result is corroborated by a numerical simulation.

قيم البحث

اقرأ أيضاً

This work addresses decentralized online optimization in non-stationary environments. A network of agents aim to track the minimizer of a global time-varying convex function. The minimizer evolves according to a known dynamics corrupted by an unknown , unstructured noise. At each time, the global function can be cast as a sum of a finite number of local functions, each of which is assigned to one agent in the network. Moreover, the local functions become available to agents sequentially, and agents do not have a prior knowledge of the future cost functions. Therefore, agents must communicate with each other to build an online approximation of the global function. We propose a decentralized variation of the celebrated Mirror Descent, developed by Nemirovksi and Yudin. Using the notion of Bregman divergence in lieu of Euclidean distance for projection, Mirror Descent has been shown to be a powerful tool in large-scale optimization. Our algorithm builds on Mirror Descent, while ensuring that agents perform a consensus step to follow the global function and take into account the dynamics of the global minimizer. To measure the performance of the proposed online algorithm, we compare it to its offline counterpart, where the global functions are available a priori. The gap between the two is called dynamic regret. We establish a regret bound that scales inversely in the spectral gap of the network, and more notably it represents the deviation of minimizer sequence with respect to the given dynamics. We then show that our results subsume a number of results in distributed optimization. We demonstrate the application of our method to decentralized tracking of dynamic parameters and verify the results via numerical experiments.
In some games, additional information hurts a player, e.g., in games with first-mover advantage, the second-mover is hurt by seeing the first-movers move. What properties of a game determine whether it has such negative value of information for a par ticular player? Can a game have negative value of information for all players? To answer such questions, we generalize the definition of marginal utility of a good to define the marginal utility of a parameter vector specifying a game. So rather than analyze the global structure of the relationship between a games parameter vector and player behavior, as in previous work, we focus on the local structure of that relationship. This allows us to prove that generically, every game can have negative marginal value of information, unless one imposes a priori constraints on allowed changes to the games parameter vector. We demonstrate these and related results numerically, and discuss their implications.
73 - Ran Xin , Usman A. Khan , 2020
In this paper, we study decentralized online stochastic non-convex optimization over a network of nodes. Integrating a technique called gradient tracking in decentralized stochastic gradient descent, we show that the resulting algorithm, GT-DSGD, enj oys certain desirable characteristics towards minimizing a sum of smooth non-convex functions. In particular, for general smooth non-convex functions, we establish non-asymptotic characterizations of GT-DSGD and derive the conditions under which it achieves network-independent performances that match the centralized minibatch SGD. In contrast, the existing results suggest that GT-DSGD is always network-dependent and is therefore strictly worse than the centralized minibatch SGD. When the global non-convex function additionally satisfies the Polyak-Lojasiewics (PL) condition, we establish the linear convergence of GT-DSGD up to a steady-state error with appropriate constant step-sizes. Moreover, under stochastic approximation step-sizes, we establish, for the first time, the optimal global sublinear convergence rate on almost every sample path, in addition to the asymptotically optimal sublinear rate in expectation. Since strongly convex functions are a special case of the functions satisfying the PL condition, our results are not only immediately applicable but also improve the currently known best convergence rates and their dependence on problem parameters.
We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized contro ller, but only based on their own payoffs and local actions executed. The agents need not observe the opponents actions or payoffs, possibly being even oblivious to the presence of the opponent, nor be aware of the zero-sum structure of the underlying game, a setting also referred to as radically uncoupled in the literature of learning in games. In this paper, we develop for the first time a radically uncoupled Q-learning dynamics that is both rational and convergent: the learning dynamics converges to the best response to the opponents strategy when the opponent follows an asymptotically stationary strategy; the value function estimates converge to the payoffs at a Nash equilibrium when both agents adopt the dynamics. The key challenge in this decentralized setting is the non-stationarity of the learning environment from an agents perspective, since both her own payoffs and the system evolution depend on the actions of other agents, and each agent adapts their policies simultaneously and independently. To address this issue, we develop a two-timescale learning dynamics where each agent updates her local Q-function and value function estimates concurrently, with the latter happening at a slower timescale.
146 - Minyi Huang , Xuwei Yang 2021
This paper studies an asymptotic solvability problem for linear quadratic (LQ) mean field games with controlled diffusions and indefinite weights for the state and control in the costs. We employ a rescaling approach to derive a low dimensional Ricca ti ordinary differential equation (ODE) system, which characterizes a necessary and sufficient condition for asymptotic solvability. The rescaling technique is further used for performance estimates, establishing an $O(1/N)$-Nash equilibrium for the obtained decentralized strategies.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا