بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

No-regret distributed learning in subnetwork zero-sum games

102 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shijie Huang

تاريخ النشر 2021

مجال البحث

والبحث باللغة English

تأليف Shijie Huang - Jinlong Lei - Yiguang Hong

التحسين والتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we consider a distributed learning problem in a subnetwork zero-sum game, where agents are competing in different subnetworks. These agents are connected through time-varying graphs where each agent has its own cost function and can receive information from its neighbors. We propose a distributed mirror descent algorithm for computing a Nash equilibrium and establish a sublinear regret bound on the sequence of iterates when the graphs are uniformly strongly connected and the cost functions are convex-concave. Moreover, we prove its convergence with suitably selected diminishing stepsizes for a strictly convex-concave cost function. We also consider a constant step-size variant of the algorithm and establish an asymptotic error bound between the cost function values of running average actions and a Nash equilibrium. In addition, we apply the algorithm to compute a mixed-strategy Nash equilibrium in subnetwork zero-sum finite-strategy games, which have merely convex-concave (to be specific, multilinear) cost functions, and obtain a final-iteration convergence result and an ergodic convergence result, respectively, under different assumptions.

قيم البحث

142 - Ehsan Asadi Kangarshahi , Ya-Ping Hsieh , Mehmet Fatih Sahin 2018

We revisit the problem of solving two-player zero-sum games in the decentralized setting. We propose a simple algorithmic framework that simultaneously achieves the best rates for honest regret as well as adversarial regret, and in addition resolves the open problem of removing the logarithmic terms in convergence to the value of the game. We achieve this goal in three steps. First, we provide a novel analysis of the optimistic mirror descent (OMD), showing that it can be modified to guarantee fast convergence for both honest regret and value of the game, when the players are playing collaboratively. Second, we propose a new algorithm, dubbed as robust optimistic mirror descent (ROMD), which attains optimal adversarial regret without knowing the time horizon beforehand. Finally, we propose a simple signaling scheme, which enables us to bridge OMD and ROMD to achieve the best of both worlds. Numerical examples are presented to support our theoretical claims and show that our non-adaptive ROMD algorithm can be competitive to OMD with adaptive step-size selection.

علوم الكمبيوتر ونظرية الألعاب

No-regret Learning in Cournot Games

208 - Yuanyuan Shi , Baosen Zhang 2019

This paper examines the convergence of no-regret learning in Cournot games with continuous actions. Cournot games are the essential model for many socio-economic systems, where players compete by strategically setting their output quantity. We assume that players do not have full information of the game and thus cannot pre-compute a Nash equilibrium. Two types of feedback are considered: one is bandit feedback and the other is gradient feedback. To study the convergence of the induced sequence of play, we introduce the notion of convergence in measure, and show that the players actual sequence of action converges to the unique Nash equilibrium. In addition, our results naturally extend the no-regret learning algorithms time-average regret bounds to obtain the final-iteration convergence rates. Together, our work presents significantly sharper convergence results for learning in games without strong assumptions on game property (e.g., monotonicity) and shows how exploiting the game information feedback can influence the convergence rates.

علوم الكمبيوتر ونظرية الألعاب أنظمة متعددة العملاء أنظمة وتحكم

Zero-Sum Differential Games on the Wasserstein Space

82 - Jun Moon , Tamer Basar 2019

We consider two-player zero-sum differential games (ZSDGs), where the state process (dynamical system) depends on the random initial condition and the state processs distribution, and the objective functional includes the state processs distribution and the random target variable. Unlike ZSDGs studied in the existing literature, the ZSDG of this paper introduces a new technical challenge, since the corresponding (lower and upper) value functions are defined on $mathcal{P}_2$ (the set of probability measures with finite second moments) or $mathcal{L}_2$ (the set of random variables with finite second moments), both of which are infinite-dimensional spaces. We show that the (lower and upper) value functions on $mathcal{P}_2$ and $mathcal{L}_2$ are equivalent (law invariant) and continuous, satisfying dynamic programming principles. We use the notion of derivative of a function of probability measures in $mathcal{P}_2$ and its lifted version in $mathcal{L}_2$ to show that the (lower and upper) value functions are unique viscosity solutions to the associated (lower and upper) Hamilton-Jacobi-Isaacs equations that are (infinite-dimensional) first-order PDEs on $mathcal{P}_2$ and $mathcal{L}_2$, where the uniqueness is obtained via the comparison principle. Under the Isaacs condition, we show that the ZSDG has a value.

التحسين والتحكم

No-Regret Learning in Extensive-Form Games with Imperfect Recall

852 - Marc Lanctot , Richard Gibson , Neil Burch 2012

Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFRs regret bounds depend on the requirement of perfect recall: players always remember information that was revea led to them and the order in which it was revealed. In games without perfect recall, however, CFRs guarantees do not apply. In this paper, we present the first regret bound for CFR when applied to a general class of games with imperfect recall. In addition, we show that CFR applied to any abstraction belonging to our general class results in a regret bound not just for the abstract game, but for the full game as well. We verify our theory and show how imperfect recall can be used to trade a small increase in regret for a significant reduction in memory in three domains: die-roll poker, phantom tic-tac-toe, and Bluff.

علوم الكمبيوتر ونظرية الألعاب الذكاء الاصطناعي

Two-Person Zero-Sum Stochastic Linear-Quadratic Differential Games

251 - Jingrui Sun 2020

The paper studies the open-loop saddle point and the open-loop lower and upper values, as well as their relationship for two-person zero-sum stochastic linear-quadratic (LQ, for short) differential games with deterministic coefficients. It derives a necessary condition for the finiteness of the open-loop lower and upper values and a sufficient condition for the existence of an open-loop saddle point. It turns out that under the sufficient condition, a strongly regular solution to the associated Riccati equation uniquely exists, in terms of which a closed-loop representation is further established for the open-loop saddle point. Examples are presented to show that the finiteness of the open-loop lower and upper values does not ensure the existence of an open-loop saddle point in general. But for the classical deterministic LQ game, these two issues are equivalent and both imply the solvability of the Riccati equation, for which an explicit representation of the solution is obtained.

التحسين والتحكم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد الوطني الجزائري للبحث الزراعي

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

No-regret distributed learning in subnetwork zero-sum games

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً