ترغب بنشر مسار تعليمي؟ اضغط هنا

Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games

75   0   0.0 ( 0 )
 نشر من قبل Kai Wang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers best response as constraints in the leaders optimization problem. These reformulation approaches can sometimes be effective, but often get trapped in low-quality solutions when followers objectives are non-linear or non-quadratic. Moreover, these approaches assume a unique equilibrium or a specific equilibrium concept, e.g., optimistic or pessimistic, which is a limiting assumption in many situations. To overcome these limitations, we propose a stochastic gradient descent--based approach, where the leaders strategy is updated by differentiating through the followers best responses. We frame the leaders optimization as a learning problem against followers equilibrium, which allows us to decouple the followers equilibrium constraints from the leaders problem. This approach also addresses cases with multiple equilibria and arbitrary equilibrium selection procedures by back-propagating through a sampled Nash equilibrium. To this end, this paper introduces a novel concept called equilibrium flow to formally characterize the set of equilibrium selection processes where the gradient with respect to a sampled equilibrium is an unbiased estimate of the true gradient. We evaluate our approach experimentally against existing baselines in three Stackelberg problems with multiple followers and find that in each case, our approach is able to achieve higher utility for the leader.



قيم البحث

اقرأ أيضاً

Stackelberg security games are a critical tool for maximizing the utility of limited defense resources to protect important targets from an intelligent adversary. Motivated by green security, where the defender may only observe an adversarys response to defense on a limited set of targets, we study the problem of learning a defense that generalizes well to a new set of targets with novel feature values and combinations. Traditionally, this problem has been addressed via a two-stage approach where an adversary model is trained to maximize predictive accuracy without considering the defenders optimization problem. We develop an end-to-end game-focused approach, where the adversary model is trained to maximize a surrogate for the defenders expected utility. We show both in theory and experimental results that our game-focused approach achieves higher defender expected utility than the two-stage alternative when there is limited data.
We propose a novel deep learning method for local self-supervised representation learning that does not require labels nor end-to-end backpropagation but exploits the natural order in data instead. Inspired by the observation that biological neural n etworks appear to learn without backpropagating a global error signal, we split a deep neural network into a stack of gradient-isolated modules. Each module is trained to maximally preserve the information of its inputs using the InfoNCE bound from Oord et al. [2018]. Despite this greedy training, we demonstrate that each module improves upon the output of its predecessor, and that the representations created by the top module yield highly competitive results on downstream classification tasks in the audio and visual domain. The proposal enables optimizing modules asynchronously, allowing large-scale distributed training of very deep neural networks on unlabelled datasets.
400 - Bo Li , Anmol Gulati , Jiahui Yu 2020
End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compared to a conventional ASR model. To address this issue, we look at encouraging the E2E model to emit words early, through an algorithm called FastEmit [3]. Naturally, improving on latency results in a quality degradation. To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR. Secondly, we also explore running a 2nd-pass beam search to improve quality. In order to ensure the 2nd-pass completes quickly, we explore non-causal Conformer layers that feed into the same 1st-pass RNN-T decoder, an algorithm called Cascaded Encoders [5]. Overall, we find that the Conformer RNN-T with Cascaded Encoders offers a better quality and latency tradeoff for streaming ASR.
In e-commerce advertising, it is crucial to jointly consider various performance metrics, e.g., user experience, advertiser utility, and platform revenue. Traditional auction mechanisms, such as GSP and VCG auctions, can be suboptimal due to their fi xed allocation rules to optimize a single performance metric (e.g., revenue or social welfare). Recently, data-driven auctions, learned directly from auction outcomes to optimize multiple performance metrics, have attracted increasing research interests. However, the procedure of auction mechanisms involves various discrete calculation operations, making it challenging to be compatible with continuous optimization pipelines in machine learning. In this paper, we design underline{D}eep underline{N}eural underline{A}uctions (DNAs) to enable end-to-end auction learning by proposing a differentiable model to relax the discrete sorting operation, a key component in auctions. We optimize the performance metrics by developing deep models to efficiently extract contexts from auctions, providing rich features for auction design. We further integrate the game theoretical conditions within the model design, to guarantee the stability of the auctions. DNAs have been successfully deployed in the e-commerce advertising system at Taobao. Experimental evaluation results on both large-scale data set as well as online A/B test demonstrated that DNAs significantly outperformed other mechanisms widely adopted in industry.
The concept of leader--follower (or Stackelberg) equilibrium plays a central role in a number of real--world applications of game theory. While the case with a single follower has been thoroughly investigated, results with multiple followers are only sporadic and the problem of designing and evaluating computationally tractable equilibrium-finding algorithms is still largely open. In this work, we focus on the fundamental case where multiple followers play a Nash equilibrium once the leader has committed to a strategy---as we illustrate, the corresponding equilibrium finding problem can be easily shown to be $mathcal{FNP}$--hard and not in Poly--$mathcal{APX}$ unless $mathcal{P} = mathcal{NP}$ and therefore it is one among the hardest problems to solve and approximate. We propose nonconvex mathematical programming formulations and global optimization methods to find both exact and approximate equilibria, as well as a heuristic black box algorithm. All the methods and formulations that we introduce are thoroughly evaluated computationally.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا