ﻻ يوجد ملخص باللغة العربية
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on determinantal point processes (DPP). By incorporating the diversity metric into best-response dynamics, we develop diverse fictitious play and diverse policy-space response oracle for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the gamescape -- convex polytopes spanned by agents mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve at least the same, and in most games, lower exploitability than PSRO solvers by finding effective and diverse strategies.
Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a pool of diverse p
Bid optimization for online advertising from single advertisers perspective has been thoroughly investigated in both academic research and industrial practice. However, existing work typically assume competitors do not change their bids, i.e., the wi
In this work we create agents that can perform well beyond a single, individual task, that exhibit much wider generalisation of behaviour to a massive, rich space of challenges. We define a universe of tasks within an environment domain and demonstra
Interactions among individuals in natural populations often occur in a dynamically changing environment. Understanding the role of environmental variation in population dynamics has long been a central topic in theoretical ecology and population biol
We consider a scenario in which two reinforcement learning agents repeatedly play a matrix game against each other and update their parameters after each round. The agents decision-making is transparent to each other, which allows each agent to predi