ﻻ يوجد ملخص باللغة العربية
Counterfactual regret minimization (CFR) is the most popular algorithm on solving two-player zero-sum extensive games with imperfect information and achieves state-of-the-art performance in practice. However, the performance of CFR is not fully understood, since empirical results on the regret are much better than the upper bound proved in cite{zinkevich2008regret}. Another issue is that CFR has to traverse the whole game tree in each round, which is time-consuming in large scale games. In this paper, we present a novel technique, lazy update, which can avoid traversing the whole game tree in CFR, as well as a novel analysis on the regret of CFR with lazy update. Our analysis can also be applied to the vanilla CFR, resulting in a much tighter regret bound than that in cite{zinkevich2008regret}. Inspired by lazy update, we further present a novel CFR variant, named Lazy-CFR. Compared to traversing $O(|mathcal{I}|)$ information sets in vanilla CFR, Lazy-CFR needs only to traverse $O(sqrt{|mathcal{I}|})$ information sets per round while keeping the regret bound almost the same, where $mathcal{I}$ is the class of all information sets. As a result, Lazy-CFR shows better convergence result compared with vanilla CFR. Experimental results consistently show that Lazy-CFR outperforms the vanilla CFR significantly.
In many real-world scenarios, a team of agents coordinate with each other to compete against an opponent. The challenge of solving this type of game is that the teams joint action space grows exponentially with the number of agents, which results in
Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFRs regret bounds depend on the requirement of perfect recall: players always remember information that was revea
Most bandit algorithm designs are purely theoretical. Therefore, they have strong regret guarantees, but also are often too conservative in practice. In this work, we pioneer the idea of algorithm design by minimizing the empirical Bayes regret, the
We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward
Recently, model-free reinforcement learning has attracted research attention due to its simplicity, memory and computation efficiency, and the flexibility to combine with function approximation. In this paper, we propose Exploration Enhanced Q-learni