Subscribe to the gold package and get unlimited access to Shamra Academy

Monte Carlo *-Minimax Search

750 0 0.0 ( 0 )

Download Cite

Added by Marc Lanctot

Publication date 2013

fields Informatics Engineering

and research's language is English

Authors Marc Lanctot - Abdallah Saffidine - Joel Veness

Computer Science and Game Theory

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper introduces Monte Carlo -Minimax Search (MCMS), a Monte Carlo search algorithm for turned-based, stochastic, two-player, zero-sum games of perfect information. The algorithm is designed for the class of of densely stochastic games; that is, games where one would rarely expect to sample the same successor state multiple times at any particular chance node. Our approach combines sparse sampling techniques from MDP planning with classic pruning techniques developed for adversarial expectimax planning. We compare and contrast our algorithm to the traditional -Minimax approaches, as well as MCTS enhanced with the Double Progressive Widening, on four games: Pig, EinStein Wurfelt Nicht!, Cant Stop, and Ra. Our results show that MCMS can be competitive with enhanced MCTS variants in some domains, while consistently outperforming the equivalent classic approaches given the same amount of thinking time.

rate research

Monte Carlo Tree Search with Heuristic Evaluations using Implicit Minimax Backups

543 - Marc Lanctot , Mark H.M. Winands , Tom Pepels 2014

Monte Carlo Tree Search (MCTS) has improved the performance of game engines in domains such as Go, Hex, and general game playing. MCTS has been shown to outperform classic alpha-beta search in games where good heuristic evaluations are difficult to obtain. In recent years, combining ideas from traditional minimax search in MCTS has been shown to be advantageous in some domains, such as Lines of Action, Amazons, and Breakthrough. In this paper, we propose a new way to use heuristic evaluations to guide the MCTS search by storing the two sources of information, estimated win rates and heuristic evaluations, separately. Rather than using the heuristic evaluations to replace the playouts, our technique backs them up implicitly during the MCTS simulations. These minimax values are then used to guide future simulations. We show that using implicit minimax backups leads to stronger play performance in Kalah, Breakthrough, and Lines of Action.

Artificial Intelligence

Measurable Monte Carlo Search Error Bounds

186 - John Mern , Mykel J. Kochenderfer 2021

Monte Carlo planners can often return sub-optimal actions, even if they are guaranteed to converge in the limit of infinite samples. Known asymptotic regret bounds do not provide any way to measure confidence of a recommended action at the conclusion of search. In this work, we prove bounds on the sub-optimality of Monte Carlo estimates for non-stationary bandits and Markov decision processes. These bounds can be directly computed at the conclusion of the search and do not require knowledge of the true action-value. The presented bound holds for general Monte Carlo solvers meeting mild convergence conditions. We empirically test the tightness of the bounds through experiments on a multi-armed bandit and a discrete Markov decision process for both a simple solver and Monte Carlo tree search.

Artificial Intelligence Numerical Analysis Numerical Analysis

Multiple Policy Value Monte Carlo Tree Search

168 - Li-Cheng Lan , Wei Li , Ting-Han Wei 2019

Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-play phase of AlphaZero (AZ) training, a balance needs to be reached between accurate state estimation and more MCTS simulations, both of which are critical for a strong game playing agent. Typically, larger DNNs are better at generalization and accurate evaluation, while smaller DNNs are less costly, and therefore can lead to more MCTS simulations and bigger search trees with the same budget. This paper introduces a new method called the multiple policy value MCTS (MPV-MCTS), which combines multiple policy value neural networks (PV-NNs) of various sizes to retain advantages of each network, where two PV-NNs f_S and f_L are used in this paper. We show through experiments on the game NoGo that a combined f_S and f_L MPV-MCTS outperforms single PV-NN with policy value MCTS, called PV-MCTS. Additionally, MPV-MCTS also outperforms PV-MCTS for AZ training.

Artificial Intelligence

Monte Carlo Methods for Calculating Shapley-Shubik Power Index in Weighted Majority Games

328 - Yuto Ushioda , Masato Tanaka , Tomomi Matsui 2021

This paper addresses Monte Carlo algorithms for calculating the Shapley-Shubik power index in weighted majority games. First, we analyze a naive Monte Carlo algorithm and discuss the required number of samples. We then propose an efficient Monte Carlo algorithm and show that our algorithm reduces the required number of samples as compared to the naive algorithm.

Computer Science and Game Theory

Generalized Mean Estimation in Monte-Carlo Tree Search

84 - Tuan Dam , Pascal Klink , Carlo DEramo 2019

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.

Artificial Intelligence

comments

Fetching comments

The Islamic University of Lebanon

Additional details More universities

Monte Carlo *-Minimax Search

Ask ChatGPT about the research

No Arabic abstract

Read More