ﻻ يوجد ملخص باللغة العربية
Online solvers for partially observable Markov decision processes have difficulty scaling to problems with large action spaces. Monte Carlo tree search with progressive widening attempts to improve scaling by sampling from the action space to construct a policy search tree. The performance of progressive widening search is dependent upon the action sampling policy, often requiring problem-specific samplers. In this work, we present a general method for efficient action sampling based on Bayesian optimization. The proposed method uses a Gaussian process to model a belief over the action-value function and selects the action that will maximize the expected improvement in the optimal action value. We implement the proposed approach in a new online tree search algorithm called Bayesian Optimized Monte Carlo Planning (BOMCP). Several experiments show that BOMCP is better able to scale to large action space POMDPs than existing state-of-the-art tree search solvers.
Todays automated vehicles lack the ability to cooperate implicitly with others. This work presents a Monte Carlo Tree Search (MCTS) based approach for decentralized cooperative planning using macro-actions for automated vehicles in heterogeneous envi
Urban traffic scenarios often require a high degree of cooperation between traffic participants to ensure safety and efficiency. Observing the behavior of others, humans infer whether or not others are cooperating. This work aims to extend the capabi
Monte Carlo planners can often return sub-optimal actions, even if they are guaranteed to converge in the limit of infinite samples. Known asymptotic regret bounds do not provide any way to measure confidence of a recommended action at the conclusion
Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-p
We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions)