ﻻ يوجد ملخص باللغة العربية
Monte Carlo planners can often return sub-optimal actions, even if they are guaranteed to converge in the limit of infinite samples. Known asymptotic regret bounds do not provide any way to measure confidence of a recommended action at the conclusion of search. In this work, we prove bounds on the sub-optimality of Monte Carlo estimates for non-stationary bandits and Markov decision processes. These bounds can be directly computed at the conclusion of the search and do not require knowledge of the true action-value. The presented bound holds for general Monte Carlo solvers meeting mild convergence conditions. We empirically test the tightness of the bounds through experiments on a multi-armed bandit and a discrete Markov decision process for both a simple solver and Monte Carlo tree search.
Practitioners wishing to experience the efficiency gains from using low discrepancy sequences need correct, well-written software. This article, based on our MCQMC 2020 tutorial, describes some of the better quasi-Monte Carlo (QMC) software available
Many of the strongest game playing programs use a combination of Monte Carlo tree search (MCTS) and deep neural networks (DNN), where the DNNs are used as policy or value evaluators. Given a limited budget, such as online playing or during the self-p
Generative adversarial networks (GANs) have shown promising results when applied on partial differential equations and financial time series generation. We investigate if GANs can also be used to approximate one-dimensional Ito stochastic differentia
While the Quasi-Monte Carlo method of numerical integration achieves smaller integration error than standard Monte Carlo, its use in particle physics phenomenology has been hindered by the abscence of a reliable way to estimate that error. The standa
We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions)