ﻻ يوجد ملخص باللغة العربية
Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called {em na{i}ve sampling}, based on a variant of the Multi-armed Bandit problem called {em Combinatorial Multi-armed Bandits} (CMAB). We analyze the theoretical properties of several variants of {em na{i}ve sampling}, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, {em na{i}ve sampling} outperforms the other sampling strategies.
This paper focuses on building personalized player models solely from player behavior in the context of adaptive games. We present two main contributions: The first is a novel approach to player modeling based on multi-armed bandits (MABs). This appr
Game tree search algorithms such as minimax have been used with enormous success in turn-based adversarial games such as Chess or Checkers. However, such algorithms cannot be directly applied to real-time strategy (RTS) games because a number of reas
We introduce a new class of reinforcement learning methods referred to as {em episodic multi-armed bandits} (eMAB). In eMAB the learner proceeds in {em episodes}, each composed of several {em steps}, in which it chooses an action and observes a feedb
We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time. We introduce the adaptive resetting bandit (ADR-bandit), which is a class of bandit algorithms that leverages adaptive windowing techniques
We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization-Improved