Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Player Modeling via Multi-Armed Bandits

83 0 0.0 ( 0 )

Download Cite

Added by Santiago Ontanon

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Robert C. Gray - Jichen Zhu - Dannielle Arigo

Artificial Intelligence Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

This paper focuses on building personalized player models solely from player behavior in the context of adaptive games. We present two main contributions: The first is a novel approach to player modeling based on multi-armed bandits (MABs). This approach addresses, at the same time and in a principled way, both the problem of collecting data to model the characteristics of interest for the current player and the problem of adapting the interactive experience based on this model. Second, we present an approach to evaluating and fine-tuning these algorithms prior to generating data in a user study. This is an important problem, because conducting user studies is an expensive and labor-intensive process; therefore, an ability to evaluate the algorithms beforehand can save a significant amount of resources. We evaluate our approach in the context of modeling players social comparison orientation (SCO) and present empirical results from both simulations and real players.

rate research

Online Learning for Cooperative Multi-Player Multi-Armed Bandits

290 - William Chang , Mehdi Jafarnia-Jahromi , Rahul Jain 2021

We introduce a framework for decentralized online learning for multi-armed bandits (MAB) with multiple cooperative players. The reward obtained by the players in each round depends on the actions taken by all the players. Its a team setting, and the objective is common. Information asymmetry is what makes the problem interesting and challenging. We consider three types of information asymmetry: action information asymmetry when the actions of the players cant be observed but the rewards received are common; reward information asymmetry when the actions of the other players are observable but rewards received are IID from the same distribution; and when we have both action and reward information asymmetry. For the first setting, we propose a UCB-inspired algorithm that achieves $O(log T)$ regret whether the rewards are IID or Markovian. For the second section, we offer an environment such that the algorithm given for the first setting gives linear regret. For the third setting, we show that a variation of the `explore then commit algorithm achieves almost log regret.

Machine Learning

Multi-Player Bandits Revisited

102 - Lilian Besson , Emilie Kaufmann (CRIStAL 2017

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, RandTopM and MCTopM, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called Selfish, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.

Machine Learning Machine Learning

Decentralized Heterogeneous Multi-Player Multi-Armed Bandits with Non-Zero Rewards on Collisions

304 - Akshayaa Magesh , Venugopal V. Veeravalli 2019

We consider a fully decentralized multi-player stochastic multi-armed bandit setting where the players cannot communicate with each other and can observe only their own actions and rewards. The environment may appear differently to different players, $textit{i.e.}$, the reward distributions for a given arm are heterogeneous across players. In the case of a collision (when more than one player plays the same arm), we allow for the colliding players to receive non-zero rewards. The time-horizon $T$ for which the arms are played is emph{not} known to the players. Within this setup, where the number of players is allowed to be greater than the number of arms, we present a policy that achieves near order-optimal expected regret of order $O(log^{1 + delta} T)$ for some $0 < delta < 1$ over a time-horizon of duration $T$. This paper is currently under review at IEEE Transactions on Information Theory.

Machine Learning Computer Science and Game Theory Machine Learning

Multi-Armed Bandits in Metric Spaces

507 - Robert Kleinberg , Aleksandrs Slivkins , Eli Upfal 2008

In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications such as online auctions and web advertisement. The goal of such research is to identify broad and natural classes of strategy sets and payoff functions which enable the design of efficient solutions. In this work we study a very general setting for the multi-armed bandit problem in which the strategies form a metric space, and the payoff function satisfies a Lipschitz condition with respect to the metric. We refer to this problem as the Lipschitz MAB problem. We present a complete solution for the multi-armed problem in this setting. That is, for every metric space (L,X) we define an isometry invariant which bounds from below the performance of Lipschitz MAB algorithms for X, and we present an algorithm which comes arbitrarily close to meeting this bound. Furthermore, our technique gives even better results for benign payoff functions.

Data Structures and Algorithms Machine Learning

Combinatorial Multi-armed Bandits for Real-Time Strategy Games

165 - Santiago Onta~non 2017

Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called {em na{i}ve sampling}, based on a variant of the Multi-armed Bandit problem called {em Combinatorial Multi-armed Bandits} (CMAB). We analyze the theoretical properties of several variants of {em na{i}ve sampling}, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, {em na{i}ve sampling} outperforms the other sampling strategies.

Artificial Intelligence

Player Modeling via Multi-Armed Bandits

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions