Beating humans in a penny-matching game by leveraging cognitive hierarchy theory and Bayesian learning

85 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Ran Tian

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ran Tian - Nan Li - Ilya Kolmanovsky

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

It is a long-standing goal of artificial intelligence (AI) to be superior to human beings in decision making. Games are suitable for testing AI capabilities of making good decisions in non-numerical tasks. In this paper, we develop a new AI algorithm to play the penny-matching game considered in Shannons mind-reading machine (1953) against human players. In particular, we exploit cognitive hierarchy theory and Bayesian learning techniques to continually evolve a model for predicting human player decisions, and let the AI player make decisions according to the model predictions to pursue the best chance of winning. Experimental results show that our AI algorithm beats 27 out of 30 volunteer human players.

قيم البحث

91 - Marc Lanctot , Vinicius Zambaldi , Audrunas Gruslys 2017

To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agen t treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker.

الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

Interpretable Reinforcement Learning Inspired by Piagets Theory of Cognitive Development

51 - Aref Hakimzadeh , Yanbo Xue , 2021

Endeavors for designing robots with human-level cognitive abilities have led to different categories of learning machines. According to Skinners theory, reinforcement learning (RL) plays a key role in human intuition and cognition. Majority of the st ate-of-the-art methods including deep RL algorithms are strongly influenced by the connectionist viewpoint. Such algorithms can significantly benefit from theories of mind and learning in other disciplines. This paper entertains the idea that theories such as language of thought hypothesis (LOTH), script theory, and Piagets cognitive development theory provide complementary approaches, which will enrich the RL field. Following this line of thinking, a general computational building block is proposed for Piagets schema theory that supports the notions of productivity, systematicity, and inferential coherence as described by Fodor in contrast with the connectionism theory. Abstraction in the proposed method is completely upon the system itself and is not externally constrained by any predefined architecture. The whole process matches the Neissers perceptual cycle model. Performed experiments on three typical control problems followed by behavioral analysis confirm the interpretability of the proposed method and its competitiveness compared to the state-of-the-art algorithms. Hence, the proposed framework can be viewed as a step towards achieving human-like cognition in artificial intelligent systems.

الذكاء الاصطناعي

Matching-based Spectrum Allocation in Cognitive Radio Networks

98 - Raghed El-Bardan , Walid Saad , Swastik Brahma 2015

In this paper, a novel spectrum association approach for cognitive radio networks (CRNs) is proposed. Based on a measure of both inference and confidence as well as on a measure of quality-of-service, the association between secondary users (SUs) in the network and frequency bands licensed to primary users (PUs) is investigated. The problem is formulated as a matching game between SUs and PUs. In this game, SUs employ a soft-decision Bayesian framework to detect PUs signals and, eventually, rank them based on the logarithm of the a posteriori ratio. A performance measure that captures both the ranking metric and rate is further computed by the SUs. Using this performance measure, a PU evaluates its own utility function that it uses to build its own association preferences. A distributed algorithm that allows both SUs and PUs to interact and self-organize into a stable match is proposed. Simulation results show that the proposed algorithm can improve the sum of SUs rates by up to 20 % and 60 % relative to the deferred acceptance algorithm and random channel allocation approach, respectively. The results also show an improved convergence time.

بنية الشبكات والإنترنت علوم الكمبيوتر ونظرية الألعاب نظرية المعلومات

Towards Cooperation in Sequential Prisoners Dilemmas: a Deep Multiagent Reinforcement Learning Approach

93 - Weixun Wang , Jianye Hao , Yixi Wang 2018

The Iterated Prisoners Dilemma has guided research on social dilemmas for decades. However, it distinguishes between only two atomic actions: cooperate and defect. In real-world prisoners dilemmas, these choices are temporally extended and different strategies may correspond to sequences of actions, reflecting grades of cooperation. We introduce a Sequential Prisoners Dilemma (SPD) game to better capture the aforementioned characteristics. In this work, we propose a deep multiagent reinforcement learning approach that investigates the evolution of mutual cooperation in SPD games. Our approach consists of two phases. The first phase is offline: it synthesizes policies with different cooperation degrees and then trains a cooperation degree detection network. The second phase is online: an agent adaptively selects its policy based on the detected degree of opponent cooperation. The effectiveness of our approach is demonstrated in two representative SPD 2D games: the Apple-Pear game and the Fruit Gathering game. Experimental results show that our strategy can avoid being exploited by exploitative opponents and achieve cooperation with cooperative opponents.

الذكاء الاصطناعي علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

Mathematical Game Theory

98 - Ulrich Faigle 2020

These lecture notes attempt a mathematical treatment of game theory akin to mathematical physics. A game instance is defined as a sequence of states of an underlying system. This viewpoint unifies classical mathematical models for 2-person and, in pa rticular, combinatorial and zero-sum games as well as models for investing and betting. n-person games are studied with emphasis on notions of utilities, potentials and equilibria, which allows to subsume cooperative games as special cases. The represenation of a game theoretic system in a Hilbert space furthermore establishes a link to the mathematical model of quantum mechancis and general interaction systems.

الاقتصاد النظري علوم الكمبيوتر ونظرية الألعاب