ترغب بنشر مسار تعليمي؟ اضغط هنا

Experience-Driven PCG via Reinforcement Learning: A Super Mario Bros Study

68   0   0.0 ( 0 )
 نشر من قبل Jialin Liu Ph.D
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We introduce a procedural content generation (PCG) framework at the intersections of experience-driven PCG and PCG via reinforcement learning, named ED(PCG)RL, EDRL in short. EDRL is able to teach RL designers to generate endless playable levels in an online manner while respecting particular experiences for the player as designed in the form of reward functions. The framework is tested initially in the Super Mario Bros game. In particular, the RL designers of Super Mario Bros generate and concatenate level segments while considering the diversity among the segments. The correctness of the generation is ensured by a neural net-assisted evolutionary level repairer and the playability of the whole level is determined through AI-based testing. Our agents in this EDRL implementation learn to maximise a quantification of Kosters principle of fun by moderating the degree of diversity across level segments. Moreover, we test their ability to design fun levels that are diverse over time and playable. Our proposed framework is capable of generating endless, playable Super Mario Bros levels with varying degrees of fun, deviation from earlier segments, and playability. EDRL can be generalised to any game that is built as a segment-based sequential process and features a built-in compressed representation of its game content.



قيم البحث

اقرأ أيضاً

Relational representations in reinforcement learning allow for the use of structural information like the presence of objects and relationships between them in the description of value functions. Through this paper, we show that such representations allow for the inclusion of background knowledge that qualitatively describes a state and can be used to design agents that demonstrate learning behavior in domains with large state and actions spaces such as computer games.
Although different learning systems are coordinated to afford complex behavior, little is known about how this occurs. This article describes a theoretical framework that specifies how complex behaviors that might be thought to require error-driven l earning might instead be acquired through simple reinforcement. This framework includes specific assumptions about the mechanisms that contribute to the evolution of (artificial) neural networks to generate topologies that allow the networks to learn large-scale complex problems using only information about the quality of their performance. The practical and theoretical implications of the framework are discussed, as are possible biological analogs of the approach.
There has been a recent explosion in the capabilities of game-playing artificial intelligence. Many classes of RL tasks, from Atari games to motor control to board games, are now solvable by fairly generic algorithms, based on deep learning, that lea rn to play from experience with minimal knowledge of the specific domain of interest. In this work, we will investigate the performance of these methods on Super Smash Bros. Melee (SSBM), a popular console fighting game. The SSBM environment has complex dynamics and partial observability, making it challenging for human and machine alike. The multi-player aspect poses an additional challenge, as the vast majority of recent advances in RL have focused on single-agent environments. Nonetheless, we will show that it is possible to train agents that are competitive against and even surpass human professionals, a new result for the multi-player video game setting.
This paper targets the efficient construction of a safety shield for decision making in scenarios that incorporate uncertainty. Markov decision processes (MDPs) are prominent models to capture such planning problems. Reinforcement learning (RL) is a machine learning technique to determine near-optimal policies in MDPs that may be unknown prior to exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables decision-making to adhere to safety constraints with high probability. In a separation of concerns, we employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. We use these results to realize a shield that is applied to an RL algorithm which then optimizes the actual performance objective. We discuss tradeoffs between sufficient progress in exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.
Although there are many approaches to implement intrinsically motivated artificial agents, the combined usage of multiple intrinsic drives remains still a relatively unexplored research area. Specifically, we hypothesize that a mechanism capable of q uantifying and controlling the evolution of the information flow between the agent and the environment could be the fundamental component for implementing a higher degree of autonomy into artificial intelligent agents. This paper propose a unified strategy for implementing two semantically orthogonal intrinsic motivations: curiosity and empowerment. Curiosity reward informs the agent about the relevance of a recent agent action, whereas empowerment is implemented as the opposite information flow from the agent to the environment that quantifies the agents potential of controlling its own future. We show that an additional homeostatic drive is derived from the curiosity reward, which generalizes and enhances the information gain of a classical curious/heterostatic reinforcement learning agent. We show how a shared internal model by curiosity and empowerment facilitates a more efficient training of the empowerment function. Finally, we discuss future directions for further leveraging the interplay between these two intrinsic rewards.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا