ﻻ يوجد ملخص باللغة العربية
Reward function, as an incentive representation that recognizes humans agency and rationalizes humans actions, is particularly appealing for modeling human behavior in human-robot interaction. Inverse Reinforcement Learning is an effective way to retrieve reward functions from demonstrations. However, it has always been challenging when applying it to multi-agent settings since the mutual influence between agents has to be appropriately modeled. To tackle this challenge, previous work either exploits equilibrium solution concepts by assuming humans as perfectly rational optimizers with unbounded intelligence or pre-assigns humans interaction strategies a priori. In this work, we advocate that humans are bounded rational and have different intelligence levels when reasoning about others decision-making process, and such an inherent and latent characteristic should be accounted for in reward learning algorithms. Hence, we exploit such insights from Theory-of-Mind and propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans latent intelligence levels during learning. We validate our approach in both zero-sum and general-sum games with synthetic agents and illustrate a practical application to learning human drivers reward functions from real driving data. We compare our approach with two baseline algorithms. The results show that by reasoning about humans latent intelligence levels, the proposed approach has more flexibility and capability to retrieve reward functions that explain humans driving behaviors better.
Most of the prior work on multi-agent reinforcement learning (MARL) achieves optimal collaboration by directly controlling the agents to maximize a common reward. In this paper, we aim to address this from a different angle. In particular, we conside
We present the Battlesnake Challenge, a framework for multi-agent reinforcement learning with Human-In-the-Loop Learning (HILL). It is developed upon Battlesnake, a multiplayer extension of the traditional Snake game in which 2 or more snakes compete
Due to the growing awareness of driving safety and the development of sophisticated technologies, advanced driving assistance system (ADAS) has been equipped in more and more vehicles with higher accuracy and lower price. The latest progress in this
Many real-world applications involve teams of agents that have to coordinate their actions to reach a common goal against potential adversaries. This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for examp
While imitation learning is becoming common practice in robotics, this approach often suffers from data mismatch and compounding errors. DAgger is an iterative algorithm that addresses these issues by continually aggregating training data from both t