ﻻ يوجد ملخص باللغة العربية
In situations where explicit communication is limited, human collaborators act by learning to: (i) infer meaning behind their partners actions, and (ii) convey private information about the state to their partner implicitly through actions. The first component of this learning process has been well-studied in multi-agent systems, whereas the second --- which is equally crucial for successful collaboration --- has not. To mimic both components mentioned above, thereby completing the learning process, we introduce a novel algorithm: Policy Belief Learning (PBL). PBL uses a belief module to model the other agents private information and a policy module to form a distribution over actions informed by the belief module. Furthermore, to encourage communication by actions, we propose a novel auxiliary reward which incentivizes one agent to help its partner to make correct inferences about its private information. The auxiliary reward for communication is integrated into the learning of the policy module. We evaluate our approach on a set of environments including a matrix game, particle environment and the non-competitive bidding problem from contract bridge. We show empirically that this auxiliary reward is effective and easy to generalize. These results demonstrate that our PBL algorithm can produce strong pairs of agents in collaborative games where explicit communication is disabled.
We consider the problem of answering queries about formulas of first-order logic based on background knowledge partially represented explicitly as other formulas, and partially represented as examples independently drawn from a fixed probability dist
Robust learning in expressive languages with real-world data continues to be a challenging task. Numerous conventional methods appeal to heuristics without any assurances of robustness. While probably approximately correct (PAC) Semantics offers stro
We present a novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner by purely interacting with an environment in reinforcement learning setting. The network builds an internal plan, which is conti
We consider stochastic convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory. We propose a robust training method for the constrained setting and derive non asymptotic convergence guarantees
The ability of modeling the other agents, such as understanding their intentions and skills, is essential to an agents interactions with other agents. Conventional agent modeling relies on passive observation from demonstrations. In this work, we pro