No Arabic abstract
We present a novel bilateral negotiation model that allows a self-interested agent to learn how to negotiate over multiple issues in the presence of user preference uncertainty. The model relies upon interpretable strategy templates representing the tactics the agent should employ during the negotiation and learns template parameters to maximize the average utility received over multiple negotiations, thus resulting in optimal bid acceptance and generation. Our model also uses deep reinforcement learning to evaluate threshold utility values, for those tactics that require them, thereby deriving optimal utilities for every environment state. To handle user preference uncertainty, the model relies on a stochastic search to find user model that best agrees with a given partial preference profile. Multi-objective optimization and multi-criteria decision-making methods are applied at negotiation time to generate Pareto-optimal outcomes thereby increasing the number of successful (win-win) negotiations. Rigorous experimental evaluations show that the agent employing our model outperforms the winning agents of the 10th Automated Negotiating Agents Competition (ANAC19) in terms of individual as well as social-welfare utilities.
Bidding and acceptance strategies have a substantial impact on the outcome of negotiations in scenarios with linear additive and nonlinear utility functions. Over the years, it has become clear that there is no single best strategy for all negotiation settings, yet many fixed strategies are still being developed. We envision a shift in the strategy design question from: What is a good strategy?, towards: What could be a good strategy? For this purpose, we developed a method leveraging automated algorithm configuration to find the best strategies for a specific set of negotiation settings. By empowering automated negotiating agents using automated algorithm configuration, we obtain a flexible negotiation agent that can be configured automatically for a rich space of opponents and negotiation scenarios. To critically assess our approach, the agent was tested in an ANAC-like bilateral automated negotiation tournament setting against past competitors. We show that our automatically configured agent outperforms all other agents, with a 5.1% increase in negotiation payoff compared to the next-best agent. We note that without our agent in the tournament, the top-ranked agent wins by a margin of only 0.01%.
We present a novel negotiation model that allows an agent to learn how to negotiate during concurrent bilateral negotiations in unknown and dynamic e-markets. The agent uses an actor-critic architecture with model-free reinforcement learning to learn a strategy expressed as a deep neural network. We pre-train the strategy by supervision from synthetic market data, thereby decreasing the exploration time required for learning during negotiation. As a result, we can build automated agents for concurrent negotiations that can adapt to different e-market settings without the need to be pre-programmed. Our experimental evaluation shows that our deep reinforcement learning-based agents outperform two existing well-known negotiation strategies in one-to-many concurrent bilateral negotiations for a range of e-market settings.
Many real-world scenarios involve teams of agents that have to coordinate their actions to reach a shared goal. We focus on the setting in which a team of agents faces an opponent in a zero-sum, imperfect-information game. Team members can coordinate their strategies before the beginning of the game, but are unable to communicate during the playing phase of the game. This is the case, for example, in Bridge, collusion in poker, and collusion in bidding. In this setting, model-free RL methods are oftentimes unable to capture coordination because agents policies are executed in a decentralized fashion. Our first contribution is a game-theoretic centralized training regimen to effectively perform trajectory sampling so as to foster team coordination. When team members can observe each other actions, we show that this approach provably yields equilibrium strategies. Then, we introduce a signaling-based framework to represent team coordinated strategies given a buffer of past experiences. Each team members policy is parametrized as a neural network whose output is conditioned on a suitable exogenous signal, drawn from a learned probability distribution. By combining these two elements, we empirically show convergence to coordinated equilibria in cases where previous state-of-the-art multi-agent RL algorithms did not.
Models of consensus are used to manage multiple agent systems in order to choose between different recommendations provided by the system. It is assumed that there is a central agent that solicits recommendations or plans from other agents. That agent the n determines the consensus of the other agents, and chooses the resultant consensus recommendation or plan. Voting schemes such as this have been used in a variety of domains, including air traffic control. This paper uses an analytic model to study the use of consensus in multiple agent systems. The binomial model is used to study the probability that the consensus judgment is correct or incorrect. That basic model is extended to account for both different levels of agent competence and unequal prior odds. The analysis of that model is critical in the investigation of multiple agent systems, since the model leads us to conclude that in some cases consensus judgment is not appropriate. In addition, the results allow us to determine how many agents should be used to develop consensus decisions, which agents should be used to develop consensus decisions and under which conditions the consensus model should be used.
We study teams of agents that play against Nature towards achieving a common objective. The agents are assumed to have imperfect information due to partial observability, and have no communication during the play of the game. We propose a natural notion of higher-order knowledge of agents. Based on this notion, we define a class of knowledge-based strategies, and consider the problem of synthesis of strategies of this class. We introduce a multi-agent extension, MKBSC, of the well-known Knowledge-Based Subset Construction applied to such games. Its iterative applications turn out to compute higher-order knowledge of the agents. We show how the MKBSC can be used for the design of knowledge-based strategy profiles and investigate the transfer of existence of such strategies between the original game and in the iterated applications of the MKBSC, under some natural assumptions. We also relate and compare the intensional view on knowledge-based strategies based on explicit knowledge representation and update, with the extensional view on finite memory strategies based on finite transducers and show that, in a certain sense, these are equivalent.