No Arabic abstract
In timeline-based planning, domains are described as sets of independent, but interacting, components, whose behaviour over time (the set of timelines) is governed by a set of temporal constraints. A distinguishing feature of timeline-based planning systems is the ability to integrate planning with execution by synthesising control strategies for flexible plans. However, flexible plans can only represent temporal uncertainty, while more complex forms of nondeterminism are needed to deal with a wider range of realistic problems. In this paper, we propose a novel game-theoretic approach to timeline-based planning problems, generalising the state of the art while uniformly handling temporal uncertainty and nondeterminism. We define a general concept of timeline-based game and we show that the notion of winning strategy for these games is strictly more general than that of control strategy for dynamically controllable flexible plans. Moreover, we show that the problem of establishing the existence of such winning strategies is decidable using a doubly exponential amount of space.
To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker.
A game-theoretic framework is used to study the effect of constellation size on the energy efficiency of wireless networks for M-QAM modulation. A non-cooperative game is proposed in which each user seeks to choose its transmit power (and possibly transmit symbol rate) as well as the constellation size in order to maximize its own utility while satisfying its delay quality-of-service (QoS) constraint. The utility function used here measures the number of reliable bits transmitted per joule of energy consumed, and is particularly suitable for energy-constrained networks. The best-response strategies and Nash equilibrium solution for the proposed game are derived. It is shown that in order to maximize its utility (in bits per joule), a user must choose the lowest constellation size that can accommodate the users delay constraint. Using this framework, the tradeoffs among energy efficiency, delay, throughput and constellation size are also studied and quantified. The effect of trellis-coded modulation on energy efficiency is also discussed.
A game-theoretic framework is used to study the effect of constellation size on the energy efficiency of wireless networks for M-QAM modulation. A non-cooperative game is proposed in which each user seeks to choose its transmit power (and possibly transmit symbol rate) as well as the constellation size in order to maximize its own utility while satisfying its delay quality-of-service (QoS) constraint. The utility function used here measures the number of reliable bits transmitted per joule of energy consumed, and is particularly suitable for energy-constrained networks. The best-response strategies and Nash equilibrium solution for the proposed game are derived. It is shown that in order to maximize its utility (in bits per joule), a user must choose the lowest constellation size that can accommodate the users delay constraint. This strategy is different from one that would maximize spectral efficiency. Using this framework, the tradeoffs among energy efficiency, delay, throughput and constellation size are also studied and quantified. In addition, the effect of trellis-coded modulation on energy efficiency is discussed.
We present a new method for multi-agent planning involving human drivers and autonomous vehicles (AVs) in unsignaled intersections, roundabouts, and during merging. In multi-agent planning, the main challenge is to predict the actions of other agents, especially human drivers, as their intentions are hidden from other agents. Our algorithm uses game theory to develop a new auction, called model, that directly determines the optimal action for each agent based on their driving style (which is observable via commonly available sensors like lidars and cameras). GamePlan assigns a higher priority to more aggressive or impatient drivers and a lower priority to more conservative or patient drivers; we theoretically prove that such an approach, although counter-intuitive, is game-theoretically optimal. Our approach successfully prevents collisions and deadlocks. We compare our approach with prior state-of-the-art auction techniques including economic auctions, time-based auctions (first-in first-out), and random bidding and show that each of these methods result in collisions among agents when taking into account driver behavior. We additionally compare with methods based on deep reinforcement learning, deep learning, and game theory and present our benefits over these approaches. Finally, we show that our approach can be implemented in the real-world with human drivers.
This paper considers a game-theoretic formulation of the covert communications problem with finite blocklength, where the transmitter (Alice) can randomly vary her transmit power in different blocks, while the warden (Willie) can randomly vary his detection threshold in different blocks. In this two player game, the payoff for Alice is a combination of the coding rate to the receiver (Bob) and the detection error probability at Willie, while the payoff for Willie is the negative of his detection error probability. Nash equilibrium solutions to the game are obtained, and shown to be efficiently computable using linear programming. For less covert requirements, our game-theoretic approach can achieve significantly higher coding rates than uniformly distributed transmit powers. We then consider the situation with an additional jammer, where Alice and the jammer can both vary their powers. We pose a two player game where Alice and the jammer jointly comprise one player, with Willie the other player. The use of a jammer is shown in numerical simulations to lead to further significant performance improvements.