No Arabic abstract
The Probabilistic Serial mechanism is well-known for its desirable fairness and efficiency properties. It is one of the most prominent protocols for the random assignment problem. However, Probabilistic Serial is not incentive-compatible, thereby these desirable properties only hold for the agents declared preferences, rather than their genuine preferences. A substantial utility gain through strategic behaviors would trigger self-interested agents to manipulate the mechanism and would subvert the very foundation of adopting the mechanism in practice. In this paper, we characterize the extent to which an individual agent can increase its utility by strategic manipulation. We show that the incentive ratio of the mechanism is $frac{3}{2}$. That is, no agent can misreport its preferences such that its utility becomes more than 1.5 times of what it is when reports truthfully. This ratio is a worst-case guarantee by allowing an agent to have complete information about other agents reports and to figure out the best response strategy even if it is computationally intractable in general. To complement this worst-case study, we further evaluate an agents utility gain on average by experiments. The experiments show that an agent incentive in manipulating the rule is very limited. These results shed some light on the robustness of Probabilistic Serial against strategic manipulation, which is one step further than knowing that it is not incentive-compatible.
The probabilistic serial (PS) rule is one of the most prominent randomized rules for the assignment problem. It is well-known for its superior fairness and welfare properties. However, PS is not immune to manipulative behaviour by the agents. We examine computational and non-computational aspects of strategising under the PS rule. Firstly, we study the computational complexity of an agent manipulating the PS rule. We present polynomial-time algorithms for optimal manipulation. Secondly, we show that expected utility best responses can cycle. Thirdly, we examine the existence and computation of Nash equilibrium profiles under the PS rule. We show that a pure Nash equilibrium is guaranteed to exist under the PS rule. For two agents, we identify two different types of preference profiles that are not only in Nash equilibrium but can also be computed in linear time. Finally, we conduct experiments to check the frequency of manipulability of the PS rule under different combinations of the number of agents, objects, and utility functions.
The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.
We study a variant of Vickreys classic bottleneck model. In our model there are $n$ agents and each agent strategically chooses when to join a first-come-first-served observable queue. Agents dislike standing in line and they take actions in discrete time steps: we assume that each agent has a cost of $1$ for every time step he waits before joining the queue and a cost of $w>1$ for every time step he waits in the queue. At each time step a single agent can be processed. Before each time step, every agent observes the queue and strategically decides whether or not to join, with the goal of minimizing his expected cost. In this paper we focus on symmetric strategies which are arguably more natural as they require less coordination. This brings up the following twist to the usual price of anarchy question: what is the main source for the inefficiency of symmetric equilibria? is it the players strategic behavior or the lack of coordination? We present results for two different parameter regimes that are qualitatively very different: (i) when $w$ is fixed and $n$ grows, we prove a tight bound of $2$ and show that the entire loss is due to the players selfish behavior (ii) when $n$ is fixed and $w$ grows, we prove a tight bound of $Theta left(sqrt{frac{w}{n}}right)$ and show that it is mainly due to lack of coordination: the same order of magnitude of loss is suffered by any symmetric profile.
Data driven segmentation is the powerhouse behind the success of online advertising. Various underlying challenges for successful segmentation have been studied by the academic community, with one notable exception - consumers incentives have been typically ignored. This lacuna is troubling as consumers have much control over the data being collected. Missing or manipulated data could lead to inferior segmentation. The current work proposes a model of prior-free segmentation, inspired by models of facility location, and to the best of our knowledge provides the first segmentation mechanism that addresses incentive compatibility, efficient market segmentation and privacy in the absence of a common prior.
We consider settings in which we wish to incentivize myopic agents (such as Airbnb landlords, who may emphasize short-term profits and property safety) to treat arriving clients fairly, in order to prevent overall discrimination against individuals or groups. We model such settings in both classical and contextual bandit models in which the myopic agents maximize rewards according to current empirical averages, but are also amenable to exogenous payments that may cause them to alter their choices. Our notion of fairness asks that more qualified individuals are never (probabilistically) preferred over less qualified ones [Joseph et al]. We investigate whether it is possible to design inexpensive {subsidy} or payment schemes for a principal to motivate myopic agents to play fairly in all or almost all rounds. When the principal has full information about the state of the myopic agents, we show it is possible to induce fair play on every round with a subsidy scheme of total cost $o(T)$ (for the classic setting with $k$ arms, $tilde{O}(sqrt{k^3T})$, and for the $d$-dimensional linear contextual setting $tilde{O}(dsqrt{k^3 T})$). If the principal has much more limited information (as might often be the case for an external regulator or watchdog), and only observes the number of rounds in which members from each of the $k$ groups were selected, but not the empirical estimates maintained by the myopic agent, the design of such a scheme becomes more complex. We show both positive and negative results in the classic and linear bandit settings by upper and lower bounding the cost of fair subsidy schemes.