ﻻ يوجد ملخص باللغة العربية
Individual decision-makers consume information revealed by the previous decision makers, and produce information that may help in future decisions. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as in other domains such as medical decisions. Each decision-maker would individually prefer to exploit: select an action with the highest expected reward given her current information. At the same time, each decision-maker would prefer previous decision-makers to explore, producing information about the rewards of various actions. A social planner, by means of carefully designed information disclosure, can incentivize the agents to balance the exploration and exploitation so as to maximize social welfare. We formulate this problem as a multi-armed bandit problem (and various generalizations thereof) under incentive-compatibility constraints induced by the agents Bayesian priors. We design an incentive-compatible bandit algorithm for the social planner whose regret is asymptotically optimal among all bandit algorithms (incentive-compatible or not). Further, we provide a black-box reduction from an arbitrary multi-arm bandit algorithm to an incentive-compatible one, with only a constant multiplicative increase in regret. This reduction works for very general bandit setting that incorporate contexts and arbitrary auxiliary feedback.
Selecting the most influential agent in a network has huge practical value in applications. However, in many scenarios, the graph structure can only be known from agents reports on their connections. In a self-interested setting, agents may strategic
We consider a ubiquitous scenario in the Internet economy when individual decision-makers (henceforth, agents) both produce and consume information as they make strategic choices in an uncertain environment. This creates a three-way tradeoff between
Motivated by kidney exchange, we study the following mechanism-design problem: On a directed graph (of transplant compatibilities among patient-donor pairs), the mechanism must select a simple path (a chain of transplantations) starting at a distingu
Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution $mathcal{P}$. In this work, we learn such policies for an unknown distribution $mathcal{P}$ using samples from $mathcal{P}$. Our
In practice, most mechanisms for selling, buying, matching, voting, and so on are not incentive compatible. We present techniques for estimating how far a mechanism is from incentive compatible. Given samples from the agents type distribution, we sho