ﻻ يوجد ملخص باللغة العربية
A rich class of mechanism design problems can be understood as incomplete-information games between a principal who commits to a policy and an agent who responds, with payoffs determined by an unknown state of the world. Traditionally, these models require strong and often-impractical assumptions about beliefs (a common prior over the state). In this paper, we dispense with the common prior. Instead, we consider a repeated interaction where both the principal and the agent may learn over time from the state history. We reformulate mechanism design as a reinforcement learning problem and develop mechanisms that attain natural benchmarks without any assumptions on the state-generating process. Our results make use of novel behavioral assumptions for the agent -- centered around counterfactual internal regret -- that capture the spirit of rationality without relying on beliefs.
In the early $20^{th}$ century, Pigou observed that imposing a marginal cost tax on the usage of a public good induces a socially efficient level of use as an equilibrium. Unfortunately, such a Pigouvian tax may also induce other, socially inefficien
We consider the model of the data broker selling information to a single agent to maximize his revenue. The agent has private valuation for the additional information, and upon receiving the signal from the data broker, the agent can conduct her own
We study online learning in repeated first-price auctions with censored feedback, where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, th
Consider a player that in each round $t$ out of $T$ rounds chooses an action and observes the incurred cost after a delay of $d_{t}$ rounds. The cost functions and the delay sequence are chosen by an adversary. We show that even if the players algori
The theory of reinforcement learning has focused on two fundamental problems: achieving low regret, and identifying $epsilon$-optimal policies. While a simple reduction allows one to apply a low-regret algorithm to obtain an $epsilon$-optimal policy