Online Learning for Measuring Incentive Compatibility in Ad Auctions


الملخص بالإنكليزية

In this paper we investigate the problem of measuring end-to-end Incentive Compatibility (IC) regret given black-box access to an auction mechanism. Our goal is to 1) compute an estimate for IC regret in an auction, 2) provide a measure of certainty around the estimate of IC regret, and 3) minimize the time it takes to arrive at an accurate estimate. We consider two main problems, with different informational assumptions: In the emph{advertiser problem} the goal is to measure IC regret for some known valuation $v$, while in the more general emph{demand-side platform (DSP) problem} we wish to determine the worst-case IC regret over all possible valuations. The problems are naturally phrased in an online learning model and we design $Regret-UCB$ algorithms for both problems. We give an online learning algorithm where for the advertiser problem the error of determining IC shrinks as $OBig(frac{|B|}{T}cdotBig(frac{ln T}{n} + sqrt{frac{ln T}{n}}Big)Big)$ (where $B$ is the finite set of bids, $T$ is the number of time steps, and $n$ is number of auctions per time step), and for the DSP problem it shrinks as $OBig(frac{|B|}{T}cdotBig( frac{|B|ln T}{n} + sqrt{frac{|B|ln T}{n}}Big)Big)$. For the DSP problem, we also consider stronger IC regret estimation and extend our $Regret-UCB$ algorithm to achieve better IC regret error. We validate the theoretical results using simulations with Generalized Second Price (GSP) auctions, which are known to not be incentive compatible and thus have strictly positive IC regret.

تحميل البحث