Do you want to publish a course? Click here

Learning to Bid in Contextual First Price Auctions

98   0   0.0 ( 0 )
 Added by Zhe Feng
 Publication date 2021
and research's language is English




Ask ChatGPT about the research

In this paper, we investigate the problem about how to bid in repeated contextual first price auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions: at each time $t$, the learner observes a context $x_tin mathbb{R}^d$ and decides the bid based on historical information and $x_t$. We assume a structured linear model of the maximum bid of all the others $m_t = alpha_0cdot x_t + z_t$, where $alpha_0in mathbb{R}^d$ is unknown to the learner and $z_t$ is randomly sampled from a noise distribution $mathcal{F}$ with log-concave density function $f$. We consider both emph{binary feedback} (the learner can only observe whether she wins or not) and emph{full information feedback} (the learner can observe $m_t$) at the end of each time $t$. For binary feedback, when the noise distribution $mathcal{F}$ is known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most $widetilde{O}(sqrt{log(d) T})$ regret. Moreover, we generalize this algorithm to the setting with binary feedback and the noise distribution is unknown but belongs to a parametrized family of distributions. For the full information feedback with emph{unknown} noise distribution, we provide an algorithm that achieves regret at most $widetilde{O}(sqrt{dT})$. Our approach combines an estimator for log-concave density functions and then MLE method to learn the noise distribution $mathcal{F}$ and linear weight $alpha_0$ simultaneously. We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least $Omega(sqrt{T})$, even when the learner receives the full information feedback and $mathcal{F}$ is known.



rate research

Read More

We study online learning in repeated first-price auctions with censored feedback, where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, the bidder faces a challenging dilemma: if she wins the bid--the only way to achieve positive payoffs--then she is not able to observe the highest bid of the other bidders, which we assume is iid drawn from an unknown distribution. This dilemma, despite being reminiscent of the exploration-exploitation trade-off in contextual bandits, cannot directly be addressed by the existing UCB or Thompson sampling algorithms in that literature, mainly because contrary to the standard bandits setting, when a positive reward is obtained here, nothing about the environment can be learned. In this paper, by exploiting the structural properties of first-price auctions, we develop the first learning algorithm that achieves $O(sqrt{T}log^2 T)$ regret bound when the bidders private values are stochastically generated. We do so by providing an algorithm on a general class of problems, which we call monotone group contextual bandits, where the same regret bound is established under stochastically generated contexts. Further, by a novel lower bound argument, we characterize an $Omega(T^{2/3})$ lower bound for the case where the contexts are adversarially generated, thus highlighting the impact of the contexts generation mechanism on the fundamental learning limit. Despite this, we further exploit the structure of first-price auctions and develop a learning algorithm that operates sample-efficiently (and computationally efficiently) in the presence of adversarially generated private values. We establish an $O(sqrt{T}log^3 T)$ regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for this problem.
80 - Tian Zhou , Hao He , Shengjun Pan 2021
Since 2019, most ad exchanges and sell-side platforms (SSPs), in the online advertising industry, shifted from second to first price auctions. Due to the fundamental difference between these auctions, demand-side platforms (DSPs) have had to update their bidding strategies to avoid bidding unnecessarily high and hence overpaying. Bid shading was proposed to adjust the bid price intended for second-price auctions, in order to balance cost and winning probability in a first-price auction setup. In this study, we introduce a novel deep distribution network for optimal bidding in both open (non-censored) and closed (censored) online first-price auctions. Offline and online A/B testing results show that our algorithm outperforms previous state-of-art algorithms in terms of both surplus and effective cost per action (eCPX) metrics. Furthermore, the algorithm is optimized in run-time and has been deployed into VerizonMedia DSP as production algorithm, serving hundreds of billions of bid requests per day. Online A/B test shows that advertisers ROI are improved by +2.4%, +2.4%, and +8.6% for impression based (CPM), click based (CPC), and conversion based (CPA) campaigns respectively.
We present a general framework for proving polynomial sample complexity bounds for the problem of learning from samples the best auction in a class of simple auctions. Our framework captures all of the most prominent examples of simple auctions, including anonymous and non-anonymous item and bundle pricings, with either a single or multiple buyers. The technique we propose is to break the analysis of auctions into two natural pieces. First, one shows that the set of allocation rules have large amounts of structure; second, fixing an allocation on a sample, one shows that the set of auctions agreeing with this allocation on that sample have revenue functions with low dimensionality. Our results effectively imply that whenever its possible to compute a near-optimal simple auction with a known prior, it is also possible to compute such an auction with an unknown prior (given a polynomial number of samples).
Bid leakage is a corrupt scheme in a first-price sealed-bid auction in which the procurer leaks the opponents bids to a favoured participant. The rational behaviour of such participant is to bid close to the deadline in order to receive all bids, which allows him to ensure his win at the best price possible. While such behaviour does leave detectable traces in the data, the absence of bid leakage labels makes supervised classification impossible. Instead, we reduce the problem of the bid leakage detection to a positive-unlabeled classification. The key idea is to regard the losing participants as fair and the winners as possibly corrupted. This allows us to estimate the prior probability of bid leakage in the sample, as well as the posterior probability of bid leakage for each specific auction. We extract and analyze the data on 600,000 Russian procurement auctions between 2014 and 2018. We find that around 9% of the auctions are exposed to bid leakage, which results in an overall 1.5% price increase. The predicted probability of bid leakage is higher for auctions with a higher reserve price, with too low or too high number of participants, and if the winner has met the auctioneer in earlier auctions.
We study the problem of learning a linear model to set the reserve price in an auction, given contextual information, in order to maximize expected revenue from the seller side. First, we show that it is not possible to solve this problem in polynomial time unless the emph{Exponential Time Hypothesis} fails. Second, we present a strong mixed-integer programming (MIP) formulation for this problem, which is capable of exactly modeling the nonconvex and discontinuous expected reward function. Moreover, we show that this MIP formulation is ideal (i.e. the strongest possible formulation) for the revenue function of a single impression. Since it can be computationally expensive to exactly solve the MIP formulation in practice, we also study the performance of its linear programming (LP) relaxation. Though it may work well in practice, we show that, unfortunately, in the worst case the optimal objective of the LP relaxation can be O(number of samples) times larger than the optimal objective of the true problem. Finally, we present computational results, showcasing that the MIP formulation, along with its LP relaxation, are able to achieve superior in- and out-of-sample performance, as compared to state-of-the-art algorithms on both real and synthetic datasets. More broadly, we believe this work offers an indication of the strength of optimization methodologies like MIP to exactly model intrinsic discontinuities in machine learning problems.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا