PreferenceNet: Encoding Human Preferences in Auction Design with Deep Learning

185 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Neehar Peri

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Neehar Peri - Michael J. Curry - Samuel Dooley

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The design of optimal auctions is a problem of interest in economics, game theory and computer science. Despite decades of effort, strategyproof, revenue-maximizing auction designs are still not known outside of restricted settings. However, recent methods using deep learning have shown some success in approximating optimal auctions, recovering several known solutions and outperforming strong baselines when optimal auctions are not known. In addition to maximizing revenue, auction mechanisms may also seek to encourage socially desirable constraints such as allocation fairness or diversity. However, these philosophical notions neither have standardization nor do they have widely accepted formal definitions. In this paper, we propose PreferenceNet, an extension of existing neural-network-based auction mechanisms to encode constraints using (potentially human-provided) exemplars of desirable allocations. In addition, we introduce a new metric to evaluate an auction allocations adherence to such socially desirable constraints and demonstrate that our proposed method is competitive with current state-of-the-art neural-network based auction designs. We validate our approach through human subject research and show that we are able to effectively capture real human preferences. Our code is available at https://github.com/neeharperi/PreferenceNet

قيم البحث

197 - Kevin Kuo , Anthony Ostuni , Elizabeth Horishny 2020

The design of revenue-maximizing auctions with strong incentive guarantees is a core concern of economic theory. Computational auctions enable online advertising, sourcing, spectrum allocation, and myriad financial markets. Analytic progress in this space is notoriously difficult; since Myersons 1981 work characterizing single-item optimal auctions, there has been limited progress outside of restricted settings. A recent paper by Dutting et al. circumvents analytic difficulties by applying deep learning techniques to, instead, approximate optimal auctions. In parallel, new research from Ilvento et al. and other groups has developed notions of fairness in the context of auction design. Inspired by these advances, in this paper, we extend techniques for approximating auctions using deep learning to address concerns of fairness while maintaining high revenue and strong incentive guarantees.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

Neural Auction: End-to-End Learning of Auction Mechanisms for E-Commerce Advertising

134 - Xiangyu Liu , Chuan Yu , Zhilin Zhang 2021

In e-commerce advertising, it is crucial to jointly consider various performance metrics, e.g., user experience, advertiser utility, and platform revenue. Traditional auction mechanisms, such as GSP and VCG auctions, can be suboptimal due to their fi xed allocation rules to optimize a single performance metric (e.g., revenue or social welfare). Recently, data-driven auctions, learned directly from auction outcomes to optimize multiple performance metrics, have attracted increasing research interests. However, the procedure of auction mechanisms involves various discrete calculation operations, making it challenging to be compatible with continuous optimization pipelines in machine learning. In this paper, we design underline{D}eep underline{N}eural underline{A}uctions (DNAs) to enable end-to-end auction learning by proposing a differentiable model to relax the discrete sorting operation, a key component in auctions. We optimize the performance metrics by developing deep models to efficiently extract contexts from auctions, providing rich features for auction design. We further integrate the game theoretical conditions within the model design, to guarantee the stability of the auctions. DNAs have been successfully deployed in the e-commerce advertising system at Taobao. Experimental evaluation results on both large-scale data set as well as online A/B test demonstrated that DNAs significantly outperformed other mechanisms widely adopted in industry.

علوم الكمبيوتر ونظرية الألعاب الذكاء الاصطناعي التعلم الآلي

Optimal Auction Design with Quantized Bids

93 - Nianxia Cao , Swastik Brahma , Pramod K. Varshney 2015

This letter considers the design of an auction mechanism to sell the object of a seller when the buyers quantize their private value estimates regarding the object prior to communicating them to the seller. The designed auction mechanism maximizes th e utility of the seller (i.e., the auction is optimal), prevents buyers from communicating falsified quantized bids (i.e., the auction is incentive-compatible), and ensures that buyers will participate in the auction (i.e., the auction is individually-rational). The letter also investigates the design of the optimal quantization thresholds using which buyers quantize their private value estimates. Numerical results provide insights regarding the influence of the quantization thresholds on the auction mechanism.

علوم الكمبيوتر ونظرية الألعاب

Targeting Makes Sample Efficiency in Auction Design

123 - Yihang Hu , Zhiyi Huang , Yiheng Shen 2021

This paper introduces the targeted sampling model in optimal auction design. In this model, the seller may specify a quantile interval and sample from a buyers prior restricted to the interval. This can be interpreted as allowing the seller to, for e xample, examine the top $40$ percents bids from previous buyers with the same characteristics. The targeting power is quantified with a parameter $Delta in [0, 1]$ which lower bounds how small the quantile intervals could be. When $Delta = 1$, it degenerates to Cole and Roughgardens model of i.i.d. samples; when it is the idealized case of $Delta = 0$, it degenerates to the model studied by Chen et al. (2018). For instance, for $n$ buyers with bounded values in $[0, 1]$, $tilde{O}(epsilon^{-1})$ targeted samples suffice while it is known that at least $tilde{Omega}(n epsilon^{-2})$ i.i.d. samples are needed. In other words, targeted sampling with sufficient targeting power allows us to remove the linear dependence in $n$, and to improve the quadratic dependence in $epsilon^{-1}$ to linear. In this work, we introduce new technical ingredients and show that the number of targeted samples sufficient for learning an $epsilon$-optimal auction is substantially smaller than the sample complexity of i.i.d. samples for the full spectrum of $Delta in [0, 1)$. Even with only mild targeting power, i.e., whenever $Delta = o(1)$, our targeted sample complexity upper bounds are strictly smaller than the optimal sample complexity of i.i.d. samples.

علوم الكمبيوتر ونظرية الألعاب

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

295 - Natasha Jaques , Asma Ghandeharioun , Judy Hanwen Shen 2019

Most deep reinforcement learning (RL) systems are not able to learn effectively from off-policy data, especially if they cannot explore online in the environment. These are critical shortcomings for applying RL to real-world problems where collecting data is expensive, and models must be tested offline before being deployed to interact with the environment -- e.g. systems that learn from human interaction. Thus, we develop a novel class of off-policy batch RL algorithms, which are able to effectively learn offline, without exploring, from a fixed batch of human interaction data. We leverage models pre-trained on data as a strong prior, and use KL-control to penalize divergence from this prior during RL training. We also use dropout-based uncertainty estimates to lower bound the target Q-values as a more efficient alternative to Double Q-Learning. The algorithms are tested on the problem of open-domain dialog generation -- a challenging reinforcement learning problem with a 20,000-dimensional action space. Using our Way Off-Policy algorithm, we can extract multiple different reward functions post-hoc from collected human interaction data, and learn effectively from all of these. We test the real-world generalization of these systems by deploying them live to converse with humans in an open-domain setting, and demonstrate that our algorithm achieves significant improvements over prior methods in off-policy batch RL.

التعلم الآلي الذكاء الاصطناعي التعلم الالي