RoxyBot-06: Stochastic Prediction and Optimization in TAC Travel

438 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Amy Greenwald

تاريخ النشر 2014

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Amy Greenwald - Seong Jae Lee - Victor Naroditskiy

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we describe our autonomous bidding agent, RoxyBot, who emerged victorious in the travel division of the 2006 Trading Agent Competition in a photo finish. At a high level, the design of many successful trading agents can be summarized as follows: (i) price prediction: build a model of market prices; and (ii) optimization: solve for an approximately optimal set of bids, given this model. To predict, RoxyBot builds a stochastic model of market prices by simulating simultaneous ascending auctions. To optimize, RoxyBot relies on the sample average approximation method, a stochastic optimization technique.

قيم البحث

74 - Vashist Avadhanula , Riccardo Colini-Baldeschi , Stefano Leonardi 2021

We study the problem of an online advertising system that wants to optimally spend an advertisers given budget for a campaign across multiple platforms, without knowing the value for showing an ad to the users on those platforms. We model this challe nging practical application as a Stochastic Bandits with Knapsacks problem over $T$ rounds of bidding with the set of arms given by the set of distinct bidding $m$-tuples, where $m$ is the number of platforms. We modify the algorithm proposed in Badanidiyuru emph{et al.,} to extend it to the case of multiple platforms to obtain an algorithm for both the discrete and continuous bid-spaces. Namely, for discrete bid spaces we give an algorithm with regret $Oleft(OPT sqrt {frac{mn}{B} }+ sqrt{mn OPT}right)$, where $OPT$ is the performance of the optimal algorithm that knows the distributions. For continuous bid spaces the regret of our algorithm is $tilde{O}left(m^{1/3} cdot minleft{ B^{2/3}, (m T)^{2/3} right} right)$. When restricted to this special-case, this bound improves over Sankararaman and Slivkins in the regime $OPT ll T$, as is the case in the particular application at hand. Second, we show an $ Omegaleft (sqrt {m OPT} right)$ lower bound for the discrete case and an $Omegaleft( m^{1/3} B^{2/3}right)$ lower bound for the continuous setting, almost matching the upper bounds. Finally, we use a real-world data set from a large internet online advertising company with multiple ad platforms and show that our algorithms outperform common benchmarks and satisfy the required properties warranted in the real-world application.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي

Cannibal Animal Games: a new variant of Tic-Tac-Toe

334 - Jean Cardinal , Sebastien Collette , Hiro Ito 2013

This paper presents a new partial two-player game, called the emph{cannibal animal game}, which is a variant of Tic-Tac-Toe. The game is played on the infinite grid, where in each round a player chooses and occupies free cells. The first player Alice can occupy a cell in each turn and wins if she occupies a set of cells, the union of a subset of which is a translated, reflected and/or rotated copy of a previously agreed upon polyomino $P$ (called an emph{animal}). The objective of the second player Bob is to prevent Alice from creating her animal by occupying in each round a translated, reflected and/or rotated copy of $P$. An animal is a emph{cannibal} if Bob has a winning strategy, and a emph{non-cannibal} otherwise. This paper presents some new tools, such as the emph{bounding strategy} and the emph{punching lemma}, to classify animals into cannibals or non-cannibals. We also show that the emph{pairing strategy} works for this problem.

علوم الكمبيوتر ونظرية الألعاب

Stochastic Optimization of Service Provision with Selfish Users

549 - F. Altarelli , A. Braunstein , C. F. Chiasserini 2013

We develop a computationally efficient technique to solve a fairly general distributed service provision problem with selfish users and imperfect information. In particular, in a context in which the service capacity of the existing infrastructure ca n be partially adapted to the user load by activating just some of the service units, we aim at finding the configuration of active service units that achieves the best trade-off between maintenance (e.g. energetic) costs for the provider and user satisfaction. The core of our technique resides in the implementation of a belief-propagation (BP) algorithm to evaluate the cost configurations. Numerical results confirm the effectiveness of our approach.

علوم الكمبيوتر ونظرية الألعاب الأنظمة المضطربة والشبكات العصبية الفيزياء والمجتمع

Fictitious play in zero-sum stochastic games

354 - Muhammed O. Sayin , Francesca Parise , Asuman Ozdaglar 2020

We present fictitious play dynamics for stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves players forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playi ng a greedy best response using estimated continuation payoffs. Players update their beliefs from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q-functions occurs at a slower timescale than update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of player payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

علوم الكمبيوتر ونظرية الألعاب التعلم الآلي النظم الديناميكية

Road Pricing for Spreading Peak Travel: Modeling and Design

308 - Tichakorn Wongpiromsarn , Nan Xiao , Keyou You 2012

A case study of the Singapore road network provides empirical evidence that road pricing can significantly affect commuter trip timing behaviors. In this paper, we propose a model of trip timing decisions that reasonably matches the observed commuter s behaviors. Our model explicitly captures the difference in individuals sensitivity to price, travel time and early or late arrival at destination. New pricing schemes are suggested to better spread peak travel and reduce traffic congestion. Simulation results based on the proposed model are provided in comparison with the real data for the Singapore case study.

علوم الكمبيوتر ونظرية الألعاب