ﻻ يوجد ملخص باللغة العربية
In this paper we apply active learning algorithms for dynamic pricing in a prominent e-commerce website. Dynamic pricing involves changing the price of items on a regular basis, and uses the feedback from the pricing decisions to update prices of the items. Most popular approaches to dynamic pricing use a passive learning approach, where the algorithm uses historical data to learn various parameters of the pricing problem, and uses the updated parameters to generate a new set of prices. We show that one can use active learning algorithms such as Thompson sampling to more efficiently learn the underlying parameters in a pricing problem. We apply our algorithms to a real e-commerce system and show that the algorithms indeed improve revenue compared to pricing algorithms that use passive learning.
We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems, its properti
Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean amon
Contextual dynamic pricing aims to set personalized prices based on sequential interactions with customers. At each time period, a customer who is interested in purchasing a product comes to the platform. The customers valuation for the product is a
We study the logistic bandit, in which rewards are binary with success probability $exp(beta a^top theta) / (1 + exp(beta a^top theta))$ and actions $a$ and coefficients $theta$ are within the $d$-dimensional unit ball. While prior regret bounds for
Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework for regret minimization problems over rank-one matrices of arms. The initially proposed algorithms are proved to have logarithmic regret, but do not match the existing lowe