No Arabic abstract
We consider a context-based dynamic pricing problem of online products which have low sales. Sales data from Alibaba, a major global online retailer, illustrate the prevalence of low-sale products. For these products, existing single-product dynamic pricing algorithms do not work well due to insufficient data samples. To address this challenge, we propose pricing policies that concurrently perform clustering over products and set individual pricing decisions on the fly. By clustering data and identifying products that have similar demand patterns, we utilize sales data from products within the same cluster to improve demand estimation and allow for better pricing decisions. We evaluate the algorithms using the regret, and the result shows that when product demand functions come from multiple clusters, our algorithms significantly outperform traditional single-product pricing policies. Numerical experiments using a real dataset from Alibaba demonstrate that the proposed policies, compared with several benchmark policies, increase the revenue. The results show that online clustering is an effective approach to tackling dynamic pricing problems associated with low-sale products. Our algorithms were further implemented in a field study at Alibaba with 40 products for 30 consecutive days, and compared to the products which use business-as-usual pricing policy of Alibaba. The results from the field experiment show that the overall revenue increased by 10.14%.
In this paper we apply active learning algorithms for dynamic pricing in a prominent e-commerce website. Dynamic pricing involves changing the price of items on a regular basis, and uses the feedback from the pricing decisions to update prices of the items. Most popular approaches to dynamic pricing use a passive learning approach, where the algorithm uses historical data to learn various parameters of the pricing problem, and uses the updated parameters to generate a new set of prices. We show that one can use active learning algorithms such as Thompson sampling to more efficiently learn the underlying parameters in a pricing problem. We apply our algorithms to a real e-commerce system and show that the algorithms indeed improve revenue compared to pricing algorithms that use passive learning.
Contextual dynamic pricing aims to set personalized prices based on sequential interactions with customers. At each time period, a customer who is interested in purchasing a product comes to the platform. The customers valuation for the product is a linear function of contexts, including product and customer features, plus some random market noise. The seller does not observe the customers true valuation, but instead needs to learn the valuation by leveraging contextual information and historical binary purchase feedbacks. Existing models typically assume full or partial knowledge of the random noise distribution. In this paper, we consider contextual dynamic pricing with unknown random noise in the valuation model. Our distribution-free pricing policy learns both the contextual function and the market noise simultaneously. A key ingredient of our method is a novel perturbed linear bandit framework, where a modified linear upper confidence bound algorithm is proposed to balance the exploration of market noise and the exploitation of the current knowledge for better pricing. We establish the regret upper bound and a matching lower bound of our policy in the perturbed linear bandit framework and prove a sub-linear regret bound in the considered pricing problem. Finally, we demonstrate the superior performance of our policy on simulations and a real-life auto-loan dataset.
We consider a firm that sells products over $T$ periods without knowing the demand function. The firm sequentially sets prices to earn revenue and to learn the underlying demand function simultaneously. A natural heuristic for this problem, commonly used in practice, is greedy iterative least squares (GILS). At each time period, GILS estimates the demand as a linear function of the price by applying least squares to the set of prior prices and realized demands. Then a price that maximizes the revenue, given the estimated demand function, is used for the next time period. The performance is measured by the regret, which is the expected revenue loss from the optimal (oracle) pricing policy when the demand function is known. Recently, den Boer and Zwart (2014) and Keskin and Zeevi (2014) demonstrated that GILS is sub-optimal. They introduced algorithms which integrate forced price dispersion with GILS and achieve asymptotically optimal performance. In this paper, we consider this dynamic pricing problem in a data-rich environment. In particular, we assume that the firm knows the expected demand under a particular price from historical data, and in each period, before setting the price, the firm has access to extra information (demand covariates) which may be predictive of the demand. We prove that in this setting GILS achieves asymptotically optimal regret of order $log(T)$. We also show the following surprising result: in the original dynamic pricing problem of den Boer and Zwart (2014) and Keskin and Zeevi (2014), inclusion of any set of covariates in GILS as potential demand covariates (even though they could carry no information) would make GILS asymptotically optimal. We validate our results via extensive numerical simulations on synthetic and real data sets.
We introduce a density-based clustering method called skeleton clustering that can detect clusters in multivariate and even high-dimensional data with irregular shapes. To bypass the curse of dimensionality, we propose surrogate density measures that are less dependent on the dimension but have intuitive geometric interpretations. The clustering framework constructs a concise representation of the given data as an intermediate step and can be thought of as a combination of prototype methods, density-based clustering, and hierarchical clustering. We show by theoretical analysis and empirical studies that the skeleton clustering leads to reliable clusters in multivariate and high-dimensional scenarios.
Single-level density-based approach has long been widely acknowledged to be a conceptually and mathematically convincing clustering method. In this paper, we propose an algorithm called best-scored clustering forest that can obtain the optimal level and determine corresponding clusters. The terminology best-scored means to select one random tree with the best empirical performance out of a certain number of purely random tree candidates. From the theoretical perspective, we first show that consistency of our proposed algorithm can be guaranteed. Moreover, under certain mild restrictions on the underlying density functions and target clusters, even fast convergence rates can be achieved. Last but not least, comparisons with other state-of-the-art clustering methods in the numerical experiments demonstrate accuracy of our algorithm on both synthetic data and several benchmark real data sets.