Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem


Abstract in English

Motivated by the phenomenon that companies introduce new products to keep abreast with customers rapidly changing tastes, we consider a novel online learning setting where a profit-maximizing seller needs to learn customers preferences through offering recommendations, which may contain existing products and new products that are launched in the middle of a selling period. We propose a sequential multinomial logit (SMNL) model to characterize customers behavior when product recommendations are presented in tiers. For the offline version with known customers preferences, we propose a polynomial-time algorithm and characterize the properties of the optimal tiered product recommendation. For the online problem, we propose a learning algorithm and quantify its regret bound. Moreover, we extend the setting to incorporate a constraint which ensures every new product is learned to a given accuracy. Our results demonstrate the tier structure can be used to mitigate the risks associated with learning new products.

Download