ﻻ يوجد ملخص باللغة العربية
Feature-based dynamic pricing is an increasingly popular model of setting prices for highly differentiated products with applications in digital marketing, online sales, real estate and so on. The problem was formally studied as an online learning problem (Cohen et al., 2016; Javanmard & Nazerzadeh, 2019) where a seller needs to propose prices on the fly for a sequence of $T$ products based on their features $x$ while having a small regret relative to the best -- omniscient -- pricing strategy she could have come up with in hindsight. We revisit this problem and provide two algorithms (EMLP and ONSP) for stochastic and adversarial feature settings, respectively, and prove the optimal $O(dlog{T})$ regret bounds for both. In comparison, the best existing results are $Oleft(minleft{frac{1}{lambda_{min}^2}log{T}, sqrt{T}right}right)$ and $O(T^{2/3})$ respectively, with $lambda_{min}$ being the smallest eigenvalue of $mathbb{E}[xx^T]$ that could be arbitrarily close to $0$. We also prove an $Omega(sqrt{T})$ information-theoretic lower bound for a slightly more general setting, which demonstrates that knowing-the-demand-curve leads to an exponential improvement in feature-based dynamic pricing.
In this paper, we study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise. Products are sold one at a time, and only a binary response indicating success or failure of
Reinforcement learning (RL) with linear function approximation has received increasing attention recently. However, existing work has focused on obtaining $sqrt{T}$-type regret bound, where $T$ is the number of interactions with the MDP. In this pape
We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown. Recent results in this setting have demonstrated efficient learning algorithms with regret growing with the square root of the
Existing weighting methods for treatment effect estimation are often built upon the idea of propensity scores or covariate balance. They usually impose strong assumptions on treatment assignment or outcome model to obtain unbiased estimation, such as
We study regret minimization in a stochastic multi-armed bandit setting and establish a fundamental trade-off between the regret suffered under an algorithm, and its statistical robustness. Considering broad classes of underlying arms distributions,