Constrained Thompson Sampling for Wireless Link Optimization


Abstract in English

Wireless communication systems operate in complex time-varying environments. Therefore, selecting the optimal configuration parameters in these systems is a challenging problem. For wireless links, emph{rate selection} is used to select the optimal data transmission rate that maximizes the link throughput subject to an application-defined latency constraint. We model rate selection as a stochastic multi-armed bandit (MAB) problem, where a finite set of transmission rates are modeled as independent bandit arms. For this setup, we propose Con-TS, a novel constrained version of the Thompson sampling algorithm, where the latency requirement is modeled by a high-probability linear constraint. We show that for Con-TS, the expected number of constraint violations over T transmission intervals is upper bounded by O(sqrt{KT}), where K is the number of available rates. Further, the expected loss in cumulative throughput compared to the optimal rate selection scheme (i.e., the egret is also upper bounded by O(sqrt{KT log K}). Through numerical simulations, we demonstrate that Con-TS significantly outperforms state-of-the-art bandit schemes for rate selection.

Download