IntelligentPooling: Practical Thompson Sampling for mHealth

329 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Sabina Tomkins

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Sabina Tomkins - Peng Liao - Predrag Klasnja

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of helping the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobile healthcare setting. In this work we are concerned with the following challenges: 1) individuals who are in the same context can exhibit differential response to treatments 2) only a limited amount of data is available for learning on any one individual, and 3) non-stationary responses to treatment. To address these challenges we generalize Thompson-Sampling bandit algorithms to develop IntelligentPooling. IntelligentPooling learns personalized treatment policies thus addressing challenge one. To address the second challenge, IntelligentPooling updates each users degree of personalization while making use of available data on other users to speed up learning. Lastly, IntelligentPooling allows responsivity to vary as a function of a users time since beginning treatment, thus addressing challenge three. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art. We demonstrate the promise of this approach and its ability to learn from even a small group of users in a live clinical trial.

قيم البحث

411 - Shipra Agrawal , Navin Goyal 2012

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical pe rformance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that simultaneously proves both the optimal problem-dependent bound of $(1+epsilon)sum_i frac{ln T}{Delta_i}+O(frac{N}{epsilon^2})$ and the first near-optimal problem-independent bound of $O(sqrt{NTln T})$ on the expected regret of this algorithm. Our near-optimal problem-independent bound solves a COLT 2012 open problem of Chapelle and Li. The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [ALT 2012]. Our novel martingale-based analysis techniques are conceptually simple, easily extend to distributions other than the Beta distribution, and also extend to the more general contextual bandits setting [Manuscript, Agrawal and Goyal, 2012].

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Constrained Thompson Sampling for Wireless Link Optimization

237 - Vidit Saxena , Joseph E. Gonzalez , Ion Stoica 2019

Wireless communication systems operate in complex time-varying environments. Therefore, selecting the optimal configuration parameters in these systems is a challenging problem. For wireless links, emph{rate selection} is used to select the optimal d ata transmission rate that maximizes the link throughput subject to an application-defined latency constraint. We model rate selection as a stochastic multi-armed bandit (MAB) problem, where a finite set of transmission rates are modeled as independent bandit arms. For this setup, we propose Con-TS, a novel constrained version of the Thompson sampling algorithm, where the latency requirement is modeled by a high-probability linear constraint. We show that for Con-TS, the expected number of constraint violations over T transmission intervals is upper bounded by O(sqrt{KT}), where K is the number of available rates. Further, the expected loss in cumulative throughput compared to the optimal rate selection scheme (i.e., the egret is also upper bounded by O(sqrt{KT log K}). Through numerical simulations, we demonstrate that Con-TS significantly outperforms state-of-the-art bandit schemes for rate selection.

التعلم الآلي بنية الشبكات والإنترنت التعلم الالي

Thompson Sampling for Contextual Bandits with Linear Payoffs

496 - Shipra Agrawal , Navin Goyal 2012

التعلم الآلي بنى وهياكل البيانات والخوارزميات التعلم الالي

Thompson Sampling with a Mixture Prior

426 - Joey Hong , Branislav Kveton , Manzil Zaheer 2021

We study Thompson sampling (TS) in online decision-making problems where the uncertain environment is sampled from a mixture distribution. This is relevant to multi-task settings, where a learning agent is faced with different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior -- dubbed MixTS -- and develop a novel, general technique for analyzing the regret of TS with such priors. We apply this technique to derive Bayes regret bounds for MixTS in both linear bandits and tabular Markov decision processes (MDPs). Our regret bounds reflect the structure of the problem and depend on the number of components and confidence width of each component of the prior. Finally, we demonstrate the empirical effectiveness of MixTS in both synthetic and real-world experiments.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Thompson Sampling for Unimodal Bandits

120 - Long Yang , Zhao Li , Zehong Hu 2021

In this paper, we propose a Thompson Sampling algorithm for emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, our algorithm makes decision according to posterior distribution only in the neighborhood of the arm that has the highest empirical mean estimate. We theoretically prove that, for Bernoulli rewards, the regret of our algorithm reaches the lower bound of unimodal bandits, thus it is asymptotically optimal. For Gaussian rewards, the regret of our algorithm is $mathcal{O}(log T)$, which is far better than standard Thompson Sampling algorithms. Extensive experiments demonstrate the effectiveness of the proposed algorithm on both synthetic data sets and the real-world applications.

التعلم الآلي الذكاء الاصطناعي