ﻻ يوجد ملخص باللغة العربية
Previous work on policy learning for Malaria control has often formulated the problem as an optimization problem assuming the objective function and the search space have a specific structure. The problem has been formulated as multi-armed bandits, contextual bandits and a Markov Decision Process in isolation. Furthermore, an emphasis is put on developing new algorithms specific to an instance of Malaria control, while ignoring a plethora of simpler and general algorithms in the literature. In this work, we formally study the formulation of Malaria control and present a comprehensive analysis of several formulations used in the literature. In addition, we implement and analyze several reinforcement learning algorithms in all formulations and compare them to black box optimization. In contrast to previous work, our results show that simple algorithms based on Upper Confidence Bounds are sufficient for learning good Malaria policies, and tend to outperform their more advanced counterparts on the malaria OpenAI Gym environment.
Sequential decision-making under cost-sensitive tasks is prohibitively daunting, especially for the problem that has a significant impact on peoples daily lives, such as malaria control, treatment recommendation. The main challenge faced by policymak
Reinforcement learning algorithms rely on exploration to discover new behaviors, which is typically achieved by following a stochastic policy. In continuous control tasks, policies with a Gaussian distribution have been widely adopted. Gaussian explo
A major challenge in modern reinforcement learning (RL) is efficient control of dynamical systems from high-dimensional sensory observations. Learning controllable embedding (LCE) is a promising approach that addresses this challenge by embedding the
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating r
Continuous control tasks in reinforcement learning are important because they provide an important framework for learning in high-dimensional state spaces with deceptive rewards, where the agent can easily become trapped into suboptimal solutions. On