ﻻ يوجد ملخص باللغة العربية
Reinforcement learning (RL) has the potential to significantly improve clinical decision making. However, treatment policies learned via RL from observational data are sensitive to subtle choices in study design. We highlight a simple approach, trajectory inspection, to bring clinicians into an iterative design process for model-based RL studies. We identify where the model recommends unexpectedly aggressive treatments or expects surprisingly positive outcomes from its recommendations. Then, we examine clinical trajectories simulated with the learned model and policy alongside the actual hospital course. Applying this approach to recent work on RL for sepsis management, we uncover a model bias towards discharge, a preference for high vasopressor doses that may be linked to small sample sizes, and clinically implausible expectations of discharge without weaning off vasopressors. We hope that iterations of detecting and addressing the issues unearthed by our method will result in RL policies that inspire more confidence in deployment.
As a notable machine learning paradigm, the research efforts in the context of reinforcement learning have certainly progressed leaps and bounds. When compared with reinforcement learning methods with the given system model, the methodology of the re
Machine learning, especially deep learning, is dramatically changing the methods associated with optical thin-film inverse design. The vast majority of this research has focused on the parameter optimization (layer thickness, and structure size) of o
Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from the observations of its behavior on a task. While this problem has been well investigated, the related problem of {em online} IRL---where the observation
Reinforcement Learning (RL) has made remarkable achievements, but it still suffers from inadequate exploration strategies, sparse reward signals, and deceptive reward functions. These problems motivate the need for a more efficient and directed explo
Meta-reinforcement learning typically requires orders of magnitude more samples than single task reinforcement learning methods. This is because meta-training needs to deal with more diverse distributions and train extra components such as context en