ترغب بنشر مسار تعليمي؟ اضغط هنا

MPC-guided Imitation Learning of Neural Network Policies for the Artificial Pancreas

135   0   0.0 ( 0 )
 نشر من قبل Hongkai Chen
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Even though model predictive control (MPC) is currently the main algorithm for insulin control in the artificial pancreas (AP), it usually requires complex online optimizations, which are infeasible for resource-constrained medical devices. MPC also typically relies on state estimation, an error-prone process. In this paper, we introduce a novel approach to AP control that uses Imitation Learning to synthesize neural-network insulin policies from MPC-computed demonstrations. Such policies are computationally efficient and, by instrumenting MPC at training time with full state information, they can directly map measurements into optimal therapy decisions, thus bypassing state estimation. We apply Bayesian inference via Monte Carlo Dropout to learn policies, which allows us to quantify prediction uncertainty and thereby derive safer therapy decisions. We show that our control policies trained under a specific patient model readily generalize (in terms of model parameters and disturbance distributions) to patient cohorts, consistently outperforming traditional MPC with state estimation.



قيم البحث

اقرأ أيضاً

In this paper we investigate the use of MPC-inspired neural network policies for sequential decision making. We introduce an extension to the DAgger algorithm for training such policies and show how they have improved training performance and general ization capabilities. We take advantage of this extension to show scalable and efficient training of complex planning policy architectures in continuous state and action spaces. We provide an extensive comparison of neural network policies by considering feed forward policies, recurrent policies, and recurrent policies with planning structure inspired by the Path Integral control framework. Our results suggest that MPC-type recurrent policies have better robustness to disturbances and modeling error.
Model predictive control (MPC) is an effective method for controlling robotic systems, particularly autonomous aerial vehicles such as quadcopters. However, application of MPC can be computationally demanding, and typically requires estimating the st ate of the system, which can be challenging in complex, unstructured environments. Reinforcement learning can in principle forego the need for explicit state estimation and acquire a policy that directly maps sensor readings to actions, but is difficult to apply to unstable systems that are liable to fail catastrophically during training before an effective policy has been found. We propose to combine MPC with reinforcement learning in the framework of guided policy search, where MPC is used to generate data at training time, under full state observations provided by an instrumented training environment. This data is used to train a deep neural network policy, which is allowed to access only the raw observations from the vehicles onboard sensors. After training, the neural network policy can successfully control the robot without knowledge of the full state, and at a fraction of the computational cost of MPC. We evaluate our method by learning obstacle avoidance policies for a simulated quadrotor, using simulated onboard sensors and no explicit state estimation at test time.
While deep neural networks (DNNs) and Gaussian Processes (GPs) are both popularly utilized to solve problems in reinforcement learning, both approaches feature undesirable drawbacks for challenging problems. DNNs learn complex nonlinear embeddings, b ut do not naturally quantify uncertainty and are often data-inefficient to train. GPs infer posterior distributions over functions, but popular kernels exhibit limited expressivity on complex and high-dimensional data. Fortunately, recently discovered conjugate and neural tangent kernel functions encode the behavior of overparameterized neural networks in the kernel domain. We demonstrate that these kernels can be efficiently applied to regression and reinforcement learning problems by analyzing a baseline case study. We apply GPs with neural network dual kernels to solve reinforcement learning tasks for the first time. We demonstrate, using the well-understood mountain-car problem, that GPs empowered with dual kernels perform at least as well as those using the conventional radial basis function kernel. We conjecture that by inheriting the probabilistic rigor of GPs and the powerful embedding properties of DNNs, GPs using NN dual kernels will empower future reinforcement learning models on difficult domains.
We exploit recent results in quantifying the robustness of neural networks to input variations to construct and tune a model-based anomaly detector, where the data-driven estimator model is provided by an autoregressive neural network. In tuning, we specifically provide upper bounds on the rate of false alarms expected under normal operation. To accomplish this, we provide a theory extension to allow for the propagation of multiple confidence ellipsoids through a neural network. The ellipsoid that bounds the output of the neural network under the input variation informs the sensitivity - and thus the threshold tuning - of the detector. We demonstrate this approach on a linear and nonlinear dynamical system.
Recently 3D volumetric organ segmentation attracts much research interest in medical image analysis due to its significance in computer aided diagnosis. This paper aims to address the pancreas segmentation task in 3D computed tomography volumes. We p ropose a novel end-to-end network, Globally Guided Progressive Fusion Network, as an effective and efficient solution to volumetric segmentation, which involves both global features and complicated 3D geometric information. A progressive fusion network is devised to extract 3D information from a moderate number of neighboring slices and predict a probability map for the segmentation of each slice. An independent branch for excavating global features from downsampled slices is further integrated into the network. Extensive experimental results demonstrate that our method achieves state-of-the-art performance on two pancreas datasets.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا