ﻻ يوجد ملخص باللغة العربية
In Path Integral control problems a representation of an optimally controlled dynamical system can be formally computed and serve as a guidepost to learn a parametrized policy. The Path Integral Cross-Entropy (PICE) method tries to exploit this, but is hampered by poor sample efficiency. We propose a model-free algorithm called ASPIC (Adaptive Smoothing of Path Integral Control) that applies an inf-convolution to the cost function to speedup convergence of policy optimization. We identify PICE as the infinite smoothing limit of such technique and show that the sample efficiency problems that PICE suffers disappear for finite levels of smoothing. For zero smoothing this method becomes a greedy optimization of the cost, which is the standard approach in current reinforcement learning. We show analytically and empirically that intermediate levels of smoothing are optimal, which renders the new method superior to both PICE and direct cost-optimization.
In this paper we investigate the problem of controlling a partially observed stochastic dynamical system such that its state is difficult to infer using a (fixed-interval) Bayesian smoother. This problem arises naturally in applications in which it i
Particle smoothing methods are used for inference of stochastic processes based on noisy observations. Typically, the estimation of the marginal posterior distribution given all observations is cumbersome and computational intensive. In this paper, w
This paper presents a constrained deep adaptive dynamic programming (CDADP) algorithm to solve general nonlinear optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints.
We apply the meta reinforcement learning framework to optimize an integrated and adaptive guidance and flight control system for an air-to-air missile, implementing the system as a deep neural network (the policy). The policy maps observations direct
Motivated by the lack of systematic tools to obtain safe control laws for hybrid systems, we propose an optimization-based framework for learning certifiably safe control laws from data. In particular, we assume a setting in which the system dynamics