ترغب بنشر مسار تعليمي؟ اضغط هنا

Stochastic Optimal Control as Approximate Input Inference

158   0   0.0 ( 0 )
 نشر من قبل Joe Watson
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Optimal control of stochastic nonlinear dynamical systems is a major challenge in the domain of robot learning. Given the intractability of the global control problem, state-of-the-art algorithms focus on approximate sequential optimization techniques, that heavily rely on heuristics for regularization in order to achieve stable convergence. By building upon the duality between inference and control, we develop the view of Optimal Control as Input Estimation, devising a probabilistic stochastic optimal control formulation that iteratively infers the optimal input distributions by minimizing an upper bound of the control cost. Inference is performed through Expectation Maximization and message passing on a probabilistic graphical model of the dynamical system, and time-varying linear Gaussian feedback controllers are extracted from the joint state-action distribution. This perspective incorporates uncertainty quantification, effective initialization through priors, and the principled regularization inherent to the Bayesian treatment. Moreover, it can be shown that for deterministic linearized systems, our framework derives the maximum entropy linear quadratic optimal control law. We provide a complete and detailed derivation of our probabilistic approach and highlight its advantages in comparison to other deterministic and probabilistic solvers.

قيم البحث

اقرأ أيضاً

Optimal control under uncertainty is a prevailing challenge in control, due to the difficulty in producing tractable solutions for the stochastic optimization problem. By framing the control problem as one of input estimation, advanced approximate in ference techniques can be used to handle the statistical approximations in a principled and practical manner. Analyzing the Gaussian setting, we present a solver capable of several stochastic control methods, and was found to be superior to popular baselines on nonlinear simulated tasks. We draw connections that relate this inference formulation to previous approaches for stochastic optimal control, and outline several advantages that this inference view brings due to its statistical nature.
The field of reinforcement learning can be split into model-based and model-free methods. Here, we unify these approaches by casting model-free policy optimisation as amortised variational inference, and model-based planning as iterative variational inference, within a `control as hybrid inference (CHI) framework. We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference. Using a didactic experiment, we demonstrate that the proposed algorithm operates in a model-based manner at the onset of learning, before converging to a model-free algorithm once sufficient data have been collected. We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines. CHI thus provides a principled framework for harnessing the sample efficiency of model-based planning while retaining the asymptotic performance of model-free policy optimisation.
In this paper, we use the optimal control methodology to control a flexible, elastic Cosserat rod. An inspiration comes from stereotypical movement patterns in octopus arms, which are observed in a variety of manipulation tasks, such as reaching or f etching. To help uncover the mechanisms underlying these observed morphologies, we outline an optimal control-based framework. A single octopus arm is modeled as a Hamiltonian control system, where the continuum mechanics of the arm is modeled after the Cosserat rod theory, and internal, distributed muscle forces and couples are considered as controls. First order necessary optimality conditions are derived for an optimal control problem formulated for this infinite dimensional system. Solutions to this problem are obtained numerically by an iterative forward-backward algorithm. The state and adjoint equations are solved in a dynamic simulation environment, setting the stage for studying a broader class of optimal control problems. Trajectories that minimize control effort are demonstrated and qualitatively compared with observed behaviors.
We present a deep learning-based adaptive control framework for nonlinear systems with multiplicatively separable parametrization, called aNCM - for adaptive Neural Contraction Metric. The framework utilizes a deep neural network to approximate a sta bilizing adaptive control law parameterized by an optimal contraction metric. The use of deep networks permits real-time implementation of the control law and broad applicability to a variety of systems, including systems modeled with basis function approximation methods. We show using contraction theory that aNCM ensures exponential boundedness of the distance between the target and controlled trajectories even under the presence of the parametric uncertainty, robustly to the learning errors caused by aNCM approximation as well as external additive disturbances. Its superiority to the existing robust and adaptive control methods is demonstrated in a simple cart-pole balancing task.
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding state-distribut ion, often resulting in failure to trasnfer underlying distributional shifts. In this paper, we present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain and incorporates adversarial perturbations of the system dynamics. The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics. Utilizing the continuous-time perspective of reinforcement learning, we derive the optimal perturbations for the states, actions, observations and model parameters in closed-form. Notably, the resulting algorithm does not require discretization of states or actions. Therefore, the optimal adversarial perturbations can be efficiently incorporated in the min-max value function update. We apply the resulting algorithm to the physical Furuta pendulum and cartpole. By changing the masses of the systems we evaluate the quantitative and qualitative performance across different model parameters. We show that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا