Value constrained model-free continuous control

197 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Steven Bohez

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Steven Bohez - Abbas Abdolmaleki - Michael Neunert

علم الروبوتات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The naive application of Reinforcement Learning algorithms to continuous control problems -- such as locomotion and manipulation -- often results in policies which rely on high-amplitude, high-frequency control signals, known colloquially as bang-bang control. Although such solutions may indeed maximize task reward, they can be unsuitable for real world systems. Bang-bang control may lead to increased wear and tear or energy consumption, and tends to excite undesired second-order dynamics. To counteract this issue, multi-objective optimization can be used to simultaneously optimize both the reward and some auxiliary cost that discourages undesired (e.g. high-amplitude) control. In principle, such an approach can yield the sought after, smooth, control policies. It can, however, be hard to find the correct trade-off between cost and return that results in the desired behavior. In this paper we propose a new constraint-based reinforcement learning approach that ensures task success while minimizing one or more auxiliary costs (such as control effort). We employ Lagrangian relaxation to learn both (a) the parameters of a control policy that satisfies the desired constraints and (b) the Lagrangian multipliers for the optimization. Moreover, we demonstrate that we can satisfy constraints either in expectation or in a per-step fashion, and can even learn a single policy that is able to dynamically trade-off between return and cost. We demonstrate the efficacy of our approach using a number of continuous control benchmark tasks, a realistic, energy-optimized quadruped locomotion task, as well as a reaching task on a real robot arm.

قيم البحث

83 - Haitong Ma , Jianyu Chen , Shengbo Eben Li 2021

Model information can be used to predict future trajectories, so it has huge potential to avoid dangerous region when implementing reinforcement learning (RL) on real-world tasks, like autonomous driving. However, existing studies mostly use model-fr ee constrained RL, which causes inevitable constraint violations. This paper proposes a model-based feasibility enhancement technique of constrained RL, which enhances the feasibility of policy using generalized control barrier function (GCBF) defined on the distance to constraint boundary. By using the model information, the policy can be optimized safely without violating actual safety constraints, and the sample efficiency is increased. The major difficulty of infeasibility in solving the constrained policy gradient is handled by an adaptive coefficient mechanism. We evaluate the proposed method in both simulations and real vehicle experiments in a complex autonomous driving collision avoidance task. The proposed method achieves up to four times fewer constraint violations and converges 3.36 times faster than baseline constrained RL approaches.

علم الروبوتات التعلم الآلي

Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification

163 - Daniel J. Mankowitz , Dan A. Calian , Rae Jeong 2020

Many real-world physical control systems are required to satisfy constraints upon deployment. Furthermore, real-world systems are often subject to effects such as non-stationarity, wear-and-tear, uncalibrated sensors and so on. Such effects effective ly perturb the system dynamics and can cause a policy trained successfully in one domain to perform poorly when deployed to a perturbed version of the same domain. This can affect a policys ability to maximize future rewards as well as the extent to which it satisfies constraints. We refer to this as constrained model misspecification. We present an algorithm that mitigates this form of misspecification, and showcase its performance in multiple simulated Mujoco tasks from the Real World Reinforcement Learning (RWRL) suite.

التعلم الآلي الذكاء الاصطناعي التعلم الالي

Robust Value Iteration for Continuous Control Tasks

187 - Michael Lutter , Shie Mannor , Jan Peters 2021

When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding state-distribut ion, often resulting in failure to trasnfer underlying distributional shifts. In this paper, we present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain and incorporates adversarial perturbations of the system dynamics. The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics. Utilizing the continuous-time perspective of reinforcement learning, we derive the optimal perturbations for the states, actions, observations and model parameters in closed-form. Notably, the resulting algorithm does not require discretization of states or actions. Therefore, the optimal adversarial perturbations can be efficiently incorporated in the min-max value function update. We apply the resulting algorithm to the physical Furuta pendulum and cartpole. By changing the masses of the systems we evaluate the quantitative and qualitative performance across different model parameters. We show that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

التعلم الآلي علم الروبوتات أنظمة وتحكم

Representation-Free Model Predictive Control for Dynamic Motions in Quadrupeds

259 - Yanran Ding , Abhishek Pandala , Chuanzheng Li 2020

This paper presents a novel Representation-Free Model Predictive Control (RF-MPC) framework for controlling various dynamic motions of a quadrupedal robot in three dimensional (3D) space. Our formulation directly represents the rotational dynamics us ing the rotation matrix, which liberates us from the issues associated with the use of Euler angles and quaternion as the orientation representations. With a variation-based linearization scheme and a carefully constructed cost function, the MPC control law is transcribed to the standard Quadratic Program (QP) form. The MPC controller can operate at real-time rates of 250 Hz on a quadruped robot. Experimental results including periodic quadrupedal gaits and a controlled backflip validate that our control strategy could stabilize dynamic motions that involve singularity in 3D maneuvers.

علم الروبوتات

Optimal Control for Constrained Coverage Path Planning

101 - Ankit Manerikar , Debasmit Das , Pranay Banerjee 2017

The problem of constrained coverage path planning involves a robot trying to cover maximum area of an environment under some constraints that appear as obstacles in the map. Out of the several coverage path planning methods, we consider augmenting th e linear sweep-based coverage method to achieve minimum energy/ time optimality along with maximum area coverage. In addition, we also study the effects of variation of different parameters on the performance of the modified method.

علم الروبوتات أنظمة وتحكم

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة الإسلامية في لبنان

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Value constrained model-free continuous control

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً