Subscribe to the gold package and get unlimited access to Shamra Academy

Soft Actor-Critic Algorithms and Applications

181 0 0.0 ( 0 )

Download Cite

Added by Tuomas Haarnoja

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Tuomas Haarnoja - Aurick Zhou - Kristian Hartikainen

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample complexity and brittleness to hyperparameters. Both of these challenges limit the applicability of such methods to real-world domains. In this paper, we describe Soft Actor-Critic (SAC), our recently introduced off-policy actor-critic algorithm based on the maximum entropy RL framework. In this framework, the actor aims to simultaneously maximize expected return and entropy. That is, to succeed at the task while acting as randomly as possible. We extend SAC to incorporate a number of modifications that accelerate training and improve stability with respect to the hyperparameters, including a constrained formulation that automatically tunes the temperature hyperparameter. We systematically evaluate SAC on a range of benchmark tasks, as well as real-world challenging tasks such as locomotion for a quadrupedal robot and robotic manipulation with a dexterous hand. With these improvements, SAC achieves state-of-the-art performance, outperforming prior on-policy and off-policy methods in sample-efficiency and asymptotic performance. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving similar performance across different random seeds. These results suggest that SAC is a promising candidate for learning in real-world robotics tasks.

rate research

GRAC: Self-Guided and Self-Regularized Actor-Critic

158 - Lin Shao , Yifan You , Mengyuan Yan 2020

Deep reinforcement learning (DRL) algorithms have successfully been demonstrated on a range of challenging decision making and control tasks. One dominant component of recent deep reinforcement learning algorithms is the target network which mitigates the divergence when learning the Q function. However, target networks can slow down the learning process due to delayed function updates. Our main contribution in this work is a self-regularized TD-learning method to address divergence without requiring a target network. Additionally, we propose a self-guided policy improvement method by combining policy-gradient with zero-order optimization to search for actions associated with higher Q-values in a broad neighborhood. This makes learning more robust to local noise in the Q function approximation and guides the updates of our actor network. Taken together, these components define GRAC, a novel self-guided and self-regularized actor critic algorithm. We evaluate GRAC on the suite of OpenAI gym tasks, achieving or outperforming state of the art in every environment tested.

Machine Learning Artificial Intelligence Robotics

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

364 - Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel 2018

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.

Machine Learning Artificial Intelligence Machine Learning

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

98 - Shangtong Zhang , Romain Laroche , Harm van Seijen 2020

We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $gamma^t$ term in the actor update for the transition observed at time $t$ in a trajectory and the critic is a discounted value function. Practitioners, however, usually ignore the discounting ($gamma^t$) for the actor while using a discounted critic. We investigate this mismatch in two scenarios. In the first scenario, we consider optimizing an undiscounted objective $(gamma = 1)$ where $gamma^t$ disappears naturally $(1^t = 1)$. We then propose to interpret the discounting in critic in terms of a bias-variance-representation trade-off and provide supporting empirical results. In the second scenario, we consider optimizing a discounted objective ($gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

Machine Learning Artificial Intelligence

Discriminator Soft Actor Critic without Extrinsic Rewards

128 - Daichi Nishio , Daiki Kuyoshi , Toi Tsuneda 2020

It is difficult to be able to imitate well in unknown states from a small amount of expert data and sampling data. Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. The methods based on reinforcement learning, such as inverse reinforcement learning and generative adversarial imitation learning (GAIL), can learn from only a few expert data. However, they often need to interact with the environment. Soft Q imitation learning addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards. In order to make this algorithm more robust to distribution shift, we propose Discriminator Soft Actor Critic (DSAC). It uses a reward function based on adversarial inverse reinforcement learning instead of constant rewards. We evaluated it on PyBullet environments with only four expert trajectories.

Machine Learning Machine Learning

Weighted Entropy Modification for Soft Actor-Critic

118 - Yizhou Zhao , Song-Chun Zhu 2020

We generalize the existing principle of the maximum Shannon entropy in reinforcement learning (RL) to weighted entropy by characterizing the state-action pairs with some qualitative weights, which can be connected with prior knowledge, experience replay, and evolution process of the policy. We propose an algorithm motivated for self-balancing exploration with the introduced weight function, which leads to state-of-the-art performance on Mujoco tasks despite its simplicity in implementation.

Machine Learning Robotics

Soft Actor-Critic Algorithms and Applications

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions