ﻻ يوجد ملخص باللغة العربية
Mixture models are an expressive hypothesis class that can approximate a rich set of policies. However, using mixture policies in the Maximum Entropy (MaxEnt) framework is not straightforward. The entropy of a mixture model is not equal to the sum of its components, nor does it have a closed-form expression in most cases. Using such policies in MaxEnt algorithms, therefore, requires constructing a tractable approximation of the mixture entropy. In this paper, we derive a simple, low-variance mixture-entropy estimator. We show that it is closely related to the sum of marginal entropies. Equipped with our entropy estimator, we derive an algorithmic variant of Soft Actor-Critic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.
In this paper, we present a new class of Markov decision processes (MDPs), called Tsallis MDPs, with Tsallis entropy maximization, which generalizes existing maximum entropy reinforcement learning (RL). A Tsallis MDP provides a unified framework for
We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before. We apply our method to learning maximum entropy policies, resulting into a new algorithm, cal
Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle conv
Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments. To improve their generalizability, data augmentation approaches (e.g. cutout and random
Deep networks have enabled reinforcement learning to scale to more complex and challenging domains, but these methods typically require large quantities of training data. An alternative is to use sample-efficient episodic control methods: neuro-inspi