ﻻ يوجد ملخص باللغة العربية
Reinforcement Learning (RL) has made remarkable achievements, but it still suffers from inadequate exploration strategies, sparse reward signals, and deceptive reward functions. These problems motivate the need for a more efficient and directed exploration. For solving this, a Population-guided Novelty Search (PNS) parallel learning method is proposed. In PNS, the population is divided into multiple sub-populations, each of which has one chief agent and several exploring agents. The role of the chief agent is to evaluate the policies learned by exploring agents and to share the optimal policy with all sub-populations. The role of exploring agents is to learn their policies in collaboration with the guidance of the optimal policy and, simultaneously, upload their policies to the chief agent. To balance exploration and exploitation, the Novelty Search (NS) is employed in chief agents to encourage policies with high novelty while maximizing per-episode performance. The introduction of sub-populations and NS mechanisms promote directed exploration and enables better policy search. In the numerical experiment section, the proposed scheme is applied to the twin delayed deep deterministic (TD3) policy gradient algorithm, and the effectiveness of PNS to promote exploration and improve performance in both continuous control domains and discrete control domains is demonstrated. Notably, the proposed method achieves rewards that far exceed the SOTA methods in Delayed MoJoCo environments.
In principle, reinforcement learning and policy search methods can enable robots to learn highly complex and general skills that may allow them to function amid the complexity and diversity of the real world. However, training a policy that generaliz
Current deep reinforcement learning (RL) approaches incorporate minimal prior knowledge about the environment, limiting computational and sample efficiency. textit{Objects} provide a succinct and causal description of the world, and many recent works
Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors by leveraging both reward feedback and a set of target task demonstrations. Prior approaches for demonstration-guided RL treat every new task as a
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL) to search for convolutional cells, applied to the Procgen benchmark. We outline the initial difficulties of applying neu
The design of efficient and generic algorithms for solving combinatorial optimization problems has been an active field of research for many years. Standard exact solving approaches are based on a clever and complete enumeration of the solution set.