ﻻ يوجد ملخص باللغة العربية
The design of reward functions in reinforcement learning is a human skill that comes with experience. Unfortunately, there is not any methodology in the literature that could guide a human to design the reward function or to allow a human to transfer the skills developed in designing reward functions to another human and in a systematic manner. In this paper, we use Systematic Instructional Design, an approach in human education, to engineer a machine education methodology to design reward functions for reinforcement learning. We demonstrate the methodology in designing a hierarchical genetic reinforcement learner that adopts a neural network representation to evolve a swarm controller for an agent shepherding a boids-based swarm. The results reveal that the methodology is able to guide the design of hierarchical reinforcement learners, with each model in the hierarchy learning incrementally through a multi-part reward function. The hierarchy acts as a decision fusion function that combines the individual behaviours and skills learnt by each instruction to create a smart shepherd to control the swarm.
Learning a policy capable of moving an agent between any two states in the environment is important for many robotics problems involving navigation and manipulation. Due to the sparsity of rewards in such tasks, applying reinforcement learning in the
Deep reinforcement learning (RL) has shown great empirical successes, but suffers from brittleness and sample inefficiency. A potential remedy is to use a previously-trained policy as a source of supervision. In this work, we refer to these policies
Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning. In HRL, the policy model has an inner representation structured in levels. With this structure, the reinforcement learning task is
Autonomous agents optimize the reward function we give them. What they dont know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training scenarios,
Reinforcement learning problems are often described through rewards that indicate if an agent has completed some task. This specification can yield desirable behavior, however many problems are difficult to specify in this manner, as one often needs