ﻻ يوجد ملخص باللغة العربية
In reinforcement learning, we often define goals by specifying rewards within desirable states. One problem with this approach is that we typically need to redefine the rewards each time the goal changes, which often requires some understanding of the solution in the agents environment. When humans are learning to complete tasks, we regularly utilize alternative sources that guide our understanding of the problem. Such task representations allow one to specify goals on their own terms, thus providing specifications that can be appropriately interpreted across various environments. This motivates our own work, in which we represent goals in environments that are different from the agents. We introduce Cross-Domain Perceptual Reward (CDPR) functions, learned rewards that represent the visual similarity between an agents state and a cross-domain goal image. We report results for learning the CDPRs with a deep neural network and using them to solve two tasks with deep reinforcement learning.
Reinforcement learning problems are often described through rewards that indicate if an agent has completed some task. This specification can yield desirable behavior, however many problems are difficult to specify in this manner, as one often needs
Traditional Reinforcement Learning (RL) problems depend on an exhaustive simulation environment that models real-world physics of the problem and trains the RL agent by observing this environment. In this paper, we present a novel approach to creatin
Autonomous agents optimize the reward function we give them. What they dont know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training scenarios,
The design of reward functions in reinforcement learning is a human skill that comes with experience. Unfortunately, there is not any methodology in the literature that could guide a human to design the reward function or to allow a human to transfer
In many real-world tasks, it is not possible to procedurally specify an RL agents reward function. In such cases, a reward function must instead be learned from interacting with and observing humans. However, current techniques for reward learning ma