ﻻ يوجد ملخص باللغة العربية
In both the human brain and any general artificial intelligence (AI), a representation of the past is necessary to predict the future. However, perfect storage of all experiences is not feasible. One approach utilized in many applications, including reward prediction in reinforcement learning, is to retain recently active features of experience in a buffer. Despite its prior successes, we show that the fixed length buffer renders Deep Q-learning Networks (DQNs) fragile to changes in the scale over which information can be learned. To enable learning when the relevant temporal scales in the environment are not known *a priori*, recent advances in psychology and neuroscience suggest that the brain maintains a compressed representation of the past. Here we introduce a neurally-plausible, scale-free memory representation we call Scale-Invariant Temporal History (SITH) for use with artificial agents. This representation covers an exponentially large period of time by sacrificing temporal accuracy for events further in the past. We demonstrate the utility of this representation by comparing the performance of agents given SITH, buffer, and exponential decay representations in learning to play video games at different levels of complexity. In these environments, SITH exhibits better learning performance by storing information for longer timescales than a fixed-size buffer, and representing this information more clearly than a set of exponentially decayed features. Finally, we discuss how the application of SITH, along with other human-inspired models of cognition, could improve reinforcement and machine learning algorithms in general.
It is essential for dialogue-based spatial reasoning systems to maintain memory of historical states of the world. In addition to conveying that the dialogue agent is mentally present and engaged with the task, referring to historical states may be c
Natural learners must compute an estimate of future outcomes that follow from a stimulus in continuous time. Widely used reinforcement learning algorithms discretize continuous time and estimate either transition functions from one step to the next (
Much of the controversy about methods for automated decision making has focused on specific calculi for combining beliefs or propagating uncertainty. We broaden the debate by (1) exploring the constellation of secondary tasks surrounding any primary
With a growing interest in data-driven control techniques, Model Predictive Control (MPC) provides an opportunity to exploit the surplus of data reliably, particularly while taking safety and stability into account. In many real-world and industrial
Detecting and responding to novel situations in open-world environments is a key capability of human cognition. Current artificial intelligence (AI) researchers strive to develop systems that can perform in open-world environments. Novelty detection