ترغب بنشر مسار تعليمي؟ اضغط هنا

Scale-invariant temporal history (SITH): optimal slicing of the past in an uncertain world

66   0   0.0 ( 0 )
 نشر من قبل Tyler Spears
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

In both the human brain and any general artificial intelligence (AI), a representation of the past is necessary to predict the future. However, perfect storage of all experiences is not feasible. One approach utilized in many applications, including reward prediction in reinforcement learning, is to retain recently active features of experience in a buffer. Despite its prior successes, we show that the fixed length buffer renders Deep Q-learning Networks (DQNs) fragile to changes in the scale over which information can be learned. To enable learning when the relevant temporal scales in the environment are not known *a priori*, recent advances in psychology and neuroscience suggest that the brain maintains a compressed representation of the past. Here we introduce a neurally-plausible, scale-free memory representation we call Scale-Invariant Temporal History (SITH) for use with artificial agents. This representation covers an exponentially large period of time by sacrificing temporal accuracy for events further in the past. We demonstrate the utility of this representation by comparing the performance of agents given SITH, buffer, and exponential decay representations in learning to play video games at different levels of complexity. In these environments, SITH exhibits better learning performance by storing information for longer timescales than a fixed-size buffer, and representing this information more clearly than a set of exponentially decayed features. Finally, we discuss how the application of SITH, along with other human-inspired models of cognition, could improve reinforcement and machine learning algorithms in general.

قيم البحث

اقرأ أيضاً

It is essential for dialogue-based spatial reasoning systems to maintain memory of historical states of the world. In addition to conveying that the dialogue agent is mentally present and engaged with the task, referring to historical states may be c rucial for enabling collaborative planning (e.g., for planning to return to a previous state, or diagnosing a past misstep). In this paper, we approach the problem of spatial memory in a multi-modal spoken dialogue system capable of answering questions about interaction history in a physical blocks world setting. This work builds upon a full spatial question-answering pipeline consisting of a vision system, speech input and output mediated by an animated avatar, a dialogue system that robustly interprets spatial queries, and a constraint solver that derives answers based on 3-D spatial modelling. The contributions of this work include a symbolic dialogue context registering knowledge about discourse history and changes in the world, as well as a natural language understanding module capable of interpreting free-form historical questions and querying the dialogue context to form an answer.
Natural learners must compute an estimate of future outcomes that follow from a stimulus in continuous time. Widely used reinforcement learning algorithms discretize continuous time and estimate either transition functions from one step to the next ( model-based algorithms) or a scalar value of exponentially-discounted future reward using the Bellman equation (model-free algorithms). An important drawback of model-based algorithms is that computational cost grows linearly with the amount of time to be simulated. On the other hand, an important drawback of model-free algorithms is the need to select a time-scale required for exponential discounting. We present a computational mechanism, developed based on work in psychology and neuroscience, for computing a scale-invariant timeline of future outcomes. This mechanism efficiently computes an estimate of inputs as a function of future time on a logarithmically-compressed scale, and can be used to generate a scale-invariant power-law-discounted estimate of expected future reward. The representation of future time retains information about what will happen when. The entire timeline can be constructed in a single parallel operation which generates concrete behavioral and neural predictions. This computational mechanism could be incorporated into future reinforcement learning algorithms.
Much of the controversy about methods for automated decision making has focused on specific calculi for combining beliefs or propagating uncertainty. We broaden the debate by (1) exploring the constellation of secondary tasks surrounding any primary decision problem, and (2) identifying knowledge engineering concerns that present additional representational tradeoffs. We argue on pragmatic grounds that the attempt to support all of these tasks within a single calculus is misguided. In the process, we note several uncertain reasoning objectives that conflict with the Bayesian ideal of complete specification of probabilities and utilities. In response, we advocate treating the uncertainty calculus as an object language for reasoning mechanisms that support the secondary tasks. Arguments against Bayesian decision theory are weakened when the calculus is relegated to this role. Architectures for uncertainty handling that take statements in the calculus as objects to be reasoned about offer the prospect of retaining normative status with respect to decision making while supporting the other tasks in uncertain reasoning.
With a growing interest in data-driven control techniques, Model Predictive Control (MPC) provides an opportunity to exploit the surplus of data reliably, particularly while taking safety and stability into account. In many real-world and industrial applications, it is typical to have an existing control strategy, for instance, execution from a human operator. The objective of this work is to improve upon this unknown, safe but suboptimal policy by learning a new controller that retains safety and stability. Learning how to be safe is achieved directly from data and from a knowledge of the system constraints. The proposed algorithm alternatively learns the terminal cost and updates the MPC parameters according to a stability metric. The terminal cost is constructed as a Lyapunov function neural network with the aim of recovering or extending the stable region of the initial demonstrator using a short prediction horizon. Theorems that characterize the stability and performance of the learned MPC in the bearing of model uncertainties and sub-optimality due to function approximation are presented. The efficacy of the proposed algorithm is demonstrated on non-linear continuous control tasks with soft constraints. The proposed approach can improve upon the initial demonstrator also in practice and achieve better stability than popular reinforcement learning baselines.
Detecting and responding to novel situations in open-world environments is a key capability of human cognition. Current artificial intelligence (AI) researchers strive to develop systems that can perform in open-world environments. Novelty detection is an important ability of such AI systems. In an open-world, novelties appear in various forms and the difficulty to detect them varies. Therefore, to accurately evaluate the detection capability of AI systems, it is necessary to investigate the difficulty to detect novelties. In this paper, we propose a qualitative physics-based method to quantify the difficulty of novelty detection focusing on open-world physical domains. We apply our method in a popular physics simulation game, Angry Birds. We conduct an experiment with human players with different novelties in Angry Birds to validate our method. Results indicate that the calculated difficulty values are in line with the detection difficulty of the human players.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا