Representation learning with reward prediction errors

94 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل William Alexander

تاريخ النشر 2021

مجال البحث علم الأحياء

والبحث باللغة English

تأليف William H. Alexander - Samuel J. Gershman

الخلايا العصبية والإدراك

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The Reward Prediction Error hypothesis proposes that phasic activity in the midbrain dopaminergic system reflects prediction errors needed for learning in reinforcement learning. Besides the well-documented association between dopamine and reward processing, dopamine is implicated in a variety of functions without a clear relationship to reward prediction error. Fluctuations in dopamine levels influence the subjective perception of time, dopamine bursts precede the generation of motor responses, and the dopaminergic system innervates regions of the brain, including hippocampus and areas in prefrontal cortex, whose function is not uniquely tied to reward. In this manuscript, we propose that a common theme linking these functions is representation, and that prediction errors signaled by the dopamine system, in addition to driving associative learning, can also support the acquisition of adaptive state representations. In a series of simulations, we show how this extension can account for the role of dopamine in temporal and spatial representation, motor response, and abstract categorization tasks. By extending the role of dopamine signals to learning state representations, we resolve a critical challenge to the Reward Prediction Error hypothesis of dopamine function.

قيم البحث

130 - Joel Ye , Chethan Pandarinath 2021

Neural population activity is theorized to reflect an underlying dynamical structure. This structure can be accurately captured using state space models with explicit dynamics, such as those based on recurrent neural networks (RNNs). However, using r ecurrence to explicitly model dynamics necessitates sequential processing of data, slowing real-time applications such as brain-computer interfaces. Here we introduce the Neural Data Transformer (NDT), a non-recurrent alternative. We test the NDTs ability to capture autonomous dynamical systems by applying it to synthetic datasets with known dynamics and data from monkey motor cortex during a reaching task well-modeled by RNNs. The NDT models these datasets as well as state-of-the-art recurrent models. Further, its non-recurrence enables 3.9ms inference, well within the loop time of real-time applications and more than 6 times faster than recurrent baselines on the monkey reaching dataset. These results suggest that an explicit dynamics model is not necessary to model autonomous neural population dynamics. Code: https://github.com/snel-repo/neural-data-transformers

الخلايا العصبية والإدراك التعلم الآلي

Visual novelty, curiosity, and intrinsic reward in machine learning and the brain

91 - Andrew Jaegle , Vahid Mehrpour , Nicole Rust 2019

A strong preference for novelty emerges in infancy and is prevalent across the animal kingdom. When incorporated into reinforcement-based machine learning algorithms, visual novelty can act as an intrinsic reward signal that vastly increases the effi ciency of exploration and expedites learning, particularly in situations where external rewards are difficult to obtain. Here we review parallels between recent developments in novelty-driven machine learning algorithms and our understanding of how visual novelty is computed and signaled in the primate brain. We propose that in the visual system, novelty representations are not configured with the principal goal of detecting novel objects, but rather with the broader goal of flexibly generalizing novelty information across different states in the service of driving novelty-based learning.

الخلايا العصبية والإدراك

A Differential Topological Model for Olfactory Learning and Representation

112 - Jack A. Cook 2020

This thesis is designed to be a self-contained exposition of the neurobiological and mathematical aspects of sensory perception, memory, and learning with a bias towards olfaction. The final chapters introduce a new approach to modeling focusing more on the geometry of the system as opposed to element wise dynamics. Additionally, we construct an organism independent model for olfactory processing: something which is currently missing from the literature.

الخلايا العصبية والإدراك

Stock2Vec: A Hybrid Deep Learning Framework for Stock Market Prediction with Representation Learning and Temporal Convolutional Network

112 - Xing Wang , Yijun Wang , Bin Weng 2020

We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market. With representation learning, we derived an embedding called Stock2Vec, which gives us insight for the relationship among different s tocks, while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series. Evaluated on S&P 500, our hybrid framework integrates both advantages and achieves better performance on the stock price prediction task than several popular benchmarked models.

التمويل الإحصائي التعلم الآلي التعلم الالي

A mathematical model of reward and executive circuitry in obsessive compulsive disorde

72 - Anca Radulescu , Rachel Marra 2015

The neuronal circuit that controls obsessive and compulsive behaviors involves a complex network of brain regions (some with known involvement in reward processing). Among these are cortical regions, the striatum and the thalamus (which compose the C STC pathway), limbic areas such as the amygdala and the hippocampus, and well as dopamine pathways. Abnormal dynamic behavior in this brain network is a hallmark feature of patients with increased anxiety and motor activity, like the ones affected by OCD. There is currently no clear understanding of precisely what mechanisms generates these behaviors. We attempt to investigate a collection of connectivity hypotheses of OCD by means of a computational model of the brain circuitry that governs reward and motion execution. Mathematically, we use methods from ordinary differential equations and continuous time dynamical systems. We use classical analytical methods as well as computational approaches to study phenomena in the phase plane (e.g., behavior of the systems solutions when given certain initial conditions) and in the parameter space (e.g., sensitive dependence of initial conditions). We find that different obsessive-compulsive subtypes may correspond to different abnormalities in the network connectivity profiles. We suggest that it is combinations of parameters (connectivity strengths between regions), rather the than the value of any one parameter taken independently, that provides the best basis for predicting behavior, and for understanding the heterogeneity of the illness.

الخلايا العصبية والإدراك

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة السورية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Representation learning with reward prediction errors

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً