ترغب بنشر مسار تعليمي؟ اضغط هنا

Representation learning with reward prediction errors

94   0   0.0 ( 0 )
 نشر من قبل William Alexander
 تاريخ النشر 2021
  مجال البحث علم الأحياء
والبحث باللغة English




اسأل ChatGPT حول البحث

The Reward Prediction Error hypothesis proposes that phasic activity in the midbrain dopaminergic system reflects prediction errors needed for learning in reinforcement learning. Besides the well-documented association between dopamine and reward processing, dopamine is implicated in a variety of functions without a clear relationship to reward prediction error. Fluctuations in dopamine levels influence the subjective perception of time, dopamine bursts precede the generation of motor responses, and the dopaminergic system innervates regions of the brain, including hippocampus and areas in prefrontal cortex, whose function is not uniquely tied to reward. In this manuscript, we propose that a common theme linking these functions is representation, and that prediction errors signaled by the dopamine system, in addition to driving associative learning, can also support the acquisition of adaptive state representations. In a series of simulations, we show how this extension can account for the role of dopamine in temporal and spatial representation, motor response, and abstract categorization tasks. By extending the role of dopamine signals to learning state representations, we resolve a critical challenge to the Reward Prediction Error hypothesis of dopamine function.



قيم البحث

اقرأ أيضاً

Neural population activity is theorized to reflect an underlying dynamical structure. This structure can be accurately captured using state space models with explicit dynamics, such as those based on recurrent neural networks (RNNs). However, using r ecurrence to explicitly model dynamics necessitates sequential processing of data, slowing real-time applications such as brain-computer interfaces. Here we introduce the Neural Data Transformer (NDT), a non-recurrent alternative. We test the NDTs ability to capture autonomous dynamical systems by applying it to synthetic datasets with known dynamics and data from monkey motor cortex during a reaching task well-modeled by RNNs. The NDT models these datasets as well as state-of-the-art recurrent models. Further, its non-recurrence enables 3.9ms inference, well within the loop time of real-time applications and more than 6 times faster than recurrent baselines on the monkey reaching dataset. These results suggest that an explicit dynamics model is not necessary to model autonomous neural population dynamics. Code: https://github.com/snel-repo/neural-data-transformers
A strong preference for novelty emerges in infancy and is prevalent across the animal kingdom. When incorporated into reinforcement-based machine learning algorithms, visual novelty can act as an intrinsic reward signal that vastly increases the effi ciency of exploration and expedites learning, particularly in situations where external rewards are difficult to obtain. Here we review parallels between recent developments in novelty-driven machine learning algorithms and our understanding of how visual novelty is computed and signaled in the primate brain. We propose that in the visual system, novelty representations are not configured with the principal goal of detecting novel objects, but rather with the broader goal of flexibly generalizing novelty information across different states in the service of driving novelty-based learning.
112 - Jack A. Cook 2020
This thesis is designed to be a self-contained exposition of the neurobiological and mathematical aspects of sensory perception, memory, and learning with a bias towards olfaction. The final chapters introduce a new approach to modeling focusing more on the geometry of the system as opposed to element wise dynamics. Additionally, we construct an organism independent model for olfactory processing: something which is currently missing from the literature.
112 - Xing Wang , Yijun Wang , Bin Weng 2020
We have proposed to develop a global hybrid deep learning framework to predict the daily prices in the stock market. With representation learning, we derived an embedding called Stock2Vec, which gives us insight for the relationship among different s tocks, while the temporal convolutional layers are used for automatically capturing effective temporal patterns both within and across series. Evaluated on S&P 500, our hybrid framework integrates both advantages and achieves better performance on the stock price prediction task than several popular benchmarked models.
The neuronal circuit that controls obsessive and compulsive behaviors involves a complex network of brain regions (some with known involvement in reward processing). Among these are cortical regions, the striatum and the thalamus (which compose the C STC pathway), limbic areas such as the amygdala and the hippocampus, and well as dopamine pathways. Abnormal dynamic behavior in this brain network is a hallmark feature of patients with increased anxiety and motor activity, like the ones affected by OCD. There is currently no clear understanding of precisely what mechanisms generates these behaviors. We attempt to investigate a collection of connectivity hypotheses of OCD by means of a computational model of the brain circuitry that governs reward and motion execution. Mathematically, we use methods from ordinary differential equations and continuous time dynamical systems. We use classical analytical methods as well as computational approaches to study phenomena in the phase plane (e.g., behavior of the systems solutions when given certain initial conditions) and in the parameter space (e.g., sensitive dependence of initial conditions). We find that different obsessive-compulsive subtypes may correspond to different abnormalities in the network connectivity profiles. We suggest that it is combinations of parameters (connectivity strengths between regions), rather the than the value of any one parameter taken independently, that provides the best basis for predicting behavior, and for understanding the heterogeneity of the illness.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا