ترغب بنشر مسار تعليمي؟ اضغط هنا

Credit Assignment Through Broadcasting a Global Error Vector

98   0   0.0 ( 0 )
 نشر من قبل David Clark
 تاريخ النشر 2021
والبحث باللغة English




اسأل ChatGPT حول البحث

Backpropagation (BP) uses detailed, unit-specific feedback to train deep neural networks (DNNs) with remarkable success. That biological neural circuits appear to perform credit assignment, but cannot implement BP, implies the existence of other powerful learning algorithms. Here, we explore the extent to which a globally broadcast learning signal, coupled with local weight updates, enables training of DNNs. We present both a learning rule, called global error-vector broadcasting (GEVB), and a class of DNNs, called vectorized nonnegative networks (VNNs), in which this learning rule operates. VNNs have vector-valued units and nonnegative weights past the first layer. The GEVB learning rule generalizes three-factor Hebbian learning, updating each weight by an amount proportional to the inner product of the presynaptic activation and a globally broadcast error vector when the postsynaptic unit is active. We prove that these weight updates are matched in sign to the gradient, enabling accurate credit assignment. Moreover, at initialization, these updates are exactly proportional to the gradient in the limit of infinite network width. GEVB matches the performance of BP in VNNs, and in some cases outperforms direct feedback alignment (DFA) applied in conventional networks. Unlike DFA, GEVB successfully trains convolutional layers. Altogether, our theoretical and empirical results point to a surprisingly powerful role for a global learning signal in training DNNs.



قيم البحث

اقرأ أيضاً

Animal brains evolved to optimize behavior in dynamically changing environments, selecting actions that maximize future rewards. A large body of experimental work indicates that such optimization changes the wiring of neural circuits, appropriately m apping environmental input onto behavioral outputs. A major unsolved scientific question is how optimal wiring adjustments, which must target the connections responsible for rewards, can be accomplished when the relation between sensory inputs, action taken, environmental context with rewards is ambiguous. The computational problem of properly targeting cues, contexts and actions that lead to reward is known as structural, contextual and temporal credit assignment respectively. In this review, we survey prior approaches to these three types of problems and advance the notion that the brains specialized neural architectures provide efficient solutions. Within this framework, the thalamus with its cortical and basal ganglia interactions serve as a systems-level solution to credit assignment. Specifically, we propose that thalamocortical interaction is the locus of meta-learning where the thalamus provides cortical control functions that parametrize the cortical activity association space. By selecting among these control functions, the basal ganglia hierarchically guide thalamocortical plasticity across two timescales to enable meta-learning. The faster timescale establishes contextual associations to enable rapid behavioral flexibility while the slower one enables generalization to new contexts. Incorporating different thalamic control functions under this framework clarifies how thalamocortical-basal ganglia interactions may simultaneously solve the three credit assignment problems.
We consider the problem of efficient credit assignment in reinforcement learning. In order to efficiently and meaningfully utilize new data, we propose to explicitly assign credit to past decisions based on the likelihood of them having led to the ob served outcome. This approach uses new information in hindsight, rather than employing foresight. Somewhat surprisingly, we show that value functions can be rewritten through this lens, yielding a new family of algorithms. We study the properties of these algorithms, and empirically show that they successfully address important credit assignment challenges, through a set of illustrative tasks.
The success of deep learning sparked interest in whether the brain learns by using similar techniques for assigning credit to each synaptic weight for its contribution to the network output. However, the majority of current attempts at biologically-p lausible learning methods are either non-local in time, require highly specific connectivity motives, or have no clear link to any known mathematical optimization method. Here, we introduce Deep Feedback Control (DFC), a new learning method that uses a feedback controller to drive a deep neural network to match a desired output target and whose control signal can be used for credit assignment. The resulting learning rule is fully local in space and time and approximates Gauss-Newton optimization for a wide range of feedback connectivity patterns. To further underline its biological plausibility, we relate DFC to a multi-compartment model of cortical pyramidal neurons with a local voltage-dependent synaptic plasticity rule, consistent with recent theories of dendritic processing. By combining dynamical system theory with mathematical optimization theory, we provide a strong theoretical foundation for DFC that we corroborate with detailed results on toy experiments and standard computer-vision benchmarks.
We present a cerebellar architecture with two main characteristics. The first one is that complex spikes respond to increases in sensory errors. The second one is that cerebellar modules associate particular contexts where errors have increased in th e past with corrective commands that stop the increase in error. We analyze our architecture formally and computationally for the case of reaching in a 3D environment. In the case of motor control, we show that there are synergies of this architecture with the Equilibrium-Point hypothesis, leading to novel ways to solve the motor error problem. In particular, the presence of desired equilibrium lengths for muscles provides a way to know when the error is increasing, and which corrections to apply. In the context of Threshold Control Theory and Perceptual Control Theory we show how to extend our model so it implements anticipative corrections in cascade control systems that span from muscle contractions to cognitive operations.
We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the worl d to improve its predictions. Particularly, we work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We establish the relative merits, limitations and complementary properties of both planning mechanisms in carefully constructed scenarios. Further, we investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated. Lastly, we discuss the issue of model estimation and highlight a spectrum of methods that stretch from explicit environment-dynamics predictors to more abstract planner-aware models.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا