ترغب بنشر مسار تعليمي؟ اضغط هنا

Distilling Neuron Spike with High Temperature in Reinforcement Learning Agents

91   0   0.0 ( 0 )
 نشر من قبل Ling Zhang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Spiking neural network (SNN), compared with depth neural network (DNN), has faster processing speed, lower energy consumption and more biological interpretability, which is expected to approach Strong AI. Reinforcement learning is similar to learning in biology. It is of great significance to study the combination of SNN and RL. We propose the reinforcement learning method of spike distillation network (SDN) with STBP. This method uses distillation to effectively avoid the weakness of STBP, which can achieve SOTA performance in classification, and can obtain a smaller, faster convergence and lower power consumption SNN reinforcement learning model. Experiments show that our method can converge faster than traditional SNN reinforcement learning and DNN reinforcement learning methods, about 1000 epochs faster, and obtain SNN 200 times smaller than DNN. We also deploy SDN to the PKU nc64c chip, which proves that SDN has lower power consumption than DNN, and the power consumption of SDN is more than 600 times lower than DNN on large-scale devices. SDN provides a new way of SNN reinforcement learning, and can achieve SOTA performance, which proves the possibility of further development of SNN reinforcement learning.



قيم البحث

اقرأ أيضاً

Modeling spike firing assumes that spiking statistics are Poisson, but real data violates this assumption. To capture non-Poissonian features, in order to fix the inevitable inherent irregularity, researchers rescale the time axis with tedious comput ational overhead instead of searching for another distribution. Spikes or action potentials are precisely-timed changes in the ionic transport through synapses adjusting the synaptic weight, successfully modeled and developed as a memristor. Memristance value is multiples of initial resistance. This reminds us with the foundations of quantum mechanics. We try to quantize potential and resistance, as done with energy. After reviewing Planck curve for blackbody radiation, we propose the quantization equations. We introduce and prove a theorem that quantizes the resistance. Then we define the tyke showing its basic characteristics. Finally we give the basic transformations to model spiking and link an energy quantum to a tyke. Investigation shows how this perfectly models the neuron spiking, with over 97% match.
132 - Zeyu Zhang , Guisheng Yin 2020
We propose a general agent population learning system, and on this basis, we propose lineage evolution reinforcement learning algorithm. Lineage evolution reinforcement learning is a kind of derivative algorithm which accords with the general agent p opulation learning system. We take the agents in DQN and its related variants as the basic agents in the population, and add the selection, mutation and crossover modules in the genetic algorithm to the reinforcement learning algorithm. In the process of agent evolution, we refer to the characteristics of natural genetic behavior, add lineage factor to ensure the retention of potential performance of agent, and comprehensively consider the current performance and lineage value when evaluating the performance of agent. Without changing the parameters of the original reinforcement learning algorithm, lineage evolution reinforcement learning can optimize different reinforcement learning algorithms. Our experiments show that the idea of evolution with lineage improves the performance of original reinforcement learning algorithm in some games in Atari 2600.
We present a neural architecture search algorithm to construct compact reinforcement learning (RL) policies, by combining ENAS and ES in a highly scalable and intuitive way. By defining the combinatorial search space of NAS to be the set of different edge-partitionings (colorings) into same-weight classes, we represent compact architectures via efficient learned edge-partitionings. For several RL tasks, we manage to learn colorings translating to effective policies parameterized by as few as $17$ weight parameters, providing >90% compression over vanilla policies and 6x compression over state-of-the-art compact policies based on Toeplitz matrices, while still maintaining good reward. We believe that our work is one of the first attempts to propose a rigorous approach to training structured neural network architectures for RL problems that are of interest especially in mobile robotics with limited storage and computational resources.
Humans and other intelligent animals evolved highly sophisticated perception systems that combine multiple sensory modalities. On the other hand, state-of-the-art artificial agents rely mostly on visual inputs or structured low-dimensional observatio ns provided by instrumented environments. Learning to act based on combined visual and auditory inputs is still a new topic of research that has not been explored beyond simple scenarios. To facilitate progress in this area we introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations. We study the performance of different model architectures in a series of tasks that require the agent to recognize sounds and execute instructions given in natural language. Finally, we train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary. We are currently in the process of merging the augmented simulator with the main ViZDoom code repository. Video demonstrations and experiment code can be found at https://sites.google.com/view/sound-rl.
Reinforcement learning (RL) research focuses on general solutions that can be applied across different domains. This results in methods that RL practitioners can use in almost any domain. However, recent studies often lack the engineering steps (tric ks) which may be needed to effectively use RL, such as reward shaping, curriculum learning, and splitting a large task into smaller chunks. Such tricks are common, if not necessary, to achieve state-of-the-art results and win RL competitions. To ease the engineering efforts, we distill descriptions of tricks from state-of-the-art results and study how well these tricks can improve a standard deep Q-learning agent. The long-term goal of this work is to enable combining proven RL methods with domain-specific tricks by providing a unified software framework and accompanying insights in multiple domains.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا