Even Faster SNN Simulation with Lazy+Event-driven Plasticity and Shared Atomics

191 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dennis Bautembach

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Dennis Bautembach - Iason Oikonomidis - Antonis Argyros

الحوسبة العصبية والتطورية الذكاء الاصطناعي النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present two novel optimizations that accelerate clock-based spiking neural network (SNN) simulators. The first one targets spike timing dependent plasticity (STDP). It combines lazy- with event-driven plasticity and efficiently facilitates the computation of pre- and post-synaptic spikes using bitfields and integer intrinsics. It offers higher bandwidth than event-driven plasticity alone and achieves a 1.5x-2x speedup over our closest competitor. The second optimization targets spike delivery. We partition our graph representation in a way that bounds the number of neurons that need be updated at any given time which allows us to perform said update in shared memory instead of global memory. This is 2x-2.5x faster than our closest competitor. Both optimizations represent the final evolutionary stages of years of iteration on STDP and spike delivery inside Spice (/spaIk/), our state of the art SNN simulator. The proposed optimizations are not exclusive to our graph representation or pipeline but are applicable to a multitude of simulator designs. We evaluate our performance on three well-established models and compare ourselves against three other state of the art simulators.

قيم البحث

83 - Dennis Bautembach , Iason Oikonomidis , Nikolaos Kyriazis 2019

We present a clock-driven Spiking Neural Network simulator which is up to 3x faster than the state of the art while, at the same time, being more general and requiring less programming effort on both the users and maintainers side. This is made possi ble by designing our pipeline around work queues which act as interfaces between stages and greatly reduce implementation complexity. We evaluate our work using three well-established SNN models on a series of benchmarks.

الحوسبة العصبية والتطورية

Multi-GPU SNN Simulation with Static Load Balancing

174 - Dennis Bautembach , Iason Oikonomidis , Antonis Argyros 2021

We present a SNN simulator which scales to millions of neurons, billions of synapses, and 8 GPUs. This is made possible by 1) a novel, cache-aware spike transmission algorithm 2) a model parallel multi-GPU distribution scheme and 3) a static, yet ver y effective load balancing strategy. The simulator further features an easy to use API and the ability to create custom models. We compare the proposed simulator against two state of the art ones on a series of benchmarks using three well-established models. We find that our simulator is faster, consumes less memory, and scales linearly with the number of GPUs.

الحوسبة العصبية والتطورية النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Optimal ANN-SNN Conversion for Fast and Accurate Inference in Deep Spiking Neural Networks

197 - Jianhao Ding , Zhaofei Yu , Yonghong Tian 2021

Spiking Neural Networks (SNNs), as bio-inspired energy-efficient neural networks, have attracted great attentions from researchers and industry. The most efficient way to train deep SNNs is through ANN-SNN conversion. However, the conversion usually suffers from accuracy loss and long inference time, which impede the practical application of SNN. In this paper, we theoretically analyze ANN-SNN conversion and derive sufficient conditions of the optimal conversion. To better correlate ANN-SNN and get greater accuracy, we propose Rate Norm Layer to replace the ReLU activation function in source ANN training, enabling direct conversion from a trained ANN to an SNN. Moreover, we propose an optimal fit curve to quantify the fit between the activation value of source ANN and the actual firing rate of target SNN. We show that the inference time can be reduced by optimizing the upper bound of the fit curve in the revised ANN to achieve fast inference. Our theory can explain the existing work on fast reasoning and get better results. The experimental results show that the proposed method achieves near loss less conversion with VGG-16, PreActResNet-18, and deeper structures. Moreover, it can reach 8.6x faster reasoning performance under 0.265x energy consumption of the typical method. The code is available at https://github.com/DingJianhao/OptSNNConvertion-RNL-RIL.

الحوسبة العصبية والتطورية الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Finding Even Subgraphs Even Faster

166 - Prachi Goyal , Pranabendu Misra , Fahad Panolan 2014

Problems of the following kind have been the focus of much recent research in the realm of parameterized complexity: Given an input graph (digraph) on $n$ vertices and a positive integer parameter $k$, find if there exist $k$ edges (arcs) whose delet ion results in a graph that satisfies some specified parity constraints. In particular, when the objective is to obtain a connected graph in which all the vertices have even degrees---where the resulting graph is emph{Eulerian}---the problem is called Undirected Eulerian Edge Deletion. The corresponding problem in digraphs where the resulting graph should be strongly connected and every vertex should have the same in-degree as its out-degree is called Directed Eulerian Edge Deletion. Cygan et al. [emph{Algorithmica, 2014}] showed that these problems are fixed parameter tractable (FPT), and gave algorithms with the running time $2^{O(k log k)}n^{O(1)}$. They also asked, as an open problem, whether there exist FPT algorithms which solve these problems in time $2^{O(k)}n^{O(1)}$. In this paper we answer their question in the affirmative: using the technique of computing emph{representative families of co-graphic matroids} we design algorithms which solve these problems in time $2^{O(k)}n^{O(1)}$. The crucial insight we bring to these problems is to view the solution as an independent set of a co-graphic matroid. We believe that this view-point/approach will be useful in other problems where one of the constraints that need to be satisfied is that of connectivity.

بنى وهياكل البيانات والخوارزميات التوافقية

BSNN: Towards Faster and Better Conversion of Artificial Neural Networks to Spiking Neural Networks with Bistable Neurons

79 - Yang Li , Yi Zeng , Dongcheng Zhao 2021

The spiking neural network (SNN) computes and communicates information through discrete binary events. It is considered more biologically plausible and more energy-efficient than artificial neural networks (ANN) in emerging neuromorphic hardware. How ever, due to the discontinuous and non-differentiable characteristics, training SNN is a relatively challenging task. Recent work has achieved essential progress on an excellent performance by converting ANN to SNN. Due to the difference in information processing, the converted deep SNN usually suffers serious performance loss and large time delay. In this paper, we analyze the reasons for the performance loss and propose a novel bistable spiking neural network (BSNN) that addresses the problem of spikes of inactivated neurons (SIN) caused by the phase lead and phase lag. Also, when ResNet structure-based ANNs are converted, the information of output neurons is incomplete due to the rapid transmission of the shortcut path. We design synchronous neurons (SN) to help efficiently improve performance. Experimental results show that the proposed method only needs 1/4-1/10 of the time steps compared to previous work to achieve nearly lossless conversion. We demonstrate state-of-the-art ANN-SNN conversion for VGG16, ResNet20, and ResNet34 on challenging datasets including CIFAR-10 (95.16% top-1), CIFAR-100 (78.12% top-1), and ImageNet (72.64% top-1).

الحوسبة العصبية والتطورية الذكاء الاصطناعي