Faster and Simpler SNN Simulation with Work Queues

84 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Dennis Bautembach

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Dennis Bautembach - Iason Oikonomidis - Nikolaos Kyriazis

الحوسبة العصبية والتطورية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We present a clock-driven Spiking Neural Network simulator which is up to 3x faster than the state of the art while, at the same time, being more general and requiring less programming effort on both the users and maintainers side. This is made possible by designing our pipeline around work queues which act as interfaces between stages and greatly reduce implementation complexity. We evaluate our work using three well-established SNN models on a series of benchmarks.

قيم البحث

190 - Dennis Bautembach , Iason Oikonomidis , Antonis Argyros 2021

We present two novel optimizations that accelerate clock-based spiking neural network (SNN) simulators. The first one targets spike timing dependent plasticity (STDP). It combines lazy- with event-driven plasticity and efficiently facilitates the com putation of pre- and post-synaptic spikes using bitfields and integer intrinsics. It offers higher bandwidth than event-driven plasticity alone and achieves a 1.5x-2x speedup over our closest competitor. The second optimization targets spike delivery. We partition our graph representation in a way that bounds the number of neurons that need be updated at any given time which allows us to perform said update in shared memory instead of global memory. This is 2x-2.5x faster than our closest competitor. Both optimizations represent the final evolutionary stages of years of iteration on STDP and spike delivery inside Spice (/spaIk/), our state of the art SNN simulator. The proposed optimizations are not exclusive to our graph representation or pipeline but are applicable to a multitude of simulator designs. We evaluate our performance on three well-established models and compare ourselves against three other state of the art simulators.

الحوسبة العصبية والتطورية الذكاء الاصطناعي النظم الموزعة والتوازية والحوسبة العنقودية

Multi-GPU SNN Simulation with Static Load Balancing

174 - Dennis Bautembach , Iason Oikonomidis , Antonis Argyros 2021

We present a SNN simulator which scales to millions of neurons, billions of synapses, and 8 GPUs. This is made possible by 1) a novel, cache-aware spike transmission algorithm 2) a model parallel multi-GPU distribution scheme and 3) a static, yet ver y effective load balancing strategy. The simulator further features an easy to use API and the ability to create custom models. We compare the proposed simulator against two state of the art ones on a series of benchmarks using three well-established models. We find that our simulator is faster, consumes less memory, and scales linearly with the number of GPUs.

الحوسبة العصبية والتطورية النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Faster and Simpler Siamese Network for Single Object Tracking

95 - Shaokui Jiang , Baile Xu , Jian Zhao 2021

Single object tracking (SOT) is currently one of the most important tasks in computer vision. With the development of the deep network and the release for a series of large scale datasets for single object tracking, siamese networks have been propose d and perform better than most of the traditional methods. However, recent siamese networks get deeper and slower to obtain better performance. Most of these methods could only meet the needs of real-time object tracking in ideal environments. In order to achieve a better balance between efficiency and accuracy, we propose a simpler siamese network for single object tracking, which runs fast in poor hardware configurations while remaining an excellent accuracy. We use a more efficient regression method to compute the location of the tracked object in a shorter time without losing much precision. For improving the accuracy and speeding up the training progress, we introduce the Squeeze-and-excitation (SE) network into the feature extractor. In this paper, we compare the proposed method with some state-of-the-art trackers and analysis their performances. Using our method, a siamese network could be trained with shorter time and less data. The fast processing speed enables combining object tracking with object detection or other tasks in real time.

الرؤية الحاسوبية وتمييز الأنماط

Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces

95 - Frank Zhang , Yongqiang Wang , Xiaohui Zhang 2020

In this work, we first show that on the widely used LibriSpeech benchmark, our transformer-based context-dependent connectionist temporal classification (CTC) system produces state-of-the-art results. We then show that using wordpieces as modeling un its combined with CTC training, we can greatly simplify the engineering pipeline compared to conventional frame-based cross-entropy training by excluding all the GMM bootstrapping, decision tree building and force alignment steps, while still achieving very competitive word-error-rate. Additionally, using wordpieces as modeling units can significantly improve runtime efficiency since we can use larger stride without losing accuracy. We further confirm these findings on two internal VideoASR datasets: German, which is similar to English as a fusional language, and Turkish, which is an agglutinative language.

معالجة الصوت والكلام الحساب واللغة

Deterministic Algorithms for Decremental Approximate Shortest Paths: Faster and Simpler

199 - Maximilian Probst Gutenberg , Christian Wulff-Nilsen 2020

In the decremental $(1+epsilon)$-approximate Single-Source Shortest Path (SSSP) problem, we are given a graph $G=(V,E)$ with $n = |V|, m = |E|$, undergoing edge deletions, and a distinguished source $s in V$, and we are asked to process edge deletion s efficiently and answer queries for distance estimates $widetilde{mathbf{dist}}_G(s,v)$ for each $v in V$, at any stage, such that $mathbf{dist}_G(s,v) leq widetilde{mathbf{dist}}_G(s,v) leq (1+ epsilon)mathbf{dist}_G(s,v)$. In the decremental $(1+epsilon)$-approximate All-Pairs Shortest Path (APSP) problem, we are asked to answer queries for distance estimates $widetilde{mathbf{dist}}_G(u,v)$ for every $u,v in V$. In this article, we consider the problems for undirected, unweighted graphs. We present a new emph{deterministic} algorithm for the decremental $(1+epsilon)$-approximate SSSP problem that takes total update time $O(mn^{0.5 + o(1)})$. Our algorithm improves on the currently best algorithm for dense graphs by Chechik and Bernstein [STOC 2016] with total update time $tilde{O}(n^2)$ and the best existing algorithm for sparse graphs with running time $tilde{O}(n^{1.25}sqrt{m})$ [SODA 2017] whenever $m = O(n^{1.5 - o(1)})$. In order to obtain this new algorithm, we develop several new techniques including improved decremental cover data structures for graphs, a more efficient notion of the heavy/light decomposition framework introduced by Chechik and Bernstein and the first clustering technique to maintain a dynamic emph{sparse} emulator in the deterministic setting. As a by-product, we also obtain a new simple deterministic algorithm for the decremental $(1+epsilon)$-approximate APSP problem with near-optimal total running time $tilde{O}(mn /epsilon)$ matching the time complexity of the sophisticated but rather involved algorithm by Henzinger, Forster and Nanongkai [FOCS 2013].

بنى وهياكل البيانات والخوارزميات

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

الجامعة العربية الدولية الخاصة

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Faster and Simpler SNN Simulation with Work Queues

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً