ﻻ يوجد ملخص باللغة العربية
The inherent dynamics of the neuron membrane potential in Spiking Neural Networks (SNNs) allows processing of sequential learning tasks, avoiding the complexity of recurrent neural networks. The highly-sparse spike-based computations in such spatio-temporal data can be leveraged for energy-efficiency. However, the membrane potential incurs additional memory access bottlenecks in current SNN hardware. To that effect, we propose a 10T-SRAM compute-in-memory (CIM) macro, specifically designed for state-of-the-art SNN inference. It consists of a fused weight (WMEM) and membrane potential (VMEM) memory and inherently exploits sparsity in input spikes leading to 97.4% reduction in energy-delay-product (EDP) at 85% sparsity (typical of SNNs considered in this work) compared to the case of no sparsity. We propose staggered data mapping and reconfigurable peripherals for handling different bit-precision requirements of WMEM and VMEM, while supporting multiple neuron functionalities. The proposed macro was fabricated in 65nm CMOS technology, achieving an energy-efficiency of 0.99TOPS/W at 0.85V supply and 200MHz frequency for signed 11-bit operations. We evaluate the SNN for sentiment classification from the IMDB dataset of movie reviews and achieve within 1% accuracy of an LSTM network with 8.5x lower parameters.
Digital In-memory computing improves energy efficiency and throughput of a data-intensive process, which incur memory thrashing and, resulting multiple same memory accesses in a von Neumann architecture. Digital in-memory computing involves accessing
A mass of data transfer between the processing and storage units has been the leading bottleneck in modern Von-Neuman computing systems, especially when used for Artificial Intelligence (AI) tasks. Computing-in-Memory (CIM) has shown great potential
This work proposes a novel Energy-Aware Network Operator Search (ENOS) approach to address the energy-accuracy trade-offs of a deep neural network (DNN) accelerator. In recent years, novel inference operators have been proposed to improve the computa
Memory system is often the main bottleneck in chipmultiprocessor (CMP) systems in terms of latency, bandwidth and efficiency, and recently additionally facing capacity and power problems in an era of big data. A lot of research works have been done t
Even with generational improvements in DRAM technology, memory access latency still remains the major bottleneck for application accelerators, primarily due to limitations in memory interface IPs which cannot fully account for variations in target ap