No Arabic abstract
Spiking Neural Networks (SNNs) offer an event-driven and more biologically realistic alternative to standard Artificial Neural Networks based on analog information processing. This can potentially enable energy-efficient hardware implementations of neuromorphic systems which emulate the functional units of the brain, namely, neurons and synapses. Recent demonstrations of ultra-fast photonic computing devices based on phase-change materials (PCMs) show promise of addressing limitations of electrically driven neuromorphic systems. However, scaling these standalone computing devices to a parallel in-memory computing primitive is a challenge. In this work, we utilize the optical properties of the PCM, Getextsubscript{2}Sbtextsubscript{2}Tetextsubscript{5} (GST), to propose a Photonic Spiking Neural Network computing primitive, comprising of a non-volatile synaptic array integrated seamlessly with previously explored `integrate-and-fire neurons. The proposed design realizes an `in-memory computing platform that leverages the inherent parallelism of wavelength-division-multiplexing (WDM). We show that the proposed computing platform can be used to emulate a SNN inferencing engine for image classification tasks. The proposed design not only bridges the gap between isolated computing devices and parallel large-scale implementation, but also paves the way for ultra-fast computing and localized on-chip learning.
Despite huge success of artificial intelligence, hardware systems running these algorithms consume orders of magnitude higher energy compared to the human brain, mainly due to heavy data movements between the memory unit and the computation cores. Spiking neural networks (SNNs) built using bio-plausible neuron and synaptic models have emerged as the power-efficient choice for designing cognitive applications. These algorithms involve several lookup-table (LUT) based function evaluations such as high-order polynomials and transcendental functions for solving complex neuro-synaptic models, that typically require additional storage. To that effect, we propose `SPARE - an in-memory, distributed processing architecture built on ROM-embedded RAM technology, for accelerating SNNs. ROM-embedded RAMs allow storage of LUTs, embedded within a typical memory array, without additional area overhead. Our proposed architecture consists of a 2-D array of Processing Elements (PEs). Since most of the computations are done locally within each PE, unnecessary data transfers are restricted, thereby alleviating the von-Neumann bottleneck. We evaluate SPARE for two different ROM-Embedded RAM structures - CMOS based ROM-Embedded SRAMs (R-SRAMs) and STT-MRAM based ROM-Embedded MRAMs (R-MRAMs). Moreover, we analyze trade-offs in terms of energy, area and performance, for using the two technologies on a range of image classification benchmarks. Furthermore, we leverage the additional storage density to implement complex neuro-synaptic functionalities. This enhances the utility of the proposed architecture by provisioning implementation of any neuron/synaptic behavior as necessitated by the application. Our results show up-to 1.75x, 1.95x and 1.95x improvement in energy, iso-storage area, and iso-area performance, respectively, by using neural network accelerators built on ROM-embedded RAM primitives.
In-memory computing is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. Crossbar arrays of resistive memory devices can be used to encode the network weights and perform efficient analog matrix-vector multiplications without intermediate movements of data. However, due to device variability and noise, the network needs to be trained in a specific way so that transferring the digitally trained weights to the analog resistive memory devices will not result in significant loss of accuracy. Here, we introduce a methodology to train ResNet-type convolutional neural networks that results in no appreciable accuracy loss when transferring weights to in-memory computing hardware based on phase-change memory (PCM). We also propose a compensation technique that exploits the batch normalization parameters to improve the accuracy retention over time. We achieve a classification accuracy of 93.7% on the CIFAR-10 dataset and a top-1 accuracy on the ImageNet benchmark of 71.6% after mapping the trained weights to PCM. Our hardware results on CIFAR-10 with ResNet-32 demonstrate an accuracy above 93.5% retained over a one day period, where each of the 361,722 synaptic weights of the network is programmed on just two PCM devices organized in a differential configuration.
Collocated data processing and storage are the norm in biological systems. Indeed, the von Neumann computing architecture, that physically and temporally separates processing and memory, was born more of pragmatism based on available technology. As our ability to create better hardware improves, new computational paradigms are being explored. Integrated photonic circuits are regarded as an attractive solution for on-chip computing using only light, leveraging the increased speed and bandwidth potential of working in the optical domain, and importantly, removing the need for time and energy sapping electro-optical
Spiking recurrent neural networks (RNNs) are a promising tool for solving a wide variety of complex cognitive and motor tasks, due to their rich temporal dynamics and sparse processing. However training spiking RNNs on dedicated neuromorphic hardware is still an open challenge. This is due mainly to the lack of local, hardware-friendly learning mechanisms that can solve the temporal credit assignment problem and ensure stable network dynamics, even when the weight resolution is limited. These challenges are further accentuated, if one resorts to using memristive devices for in-memory computing to resolve the von-Neumann bottleneck problem, at the expense of a substantial increase in variability in both the computation and the working memory of the spiking RNNs. To address these challenges and enable online learning in memristive neuromorphic RNNs, we present a simulation framework of differential-architecture crossbar arrays based on an accurate and comprehensive Phase-Change Memory (PCM) device model. We train a spiking RNN whose weights are emulated in the presented simulation framework, using a recently proposed e-prop learning rule. Although e-prop locally approximates the ideal synaptic updates, it is difficult to implement the updates on the memristive substrate due to substantial PCM non-idealities. We compare several widely adapted weight update schemes that primarily aim to cope with these device non-idealities and demonstrate that accumulating gradients can enable online and efficient training of spiking RNN on memristive substrates.
Neuromorphic hardware platforms implement biological neurons and synapses to execute spiking neural networks (SNNs) in an energy-efficient manner. We present SpiNeMap, a design methodology to map SNNs to crossbar-based neuromorphic hardware, minimizing spike latency and energy consumption. SpiNeMap operates in two steps: SpiNeCluster and SpiNePlacer. SpiNeCluster is a heuristic-based clustering technique to partition SNNs into clusters of synapses, where intracluster local synapses are mapped within crossbars of the hardware and inter-cluster global synapses are mapped to the shared interconnect. SpiNeCluster minimizes the number of spikes on global synapses, which reduces spike congestion on the shared interconnect, improving application performance. SpiNePlacer then finds the best placement of local and global synapses on the hardware using a meta-heuristic-based approach to minimize energy consumption and spike latency. We evaluate SpiNeMap using synthetic and realistic SNNs on the DynapSE neuromorphic hardware. We show that SpiNeMap reduces average energy consumption by 45% and average spike latency by 21%, compared to state-of-the-art techniques.