Do you want to publish a course? Click here

A Device Non-Ideality Resilient Approach for Mapping Neural Networks to Crossbar Arrays

98   0   0.0 ( 0 )
 Added by Arman Kazemi
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

We propose a technology-independent method, referred to as adjacent connection matrix (ACM), to efficiently map signed weight matrices to non-negative crossbar arrays. When compared to same-hardware-overhead mapping methods, using ACM leads to improvements of up to 20% in training accuracy for ResNet-20 with the CIFAR-10 dataset when training with 5-bit precision crossbar arrays or lower. When compared with strategies that use two elements to represent a weight, ACM achieves comparable training accuracies, while also offering area and read energy reductions of 2.3x and 7x, respectively. ACM also has a mild regularization effect that improves inference accuracy in crossbar arrays without any retraining or costly device/variation-aware training.



rate research

Read More

The analog nature of computing in Memristive crossbars poses significant issues due to various non-idealities such as: parasitic resistances, non-linear I-V characteristics of the device etc. The non-idealities can have a detrimental impact on the functionality i.e. computational accuracy of crossbars. Past works have explored modeling the non-idealities using analytical techniques. However, several non-idealities have data dependent behavior. This can not be captured using analytical (non data-dependent) models thereby, limiting their suitability in predicting application accuracy. To address this, we propose a Generalized Approach to Emulating Non-Ideality in Memristive Crossbars using Neural Networks (GENIEx), which accurately captures the data-dependent nature of non-idealities. We perform extensive HSPICE simulations of crossbars with different voltage and conductance combinations. Following that, we train a neural network to learn the transfer characteristics of the non-ideal crossbar. Next, we build a functional simulator which includes key architectural facets such as textit{tiling}, and textit{bit-slicing} to analyze the impact of non-idealities on the classification accuracy of large-scale neural networks. We show that GENIEx achieves textit{low} root mean square errors (RMSE) of $0.25$ and $0.7$ for low and high voltages, respectively, compared to HSPICE. Additionally, the GENIEx errors are $7times$ and $12.8times$ better than an analytical model which can only capture the linear non-idealities. Further, using the functional simulator and GENIEx, we demonstrate that an analytical model can overestimate the degradation in classification accuracy by $ge 10%$ on CIFAR-100 and $3.7%$ on ImageNet datasets compared to GENIEx.
There is widespread interest in emerging technologies, especially resistive crossbars for accelerating Deep Neural Networks (DNNs). Resistive crossbars offer a highly-parallel and efficient matrix-vector-multiplication (MVM) operation. MVM being the most dominant operation in DNNs makes crossbars ideally suited. However, various sources of device and circuit non-idealities lead to errors in the MVM output, thereby reducing DNN accuracy. Towards that end, we propose crossbar re-mapping strategies to mitigate line-resistance induced accuracy degradation in DNNs, without having to re-train the learned weights, unlike most prior works. Line-resistances degrade the voltage levels along the crossbar columns, thereby inducing more errors at the columns away from the drivers. We rank the DNN weights and kernels based on a sensitivity analysis, and re-arrange the columns such that the most sensitive kernels are mapped closer to the drivers, thereby minimizing the impact of errors on the overall accuracy. We propose two algorithms $-$ static remapping strategy (SRS) and dynamic remapping strategy (DRS), to optimize the crossbar re-arrangement of a pre-trained DNN. We demonstrate the benefits of our approach on a standard VGG16 network trained using CIFAR10 dataset. Our results show that SRS and DRS limit the accuracy degradation to 2.9% and 2.1%, respectively, compared to a 5.6% drop from an as it is mapping of weights and kernels to crossbars. We believe this work brings an additional aspect for optimization, which can be used in tandem with existing mitigation techniques, such as in-situ compensation, technology aware training and re-training approaches, to enhance system performance.
Neuromorphic hardware platforms implement biological neurons and synapses to execute spiking neural networks (SNNs) in an energy-efficient manner. We present SpiNeMap, a design methodology to map SNNs to crossbar-based neuromorphic hardware, minimizing spike latency and energy consumption. SpiNeMap operates in two steps: SpiNeCluster and SpiNePlacer. SpiNeCluster is a heuristic-based clustering technique to partition SNNs into clusters of synapses, where intracluster local synapses are mapped within crossbars of the hardware and inter-cluster global synapses are mapped to the shared interconnect. SpiNeCluster minimizes the number of spikes on global synapses, which reduces spike congestion on the shared interconnect, improving application performance. SpiNePlacer then finds the best placement of local and global synapses on the hardware using a meta-heuristic-based approach to minimize energy consumption and spike latency. We evaluate SpiNeMap using synthetic and realistic SNNs on the DynapSE neuromorphic hardware. We show that SpiNeMap reduces average energy consumption by 45% and average spike latency by 21%, compared to state-of-the-art techniques.
In-memory computing is an emerging non-von Neumann computing paradigm where certain computational tasks are performed in memory by exploiting the physical attributes of the memory devices. Memristive devices such as phase-change memory (PCM), where information is stored in terms of their conductance levels, are especially well suited for in-memory computing. In particular, memristive devices, when organized in a crossbar configuration can be used to perform matrix-vector multiply operations by exploiting Kirchhoffs circuit laws. To explore the feasibility of such in-memory computing cores in applications such as deep learning as well as for system-level architectural exploration, it is highly desirable to develop an accurate hardware emulator that captures the key physical attributes of the memristive devices. Here, we present one such emulator for PCM and experimentally validate it using measurements from a PCM prototype chip. Moreover, we present an application of the emulator for neural network inference where our emulator can capture the conductance evolution of approximately 400,000 PCM devices remarkably well.
Backpropagation through nonlinear neurons is an outstanding challenge to the field of optical neural networks and the major conceptual barrier to all-optical training schemes. Each neuron is required to exhibit a directionally dependent response to propagating optical signals, with the backwards response conditioned on the forward signal, which is highly non-trivial to implement optically. We propose a practical and surprisingly simple solution that uses saturable absorption to provide the network nonlinearity. We find that the backward propagating gradients required to train the network can be approximated in a pump-probe scheme that requires only passive optical elements. Simulations show that, with readily obtainable optical depths, our approach can achieve equivalent performance to state-of-the-art computational networks on image classification benchmarks, even in deep networks with multiple sequential gradient approximations. This scheme is compatible with leading optical neural network proposals and therefore provides a feasible path towards end-to-end optical training.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا