No Arabic abstract
The memristive crossbar aims to implement analog weighted neural network, however, the realistic implementation of such crossbar arrays is not possible due to limited switching states of memristive devices. In this work, we propose the design of an analog deep neural network with binary weight update through backpropagation algorithm using binary state memristive devices. We show that such networks can be successfully used for image processing task and has the advantage of lower power consumption and small on-chip area in comparison with digital counterparts. The proposed network was benchmarked for MNIST handwritten digits recognition achieving an accuracy of approximately 90%.
The volume, veracity, variability, and velocity of data produced from the ever-increasing network of sensors connected to Internet pose challenges for power management, scalability, and sustainability of cloud computing infrastructure. Increasing the data processing capability of edge computing devices at lower power requirements can reduce several overheads for cloud computing solutions. This paper provides the review of neuromorphic CMOS-memristive architectures that can be integrated into edge computing devices. We discuss why the neuromorphic architectures are useful for edge devices and show the advantages, drawbacks and open problems in the field of neuro-memristive circuits for edge computing.
In-Memory Computing (IMC) hardware using Memristive Crossbar Arrays (MCAs) are gaining popularity to accelerate Deep Neural Networks (DNNs) since it alleviates the memory wall problem associated with von-Neumann architecture. The hardware efficiency (energy, latency and area) as well as application accuracy (considering device and circuit non-idealities) of DNNs mapped to such hardware are co-dependent on network parameters, such as kernel size, depth etc. and hardware architecture parameters such as crossbar size. However, co-optimization of both network and hardware parameters presents a challenging search space comprising of different kernel sizes mapped to varying crossbar sizes. To that effect, we propose NAX -- an efficient neural architecture search engine that co-designs neural network and IMC based hardware architecture. NAX explores the aforementioned search space to determine kernel and corresponding crossbar sizes for each DNN layer to achieve optimal tradeoffs between hardware efficiency and application accuracy. Our results from NAX show that the networks have heterogeneous crossbar sizes across different network layers, and achieves optimal hardware efficiency and accuracy considering the non-idealities in crossbars. On CIFAR-10 and Tiny ImageNet, our models achieve 0.8%, 0.2% higher accuracy, and 17%, 4% lower EDAP (energy-delay-area product) compared to a baseline ResNet-20 and ResNet-18 models, respectively.
An analog synapse circuit based on ferroelectric-metal field-effect transistors is proposed, that offers 6-bit weight precision. The circuit is comprised of volatile least significant bits (LSBs) used solely during training, and non-volatile most significant bits (MSBs) used for both training and inference. The design works at a 1.8V logic-compatible voltage, provides 10^10 endurance cycles, and requires only 250ps update pulses. A variant of LeNet trained with the proposed synapse achieves 98.2% accuracy on MNIST, which is only 0.4% lower than an ideal implementation of the same network with the same bit precision. Furthermore, the proposed synapse offers improvements of up to 26% in area, 44.8% in leakage power, 16.7% in LSB update pulse duration, and two orders of magnitude in endurance cycles, when compared to state-of-the-art hybrid synaptic circuits. Our proposed synapse can be extended to an 8-bit design, enabling a VGG-like network to achieve 88.8% accuracy on CIFAR-10 (only 0.8% lower than an ideal implementation of the same network).
There is widespread interest in emerging technologies, especially resistive crossbars for accelerating Deep Neural Networks (DNNs). Resistive crossbars offer a highly-parallel and efficient matrix-vector-multiplication (MVM) operation. MVM being the most dominant operation in DNNs makes crossbars ideally suited. However, various sources of device and circuit non-idealities lead to errors in the MVM output, thereby reducing DNN accuracy. Towards that end, we propose crossbar re-mapping strategies to mitigate line-resistance induced accuracy degradation in DNNs, without having to re-train the learned weights, unlike most prior works. Line-resistances degrade the voltage levels along the crossbar columns, thereby inducing more errors at the columns away from the drivers. We rank the DNN weights and kernels based on a sensitivity analysis, and re-arrange the columns such that the most sensitive kernels are mapped closer to the drivers, thereby minimizing the impact of errors on the overall accuracy. We propose two algorithms $-$ static remapping strategy (SRS) and dynamic remapping strategy (DRS), to optimize the crossbar re-arrangement of a pre-trained DNN. We demonstrate the benefits of our approach on a standard VGG16 network trained using CIFAR10 dataset. Our results show that SRS and DRS limit the accuracy degradation to 2.9% and 2.1%, respectively, compared to a 5.6% drop from an as it is mapping of weights and kernels to crossbars. We believe this work brings an additional aspect for optimization, which can be used in tandem with existing mitigation techniques, such as in-situ compensation, technology aware training and re-training approaches, to enhance system performance.
Deep neural networks with applications from computer vision and image processing to medical diagnosis are commonly implemented using clock-based processors, where computation speed is limited by the clock frequency and the memory access time. Advances in photonic integrated circuits have enabled research in photonic computation, where, despite excellent features such as fast linear computation, no integrated photonic deep network has been demonstrated to date due to the lack of scalable nonlinear functionality and the loss of photonic devices, making scalability to a large number of layers challenging. Here we report the first integrated end-to-end photonic deep neural network (PDNN) that performs instantaneous image classification through direct processing of optical waves. Images are formed on the input pixels and optical waves are coupled into nanophotonic waveguides and processed as the light propagates through layers of neurons on-chip. Each neuron generates an optical output from input optical signals, where linear computation is performed optically and the nonlinear activation function is realised opto-electronically. The output of a laser coupled into the chip is uniformly distributed among all neurons within the network providing the same per-neuron supply light. Thus, all neurons have the same optical output range enabling scalability to deep networks with large number of layers. The PDNN chip is used for 2- and 4-class classification of handwritten letters achieving accuracies of higher than 93.7% and 90.3%, respectively, with a computation time less than one clock cycle of state-of-the-art digital computation platforms. Direct clock-less processing of optical data eliminates photo-detection, A/D conversion, and the requirement for a large memory module, enabling significantly faster and more energy-efficient neural networks for the next generations of deep learning systems.