أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Manuel Le Gallo

Energy Efficient In-memory Hyperdimensional Encoding for Spatio-temporal Signal Processing

152 - Geethan Karunaratne , Manuel Le Gallo , Michael Hersche 2021

The emerging brain-inspired computing paradigm known as hyperdimensional computing (HDC) has been proven to provide a lightweight learning framework for various cognitive tasks compared to the widely used deep learning-based approaches. Spatio-tempor al (ST) signal processing, which encompasses biosignals such as electromyography (EMG) and electroencephalography (EEG), is one family of applications that could benefit from an HDC-based learning framework. At the core of HDC lie manipulations and comparisons of large bit patterns, which are inherently ill-suited to conventional computing platforms based on the von-Neumann architecture. In this work, we propose an architecture for ST signal processing within the HDC framework using predominantly in-memory compute arrays. In particular, we introduce a methodology for the in-memory hyperdimensional encoding of ST data to be used together with an in-memory associative search module. We show that the in-memory HDC encoder for ST signals offers at least 1.80x energy efficiency gains, 3.36x area gains, as well as 9.74x throughput gains compared with a dedicated digital hardware implementation. At the same time it achieves a peak classification accuracy within 0.04% of that of the baseline HDC framework.

التقنيات الناشئة

Measurement of onset of structural relaxation in melt-quenched phase change materials

119 - Benedikt Kersting , Syed Ghazi Sarwat , Manuel Le Gallo 2021

Chalcogenide phase change materials enable non-volatile, low-latency storage-class memory. They are also being explored for new forms of computing such as neuromorphic and in-memory computing. A key challenge, however, is the temporal drift in the el ectrical resistance of the amorphous states that encode data. Drift, caused by the spontaneous structural relaxation of the newly recreated melt-quenched amorphous phase, has consistently been observed to have a logarithmic dependence in time. Here, we show that this observation is valid only in a certain observable timescale. Using threshold-switching voltage as the measured variable, based on temperature-dependent and short timescale electrical characterization, we experimentally measure the onset of drift. This additional feature of the structural relaxation dynamics serves as a new benchmark to appraise the different classical models to explain drift.

علم المواد التقنيات الناشئة

Robust High-dimensional Memory-augmented Neural Networks

177 - Geethan Karunaratne , Manuel Schmuck , Manuel Le Gallo 2020

Traditional neural networks require enormous amounts of data to build their complex mappings during a slow training procedure that hinders their abilities for relearning and adapting to new data. Memory-augmented neural networks enhance neural networ ks with an explicit memory to overcome these issues. Access to this explicit memory, however, occurs via soft read and write operations involving every individual memory entry, resulting in a bottleneck when implemented using the conventional von Neumann computer architecture. To overcome this bottleneck, we propose a robust architecture that employs a computational memory unit as the explicit memory performing analog in-memory computation on high-dimensional (HD) vectors, while closely matching 32-bit software-equivalent accuracy. This is achieved by a content-based attention mechanism that represents unrelated items in the computational memory with uncorrelated HD vectors, whose real-valued components can be readily approximated by binary, or bipolar components. Experimental results demonstrate the efficacy of our approach on few-shot image classification tasks on the Omniglot dataset using more than 256,000 phase-change memory devices. Our approach effectively merges the richness of deep neural network representations with HD computing that paves the way for robust vector-symbolic manipulations applicable in reasoning, fusion, and compression.

التقنيات الناشئة التعلم الآلي الحوسبة العصبية والتطورية

Accurate Emulation of Memristive Crossbar Arrays for In-Memory Computing

118 - Anastasios Petropoulos , Irem Boybat , Manuel Le Gallo 2020

In-memory computing is an emerging non-von Neumann computing paradigm where certain computational tasks are performed in memory by exploiting the physical attributes of the memory devices. Memristive devices such as phase-change memory (PCM), where i nformation is stored in terms of their conductance levels, are especially well suited for in-memory computing. In particular, memristive devices, when organized in a crossbar configuration can be used to perform matrix-vector multiply operations by exploiting Kirchhoffs circuit laws. To explore the feasibility of such in-memory computing cores in applications such as deep learning as well as for system-level architectural exploration, it is highly desirable to develop an accurate hardware emulator that captures the key physical attributes of the memristive devices. Here, we present one such emulator for PCM and experimentally validate it using measurements from a PCM prototype chip. Moreover, we present an application of the emulator for neural network inference where our emulator can capture the conductance evolution of approximately 400,000 PCM devices remarkably well.

التقنيات الناشئة

ESSOP: Efficient and Scalable Stochastic Outer Product Architecture for Deep Learning

294 - Vinay Joshi , Geethan Karunaratne , Manuel Le Gallo 2020

Deep neural networks (DNNs) have surpassed human-level accuracy in a variety of cognitive tasks but at the cost of significant memory/time requirements in DNN training. This limits their deployment in energy and memory limited applications that requi re real-time learning. Matrix-vector multiplications (MVM) and vector-vector outer product (VVOP) are the two most expensive operations associated with the training of DNNs. Strategies to improve the efficiency of MVM computation in hardware have been demonstrated with minimal impact on training accuracy. However, the VVOP computation remains a relatively less explored bottleneck even with the aforementioned strategies. Stochastic computing (SC) has been proposed to improve the efficiency of VVOP computation but on relatively shallow networks with bounded activation functions and floating-point (FP) scaling of activation gradients. In this paper, we propose ESSOP, an efficient and scalable stochastic outer product architecture based on the SC paradigm. We introduce efficient techniques to generalize SC for weight update computation in DNNs with the unbounded activation functions (e.g., ReLU), required by many state-of-the-art networks. Our architecture reduces the computational cost by re-using random numbers and replacing certain FP multiplication operations by bit shift scaling. We show that the ResNet-32 network with 33 convolution layers and a fully-connected layer can be trained with ESSOP on the CIFAR-10 dataset to achieve baseline comparable accuracy. Hardware design of ESSOP at 14nm technology node shows that, compared to a highly pipelined FP16 multiplier design, ESSOP is 82.2% and 93.7% better in energy and area efficiency respectively for outer product computation.

التعلم الآلي الحوسبة العصبية والتطورية

Mixed-precision deep learning based on computational memory

68 - S. R. Nandakumar , Manuel Le Gallo , Christophe Piveteau 2020

Deep neural networks (DNNs) have revolutionized the field of artificial intelligence and have achieved unprecedented success in cognitive tasks such as image and speech recognition. Training of large DNNs, however, is computationally intensive and th is has motivated the search for novel computing architectures targeting this application. A computational memory unit with nanoscale resistive memory devices organized in crossbar arrays could store the synaptic weights in their conductance states and perform the expensive weighted summations in place in a non-von Neumann manner. However, updating the conductance states in a reliable manner during the weight update process is a fundamental challenge that limits the training accuracy of such an implementation. Here, we propose a mixed-precision architecture that combines a computational memory unit performing the weighted summations and imprecise conductance updates with a digital processing unit that accumulates the weight updates in high precision. A combined hardware/software training experiment of a multilayer perceptron based on the proposed architecture using a phase-change memory (PCM) array achieves 97.73% test accuracy on the task of classifying handwritten digits (based on the MNIST dataset), within 0.6% of the software baseline. The architecture is further evaluated using accurate behavioral models of PCM on a wide class of networks, namely convolutional neural networks, long-short-term-memory networks, and generative-adversarial networks. Accuracies comparable to those of floating-point implementations are achieved without being constrained by the non-idealities associated with the PCM devices. A system-level study demonstrates 173x improvement in energy efficiency of the architecture when used for training a multilayer perceptron compared with a dedicated fully digital 32-bit implementation.

التقنيات الناشئة

Accurate deep neural network inference using computational phase-change memory

82 - Vinay Joshi , Manuel Le Gallo , Simon Haefeli 2019

In-memory computing is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. Crossbar arrays of resistive memory devices can be used to encode the network weights and perform efficient analog matrix-vector multiplications without intermediate movements of data. However, due to device variability and noise, the network needs to be trained in a specific way so that transferring the digitally trained weights to the analog resistive memory devices will not result in significant loss of accuracy. Here, we introduce a methodology to train ResNet-type convolutional neural networks that results in no appreciable accuracy loss when transferring weights to in-memory computing hardware based on phase-change memory (PCM). We also propose a compensation technique that exploits the batch normalization parameters to improve the accuracy retention over time. We achieve a classification accuracy of 93.7% on the CIFAR-10 dataset and a top-1 accuracy on the ImageNet benchmark of 71.6% after mapping the trained weights to PCM. Our hardware results on CIFAR-10 with ResNet-32 demonstrate an accuracy above 93.5% retained over a one day period, where each of the 361,722 synaptic weights of the network is programmed on just two PCM devices organized in a differential configuration.

التقنيات الناشئة

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد