Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

343 0 0.0 ( 0 )

Download Cite

Added by Aayush Ankit

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Aayush Ankit - Izzat El Hajj - Sai Rahul Chalamalasetti

Emerging Technologies Hardware Architecture

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Memristor crossbars are circuits capable of performing analog matrix-vector multiplications, overcoming the fundamental energy efficiency limitations of digital logic. They have been shown to be effective in special-purpose accelerators for a limited set of neural network applications. We present the Programmable Ultra-efficient Memristor-based Accelerator (PUMA) which enhances memristor crossbars with general purpose execution units to enable the acceleration of a wide variety of Machine Learning (ML) inference workloads. PUMAs microarchitecture techniques exposed through a specialized Instruction Set Architecture (ISA) retain the efficiency of in-memory computing and analog circuitry, without compromising programmability. We also present the PUMA compiler which translates high-level code to PUMA ISA. The compiler partitions the computational graph and optimizes instruction scheduling and register allocation to generate code for large and complex workloads to run on thousands of spatial cores. We have developed a detailed architecture simulator that incorporates the functionality, timing, and power models of PUMAs components to evaluate performance and energy consumption. A PUMA accelerator running at 1 GHz can reach area and power efficiency of $577~GOPS/s/mm^2$ and $837~GOPS/s/W$, respectively. Our evaluation of diverse ML applications from image recognition, machine translation, and language modelling (5M-800M synapses) shows that PUMA achieves up to $2,446times$ energy and $66times$ latency improvement for inference compared to state-of-the-art GPUs. Compared to an application-specific memristor-based accelerator, PUMA incurs small energy overheads at similar inference latency and added programmability.

rate research

A Machine Learning Accelerator In-Memory for Energy Harvesting

122 - Salonik Resch , S. Karen Khatamifard , Zamshed Iqbal Chowdhury 2019

There is increasing demand to bring machine learning capabilities to low power devices. By integrating the computational power of machine learning with the deployment capabilities of low power devices, a number of new applications become possible. In some applications, such devices will not even have a battery, and must rely solely on energy harvesting techniques. This puts extreme constraints on the hardware, which must be energy efficient and capable of tolerating interruptions due to power outages. Here, as a representative example, we propose an in-memory support vector machine learning accelerator utilizing non-volatile spintronic memory. The combination of processing-in-memory and non-volatility provides a key advantage in that progress is effectively saved after every operation. This enables instant shut down and restart capabilities with minimal overhead. Additionally, the operations are highly energy efficient leading to low power consumption.

Emerging Technologies Hardware Architecture Distributed Parallel and Cluster Computing

An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM

119 - Geng Yuan , Xiaolong Ma , Caiwen Ding 2019

The high computation and memory storage of large deep neural networks (DNNs) models pose intensive challenges to the conventional Von-Neumann architecture, incurring substantial data movements in the memory hierarchy. The memristor crossbar array has emerged as a promising solution to mitigate the challenges and enable low-power acceleration of DNNs. Memristor-based weight pruning and weight quantization have been seperately investigated and proven effectiveness in reducing area and power consumption compared to the original DNN model. However, there has been no systematic investigation of memristor-based neuromorphic computing (NC) systems considering both weight pruning and weight quantization. In this paper, we propose an unified and systematic memristor-based framework considering both structured weight pruning and weight quantization by incorporating alternating direction method of multipliers (ADMM) into DNNs training. We consider hardware constraints such as crossbar blocks pruning, conductance range, and mismatch between weight value and real devices, to achieve high accuracy and low power and small area footprint. Our framework is mainly integrated by three steps, i.e., memristor-based ADMM regularized optimization, masked mapping and retraining. Experimental results show that our proposed framework achieves 29.81X (20.88X) weight compression ratio, with 98.38% (96.96%) and 98.29% (97.47%) power and area reduction on VGG-16 (ResNet-18) network where only have 0.5% (0.76%) accuracy loss, compared to the original DNN models. We share our models at link http://bit.ly/2Jp5LHJ.

Emerging Technologies Hardware Architecture Machine Learning

A Fast Method for Steady-State Memristor Crossbar Array Circuit Simulation

271 - Rui Xie , Mingyang Song , Junzhuo Zhou 2021

In this work we propose an effective preconditioning technique to accelerate the steady-state simulation of large-scale memristor crossbar arrays (MCAs). We exploit the structural regularity of MCAs to develop a specially-crafted preconditioner that can be efficiently evaluated utilizing tensor products and block matrix inversion. Numerical experiments demonstrate the efficacy of the proposed technique compared to mainstream preconditioners.

Emerging Technologies Hardware Architecture

SONIC: A Sparse Neural Network Inference Accelerator with Silicon Photonics for Energy-Efficient Deep Learning

165 - Febin Sunny , Mahdi Nikdast , Sudeep Pasricha 2021

Sparse neural networks can greatly facilitate the deployment of neural networks on resource-constrained platforms as they offer compact model sizes while retaining inference accuracy. Because of the sparsity in parameter matrices, sparse neural networks can, in principle, be exploited in accelerator architectures for improved energy-efficiency and latency. However, to realize these improvements in practice, there is a need to explore sparsity-aware hardware-software co-design. In this paper, we propose a novel silicon photonics-based sparse neural network inference accelerator called SONIC. Our experimental analysis shows that SONIC can achieve up to 5.8x better performance-per-watt and 8.4x lower energy-per-bit than state-of-the-art sparse electronic neural network accelerators; and up to 13.8x better performance-per-watt and 27.6x lower energy-per-bit than the best known photonic neural network accelerators.

Machine Learning Hardware Architecture Neural and Evolutionary Computing

Field-Programmable Crossbar Array (FPCA) for Reconfigurable Computing

103 - Mohammed A. Zidan , YeonJoo Jeong , Jong Hong Shin 2016

For decades, advances in electronics were directly driven by the scaling of CMOS transistors according to Moores law. However, both the CMOS scaling and the classical computer architecture are approaching fundamental and practical limits, and new computing architectures based on emerging devices, such as resistive random-access memory (RRAM) devices, are expected to sustain the exponential growth of computing capability. Here we propose a novel memory-centric, reconfigurable, general purpose computing platform that is capable of handling the explosive amount of data in a fast and energy-efficient manner. The proposed computing architecture is based on a uniform, physical, resistive, memory-centric fabric that can be optimally reconfigured and utilized to perform different computing and data storage tasks in a massively parallel approach. The system can be tailored to achieve maximal energy efficiency based on the data flow by dynamically allocating the basic computing fabric for storage, arithmetic, and analog computing including neuromorphic computing tasks.

Emerging Technologies Hardware Architecture Neural and Evolutionary Computing

comments

Fetching comments

Al Rasheed International University for Science & Technology

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

Ask ChatGPT about the research

No Arabic abstract

Read More