Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Machine Learning Accelerator In-Memory for Energy Harvesting

123 0 0.0 ( 0 )

Download Cite

Added by Salonik Resch

Publication date 2019

fields Informatics Engineering

and research's language is English

Authors Salonik Resch - S. Karen Khatamifard - Zamshed Iqbal Chowdhury

Emerging Technologies Hardware Architecture Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

There is increasing demand to bring machine learning capabilities to low power devices. By integrating the computational power of machine learning with the deployment capabilities of low power devices, a number of new applications become possible. In some applications, such devices will not even have a battery, and must rely solely on energy harvesting techniques. This puts extreme constraints on the hardware, which must be energy efficient and capable of tolerating interruptions due to power outages. Here, as a representative example, we propose an in-memory support vector machine learning accelerator utilizing non-volatile spintronic memory. The combination of processing-in-memory and non-volatility provides a key advantage in that progress is effectively saved after every operation. This enables instant shut down and restart capabilities with minimal overhead. Additionally, the operations are highly energy efficient leading to low power consumption.

rate research

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

342 - Aayush Ankit , Izzat El Hajj , Sai Rahul Chalamalasetti 2019

Memristor crossbars are circuits capable of performing analog matrix-vector multiplications, overcoming the fundamental energy efficiency limitations of digital logic. They have been shown to be effective in special-purpose accelerators for a limited set of neural network applications. We present the Programmable Ultra-efficient Memristor-based Accelerator (PUMA) which enhances memristor crossbars with general purpose execution units to enable the acceleration of a wide variety of Machine Learning (ML) inference workloads. PUMAs microarchitecture techniques exposed through a specialized Instruction Set Architecture (ISA) retain the efficiency of in-memory computing and analog circuitry, without compromising programmability. We also present the PUMA compiler which translates high-level code to PUMA ISA. The compiler partitions the computational graph and optimizes instruction scheduling and register allocation to generate code for large and complex workloads to run on thousands of spatial cores. We have developed a detailed architecture simulator that incorporates the functionality, timing, and power models of PUMAs components to evaluate performance and energy consumption. A PUMA accelerator running at 1 GHz can reach area and power efficiency of $577~GOPS/s/mm^2$ and $837~GOPS/s/W$, respectively. Our evaluation of diverse ML applications from image recognition, machine translation, and language modelling (5M-800M synapses) shows that PUMA achieves up to $2,446times$ energy and $66times$ latency improvement for inference compared to state-of-the-art GPUs. Compared to an application-specific memristor-based accelerator, PUMA incurs small energy overheads at similar inference latency and added programmability.

Emerging Technologies Hardware Architecture

Memory-Aware Partitioning of Machine Learning Applications for Optimal Energy Use in Batteryless Systems

125 - Andres Gomez , Andreas Tretter , Pascal Alexander Hager 2021

Sensing systems powered by energy harvesting have traditionally been designed to tolerate long periods without energy. As the Internet of Things (IoT) evolves towards a more transient and opportunistic execution paradigm, reducing energy storage costs will be key for its economic and ecologic viability. However, decreasing energy storage in harvesting systems introduces reliability issues. Transducers only produce intermittent energy at low voltage and current levels, making guaranteed task completion a challenge. Existing ad hoc methods overcome this by buffering enough energy either for single tasks, incurring large data-retention overheads, or for one full application cycle, requiring a large energy buffer. We present Julienning: an automated method for optimizing the total energy cost of batteryless applications. Using a custom specification model, developers can describe transient applications as a set of atomically executed kernels with explicit data dependencies. Our optimization flow can partition data- and energy-intensive applications into multiple execution cycles with bounded energy consumption. By leveraging interkernel data dependencies, these energy-bounded execution cycles minimize the number of system activations and nonvolatile data transfers, and thus the total energy overhead. We validate our methodology with two batteryless cameras running energy-intensive machine learning applications. Results demonstrate that compared to ad hoc solutions, our method can reduce the required energy storage by over 94% while only incurring a 0.12% energy overhead.

Distributed Parallel and Cluster Computing Hardware Architecture Machine Learning

Energy-Harvesting Distributed Machine Learning

72 - Basak Guler , Aylin Yener 2021

This paper provides a first study of utilizing energy harvesting for sustainable machine learning in distributed networks. We consider a distributed learning setup in which a machine learning model is trained over a large number of devices that can harvest energy from the ambient environment, and develop a practical learning framework with theoretical convergence guarantees. We demonstrate through numerical experiments that the proposed framework can significantly outperform energy-agnostic benchmarks. Our framework is scalable, requires only local estimation of the energy statistics, and can be applied to a wide range of distributed training settings, including machine learning in wireless networks, edge computing, and mobile internet of things.

Machine Learning Information Theory Information Theory

SIMBA: A Skyrmionic In-Memory Binary Neural Network Accelerator

86 - Venkata Pavan Kumar Miriyala , Kale Rahul Vishwanath , Xuanyao Fong 2020

Magnetic skyrmions are emerging as potential candidates for next generation non-volatile memories. In this paper, we propose an in-memory binary neural network (BNN) accelerator based on the non-volatile skyrmionic memory, which we call as SIMBA. SIMBA consumes 26.7 mJ of energy and 2.7 ms of latency when running an inference on a VGG-like BNN. Furthermore, we demonstrate improvements in the performance of SIMBA by optimizing material parameters such as saturation magnetization, anisotropic energy and damping ratio. Finally, we show that the inference accuracy of BNNs is robust against the possible stochastic behavior of SIMBA (88.5% +/- 1%).

Emerging Technologies Disordered Systems and Neural Networks

ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0

56 - Asif Ali Khan , Fazal Hameed , Robin Blaesing 2019

Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-based system are still highly influenced by the number of shift operations. These operations are required to move bits to the right positions in the racetracks. This paper presents data placement techniques for RMs that maximize the likelihood that consecutive references access nearby memory locations at runtime thereby minimizing the number of shifts. We present an integer linear programming (ILP) formulation for optimal data placement in RMs, and revisit existing offset assignment heuristics, originally proposed for random-access memories. We introduce a novel heuristic tailored to a realistic RM and combine it with a genetic search to further improve the solution. We show a reduction in the number of shifts of up to 52.5%, outperforming the state of the art by up to 16.1%.

Emerging Technologies Hardware Architecture

comments

Fetching comments

University of Aleppo

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Machine Learning Accelerator In-Memory for Energy Harvesting

Ask ChatGPT about the research

No Arabic abstract

Read More