A Hardware-Efficient ADMM-Based SVM Training Algorithm for Edge Computing

206 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shuo-An Huang

تاريخ النشر 2019

مجال البحث هندسة إلكترونية الهندسة المعلوماتية

والبحث باللغة English

تأليف Shuo-An Huang - Chia-Hsiang Yang

معالجة الإشارات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This work demonstrates a hardware-efficient support vector machine (SVM) training algorithm via the alternative direction method of multipliers (ADMM) optimizer. Low-rank approximation is exploited to reduce the dimension of the kernel matrix by employing the Nystr{o}m method. Verified in four datasets, the proposed ADMM-based training algorithm with rank approximation reduces 32$times$ of matrix dimension with only 2% drop in inference accuracy. Compared to the conventional sequential minimal optimization (SMO) algorithm, the ADMM-based training algorithm is able to achieve a 9.8$times$10$^7$ shorter latency for training 2048 samples. Hardware design techniques, including pre-computation and memory sharing, are proposed to reduce the computational complexity by 62% and the memory usage by 60%. As a proof of concept, an epileptic seizure detector chip is designed to demonstrate the effectiveness of the proposed hardware-efficient training algorithm. The chip achieves a 153,310$times$ higher energy efficiency and a 364$times$ higher throughput-to-area ratio for SVM training than a high-end CPU. This work provides a promising solution for edge devices which require low-power and real-time training.

قيم البحث

59 - Silvija Kokalj-Filipovic , Paul Toliver , William Johnson 2021

Current radio frequency (RF) sensors at the Edge lack the computational resources to support practical, in-situ training for intelligent spectrum monitoring, and sensor data classification in general. We propose a solution via Deep Delay Loop Reservo ir Computing (DLR), a processing architecture that supports general machine learning algorithms on compact mobile devices by leveraging delay-loop reservoir computing in combination with innovative electrooptical hardware. With both digital and photonic realizations of our design of the loops, DLR delivers reductions in form factor, hardware complexity and latency, compared to the State-of-the-Art (SoA). The main impact of the reservoir is to project the input data into a higher dimensional space of reservoir state vectors in order to linearly separate the input classes. Once the classes are well separated, traditionally complex, power-hungry classification models are no longer needed for the learning process. Yet, even with simple classifiers based on Ridge regression (RR), the complexity grows at least quadratically with the input size. Hence, the hardware reduction required for training on compact devices is in contradiction with the large dimension of state vectors. DLR employs a RR-based classifier to exceed the SoA accuracy, while further reducing power consumption by leveraging the architecture of parallel (split) loops. We present DLR architectures composed of multiple smaller loops whose state vectors are linearly combined to create a lower dimensional input into Ridge regression. We demonstrate the advantages of using DLR for two distinct applications: RF Specific Emitter Identification (SEI) for IoT authentication, and wireless protocol recognition for IoT situational awareness.

معالجة الإشارات التعلم الآلي

Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware

345 - Peter Blouw , Gurshaant Malik , Benjamin Morcos 2020

Keyword spotting (KWS) provides a critical user interface for many mobile and edge applications, including phones, wearables, and cars. As KWS systems are typically always on, maximizing both accuracy and power efficiency are central to their utility . In this work we use hardware aware training (HAT) to build new KWS neural networks based on the Legendre Memory Unit (LMU) that achieve state-of-the-art (SotA) accuracy and low parameter counts. This allows the neural network to run efficiently on standard hardware (212$mu$W). We also characterize the power requirements of custom designed accelerator hardware that achieves SotA power efficiency of 8.79$mu$W, beating general purpose low power hardware (a microcontroller) by 24x and special purpose ASICs by 16x.

معالجة الصوت والكلام التعلم الآلي أنظمة الصوت في الحاسوب

Hardware Efficient Quantum Search Algorithm

324 - Ji Liu , Huiyang Zhou 2021

Quantum computing has noteworthy speedup over classical computing by taking advantage of quantum parallelism, i.e., the superposition of states. In particular, quantum search is widely used in various computationally hard problems. Grovers search alg orithm finds the target element in an unsorted database with quadratic speedup than classical search and has been proved to be optimal in terms of the number of queries to the database. The challenge, however, is that Grovers search algorithm leads to high numbers of quantum gates, which make it infeasible for the Noise-Intermediate-Scale-Quantum (NISQ) computers. In this paper, we propose a novel hardware efficient quantum search algorithm to overcome this challenge. Our key idea is to replace the global diffusion operation with low-cost local diffusions. Our analysis shows that our algorithm has similar oracle complexity to the original Grovers search algorithm while significantly reduces the circuit depth and gate count. The circuit cost reduction leads to a remarkable improvement in the system success rates, paving the way for quantum search on NISQ machines.

فيزياء الكم

A Supervised STDP-based Training Algorithm for Living Neural Networks

148 - Yuan Zeng , Kevin Devincentis , Yao Xiao 2017

Neural networks have shown great potential in many applications like speech recognition, drug discovery, image classification, and object detection. Neural network models are inspired by biological neural networks, but they are optimized to perform m achine learning tasks on digital computers. The proposed work explores the possibilities of using living neural networks in vitro as basic computational elements for machine learning applications. A new supervised STDP-based learning algorithm is proposed in this work, which considers neuron engineering constrains. A 74.7% accuracy is achieved on the MNIST benchmark for handwritten digit recognition.

الحوسبة العصبية والتطورية التعلم الآلي الأساليب الكمية

A Machine Learning Approach for Task and Resource Allocation in Mobile Edge Computing Based Networks

299 - Sihua Wang , Mingzhe Chen , Xuanlin Liu 2020

In this paper, a joint task, spectrum, and transmit power allocation problem is investigated for a wireless network in which the base stations (BSs) are equipped with mobile edge computing (MEC) servers to jointly provide computational and communicat ion services to users. Each user can request one computational task from three types of computational tasks. Since the data size of each computational task is different, as the requested computational task varies, the BSs must adjust their resource (subcarrier and transmit power) and task allocation schemes to effectively serve the users. This problem is formulated as an optimization problem whose goal is to minimize the maximal computational and transmission delay among all users. A multi-stack reinforcement learning (RL) algorithm is developed to solve this problem. Using the proposed algorithm, each BS can record the historical resource allocation schemes and users information in its multiple stacks to avoid learning the same resource allocation scheme and users states, thus improving the convergence speed and learning efficiency. Simulation results illustrate that the proposed algorithm can reduce the number of iterations needed for convergence and the maximal delay among all users by up to 18% and 11.1% compared to the standard Q-learning algorithm.

معالجة الإشارات نظرية المعلومات التعلم الآلي