بحث متقدم مدعوم من الذكاء الصنعي

مساحة جديدة

اشترك بالحزمة الذهبية واحصل على وصول غير محدود شمرا أكاديميا

تسجيل مستخدم جديد

An Electro-Photonic System for Accelerating Deep Neural Networks

119 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Cansu Demirkiran

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Cansu Demirkiran - Furkan Eris - Gongyu Wang

هندسة العتاد التقنيات الناشئة

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The number of parameters in deep neural networks (DNNs) is scaling at about 5$times$ the rate of Moores Law. To sustain the pace of growth of the DNNs, new technologies and computing architectures are needed. Photonic computing systems are promising avenues, since they can perform the dominant general matrix-matrix multiplication (GEMM) operations in DNNs at a higher throughput than their electrical counterpart. However, purely photonic systems face several challenges including a lack of photonic memory, the need for conversion circuits, and the accumulation of noise. In this paper, we propose a hybrid electro-photonic system realizing the best of both worlds to accelerate DNNs. In contrast to prior work in photonic and electronic accelerators, we adopt a system-level perspective. Our electro-photonic system includes an electronic host processor and DRAM, and a custom electro-photonic hardware accelerator called ADEPT. The fused hardware accelerator leverages a photonic computing unit for performing highly-efficient GEMM operations and a digital electronic ASIC for storage and for performing non-GEMM operations. We also identify architectural optimization opportunities for improving the overall ADEPTs efficiency. We evaluate ADEPT using three state-of-the-art neural networks-ResNet-50, BERT-large, and RNN-T-to show its general applicability in accelerating todays DNNs. A head-to-head comparison of ADEPT with systolic array architectures shows that ADEPT can provide, on average, 7.19$times$ higher inference throughput per watt.

قيم البحث

215 - Aqeeb Iqbal Arka , Biresh Kumar Joardar , Janardhan Rao Doppa 2021

Graph Neural Network (GNN) is a variant of Deep Neural Networks (DNNs) operating on graphs. However, GNNs are more complex compared to traditional DNNs as they simultaneously exhibit features of both DNN and graph applications. As a result, architect ures specifically optimized for either DNNs or graph applications are not suited for GNN training. In this work, we propose a 3D heterogeneous manycore architecture for on-chip GNN training to address this problem. The proposed architecture, ReGraphX, involves heterogeneous ReRAM crossbars to fulfill the disparate requirements of both DNN and graph computations simultaneously. The ReRAM-based architecture is complemented with a multicast-enabled 3D NoC to improve the overall achievable performance. We demonstrate that ReGraphX outperforms conventional GPUs by up to 3.5X (on an average 3X) in terms of execution time, while reducing energy consumption by as much as 11X.

هندسة العتاد التقنيات الناشئة

Helix: Algorithm/Architecture Co-design for Accelerating Nanopore Genome Base-calling

85 - Qian Lou , Sarath Janga , Lei Jiang 2020

Nanopore genome sequencing is the key to enabling personalized medicine, global food security, and virus surveillance. The state-of-the-art base-callers adopt deep neural networks (DNNs) to translate electrical signals generated by nanopore sequencer s to digital DNA symbols. A DNN-based base-caller consumes $44.5%$ of total execution time of a nanopore sequencing pipeline. However, it is difficult to quantize a base-caller and build a power-efficient processing-in-memory (PIM) to run the quantized base-caller. In this paper, we propose a novel algorithm/architecture co-designed PIM, Helix, to power-efficiently and accurately accelerate nanopore base-calling. From algorithm perspective, we present systematic error aware training to minimize the number of systematic errors in a quantized base-caller. From architecture perspective, we propose a low-power SOT-MRAM-based ADC array to process analog-to-digital conversion operations and improve power efficiency of prior DNN PIMs. Moreover, we revised a traditional NVM-based dot-product engine to accelerate CTC decoding operations, and create a SOT-MRAM binary comparator array to process read voting. Compared to state-of-the-art PIMs, Helix improves base-calling throughput by $6times$, throughput per Watt by $11.9times$ and per $mm^2$ by $7.5times$ without degrading base-calling accuracy.

هندسة العتاد التقنيات الناشئة التعلم الآلي

Accelerating Sparse Deep Neural Networks

114 - Asit Mishra , Jorge Albericio Latorre , Jeff Pool 2021

As neural network model sizes have dramatically increased, so has the interest in various techniques to reduce their parameter counts and accelerate their execution. An active area of research in this field is sparsity - encouraging zero values in pa rameters that can then be discarded from storage or computations. While most research focuses on high levels of sparsity, there are challenges in universally maintaining model accuracy as well as achieving significant speedups over modern matrix-math hardware. To make sparsity adoption practical, the NVIDIA Ampere GPU architecture introduces sparsity support in its matrix-math units, Tensor Cores. We present the design and behavior of Sparse Tensor Cores, which exploit a 2:4 (50%) sparsity pattern that leads to twice the math throughput of dense matrix units. We also describe a simple workflow for training networks that both satisfy 2:4 sparsity pattern requirements and maintain accuracy, verifying it on a wide range of common tasks and model architectures. This workflow makes it easy to prepare accurate models for efficient deployment on Sparse Tensor Cores.

التعلم الآلي الذكاء الاصطناعي هندسة العتاد

Quantifying power use in silicon photonic neural networks

77 - Alexander N. Tait 2021

Due to challenging efficiency limits facing conventional and unconventional electronic architectures, information processors based on photonics have attracted renewed interest. Research communities have yet to settle on definitive techniques to descr ibe the performance of this class of information processors. Photonic systems are different from electronic ones, so the existing concepts of computer performance measurement cannot necessarily apply. In this manuscript, we attempt to quantify the power use of photonic neural networks with state-of-the-art and future hardware. We derive scaling laws, physical limits, and new platform performance metrics. We find that overall performance is regime-like, which means that energy efficiency characteristics of a photonic processor can be completely described by no less than seven performance numbers. The introduction of these analytical strategies provides a much needed foundation for quantitative roadmapping and commercial value assignment for silicon photonic neural networks.

بصريات التقنيات الناشئة

Layer Pruning for Accelerating Very Deep Neural Networks

190 - Weiwei Zhang , Changsheng chen , Xuechun Wu 2019

In this paper, we propose an adaptive pruning method. This method can cut off the channel and layer adaptively. The proportion of the layer and the channel to be cut is learned adaptively. The pruning method proposed in this paper can reduce half of the parameters, and the accuracy will not decrease or even be higher than baseline.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط

سجل دخول لتتمكن من نشر تعليقات

التعليقات

جاري جلب التعليقات

سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها

المعهد الوطني لإدارة الأعمال

تفاصيل إضافية المزيد من الجامعات

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

An Electro-Photonic System for Accelerating Deep Neural Networks

اسأل ChatGPT حول البحث

ﻻ يوجد ملخص باللغة العربية

اقرأ أيضاً