ترغب بنشر مسار تعليمي؟ اضغط هنا

RISC-NN: Use RISC, NOT CISC as Neural Network Hardware Infrastructure

145   0   0.0 ( 0 )
 نشر من قبل Mingzhe Zhang
 تاريخ النشر 2021
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Neural Networks (NN) have been proven to be powerful tools to analyze Big Data. However, traditional CPUs cannot achieve the desired performance and/or energy efficiency for NN applications. Therefore, numerous NN accelerators have been used or designed to meet these goals. These accelerators all fall into three categories: GPGPUs, ASIC NN Accelerators and CISC NN Accelerators. Though CISC NN Accelerators can achieve considerable smaller memory footprint than GPGPU thus improve energy efficiency; they still fail to provide same level of data reuse optimization achieved by ASIC NN Accelerators because of the inherited poor pragrammability of their CISC architecture. We argue that, for NN Accelerators, RISC is a better design choice than CISC, as is the case with general purpose processors. We propose RISC-NN, a novel many-core RISC-based NN accelerator that achieves high expressiveness and high parallelism and features strong programmability and low control-hardware costs. We show that, RISC-NN can implement all the necessary instructions of state-of-the-art CISC NN Accelerators; in the meantime, RISC-NN manages to achieve advanced optimization such as multiple-level data reuse and support for Sparse NN applications which previously only existed in ASIC NN Accelerators. Experiment results show that, RISC-NN achieves on average 11.88X performance efficiency compared with state-of-the-art Nvidia TITAN Xp GPGPU for various NN applications. RISC-NN also achieves on average 1.29X, 8.37X and 21.71X performance efficiency over CISC-based TPU in CNN, MLP and LSTM applications, respectively. Finally, RISC-NN can achieve additional 26.05% performance improvement and 33.13% energy reduction after applying pruning for Sparse NN applications.

قيم البحث

اقرأ أيضاً

In the last decade we have witnessed a rapid growth in data center systems, requiring new and highly complex networking devices. The need to refresh networking infrastructure whenever new protocols or functions are introduced, and the increasing cost s that this entails, are of a concern to all data center providers. New generations of Systems on Chip (SoC), integrating microprocessors and higher bandwidth interfaces, are an emerging solution to this problem. These devices permit entirely new systems and architectures that can obviate the replacement of existing networking devices while enabling seamless functionality change. In this work, we explore open source, RISC based, SoC architectures with high performance networking capabilities. The prototype architectures are implemented on the NetFPGA-SUME platform. Beyond details of the architecture, we also describe the hardware implementation and the porting of operating systems to the platform. The platform can be exploited for the development of practical networking appliances, and we provide use case examples.
95 - Xuan Guo , Robert Mullins 2020
It has always been difficult to balance the accuracy and performance of ISSs. RTL simulators or systems such as gem5 are used to execute programs in a cycle-accurate manner but are often prohibitively slow. In contrast, functional simulators such as QEMU can run large benchmarks to completion in a reasonable time yet capture few performance metrics and fail to model complex interactions between multiple cores. This paper presents a novel multi-purpose simulator that exploits binary translation to offer fast cycle-level full-system simulations. Its functional simulation mode outperforms QEMU and, if desired, it is possible to switch between functional and timing modes at run-time. Cycle-level simulations of RISC-V multi-core processors are possible at more than 20 MIPS, a useful middle ground in terms of accuracy and performance with simulation speeds nearly 100 times those of more detailed cycle-accurate models.
RISC-V is a relatively new, open instruction set architecture with a mature ecosystem and an official formal machine-readable specification. It is therefore a promising playground for formal-methods research. However, we observe that different form al-methods research projects are interested in different aspects of RISC-V and want to simplify, abstract, approximate, or ignore the other aspects. Often, they also require different encoding styles, resulting in each project starting a new formalization from-scratch. We set out to identify the commonalities between projects and to represent the RISC-V specification as a program with holes that can be instantiated differently by different projects. Our formalization of the RISC-V specification is written in Haskell and leverages existing tools rather than requiring new domain-specific tools, contrary to other approaches. To our knowledge, it is the first RISC-V specification able to serve as the interface between a processor-correctness proof and a compiler-correctness proof, while supporting several other projects with diverging requirements as well.
With the rapid development of scientific computation, more and more researchers and developers are committed to implementing various workloads/operations on different devices. Among all these devices, NVIDIA GPU is the most popular choice due to its comprehensive documentation and excellent development tools. As a result, there are abundant resources for hand-writing high-performance CUDA codes. However, CUDA is mainly supported by only commercial products and there has been no support for open-source H/W platforms. RISC-V is the most popular choice for hardware ISA, thanks to its elegant design and open-source license. In this project, we aim to utilize these existing CUDA codes with RISC-V devices. More specifically, we design and implement a pipeline that can execute CUDA source code on an RISC-V GPU architecture. We have succeeded in executing CUDA kernels with several important features, like multi-thread and atomic instructions, on an RISC-V GPU architecture.
The advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors has brought on new opportunities for applying both Deep and Spiking Neural Network (SNN) algorithms to healthcare and biomedical applications at the edge. This can fa cilitate the advancement of medical Internet of Things (IoT) systems and Point of Care (PoC) devices. In this paper, we provide a tutorial describing how various technologies including emerging memristive devices, Field Programmable Gate Arrays (FPGAs), and Complementary Metal Oxide Semiconductor (CMOS) can be used to develop efficient DL accelerators to solve a wide variety of diagnostic, pattern recognition, and signal processing problems in healthcare. Furthermore, we explore how spiking neuromorphic processors can complement their DL counterparts for processing biomedical signals. The tutorial is augmented with case studies of the vast literature on neural network and neuromorphic hardware as applied to the healthcare domain. We benchmark various hardware platforms by performing a sensor fusion signal processing task combining electromyography (EMG) signals with computer vision. Comparisons are made between dedicated neuromorphic processors and embedded AI accelerators in terms of inference latency and energy. Finally, we provide our analysis of the field and share a perspective on the advantages, disadvantages, challenges, and opportunities that various accelerators and neuromorphic processors introduce to healthcare and biomedical domains.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا