BENCHIP: Benchmarking Intelligence Processors

423 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jinhua Tao

تاريخ النشر 2017

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jinhua Tao - Zidong Du - Qi Guo

الأداء الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BENCHIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BENCHIP consists of two sets of benchmarks: microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect the various characteristics of the evaluated intelligence processors. BENCHIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BENCHIP will be open-sourced soon.

قيم البحث

102 - Jianfeng Zhan , Lei Wang , Wanling Gao 2019

This paper outlines BenchCouncils view on the challenges, rules, and vision of benchmarking modern workloads like Big Data, AI or machine learning, and Internet Services. We conclude the challenges of benchmarking modern workloads as FIDSS (Fragmente d, Isolated, Dynamic, Service-based, and Stochastic), and propose the PRDAERS benchmarking rules that the benchmarks should be specified in a paper-and-pencil manner, relevant, diverse, containing different levels of abstractions, specifying the evaluation metrics and methodology, repeatable, and scaleable. We believe proposing simple but elegant abstractions that help achieve both efficiency and general-purpose is the final target of benchmarking in future, which may be not pressing. In the light of this vision, we shortly discuss BenchCouncils related projects.

الأداء الذكاء الاصطناعي

Performance of Devito on HPC-Optimised ARM Processors

116 - Hermes Senger , Jaime F. de Souza , Edson S. Gomi 2019

We evaluate the performance of Devito, a domain specific language (DSL) for finite differences on Arm ThunderX2 processors. Experiments with two common seismic computational kernels demonstrate that Arm processors can deliver competitive performance compared to other Intel Xeon processors.

الأداء

Benchmarking of quantum processors with random circuits

108 - James R. Wootton 2018

Quantum processors with sizes in the 10-100 qubit range are now increasingly common. However, with increased size comes increased complexity for benchmarking. The effectiveness of a given device may vary greatly between different tasks, and will not always be easy to predict from single and two qubit gate fidelities. For this reason, it is important to assess processor quality for a range of important tasks. In this work we propose and implement tests based on random quantum circuits. These are used to evaluate multiple different superconducting qubit devices, with sizes from 5 to 19 qubits, from two hardware manufacturers: IBM Research and Rigetti. The data is analyzed to give a quantitive description of how the devices perform. We also describe how it can be used for a qualititive description accessible to the layperson, by being played as a game.

فيزياء الكم

Benchmarking TinyML Systems: Challenges and Direction

159 - Colby R. Banbury , Vijay Janapa Reddi , Max Lam 2020

Recent advancements in ultra-low-power machine learning (TinyML) hardware promises to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted benchmark for these systems. Benchmarkin g allows us to measure and thereby systematically compare, evaluate, and improve the performance of systems and is therefore fundamental to a field reaching maturity. In this position paper, we present the current landscape of TinyML and discuss the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads. Furthermore, we present our four benchmarks and discuss our selection methodology. Our viewpoints reflect the collective thoughts of the TinyMLPerf working group that is comprised of over 30 organizations.

الأداء التعلم الآلي

Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors

120 - Christie L. Alappat , Johannes Hofmann , Georg Hager 2020

Hardware platforms in high performance computing are constantly getting more complex to handle even when considering multicore CPUs alone. Numerous features and configuration options in the hardware and the software environment that are relevant for performance are not even known to most application users or developers. Microbenchmarks, i.e., simple codes that fathom a particular aspect of the hardware, can help to shed light on such issues, but only if they are well understood and if the results can be reconciled with known facts or performance models. The insight gained from microbenchmarks may then be applied to real applications for performance analysis or optimization. In this paper we investigate two modern Intel x86 server CPU architectures in depth: Broadwell EP and Cascade Lake SP. We highlight relevant hardware configuration settings that can have a decisive impact on code performance and show how to properly measure on-chip and off-chip data transfer bandwidths. The new victim L3 cache of Cascade Lake and its advanced replacement policy receive due attention. Finally we use DGEMM, sparse matrix-vector multiplication, and the HPCG benchmark to make a connection to relevant application scenarios.

الأداء النظم الموزعة والتوازية والحوسبة العنقودية