Application of the Computer Capacity to the Analysis of Processors Evolution

202 0 0.0 ( 0 )

Download Cite

Added by Boris Ryabko

Publication date 2017

fields Informatics Engineering

and research's language is English

Authors Boris Ryabko - Anton Rakitskiy

Performance

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The notion of computer capacity was proposed in 2012, and this quantity has been estimated for computers of different kinds. In this paper we show that, when designing new processors, the manufacturers change the parameters that affect the computer capacity. This allows us to predict the values of parameters of future processors. As the main example we use Intel processors, due to the accessibility of detailed description of all their technical characteristics.

rate research

Adaptive Mesh Approach for Predicting Algorithm Behavior with Application to Visibility Culling in Computer Graphics

371 - Matthias Fischer , Claudius Jahn , Martin Ziegler 2009

We propose a concise approximate description, and a method for efficiently obtaining this description, via adaptive random sampling of the performance (running time, memory consumption, or any other profileable numerical quantity) of a given algorithm on some low-dimensional rectangular grid of inputs. The formal correctness is proven under reasonable assumptions on the algorithm under consideration; and the approachs practical benefit is demonstrated by predicting for which observer positions and viewing directions an occlusion culling algorithm yields a net performance benefit or loss compared to a simple brute force renderer.

Performance Graphics

BENCHIP: Benchmarking Intelligence Processors

422 - Jinhua Tao , Zidong Du , Qi Guo 2017

The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BENCHIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BENCHIP consists of two sets of benchmarks: microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect the various characteristics of the evaluated intelligence processors. BENCHIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BENCHIP will be open-sourced soon.

Performance Artificial Intelligence

Performance of Devito on HPC-Optimised ARM Processors

116 - Hermes Senger , Jaime F. de Souza , Edson S. Gomi 2019

We evaluate the performance of Devito, a domain specific language (DSL) for finite differences on Arm ThunderX2 processors. Experiments with two common seismic computational kernels demonstrate that Arm processors can deliver competitive performance compared to other Intel Xeon processors.

Performance

Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors

120 - Christie L. Alappat , Johannes Hofmann , Georg Hager 2020

Hardware platforms in high performance computing are constantly getting more complex to handle even when considering multicore CPUs alone. Numerous features and configuration options in the hardware and the software environment that are relevant for performance are not even known to most application users or developers. Microbenchmarks, i.e., simple codes that fathom a particular aspect of the hardware, can help to shed light on such issues, but only if they are well understood and if the results can be reconciled with known facts or performance models. The insight gained from microbenchmarks may then be applied to real applications for performance analysis or optimization. In this paper we investigate two modern Intel x86 server CPU architectures in depth: Broadwell EP and Cascade Lake SP. We highlight relevant hardware configuration settings that can have a decisive impact on code performance and show how to properly measure on-chip and off-chip data transfer bandwidths. The new victim L3 cache of Cascade Lake and its advanced replacement policy receive due attention. Finally we use DGEMM, sparse matrix-vector multiplication, and the HPCG benchmark to make a connection to relevant application scenarios.

Performance Distributed Parallel and Cluster Computing

High Performance and Portable Convolution Operators for ARM-based Multicore Processors

73 - Pablo San Juan , Adrian Castello , Manuel F. Dolz 2020

The considerable impact of Convolutional Neural Networks on many Artificial Intelligence tasks has led to the development of various high performance algorithms for the convolution operator present in this type of networks. One of these approaches leverages the imcol transform followed by a general matrix multiplication (GEMM) in order to take advantage of the highly optimized realizations of the GEMM kernel in many linear algebra libraries. The main problems of this approach are 1) the large memory workspace required to host the intermediate matrices generated by the IM2COL transform; and 2) the time to perform the IM2COL transform, which is not negligible for complex neural networks. This paper presents a portable high performance convolution algorithm based on the BLIS realization of the GEMM kernel that avoids the use of the intermediate memory by taking advantage of the BLIS structure. In addition, the proposed algorithm eliminates the cost of the explicit IM2COL transform, while maintaining the portability and performance of the underlying realization of GEMM in BLIS.

Performance