ﻻ يوجد ملخص باللغة العربية
The SIMT execution model is commonly used for general GPU development. CUDA and OpenCL developers write scalar code that is implicitly parallelized by compiler and hardware. On Intel GPUs, however, this abstraction has profound performance implications as the underlying ISA is SIMD and important hardware capabilities cannot be fully utilized. To close this performance gap we introduce C-For-Metal (CM), an explicit SIMD programming framework designed to deliver close-to-the-metal performance on Intel GPUs. The CM programming language and its vector/matrix types provide an intuitive interface to exploit the underlying hardware features, allowing fine-grained register management, SIMD size control and cross-lane data sharing. Experimental results show that CM applications from different domains outperform the best-known SIMT-based OpenCL implementations, achieving up to 2.7x speedup on the latest Intel GPU.
Convolution layers are prevalent in many classes of deep neural networks, including Convolutional Neural Networks (CNNs) which provide state-of-the-art results for tasks like image recognition, neural machine translation and speech recognition. The c
Stencil computations are widely used in HPC applications. Today, many HPC platforms use GPUs as accelerators. As a result, understanding how to perform stencil computations fast on GPUs is important. While implementation strategies for low-order sten
Many applications require to learn, mine, analyze and visualize large-scale graphs. These graphs are often too large to be addressed efficiently using conventional graph processing technologies. Many applications have requirements to analyze, transfo
The simulation of the two-dimensional Ising model is used as a benchmark to show the computational capabilities of Graphic Processing Units (GPUs). The rich programming environment now available on GPUs and flexible hardware capabilities allowed us t
High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather t