Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Implementation of the conjugate gradient algorithm on FPGA devices

68 0 0.0 ( 0 )

Download Cite

Added by Piotr Korcyl

Publication date 2018

fields Physics

and research's language is English

Authors Piotr Korcyl - Grzegorz Korcyl

High Energy Physics - Lattice Computational Physics

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Results of porting parts of the Lattice Quantum Chromodynamics code to modern FPGA devices are presented. A single-node, double precision implementation of the Conjugate Gradient algorithm is used to invert numerically the Dirac-Wilson operator on a 4-dimensional grid on a Xilinx Zynq evaluation board. The code is divided into two software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in programmable logic, and the rest of the algorithm runs on the ARM cores. Optimized data blocks are used to efficiently use data movement infrastructure allowing to reach intervals of 1 clock cycle. We show that the FPGA implementation can offer a comparable performance compared to that obtained using Intel Xeon Phi KNL.

rate research

MILC staggered conjugate gradient performance on Intel KNL

75 - Carleton DeTar , Douglas Doerfler , Steven Gottlieb 2016

We review our work done to optimize the staggered conjugate gradient (CG) algorithm in the MILC code for use with the Intel Knights Landing (KNL) architecture. KNL is the second gener- ation Intel Xeon Phi processor. It is capable of massive thread parallelism, data parallelism, and high on-board memory bandwidth and is being adopted in supercomputing centers for scientific research. The CG solver consumes the majority of time in production running, so we have spent most of our effort on it. We compare performance of an MPI+OpenMP baseline version of the MILC code with a version incorporating the QPhiX staggered CG solver, for both one-node and multi-node runs.

High Energy Physics - Lattice Computational Physics

Practical Implementation of Lattice QCD Simulation on SIMD Machines with Intel AVX-512

58 - Issaku Kanamori , Hideo Matsufuru 2018

We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel AVX-512 architecture. The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of AVX-512, such as using intrinsics and manual prefetching, for the matrix multiplication. Based on experience on the Oakforest-PACS system, a large scale cluster composed of Intel Xeon Phi Knights Landing, we discuss the performance tuning exploiting AVX-512 and code design on the SIMD architecture and massively parallel machines. We observe that the same code runs efficiently on an Intel Xeon Skylake-SP machine.

High Energy Physics - Lattice Computational Physics

A conjugate gradient algorithm for the astrometric core solution of Gaia

415 - Alex Bombrun 2011

The ESA space astrometry mission Gaia, planned to be launched in 2013, has been designed to make angular measurements on a global scale with micro-arcsecond accuracy. A key component of the data processing for Gaia is the astrometric core solution, which must implement an efficient and accurate numerical algorithm to solve the resulting, extremely large least-squares problem. The Astrometric Global Iterative Solution (AGIS) is a framework that allows to implement a range of different iterative solution schemes suitable for a scanning astrometric satellite. In order to find a computationally efficient and numerically accurate iteration scheme for the astrometric solution, compatible with the AGIS framework, we study an adaptation of the classical conjugate gradient (CG) algorithm, and compare it to the so-called simple iteration (SI) scheme that was previously known to converge for this problem, although very slowly. The different schemes are implemented within a software test bed for AGIS known as AGISLab, which allows to define, simulate and study scaled astrometric core solutions. After successful testing in AGISLab, the CG scheme has been implemented also in AGIS. The two algorithms CG and SI eventually converge to identical solutions, to within the numerical noise (of the order of 0.00001 micro-arcsec). These solutions are independent of the starting values (initial star catalogue), and we conclude that they are equivalent to a rigorous least-squares estimation of the astrometric parameters. The CG scheme converges up to a factor four faster than SI in the tested cases, and in particular spatially correlated truncation errors are much more efficiently damped out with the CG scheme.

Instrumentation and Methods for Astrophysics

On the Convergence Rate of Variants of the Conjugate Gradient Algorithm in Finite Precision Arithmetic

64 - Anne Greenbaum , Hexuan Liu , Tyler Chen 2019

We consider three mathematically equivalent variants of the conjugate gradient (CG) algorithm and how they perform in finite precision arithmetic. It was shown in [{em Behavior of slightly perturbed Lanczos and conjugate-gradient recurrences}, Lin.~Alg.~Appl., 113 (1989), pp.~7-63] that under certain conditions the convergence of a slightly perturbed CG computation is like that of exact CG for a matrix with many eigenvalues distributed throughout tiny intervals about the eigenvalues of the given matrix, the size of the intervals being determined by how closely these conditions are satisfied. We determine to what extent each of these variants satisfies the desired conditions, using a set of test problems and show that there is significant correlation between how well these conditions are satisfied and how well the finite precision computation converges before reaching its ultimately attainable accuracy. We show that for problems where the width of the intervals containing the eigenvalues of the associated exact CG matrix makes a significant difference in the behavior of exact CG, the different CG variants behave differently in finite precision arithmetic. For problems where the interval width makes little difference or where the convergence of exact CG is essentially governed by the upper bound based on the square root of the condition number of the matrix, the different CG variants converge similarly in finite precision arithmetic until the ultimate level of accuracy is achieved, although this ultimate level of accuracy may be different for the different variants. This points to the need for testing new CG variants on problems that are especially sensitive to rounding errors.

Numerical Analysis

Towards Lattice Quantum Chromodynamics on FPGA devices

75 - Grzegorz Korcyl , Piotr Korcyl 2018

In this paper we describe a single-node, double precision Field Programmable Gate Array (FPGA) implementation of the Conjugate Gradient algorithm in the context of Lattice Quantum Chromodynamics. As a benchmark of our proposal we invert numerically the Dirac-Wilson operator on a 4-dimensional grid on three Xilinx hardware solutions: Zynq Ultrascale+ evaluation board, the Alveo U250 accelerator and the largest device available on the market, the VU13P device. In our implementation we separate software/hardware parts in such a way that the entire multiplication by the Dirac operator is performed in hardware, and the rest of the algorithm runs on the host. We find out that the FPGA implementation can offer a performance comparable with that obtained using current CPU or Intels many core Xeon Phi accelerators. A possible multiple node FPGA-based system is discussed and we argue that power-efficient High Performance Computing (HPC) systems can be implemented using FPGA devices only.

Distributed Parallel and Cluster Computing

comments

Fetching comments

Mamoun Private University For Science and Technology

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Implementation of the conjugate gradient algorithm on FPGA devices

Ask ChatGPT about the research

No Arabic abstract

Read More