Do you want to publish a course? Click here

Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typically limited to time-dominant algorithms and routines, leaving the remainder not accelerated which can open a serious Amdahls law issue. The lattice QCD application Chroma allows to explore a different porting strategy. The layered structure of the software architecture logically separates the data-parallel from the application layer. The QCD Data-Parallel software layer provides data types and expressions with stencil-like operations suitable for lattice field theory and Chroma implements algorithms in terms of this high-level interface. Thus by porting the low-level layer one can effectively move the whole application in one swing to a different platform. The QDP-JIT/PTX library, the reimplementation of the low-level layer, provides a framework for lattice QCD calculations for the CUDA architecture. The complete software interface is supported and thus applications can be run unaltered on GPU-based parallel computers. This reimplementation was possible due to the availability of a JIT compiler (part of the NVIDIA Linux kernel driver) which translates an assembly-like language (PTX) to GPU code. The expression template technique is used to build PTX code generators and a software cache manages the GPU memory. This reimplementation allows us to deploy an efficient implementation of the full gauge-generation program with dynamical fermions on large-scale GPU-based machines such as Titan and Blue Waters which accelerates the algorithm by more than an order of magnitude.
Numerical lattice gauge theory computations to generate gauge field configurations including the effects of dynamical fermions are usually carried out using algorithms that require the molecular dynamics evolution of gauge fields using symplectic integrators. Sophisticated integrators are in common use but are hard to optimise, and force-gradient integrators show promise especially for large lattice volumes. We explain why symplectic integrators lead to very efficient Monte Carlo algorithms because they exactly conserve a shadow Hamiltonian. The shadow Hamiltonian may be expanded in terms of Poisson brackets, and can be used to optimize the integrators. We show how this may be done for gauge theories by extending the formulation of Hamiltonian mechanics on Lie groups to include Poisson brackets and shadows, and by giving a general method for the practical computation of forces, force-gradients, and Poisson brackets for gauge theories.
56 - R. Babich , M. A. Clark , B. Joo 2011
Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-Monte Carlo analysis phase which accounts for a substantial fraction of the workload in a typical LQCD calculation, the initial Monte Carlo gauge field generation phase requires capability-level supercomputing, corresponding to O(100) GPUs or more. Such strong scaling has not been previously achieved. In this contribution, we demonstrate that using a multi-dimensional parallelization strategy and a domain-decomposed preconditioner allows us to scale into this regime. We present results for two popular discretizations of the Dirac operator, Wilson-clover and improved staggered, employing up to 256 GPUs on the Edge cluster at Lawrence Livermore National Laboratory.
We show how the integrators used for the molecular dynamics step of the Hybrid Monte Carlo algorithm can be further improved. These integrators not only approximately conserve some Hamiltonian $H$ but conserve exactly a nearby shadow Hamiltonian $tilde{H}$. This property allows for a new tuning method of the molecular dynamics integrator and also allows for a new class of integrators (force-gradient integrators) which is expected to reduce significantly the computational cost of future large-scale gauge field ensemble generation.
We show how to improve the molecular dynamics step of Hybrid Monte Carlo, both by tuning the integrator using Poisson brackets measurements and by the use of force gradient integrators. We present results for moderate lattice sizes.
Modern graphics hardware is designed for highly parallel numerical tasks and promises significant cost and performance benefits for many scientific applications. One such application is lattice quantum chromodyamics (lattice QCD), where the main computational challenge is to efficiently solve the discretized Dirac equation in the presence of an SU(3) gauge field. Using NVIDIAs CUDA platform we have implemented a Wilson-Dirac sparse matrix-vector product that performs at up to 40 Gflops, 135 Gflops and 212 Gflops for double, single and half precision respectively on NVIDIAs GeForce GTX 280 GPU. We have developed a new mixed precision approach for Krylov solvers using reliable updates which allows for full double precision accuracy while using only single or half precision arithmetic for the bulk of the computation. The resulting BiCGstab and CG solvers run in excess of 100 Gflops and, in terms of iterations until convergence, perform better than the usual defect-correction approach for mixed precision.
We present initial results of the use of Force Gradient integrators for lattice field theories. These promise to give significant performance improvements, especially for light fermions and large lattices. Our results show that this is indeed the case, indicating a speed-up of more than a factor of two, which is expected to increase as the integration step size becomes smaller for larger lattices and smaller fermion masses.
We discuss how the integrators used for the Hybrid Monte Carlo (HMC) algorithm not only approximately conserve some Hamiltonian $H$ but exactly conserve a nearby shadow Hamiltonian (tilde H), and how the difference $Delta H equiv tilde H - H $ may be expressed as an expansion in Poisson brackets. By measuring average values of these Poisson brackets over the equilibrium distribution $propto e^{-H}$ generated by HMC we can find the optimal integrator parameters from a single simulation. We show that a good way of doing this in practice is to minimize the variance of $Delta H$ rather than its magnitude, as has been previously suggested. Some details of how to compute Poisson brackets for gauge and fermion fields, and for nested and force gradient integrators are also presented.
117 - M. A. Clark , A. D. Kennedy 2007
We discuss how dynamical fermion computations may be made yet cheaper by using symplectic integrators that conserve energy much more accurately without decreasing the integration step size. We first explain why symplectic integrators exactly conserve a ``shadow Hamiltonian close to the desired one, and how this Hamiltonian may be computed in terms of Poisson brackets. We then discuss how classical mechanics may be implemented on Lie groups and derive the form of the Poisson brackets and force terms for some interesting integrators such as those making use of second derivatives of the action (Hessian or force gradient integrators). We hope that these will be seen to greatly improve energy conservation for only a small additional cost and that their use will significantly reduce the cost of dynamical fermion computations.
We introduce a simple general method for finding the equilibrium distribution for a class of widely used inexact Markov Chain Monte Carlo algorithms. The explicit error due to the non-commutivity of the updating operators when numerically integrating Hamiltons equations can be derived using the Baker-Campbell-Hausdorff formula. This error is manifest in the conservation of a ``shadow Hamiltonian that lies close to the desired Hamiltonian. The fixed point distribution of inexact Hybrid algorithms may then be derived taking into account that the fixed point of the momentum heatbath and that of the molecular dynamics do not coincide exactly. We perform this derivation for various inexact algorithms used for lattice QCD calculations.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا