No Arabic abstract
We give an overview of the QPACE project, which is pursuing the development of a massively parallel, scalable supercomputer for LQCD. The machine is a three-dimensional torus of identical processing nodes, based on the PowerXCell 8i processor. The nodes are connected by an FPGA-based, application-optimized network processor attached to the PowerXCell 8i processor. We present a performance analysis of lattice QCD codes on QPACE and corresponding hardware benchmarks.
We discuss the current status of our calculation of the physics of pi and K mesons using three dynamical flavors of improved staggered quarks. This year, we have a new ensemble with a lattice spacing of 0.06 fm and a light sea mass of 0.2 m_s, as well as significant increases in statistics at several coarser lattice spacings and/or heavier sea masses. Results for decay constants, quark masses, low energy constants, condensates, and V_{us} are presented.
We discuss the implementation and optimization challenges for a Wilson-Dirac solver with Clover term on QPACE, a parallel machine based on Cell processors and a torus network. We choose the mixed-precision Schwarz preconditioned FGCR algorithm in order to circumvent network bandwidth and latency constraints, to make efficient use of the multicore parallelism and on-chip memory, and to achieve flexibility in the choice of lattice sizes. We present benchmarks on up to 256 QPACE nodes showing an aggregate sustained performance of about 10 TFlops for the complete solver and very good scaling.
QPACE is a novel massively parallel architecture optimized for lattice QCD simulations. A single QPACE node is based on the IBM PowerXCell 8i processor. The nodes are interconnected by a custom 3-dimensional torus network implemented on an FPGA. The compute power of the processor is provided by 8 Synergistic Processing Units. Making efficient use of these accelerator cores in scientific applications is challenging. In this paper we describe our strategies for porting applications to the QPACE architecture and report on performance numbers.
The Picasso project is a dark matter search experiment based on the superheated droplet technique. Preliminary runs performed at the Picasso Lab in Montreal have showed the suitability of this detection technique to the search for weakly interacting cold dark matter particles. In July 2002, a new phase of the project started. A batch of six 1-liter detectors with an active mass of approximately 40g was installed in a gallery of the SNO observatory in Sudbury, Ontario, Canada at a depth of 6,800 feet (2,070m). We give a status report on the new experimental setup, data analysis, and preliminary limits on spin-dependent neutralino interaction cross section.
We describe our experience porting the Regensburg implementation of the DD-$alpha$AMG solver from QPACE 2 to QPACE 3. We first review how the code was ported from the first generation Intel Xeon Phi processor (Knights Corner) to its successor (Knights Landing). We then describe the modifications in the communication library necessitated by the switch from InfiniBand to Omni-Path. Finally, we present the performance of the code on a single processor as well as the scaling on many nodes, where in both cases the speedup factor is close to the theoretical expectations.