No Arabic abstract
The impending termination of Moores law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand. In this work, we design a custom FPGA-based accelerator for a computational fluid dynamics (CFD) code. Unlike prior work -- which often focuses on accelerating small kernels -- we target the entire unstructured Poisson solver based on the high-fidelity spectral element method (SEM) used in modern state-of-the-art CFD systems. We model our accelerator using an analytical performance model based on the I/O cost of the algorithm. We empirically evaluate our accelerator on a state-of-the-art Intel Stratix 10 FPGA in terms of performance and power consumption and contrast it against existing solutions on general-purpose processors (CPUs). Finally, we propose a novel data movement-reducing technique where we compute geometric factors on the fly, which yields significant (700+ GFlop/s) single-precision performance and an upwards of 2x reduction in runtime for the local evaluation of the Laplace operator. We end the paper by discussing the challenges and opportunities of using reconfigurable architecture in the future, particularly in the light of emerging (not yet available) technologies.
We review some of the basic principles, fundamentals, technologies, architectures and recent advances leading to thefor the implementation of Field Programmable Photonic Field Arrays (FPPGAs).
Pipelined algorithms implemented in field programmable gate arrays are being extensively used for hardware triggers in the modern experimental high energy physics field and the complexity of such algorithms are increases rapidly. For development of such hardware triggers, algorithms are developed in $texttt{C++}$, ported to hardware description language for synthesizing firmware, and then ported back to $texttt{C++}$ for simulating the firmware response down to the single bit level. We present a $texttt{C++}$ software framework which automatically simulates and generates hardware description language code for pipelined arithmetic algorithms.
We present the design and optimization of a linear solver on General Purpose GPUs for the efficient and high-throughput evaluation of the marginalized graph kernel between pairs of labeled graphs. The solver implements a preconditioned conjugate gradient (PCG) method to compute the solution to a generalized Laplacian equation associated with the tensor product of two graphs. To cope with the gap between the instruction throughput and the memory bandwidth of current generation GPUs, our solver forms the tensor product linear system on-the-fly without storing it in memory when performing matrix-vector dot product operations in PCG. Such on-the-fly computation is accomplished by using threads in a warp to cooperatively stream the adjacency and edge label matrices of individual graphs by small square matrix blocks called tiles, which are then staged in registers and the shared memory for later reuse. Warps across a thread block can further share tiles via the shared memory to increase data reuse. We exploit the sparsity of the graphs hierarchically by storing only non-empty tiles using a coordinate format and nonzero elements within each tile using bitmaps. Besides, we propose a new partition-based reordering algorithm for aggregating nonzero elements of the graphs into fewer but denser tiles to improve the efficiency of the sparse format. We carry out extensive theoretical analyses on the graph tensor product primitives for tiles of various density and evaluate their performance on synthetic and real-world datasets. Our solver delivers three to four orders of magnitude speedup over existing CPU-based solvers such as GraKeL and GraphKernels. The capability of the solver enables kernel-based learning tasks at unprecedented scales.
We describe the technological concept and the first-light results of a 1024-channel spectrometer based on field programmable gate array (FPGA) hardware. This spectrometer is the prototype for the seven beam L-band receiver to be installed at the Effelsberg 100-m telescope in autumn 2005. Using of-the-shelf hardware and software products, we designed and constructed an extremely flexible Fast-Fourier-Transform (FFT) spectrometer with unprecedented sensitivity and dynamic range, which can be considered prototypical for spectrometer development in future radio astronomy.
In this paper we propose the first better than second order accurate method in space and time for the numerical solution of the resistive relativistic magnetohydrodynamics (RRMHD) equations on unstructured meshes in multiple space dimensions. The nonlinear system under consideration is purely hyperbolic and contains a source term, the one for the evolution of the electric field, that becomes stiff for low values of the resistivity. For the spatial discretization we propose to use high order $PNM$ schemes as introduced in cite{Dumbser2008} for hyperbolic conservation laws and a high order accurate unsplit time discretization is achieved using the element-local space-time discontinuous Galerkin approach proposed in cite{DumbserEnauxToro} for one-dimensional balance laws with stiff source terms. The divergence free character of the magnetic field is accounted for through the divergence cleaning procedure of Dedner et al. cite{Dedneretal}. To validate our high order method we first solve some numerical test cases for which exact analytical reference solutions are known and we also show numerical convergence studies in the stiff limit of the RRMHD equations using $PNM$ schemes from third to fifth order of accuracy in space and time. We also present some applications with shock waves such as a classical shock tube problem with different values for the conductivity as well as a relativistic MHD rotor problem and the relativistic equivalent of the Orszag-Tang vortex problem. We have verified that the proposed method can handle equally well the resistive regime and the stiff limit of ideal relativistic MHD. For these reasons it provides a powerful tool for relativistic astrophysical simulations involving the appearance of magnetic reconnection.