Do you want to publish a course? Click here

Initial Guesses for Sequences of Linear Systems in a GPU-Accelerated Incompressible Flow Solver

107   0   0.0 ( 0 )
 Added by Anthony Austin
 Publication date 2020
and research's language is English




Ask ChatGPT about the research

We consider several methods for generating initial guesses when iteratively solving sequences of linear systems, showing that they can be implemented efficiently in GPU-accelerated PDE solvers, specifically solvers for incompressible flow. We propose new initial guess methods based on stabilized polynomial extrapolation and compare them to the projection method of Fischer [15], showing that they are generally competitive with projection schemes despite requiring only half the storage and performing considerably less data movement and communication. Our implementations of these algorithms are freely available as part of the libParanumal collection of GPU-accelerated flow solvers.



rate research

Read More

The linear equations that arise in interior methods for constrained optimization are sparse symmetric indefinite and become extremely ill-conditioned as the interior method converges. These linear systems present a challenge for existing solver frameworks based on sparse LU or LDL^T decompositions. We benchmark five well known direct linear solver packages using matrices extracted from power grid optimization problems. The achieved solution accuracy varies greatly among the packages. None of the tested packages delivers significant GPU acceleration for our test cases.
154 - R. Eymard 2020
In this paper, we present a class of finite volume schemes for incompressible flow problems. The unknowns are collocated at the center of the control volumes, and the stability of the schemes is obtained by adding to the mass balance stabilization terms involving the pressure jumps across the edges of the mesh.
We propose an efficient, accurate and robust implicit solver for the incompressible Navier-Stokes equations, based on a DG spatial discretization and on the TR-BDF2 method for time discretization. The effectiveness of the method is demonstrated in a number of classical benchmarks, which highlight its superior efficiency with respect to other widely used implicit approaches. The parallel implementation of the proposed method in the framework of the deal.II software package allows for accurate and efficient adaptive simulations in complex geometries, which makes the proposed solver attractive for large scale industrial applications.
We develop and use a novel mixed-precision weighted essentially non-oscillatory (WENO) method for solving the Teukolsky equation, which arises when modeling perturbations of Kerr black holes. We show that WENO methods outperform higher-order finite-difference methods, standard in the discretization of the Teukolsky equation, due to the need to add dissipation for stability purposes in the latter. In particular, as the WENO scheme uses no additional dissipation it is well-suited for scenarios requiring long-time evolution such as the study of Price tails and gravitational wave emission from extreme mass ratio binaries. In the mixed-precision approach, the expensive computation of the WENO weights is performed in reduced floating-point precision that results in a significant speedup factor of 3.3. In addition, we use state-of-the-art Nvidia general-purpose graphics processing units and cluster parallelism to further accelerate the WENO computations. Our optimized WENO solver can be used to quickly generate accurate results of significance in the field of black hole and gravitational wave physics. We apply our solver to study the behavior of the Aretakis charge -- a conserved quantity, that if detected by a gravitational wave observatory like LIGO/Virgo would prove the existence of extremal black holes.
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We present various numerical results to demonstrate the versatility and scalability of the parallel algorithm.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا