No Arabic abstract
In this paper we propose an accurate, highly parallel algorithm for the generalized eigendecomposition of a matrix pair $(H, S)$, given in a factored form $(F^{ast} J F, G^{ast} G)$. Matrices $H$ and $S$ are generally complex and Hermitian, and $S$ is positive definite. This type of matrices emerges from the representation of the Hamiltonian of a quantum mechanical system in terms of an overcomplete set of basis functions. This expansion is part of a class of models within the broad field of Density Functional Theory, which is considered the golden standard in condensed matter physics. The overall algorithm consists of four phases, the second and the fourth being optional, where the two last phases are computation of the generalized hyperbolic SVD of a complex matrix pair $(F,G)$, according to a given matrix $J$ defining the hyperbolic scalar product. If $J = I$, then these two phases compute the GSVD in parallel very accurately and efficiently.
This paper presents the use of element-based algebraic multigrid (AMGe) hierarchies, implemented in the ParELAG (Parallel Element Agglomeration Algebraic Multigrid Upscaling and Solvers) library, to produce multilevel preconditioners and solvers for H(curl) and H(div) formulations. ParELAG constructs hierarchies of compatible nested spaces, forming an exact de Rham sequence on each level. This allows the application of hybrid smoothers on all levels and AMS (Auxiliary-space Maxwell Solver) or ADS (Auxiliary-space Divergence Solver) on the coarsest levels, obtaining complete multigrid cycles. Numerical results are presented, showing the parallel performance of the proposed methods. As a part of the exposition, this paper demonstrates some of the capabilities of ParELAG and outlines some of the components and procedures within the library.
High-precision numerical scheme for nonlinear hyperbolic evolution equations is proposed based on the spectral method. The detail discretization processes are discussed in case of one-dimensional Klein-Gordon equations. In conclusion, a numerical scheme with the order of total calculation cost $O(N log 2N)$ is proposed. As benchmark results, the relation between the numerical precision and the discretization unit size are demonstrated.
Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require double precision accuracy in several domains. This conflict between hardware trends and application needs has resulted in a need for multiprecision strategies at the linear algebra algorithms level if we want to exploit the hardware to its full potential while meeting the accuracy requirements. In this paper, we focus on preconditioned sparse iterative linear solvers, a key kernel in several CSE applications. We present a study of multiprecision strategies for accelerating this kernel on GPUs. We seek the best methods for incorporating multiple precisions into the GMRES linear solver; these include iterative refinement and parallelizable preconditioners. Our work presents strategies to determine when multiprecision GMRES will be effective and to choose parameters for a multiprecision iterative refinement solver to achieve better performance. We use an implementation that is based on the Trilinos library and employs Kokkos Kernels for performance portability of linear algebra kernels. Performance results demonstrate the promise of multiprecision approaches and demonstrate even further improvements are possible by optimizing low-level kernels.
Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-stage variants, replacing the direct triangular solve with a fixed number of inner Jacobi-Richardson (JR) iterations. When a small number of inner iterations is sufficient to maintain the Krylov convergence rate, the two-stage GS (GS2) often outperforms the traditional algorithm on many-core architectures. We also compare GS2 with JR. When they perform the same number of flops for SpMV (e.g. three JR sweeps compared to two GS sweeps with one inner JR sweep), the GS2 iterations, and the Krylov solver preconditioned with GS2, may converge faster than the JR iterations. Moreover, for some problems (e.g. elasticity), it was found that JR may diverge with a damping factor of one, whereas two-stage GS may improve the convergence with more inner iterations. Finally, to study the performance of the two-stage smoother and preconditioner for a practical problem, %(e.g. using tuned damping factors), these were applied to incompressible fluid flow simulations on GPUs.
We introduced the least-squares ReLU neural network (LSNN) method for solving the linear advection-reaction problem with discontinuous solution and showed that the method outperforms mesh-based numerical methods in terms of the number of degrees of freedom. This paper studies the LSNN method for scalar nonlinear hyperbolic conservation law. The method is a discretization of an equivalent least-squares (LS) formulation in the set of neural network functions with the ReLU activation function. Evaluation of the LS functional is done by using numerical integration and conservative finite volume scheme. Numerical results of some test problems show that the method is capable of approximating the discontinuous interface of the underlying problem automatically through the free breaking lines of the ReLU neural network. Moreover, the method does not exhibit the common Gibbs phenomena along the discontinuous interface.