No Arabic abstract
The parallel strong-scaling of Krylov iterative methods is largely determined by the number of global reductions required at each iteration. The GMRES and Krylov-Schur algorithms employ the Arnoldi algorithm for nonsymmetric matrices. The underlying orthogonalization scheme is left-looking and processes one column at a time. Thus, at least one global reduction is required per iteration. The traditional algorithm for generating the orthogonal Krylov basis vectors for the Krylov-Schur algorithm is classical Gram Schmidt applied twice with reorthogonalization (CGS2), requiring three global reductions per step. A new variant of CGS2 that requires only one reduction per iteration is applied to the Arnoldi-QR iteration. Strong-scaling results are presented for finding eigenvalue-pairs of nonsymmetric matrices. A preliminary attempt to derive a similar algorithm (one reduction per Arnoldi iteration with a robust orthogonalization scheme) was presented by Hernandez et al.(2007). Unlike our approach, their method is not forward stable for eigenvalues.
This paper introduces new solvers for the computation of low-rank approximate solutions to large-scale linear problems, with a particular focus on the regularization of linear inverse problems. Although Krylov methods incorporating explicit projections onto low-rank subspaces are already used for well-posed systems that arise from discretizing stochastic or time-dependent PDEs, we are mainly concerned with algorithms that solve the so-called nuclear norm regularized problem, where a suitable nuclear norm penalization on the solution is imposed alongside a fit-to-data term expressed in the 2-norm: this has the effect of implicitly enforcing low-rank solutions. By adopting an iteratively reweighted norm approach, the nuclear norm regularized problem is reformulated as a sequence of quadratic problems, which can then be efficiently solved using Krylov methods, giving rise to an inner-outer iteration scheme. Our approach differs from the other solvers available in the literature in that: (a) Kronecker product properties are exploited to define the reweighted 2-norm penalization terms; (b) efficient preconditioned Krylov methods replace gradient (projection) methods; (c) the regularization parameter can be efficiently and adaptively set along the iterations. Furthermore, we reformulate within the framework of flexible Krylov methods both the new inner-outer methods for nuclear norm regularization and some of the existing Krylov methods incorporating low-rank projections. This results in an even more computationally efficient (but heuristic) strategy, that does not rely on an inner-outer iteration scheme. Numerical experiments show that our new solvers are competitive with other state-of-the-art solvers for low-rank problems, and deliver reconstructions of increased quality with respect to other classical Krylov methods.
The numerical integration of an analytical function $f(x)$ using a finite set of equidistant points can be performed by quadrature formulas like the Newton-Cotes. Unlike Gaussian quadrature formulas however, higher-order Newton-Cotes formulas are not stable, limiting the usable order of such formulas. Existing work showed that by the use of orthogonal polynomials, stable high-order quadrature formulas with equidistant points can be developed. We improve upon such work by making use of (orthogonal) Gram polynomials and deriving an iterative algorithm, together allowing us to reduce the space-complexity of the original algorithm significantly.
Many Krylov subspace methods for shifted linear systems take advantage of the invariance of the Krylov subspace under a shift of the matrix. However, exploiting this fact in the non-Hermitian case introduces restrictions; e.g., initial residuals must be collinear and this collinearity must be maintained at restart. Thus we cannot simultaneously solve shifted systems with unrelated right-hand sides using this strategy, and all shifted residuals cannot be simultaneously minimized over a Krylov subspace such that collinearity is maintained. It has been shown that this renders them generally incompatible with techniques of subspace recycling [Soodhalter et al. APNUM 14]. This problem, however, can be overcome. By interpreting a family of shifted systems as one Sylvester equation, we can take advantage of the known shift invariance of the Krylov subspace generated by the Sylvester operator. Thus we can simultaneously solve all systems over one block Krylov subspace using FOM or GMRES type methods, even when they have unrelated right-hand sides. Because residual collinearity is no longer a requirement at restart, these methods are fully compatible with subspace recycling techniques. Furthermore, we realize the benefits of block sparse matrix operations which arise in the context of high-performance computing applications. In this paper, we discuss exploiting this Sylvester equation point of view which has yielded methods for shifted systems which are compatible with unrelated right-hand sides. From this, we propose a recycled GMRES method for simultaneous solution of shifted systems.Numerical experiments demonstrate the effectiveness of the methods.
An approach is given for solving large linear systems that combines Krylov methods with use of two different grid levels. Eigenvectors are computed on the coarse grid and used to deflate eigenvalues on the fine grid. GMRES-type methods are first used on both the coarse and fine grids. Then another approach is given that has a restarted BiCGStab (or IDR) method on the fine grid. While BiCGStab is generally considered to be a non-restarted method, it works well in this context with deflating and restarting. Tests show this new approach can be very efficient for difficult linear equations problems.
Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-stage variants, replacing the direct triangular solve with a fixed number of inner Jacobi-Richardson (JR) iterations. When a small number of inner iterations is sufficient to maintain the Krylov convergence rate, the two-stage GS (GS2) often outperforms the traditional algorithm on many-core architectures. We also compare GS2 with JR. When they perform the same number of flops for SpMV (e.g. three JR sweeps compared to two GS sweeps with one inner JR sweep), the GS2 iterations, and the Krylov solver preconditioned with GS2, may converge faster than the JR iterations. Moreover, for some problems (e.g. elasticity), it was found that JR may diverge with a damping factor of one, whereas two-stage GS may improve the convergence with more inner iterations. Finally, to study the performance of the two-stage smoother and preconditioner for a practical problem, %(e.g. using tuned damping factors), these were applied to incompressible fluid flow simulations on GPUs.