No Arabic abstract
We address the reduction to compact band forms, via unitary similarity transformations, for the solution of symmetric eigenvalue problems and the computation of the singular value decomposition (SVD). Concretely, in the first case we revisit the reduction to symmetric band form while, for the second case, we propose a similar alternative, which transforms the original matrix to (unsymmetric) band form, replacing the conventional reduction method that produces a triangular--band output. In both cases, we describe algorithmic variants of the standard Level-3 BLAS-based procedures, enhanced with look-ahead, to overcome the performance bottleneck imposed by the panel factorization. Furthermore, our solutions employ an algorithmic block size that differs from the target bandwidth, illustrating the important performance benefits of this decision. Finally, we show that our alternative compact band form for the SVD is key to introduce an effective look-ahead strategy into the corresponding reduction procedure.
In this paper, a parallel structured divide-and-conquer (PSDC) eigensolver is proposed for symmetric tridiagonal matrices based on ScaLAPACK and a parallel structured matrix multiplication algorithm, called PSMMA. Computing the eigenvectors via matrix-matrix multiplications is the most computationally expensive part of the divide-and-conquer algorithm, and one of the matrices involved in such multiplications is a rank-structured Cauchy-like matrix. By exploiting this particular property, PSMMA constructs the local matrices by using generators of Cauchy-like matrices without any communication, and further reduces the computation costs by using a structured low-rank approximation algorithm. Thus, both the communication and computation costs are reduced. Experimental results show that both PSMMA and PSDC are highly scalable and scale to 4096 processes at least. PSDC has better scalability than PHDC that was proposed in [J. Comput. Appl. Math. 344 (2018) 512--520] and only scaled to 300 processes for the same matrices. Comparing with texttt{PDSTEDC} in ScaLAPACK, PSDC is always faster and achieves $1.4$x--$1.6$x speedup for some matrices with few deflations. PSDC is also comparable with ELPA, with PSDC being faster than ELPA when using few processes and a little slower when using many processes.
In many scientific applications the solution of non-linear differential equations are obtained through the set-up and solution of a number of successive eigenproblems. These eigenproblems can be regarded as a sequence whenever the solution of one problem fosters the initialization of the next. In addition, in some eigenproblem sequences there is a connection between the solutions of adjacent eigenproblems. Whenever it is possible to unravel the existence of such a connection, the eigenproblem sequence is said to be correlated. When facing with a sequence of correlated eigenproblems the current strategy amounts to solving each eigenproblem in isolation. We propose a alternative approach which exploits such correlation through the use of an eigensolver based on subspace iteration and accelerated with Chebyshev polynomials (ChFSI). The resulting eigensolver is optimized by minimizing the number of matrix-vector multiplications and parallelized using the Elemental library framework. Numerical results show that ChFSI achieves excellent scalability and is competitive with current dense linear algebra parallel eigensolvers.
SLEPc is a parallel library for the solution of various types of large-scale eigenvalue problems. In the last years we have been developing a module within SLEPc, called NEP, that is intended for solving nonlinear eigenvalue problems. These problems can be defined by means of a matrix-valued function that depends nonlinearly on a single scalar parameter. We do not consider the particular case of polynomial eigenvalue problems (which are implemented in a different module in SLEPc) and focus here on rational eigenvalue problems and other general nonlinear eigenproblems involving square roots or any other nonlinear function. The paper discusses how the NEP module has been designed to fit the needs of applications and provides a description of the available solvers, including some implementation details such as parallelization. Several test problems coming from real applications are used to evaluate the performance and reliability of the solvers.
Some numerical algorithms for elliptic eigenvalue problems are proposed, analyzed, and numerically tested. The methods combine advantages of the two-grid algorithm, two-space method, the shifted inverse power method, and the polynomial preserving recovery technique . Our new algorithms compare favorably with some existing methods and enjoy superconvergence property.
We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.