New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Accelerated Cyclic Reduction: A Distributed-Memory Fast Solver for Structured Linear Systems

167 0 0.0 ( 0 )

Download Cite

Added by Gustavo Ch\\'avez

Publication date 2017

fields

and research's language is English

Authors Gustavo Chavez - George Turkiyyah - Stefano Zampini

Analysis of PDEs Numerical Analysis

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We present Accelerated Cyclic Reduction (ACR), a distributed-memory fast direct solver for rank-compressible block tridiagonal linear systems arising from the discretization of elliptic operators, developed here for three dimensions. Algorithmic synergies between Cyclic Reduction and hierarchical matrix arithmetic operations result in a solver that has $O(k~N log N~(log N + k^2))$ arithmetic complexity and $O(k~N log N)$ memory footprint, where $N$ is the number of degrees of freedom and $k$ is the rank of a typical off-diagonal block, and which exhibits substantial concurrency. We provide a baseline for performance and applicability by comparing with the multifrontal method where hierarchical semi-separable matrices are used for compressing the fronts, and with algebraic multigrid. Over a set of large-scale elliptic systems with features of nonsymmetry and indefiniteness, the robustness of the direct solvers extends beyond that of the multigrid solver, and relative to the multifrontal approach ACR has lower or comparable execution time and memory footprint. ACR exhibits good strong and weak scaling in a distributed context and, as with any direct solver, is advantageous for problems that require the solution of multiple right-hand sides.

rate research

A distributed-memory hierarchical solver for general sparse linear systems

190 - Chao Chen , Hadi Pouransari , Sivasankaran Rajamanickam 2017

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We present various numerical results to demonstrate the versatility and scalability of the parallel algorithm.

Numerical Analysis Mathematical Software Numerical Analysis

Initial Guesses for Sequences of Linear Systems in a GPU-Accelerated Incompressible Flow Solver

106 - Anthony P. Austin , Noel Chalmers , Tim Warburton 2020

We consider several methods for generating initial guesses when iteratively solving sequences of linear systems, showing that they can be implemented efficiently in GPU-accelerated PDE solvers, specifically solvers for incompressible flow. We propose new initial guess methods based on stabilized polynomial extrapolation and compare them to the projection method of Fischer [15], showing that they are generally competitive with projection schemes despite requiring only half the storage and performing considerably less data movement and communication. Our implementations of these algorithms are freely available as part of the libParanumal collection of GPU-accelerated flow solvers.

Numerical Analysis Numerical Analysis

A numerical framework for computing steady states of size-structured population models and their stability

59 - Inom Mirzaev , David M. Bortz 2016

Structured population models are a class of general evolution equations which are widely used in the study of biological systems. Many theoretical methods are available for establishing existence and stability of steady states of general evolution equations. However, except for very special cases, finding an analytical form of stationary solutions for evolution equations is a challenging task. In the present paper, we develop a numerical framework for computing approximations to stationary solutions of general evolution equations, which can also be used to produce existence and stability regions for steady states. In particular, we use the Trotter-Kato Theorem to approximate the infinitesimal generator of an evolution equation on a finite dimensional space, which in turn reduces the evolution equation into a system of ordinary differential equations. Consequently, we approximate and study the asymptotic behavior of stationary solutions. We illustrate the convergence of our numerical framework by applying it to a linear Sinko-Streifer structured population model for which the exact form of the steady state is known. To further illustrate the utility of our approach, we apply our framework to nonlinear population balance equation, which is an extension of well-known Smoluchowksi coagulation-fragmentation model to biological populations. We also demonstrate that our numerical framework can be used to gain insight about the theoretical stability of the stationary solutions of the evolution equations. Furthermore, the open source Python program that we have developed for our numerical simulations is freely available from our Github repository (github.com/MathBioCU).

Analysis of PDEs Numerical Analysis Populations and Evolution

Analysis of a Population Model Structured by the Cells Molecular Content

157 - Marie Doumic Jauffret 2008

We study the mathematical properties of a general model of cell division structured with several internal variables. We begin with a simpler and specific model with two variables, we solve the eigenvalue problem with strong or weak assumptions, and deduce from it the long-time convergence. The main difficulty comes from natural degeneracy of birth terms that we overcome with a regularization technique. We then extend the results to the case with several parameters and recall the link between this simplified model and the one presented in cite{CBBP1}; an application to the non-linear problem is also given, leading to robust subpolynomial growth of the total population.

Analysis of PDEs Numerical Analysis

CLAIRE: A distributed-memory solver for constrained large deformation diffeomorphic image registration

116 - Andreas Mang , Amir Gholami , Christos Davatzikos 2018

With this work, we release CLAIRE, a distributed-memory implementation of an effective solver for constrained large deformation diffeomorphic image registration problems in three dimensions. We consider an optimal control formulation. We invert for a stationary velocity field that parameterizes the deformation map. Our solver is based on a globalized, preconditioned, inexact reduced space Gauss--Newton--Krylov scheme. We exploit state-of-the-art techniques in scientific computing to develop an effective solver that scales to thousands of distributed memory nodes on high-end clusters. We present the formulation, discuss algorithmic features, describe the software package, and introduce an improved preconditioner for the reduced space Hessian to speed up the convergence of our solver. We test registration performance on synthetic and real data. We demonstrate registration accuracy on several neuroimaging datasets. We compare the performance of our scheme against different flavors of the Demons algorithm for diffeomorphic image registration. We study convergence of our preconditioner and our overall algorithm. We report scalability results on state-of-the-art supercomputing platforms. We demonstrate that we can solve registration problems for clinically relevant data sizes in two to four minutes on a standard compute node with 20 cores, attaining excellent data fidelity. With the present work we achieve a speedup of (on average) 5$times$ with a peak performance of up to 17$times$ compared to our former work.

Optimization and Control Computer Vision and Pattern Recognition Distributed Parallel and Cluster Computing

comments

Fetching comments

Higher Institute of Business Administration

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Accelerated Cyclic Reduction: A Distributed-Memory Fast Solver for Structured Linear Systems

Ask ChatGPT about the research

No Arabic abstract

Read More