No Arabic abstract
In research problems that involve the use of numerical methods for solving systems of ordinary differential equations (ODEs), it is often required to select the most efficient method for a particular problem. To solve a Cauchy problem for a system of ODEs, Runge-Kutta methods (explicit or implicit ones, with or without step-size control, etc.) are employed. In that case, it is required to search through many implementations of the numerical method and select coefficients or other parameters of its numerical scheme. This paper proposes a library and scripts for automated generation of routine functions in the Julia programming language for a set of numerical schemes of Runge-Kutta methods. For symbolic manipulations, we use a template substitution tool. The proposed approach to automated generation of program code allows us to use a single template for editing, instead of modifying each individual function to be compared. On the one hand, this provides universality in the implementation of a numerical scheme and, on the other hand, makes it possible to minimize the number of errors in the process of modifying the compared implementations of the numerical method. We consider Runge-Kutta methods without step-size control, embedded methods with step-size control, and Rosenbrock methods with step-size control. The program codes for the numerical schemes, which are generated automatically using the proposed library, are tested by numerical solution of several well-known problems.
We propose a functional implementation of emph{Multivariate Tower Automatic Differentiation}. Our implementation is intended to be used in implementing $C^infty$-structure computation of an arbitrary Weil algebra, which we discussed in the previous work.
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We present various numerical results to demonstrate the versatility and scalability of the parallel algorithm.
The linear equations that arise in interior methods for constrained optimization are sparse symmetric indefinite and become extremely ill-conditioned as the interior method converges. These linear systems present a challenge for existing solver frameworks based on sparse LU or LDL^T decompositions. We benchmark five well known direct linear solver packages using matrices extracted from power grid optimization problems. The achieved solution accuracy varies greatly among the packages. None of the tested packages delivers significant GPU acceleration for our test cases.
Gauss-Seidel (GS) relaxation is often employed as a preconditioner for a Krylov solver or as a smoother for Algebraic Multigrid (AMG). However, the requisite sparse triangular solve is difficult to parallelize on many-core architectures such as graphics processing units (GPUs). In the present study, the performance of the traditional GS relaxation based on a triangular solve is compared with two-stage variants, replacing the direct triangular solve with a fixed number of inner Jacobi-Richardson (JR) iterations. When a small number of inner iterations is sufficient to maintain the Krylov convergence rate, the two-stage GS (GS2) often outperforms the traditional algorithm on many-core architectures. We also compare GS2 with JR. When they perform the same number of flops for SpMV (e.g. three JR sweeps compared to two GS sweeps with one inner JR sweep), the GS2 iterations, and the Krylov solver preconditioned with GS2, may converge faster than the JR iterations. Moreover, for some problems (e.g. elasticity), it was found that JR may diverge with a damping factor of one, whereas two-stage GS may improve the convergence with more inner iterations. Finally, to study the performance of the two-stage smoother and preconditioner for a practical problem, %(e.g. using tuned damping factors), these were applied to incompressible fluid flow simulations on GPUs.
The Python package ComCH is a lightweight specialized computer algebra system that provides models for well known objects, the surjection and Barratt-Eccles operads, parameterizing the product structure of algebras that are commutative in a derived sense. The primary examples of such algebras treated by ComCH are the cochain complexes of spaces, for which it provides effective constructions of Steenrod cohomology operations at all prime.