Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

An Efficient ADER-DG Local Time Stepping Scheme for 3D HPC Simulation of Seismic Waves in Poroelastic Media

111 0 0.0 ( 0 )

Download Cite

Added by Sebastian Wolf

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Sebastian Wolf - Martin Galis - Carsten Uphoff

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Many applications from geosciences require simulations of seismic waves in porous media. Biots theory of poroelasticity describes the coupling between solid and fluid phases and introduces a stiff source term, thereby increasing computational cost and motivating efficient methods utilising High-Performance Computing. We present a novel realisation of the discontinuous Galerkin scheme with Arbitrary DERivative time stepping (ADER-DG) that copes with stiff source terms. To integrate this source term with a reasonable time step size, we use an element-local space-time predictor, which needs to solve medium-sized linear systems - with 1000 to 10000 unknowns - in each element update (i.e., billions of times). We present a novel block-wise back-substitution algorithm for solving these systems efficiently. In comparison to LU decomposition, we reduce the number of floating-point operations by a factor of up to 25. The block-wise back-substitution is mapped to a sequence of small matrix-matrix multiplications, for which code generators are available to generate highly optimised code. We verify the new solver thoroughly in problems of increasing complexity. We demonstrate high-order convergence for 3D problems. We verify the correct treatment of point sources, material interfaces and traction-free boundary conditions. In addition, we compare against a finite difference code for a newly defined layer over half-space problem. We find that extremely high accuracy is required to resolve the slow P-wave at a free surface, while solid particle velocities are not affected by coarser resolutions. By using a clustered local time stepping scheme, we reduce time to solution by a factor of 6 to 10 compared to global time stepping. We conclude our study with a scaling and performance analysis, demonstrating our implementations efficiency and its potential for extreme-scale simulations.

rate research

An Efficient Vectorization Scheme for Stencil Computation

107 - Kun Li , Liang Yuan , Yunquan Zhang 2021

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data locality respectively. In this paper, the downsides of existing vectorization schemes are analyzed. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. Then we propose a novel transpose layout to preserve the data locality for tiling and reduce the data reorganization overhead for vectorization simultaneously. To further improve the data reuse at the register level, a time loop unroll-and-jam strategy is designed to perform multistep stencil computation along the time dimension. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.

Distributed Parallel and Cluster Computing

Software Abstractions and Methodologies for HPC Simulation Codes on Future Architectures

417 - A. Dubey , S. Brandt , R. Brower 2013

Large, complex, multi-scale, multi-physics simulation codes, running on high performance com-puting (HPC) platforms, have become essential to advancing science and engineering. These codes simulate multi-scale, multi-physics phenomena with unprecedented fidelity on petascale platforms, and are used by large communities. Continued ability of these codes to run on future platforms is as crucial to their communities as continued improvements in instruments and facilities are to experimental scientists. However, the ability of code developers to do these things faces a serious challenge with the paradigm shift underway in platform architecture. The complexity and uncertainty of the future platforms makes it essential to approach this challenge cooperatively as a community. We need to develop common abstractions, frameworks, programming models and software development methodologies that can be applied across a broad range of complex simulation codes, and common software infrastructure to support them. In this position paper we express and discuss our belief that such an infrastructure is critical to the deployment of existing and new large, multi-scale, multi-physics codes on future HPC platforms.

Computational Engineering Mathematical Software Software Engineering

A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW

398 - Mario Berljafa Department ofn Mathematics 2013

In one of the most important methods in Density Functional Theory - the Full-Potential Linearized Augmented Plane Wave (FLAPW) method - dense generalized eigenproblems are organized in long sequences. Moreover each eigenproblem is strongly correlated to the next one in the sequence. We propose a novel approach which exploits such correlation through the use of an eigensolver based on subspace iteration and accelerated with Chebyshev polynomials. The resulting solver, parallelized using the Elemental library framework, achieves excellent scalability and is competitive with current dense parallel eigensolvers.

Distributed Parallel and Cluster Computing Mathematical Software Computational Physics

GPU Algorithms for Efficient Exascale Discretizations

272 - Ahmad Abdelfattah , Valeria Barra , Natalie Beams 2021

In this paper we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM, libParanumal, and Nek projects. We report performance and capability improvements in several CEED-enabled applications on both NVIDIA and AMD GPU systems.

Distributed Parallel and Cluster Computing Mathematical Software Numerical Analysis

Discontinuous Galerkin methods on graphics processing units for nonlinear hyperbolic conservation laws

115 - Martin Fuhry , Andrew Giuliani , Lilia Krivodonova 2016

We present a novel implementation of the modal discontinuous Galerkin (DG) method for hyperbolic conservation laws in two dimensions on graphics processing units (GPUs) using NVIDIAs Compute Unified Device Architecture (CUDA). Both flexible and highly accurate, DG methods accommodate parallel architectures well as their discontinuous nature produces element-local approximations. High performance scientific computing suits GPUs well, as these powerful, massively parallel, cost-effective devices have recently included support for double-precision floating point numbers. Computed examples for Euler equations over unstructured triangle meshes demonstrate the effectiveness of our implementation on an NVIDIA GTX 580 device. Profiling of our method reveals performance comparable to an existing nodal DG-GPU implementation for linear problems.

Distributed Parallel and Cluster Computing Mathematical Software Computational Physics

comments

Fetching comments

Wadi International University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

An Efficient ADER-DG Local Time Stepping Scheme for 3D HPC Simulation of Seismic Waves in Poroelastic Media

Ask ChatGPT about the research

No Arabic abstract

Read More