No Arabic abstract
Block-structured adaptive mesh refinement (AMR) provides the basis for the temporal and spatial discretization strategy for a number of ECP applications in the areas of accelerator design, additive manufacturing, astrophysics, combustion, cosmology, multiphase flow, and wind plant modelling. AMReX is a software framework that provides a unified infrastructure with the functionality needed for these and other AMR applications to be able to effectively and efficiently utilize machines from laptops to exascale architectures. AMR reduces the computational cost and memory footprint compared to a uniform mesh while preserving accurate descriptions of different physical processes in complex multi-physics algorithms. AMReX supports algorithms that solve systems of partial differential equations (PDEs) in simple or complex geometries, and those that use particles and/or particle-mesh operations to represent component physical processes. In this paper, we will discuss the core elements of the AMReX framework such as data containers and iterators as well as several specialized operations to meet the needs of the application projects. In addition we will highlight the strategy that the AMReX team is pursuing to achieve highly performant code across a range of accelerator-based architectures for a variety of different applications.
Programming current supercomputers efficiently is a challenging task. Multiple levels of parallelism on the core, on the compute node, and between nodes need to be exploited to make full use of the system. Heterogeneous hardware architectures with accelerators further complicate the development process. waLBerla addresses these challenges by providing the user with highly efficient building blocks for developing simulations on block-structured grids. The block-structured domain partitioning is flexible enough to handle complex geometries, while the structured grid within each block allows for highly efficient implementations of stencil-based algorithms. We present several example applications realized with waLBerla, ranging from lattice Boltzmann methods to rigid particle simulations. Most importantly, these methods can be coupled together, enabling multiphysics simulations. The framework uses meta-programming techniques to generate highly efficient code for CPUs and GPUs from a symbolic method formulation. To ensure software quality and performance portability, a continuous integration toolchain automatically runs an extensive test suite encompassing multiple compilers, hardware architectures, and software configurations.
In this article, a new unified duality theory is developed for Petrov-Galerkin finite element methods. This novel theory is then used to motivate goal-oriented adaptive mesh refinement strategies for use with discontinuous Petrov-Galerkin (DPG) methods. The focus of this article is mainly on broken ultraweak variational formulations of stationary boundary value problems, however, many of the ideas presented within are general enough that they be extended to any such well-posed variational formulation. The proposed goal-oriented adaptive mesh refinement procedures require the construction of refinement indicators for both a primal problem and a dual problem. In the DPG context, the primal problem is simply the system of linear equations coming from a standard DPG method and the dual problem is a similar system of equations, coming from a new method which is dual to DPG. This new method has the same coefficient matrix as the associated DPG method but has a different load. We refer to this new finite element method as a DPG* method. A thorough analysis of DPG* methods, as stand-alone finite element methods, is not given here but will be provided in subsequent articles. For DPG methods, the current theory of a posteriori error estimation is reviewed and the reliability estimate in [13, Theorem 2.1] is improved on. For DPG* methods, three different classes of refinement indicators are derived and several contributions are made towards rigorous a posteriori error estimation. At the closure of the article, results of numerical experiments with Poissons boundary value problem in a three-dimensional domain are provided. These results clearly demonstrate the utility of the goal-oriented adaptive mesh refinement strategies for quantities of interest with either interior or boundary terms.
Computationally solving the equations of elasticity is a key component in many materials science and mechanics simulations. Phenomena such as deformation-induced microstructure evolution, microfracture, and microvoid nucleation are examples of applications for which accurate stress and strain fields are required. A characteristic feature of these simulations is that the problem domain is simple (typically a rectilinear representative volume element (RVE)), but the evolution of internal topological features is extremely complex. Traditionally, the finite element method (FEM) is used for elasticity calculations; FEM is nearly ubiquituous due to (1) its ability to handle meshes of complex geometry using isoparametric elements, and (2) the weak formulation which eschews the need for computation of second derivatives. However, variable topology problems (e.g. microstructure evolution) require either remeshing, or adaptive mesh refinement (AMR) - both of which can cause extensive overhead and limited scaling. Block-structured AMR (BSAMR) is a method for adaptive mesh refinement that exhibits good scaling and is well-suited for many problems in materials science. Here, it is shown that the equations of elasticity can be efficiently solved using BSAMR using the finite difference method. The boundary operator method is used to treat different types of boundary conditions, and the reflux-free method is introduced to efficiently and easily treat the coarse-fine boundaries that arise in BSAMR. Examples are presented that demonstrate the use of this method in a variety of cases relevant to materials science: Eshelby inclusions, fracture, and microstructure evolution. Reasonable scaling is demonstrated up to $sim$4000 processors with tens of millions of grid points, and good AMR efficiency is observed.
Large-scale finite element simulations of complex physical systems governed by partial differential equations crucially depend on adaptive mesh refinement (AMR) to allocate computational budget to regions where higher resolution is required. Existing scalable AMR methods make heuristic refinement decisions based on instantaneous error estimation and thus do not aim for long-term optimality over an entire simulation. We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning (RL) to train refinement policies directly from simulation. AMR poses a new problem for RL in that both the state dimension and available action set changes at every step, which we solve by proposing new policy architectures with differing generality and inductive bias. The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations. We demonstrate in comprehensive experiments on static function estimation and the advection of different fields that RL policies can be competitive with a widely-used error estimator and generalize to larger, more complex, and unseen test problems.
Lattice Boltzmann methods are a popular mesoscopic alternative to macroscopic computational fluid dynamics solvers. Many variants have been developed that vary in complexity, accuracy, and computational cost. Extensions are available to simulate multi-phase, multi-component, turbulent, or non-Newtonian flows. In this work we present lbmpy, a code generation package that supports a wide variety of different methods and provides a generic development environment for new schemes as well. A high-level domain-specific language allows the user to formulate, extend and test various lattice Boltzmann schemes. The method specification is represented in a symbolic intermediate representation. Transformations that operate on this intermediate representation optimize and parallelize the method, yielding highly efficient lattice Boltzmann compute kernels not only for single- and two-relaxation-time schemes but also for multi-relaxation-time, cumulant, and entropically stabilized methods. An integration into the HPC framework waLBerla makes massively parallel, distributed simulations possible, which is demonstrated through scaling experiments on the SuperMUC-NG supercomputing system