No Arabic abstract
Sparse tiling is a technique to fuse loops that access common data, thus increasing data locality. Unlike traditional loop fusion or blocking, the loops may have different iteration spaces and access shared datasets through indirect memory accesses, such as A[map[i]] -- hence the name sparse. One notable example of such loops arises in discontinuous-Galerkin finite element methods, because of the computation of numerical integrals over different domains (e.g., cells, facets). The major challenge with sparse tiling is implementation -- not only is it cumbersome to understand and synthesize, but it is also onerous to maintain and generalize, as it requires a complete rewrite of the bulk of the numerical computation. In this article, we propose an approach to extend the applicability of sparse tiling based on raising the level of abstraction. Through a sequence of compiler passes, the mathematical specification of a problem is progressively lowered, and eventually sparse-tiled C for-loops are generated. Besides automation, we advance the state-of-the-art by introducing: a revisited, more efficient sparse tiling algorithm; support for distributed-memory parallelism; a range of fine-grained optimizations for increased run-time performance; implementation in a publicly-available library, SLOPE; and an in-depth study of the performance impact in Seigen, a real-world elastic wave equation solver for seismological problems, which shows speed-ups up to 1.28x on a platform consisting of 896 Intel Broadwell cores.
Modelling of mechanical behaviour of pre-stressed fibre-reinforced composites is considered in a geometrically exact setting. A general approach which includes two different reference configurations is employed: one configuration corresponds to the load-free state of the structure and another one to the stress-free state of each material particle. The applicability of the approach is demonstrated in terms of a viscoelastic material model; both the matrix and the fibre are modelled using a multiplicative split of the deformation gradient tensor; a transformation rule for initial conditions is elaborated and specified. Apart from its simplicity, an important advantage of the approach is that well-established numerical algorithms can be used for pre-stressed inelastic structures. The interrelation between the advocated approach and the widely used opening angle approach is clarified. A full-scale FEM simulation confirms the main predictions of the opening angle approach. A locking effect is discovered; the effect is that in some cases the opening angle of the composite is essentially smaller than the opening angles of its individual layers. Thus, the standard cutting test typically used to analyse pre-stresses does not carry enough information and more refined experimental techniques are needed.
We present a simple mathematical framework and API for parallel mesh and data distribution, load balancing, and overlap generation. It relies on viewing the mesh as a Hasse diagram, abstracting away information such as cell shape, dimension, and coordinates. The high level of abstraction makes our interface both concise and powerful, as the same algorithm applies to any representable mesh, such as hybrid meshes, meshes embedded in higher dimension, and overlapped meshes in parallel. We present evidence, both theoretical and experimental, that the algorithms are scalable and efficient. A working implementation can be found in the latest release of the PETSc libraries.
In machine learning for fluid mechanics, fully-connected neural network (FNN) only uses the local features for modelling, while the convolutional neural network (CNN) cannot be applied to data on structured/unstructured mesh. In order to overcome the limitations of FNN and CNN, the unstructured convolutional neural network (UCNN) is proposed, which aggregates and effectively exploits the features of neighbour nodes through the weight function. Adjoint vector modelling is taken as the task to study the performance of UCNN. The mapping function from flow-field features to adjoint vector is constructed through efficient parallel implementation on GPU. The modelling capability of UCNN is compared with that of FNN on validation set and in aerodynamic shape optimization at test case. The influence of mesh changing on the modelling capability of UCNN is further studied. The results indicate that UCNN is more accurate in modelling process.
We introduce an algorithm for the efficient computation of the continuous Haar transform of 2D patterns that can be described by polygons. These patterns are ubiquitous in VLSI processes where they are used to describe design and mask layouts. There, speed is of paramount importance due to the magnitude of the problems to be solved and hence very fast algorithms are needed. We show that by techniques borrowed from computational geometry we are not only able to compute the continuous Haar transform directly, but also to do it quickly. This is achieved by massively pruning the transform tree and thus dramatically decreasing the computational load when the number of vertices is small, as is the case for VLSI layouts. We call this new algorithm the pruned continuous Haar transform. We implement this algorithm and show that for patterns found in VLSI layouts the proposed algorithm was in the worst case as fast as its discrete counterpart and up to 12 times faster.
Huge data advent in high-performance computing (HPC) applications such as fluid flow simulations usually hinders the interactive processing and exploration of simulation results. Such an interactive data exploration not only allows scientiest to play with their data but also to visualise huge (distributed) data sets in both an efficient and easy way. Therefore, we propose an HPC data exploration service based on a sliding window concept, that enables researches to access remote data (available on a supercomputer or cluster) during simulation runtime without exceeding any bandwidth limitations between the HPC back-end and the user front-end.