No Arabic abstract
A mesh is a graph that divides physical space into regularly-shaped regions. Meshes computations form the basis of many applications, e.g. finite-element methods, image rendering, and collision detection. In one important mesh primitive, called a mesh update, each mesh vertex stores a value and repeatedly updates this value based on the values stored in all neighboring vertices. The performance of a mesh update depends on the layout of the mesh in memory. This paper shows how to find a memory layout that guarantees that the mesh update has asymptotically optimal memory performance for any set of memory parameters. Such a memory layout is called cache-oblivious. Formally, for a $d$-dimensional mesh $G$, block size $B$, and cache size $M$ (where $M=Omega(B^d)$), the mesh update of $G$ uses $O(1+|G|/B)$ memory transfers. The paper also shows how the mesh-update performance degrades for smaller caches, where $M=o(B^d)$. The paper then gives two algorithms for finding cache-oblivious mesh layouts. The first layout algorithm runs in time $O(|G|log^2|G|)$ both in expectation and with high probability on a RAM. It uses $O(1+|G|log^2(|G|/M)/B)$ memory transfers in expectation and $O(1+(|G|/B)(log^2(|G|/M) + log|G|))$ memory transfers with high probability in the cache-oblivious and disk-access machine (DAM) models. The layout is obtained by finding a fully balanced decomposition tree of $G$ and then performing an in-order traversal of the leaves of the tree. The second algorithm runs faster by almost a $log|G|/loglog|G|$ factor in all three memory models, both in expectation and with high probability. The layout obtained by finding a relax-balanced decomposition tree of $G$ and then performing an in-order traversal of the leaves of the tree.
We consider the problem of exploring an anonymous unoriented ring of size $n$ by $k$ identical, oblivious, asynchronous mobile robots, that are unable to communicate, yet have the ability to sense their environment and take decisions based on their local view. Previous works in this weak scenario prove that $k$ must not divide $n$ for a deterministic solution to exist. Also, it is known that the minimum number of robots (either deterministic or probabilistic) to explore a ring of size $n$ is 4. An upper bound of 17 robots holds in the deterministic case while 4 probabilistic robots are sufficient. In this paper, we close the complexity gap in the deterministic setting, by proving that no deterministic exploration is feasible with less than five robots whenever the size of the ring is even, and that five robots are sufficient for any $n$ that is coprime with five. Our protocol completes exploration in O(n) robot moves, which is also optimal.
Frigo et al. proposed an ideal cache model and a recursive technique to design sequential cache-efficient algorithms in a cache-oblivious fashion. Ballard et al. pointed out that it is a fundamental open problem to extend the technique to an arbitrary architecture. Ballard et al. raised another open question on how to parallelize Strassens algorithm exactly and efficiently on an arbitrary number of processors. We propose a novel way of partitioning a cache-oblivious algorithm to achieve perfect strong scaling on an arbitrary number, even a prime number, of processors within a certain range in a shared-memory setting. Our approach is Processor-Aware but Cache-Oblivious (PACO). We demonstrate our approach on several important cache-oblivious algorithms, including LCS, 1D, GAP, classic rectangular matrix multiplication on a semiring, and Strassens algorithm. We discuss how to extend our approach to a distributed-memory architecture, or even a heterogeneous computing system. Hence, our work may provide a new perspective on the fundamental open problem of extending the recursive cache-oblivious technique to an arbitrary architecture. We provide an almost exact solution to the open problem on parallelizing Strassen. Our approach may provide a new perspective on extending the recursive cache-oblivious technique to an arbitrary architecture. All our algorithms demonstrate better scalability or better overall parallel cache complexities than the best known algorithms. Preliminary experiments justify our theoretical prediction that the PACO algorithms can outperform significantly state-of-the-art Processor-Oblivious (PO) and Processor-Aware (PA) counterparts.
Computational Fluid Dynamics (CFD) is a major sub-field of engineering. Corresponding flow simulations are typically characterized by heavy computational resource requirements. Often, very fine and complex meshes are required to resolve physical effects in an appropriate manner. Since all CFD algorithms scale at least linearly with the size of the underlying mesh discretization, finding an optimal mesh is key for computational efficiency. One methodology used to find optimal meshes is goal-oriented adaptive mesh refinement. However, this is typically computationally demanding and only available in a limited number of tools. Within this contribution, we adopt a machine learning approach to identify optimal mesh densities. We generate optimized meshes using classical methodologies and propose to train a convolutional network predicting optimal mesh densities given arbitrary geometries. The proposed concept is validated along 2d wind tunnel simulations with more than 60,000 simulations. Using a training set of 20,000 simulations we achieve accuracies of more than 98.7%. Corresponding predictions of optimal meshes can be used as input for any mesh generation and CFD tool. Thus without complex computations, any CFD engineer can start his predictions from a high quality mesh.
This article introduces a representation of dynamic meshes, adapted to some numerical simulations that require controlling the volume of objects with free boundaries, such as incompressible fluid simulation, some astrophysical simulations at cosmological scale, and shape/topology optimization. The algorithm decomposes the simulated object into a set of convex cells called a Laguerre diagram, parameterized by the position of $N$ points in 3D and $N$ additional parameters that control the volumes of the cells. These parameters are found as the (unique) solution of a convex optimization problem -- semi-discrete Monge-Amp`ere equation -- stemming from optimal transport theory. In this article, this setting is extended to objects with free boundaries and arbitrary topology, evolving in a domain of arbitrary shape, by solving a partial optimal transport problem. The resulting Lagrangian scheme makes it possible to accurately control the volume of the object, while precisely tracking interfaces, interactions, collisions, and topology changes.
Databases need to allocate and free blocks of storage on disk. Freed blocks introduce holes where no data is stored. Allocation systems attempt to reuse such deallocated regions in order to minimize the footprint on disk. If previously allocated blocks cannot be moved, the problem is called the memory allocation problem, which is known to have a logarithmic overhead in the footprint. This paper defines the storage reallocation problem, where previously allocated blocks can be moved, or reallocated, but at some cost. The algorithms presented here are cost oblivious, in that they work for a broad and reasonable class of cost functions, even when they do not know what the cost function is. The objective is to minimize the storage footprint, that is, the largest memory address containing an allocated object, while simultaneously minimizing the reallocation costs. This paper gives asymptotically optimal algorithms for storage reallocation, in which the storage footprint is at most (1+epsilon) times optimal, and the reallocation cost is at most (1/epsilon) times the original allocation cost, which is also optimal. The algorithms are cost oblivious as long as the allocation/reallocation cost function is subadditive.