ﻻ يوجد ملخص باللغة العربية
Frigo et al. proposed an ideal cache model and a recursive technique to design sequential cache-efficient algorithms in a cache-oblivious fashion. Ballard et al. pointed out that it is a fundamental open problem to extend the technique to an arbitrary architecture. Ballard et al. raised another open question on how to parallelize Strassens algorithm exactly and efficiently on an arbitrary number of processors. We propose a novel way of partitioning a cache-oblivious algorithm to achieve perfect strong scaling on an arbitrary number, even a prime number, of processors within a certain range in a shared-memory setting. Our approach is Processor-Aware but Cache-Oblivious (PACO). We demonstrate our approach on several important cache-oblivious algorithms, including LCS, 1D, GAP, classic rectangular matrix multiplication on a semiring, and Strassens algorithm. We discuss how to extend our approach to a distributed-memory architecture, or even a heterogeneous computing system. Hence, our work may provide a new perspective on the fundamental open problem of extending the recursive cache-oblivious technique to an arbitrary architecture. We provide an almost exact solution to the open problem on parallelizing Strassen. Our approach may provide a new perspective on extending the recursive cache-oblivious technique to an arbitrary architecture. All our algorithms demonstrate better scalability or better overall parallel cache complexities than the best known algorithms. Preliminary experiments justify our theoretical prediction that the PACO algorithms can outperform significantly state-of-the-art Processor-Oblivious (PO) and Processor-Aware (PA) counterparts.
A mesh is a graph that divides physical space into regularly-shaped regions. Meshes computations form the basis of many applications, e.g. finite-element methods, image rendering, and collision detection. In one important mesh primitive, called a mes
This paper initiates the study of the classic balanced graph partitioning problem from an online perspective: Given an arbitrary sequence of pairwise communication requests between $n$ nodes, with patterns that may change over time, the objective is
In this paper we study fractional coloring from the angle of distributed computing. Fractional coloring is the linear relaxation of the classical notion of coloring, and has many applications, in particular in scheduling. It was proved by Hasemann, H
This paper provides an algorithmic framework for obtaining fast distributed algorithms for a highly-dynamic setting, in which *arbitrarily many* edge changes may occur in each round. Our algorithm significantly improves upon prior work in its combina
The problem of attaining energy efficiency in distributed systems is of importance, but a general, non-domain-specific theory of energy-minimal scheduling is far from developed. In this paper, we classify the problems of energy-minimal scheduling and