No Arabic abstract
This paper provides three nearly-optimal algorithms for scheduling $t$ jobs in the $mathsf{CLIQUE}$ model. First, we present a deterministic scheduling algorithm that runs in $O(mathsf{GlobalCongestion} + mathsf{dilation})$ rounds for jobs that are sufficiently efficient in terms of their memory. The $mathsf{dilation}$ is the maximum round complexity of any of the given jobs, and the $mathsf{GlobalCongestion}$ is the total number of messages in all jobs divided by the per-round bandwidth of $n^2$ of the $mathsf{CLIQUE}$ model. Both are inherent lower bounds for any scheduling algorithm. Then, we present a randomized scheduling algorithm which runs $t$ jobs in $O(mathsf{GlobalCongestion} + mathsf{dilation}cdotlog{n}+t)$ rounds and only requires that inputs and outputs do not exceed $O(nlog n)$ bits per node, which is met by, e.g., almost all graph problems. Lastly, we adjust the emph{random-delay-based} scheduling algorithm [Ghaffari, PODC15] from the $mathsf{CLIQUE}$ model and obtain an algorithm that schedules any $t$ jobs in $O(t / n + mathsf{LocalCongestion} + mathsf{dilation}cdotlog{n})$ rounds, where the $mathsf{LocalCongestion}$ relates to the congestion at a single node of the $mathsf{CLIQUE}$. We compare this algorithm to the previous approaches and show their benefit. We schedule the set of jobs on-the-fly, without a priori knowledge of its parameters or the communication patterns of the jobs. In light of the inherent lower bounds, all of our algorithms are nearly-optimal. We exemplify the power of our algorithms by analyzing the message complexity of the state-of-the-art MIS protocol [Ghaffari, Gouleakis, Konrad, Mitrovic and Rubinfeld, PODC18], and we show that we can solve $t$ instances of MIS in $O(t + loglogDeltalog{n})$ rounds, that is, in $O(1)$ amortized time, for $tgeq loglogDeltalog{n}$.
In this work, we use algebraic methods for studying distance computation and subgraph detection tasks in the congested clique model. Specifically, we adapt parallel matrix multiplication implementations to the congested clique, obtaining an $O(n^{1-2/omega})$ round matrix multiplication algorithm, where $omega < 2.3728639$ is the exponent of matrix multiplication. In conjunction with known techniques from centralised algorithmics, this gives significant improvements over previous best upper bounds in the congested clique model. The highlight results include: -- triangle and 4-cycle counting in $O(n^{0.158})$ rounds, improving upon the $O(n^{1/3})$ triangle detection algorithm of Dolev et al. [DISC 2012], -- a $(1 + o(1))$-approximation of all-pairs shortest paths in $O(n^{0.158})$ rounds, improving upon the $tilde{O} (n^{1/2})$-round $(2 + o(1))$-approximation algorithm of Nanongkai [STOC 2014], and -- computing the girth in $O(n^{0.158})$ rounds, which is the first non-trivial solution in this model. In addition, we present a novel constant-round combinatorial algorithm for detecting 4-cycles.
We design fast deterministic algorithms for distance computation in the congested clique model. Our key contributions include: -- A $(2+epsilon)$-approximation for all-pairs shortest paths in $O(log^2{n} / epsilon)$ rounds on unweighted undirected graphs. With a small additional additive factor, this also applies for weighted graphs. This is the first sub-polynomial constant-factor approximation for APSP in this model. -- A $(1+epsilon)$-approximation for multi-source shortest paths from $O(sqrt{n})$ sources in $O(log^2{n} / epsilon)$ rounds on weighted undirected graphs. This is the first sub-polynomial algorithm obtaining this approximation for a set of sources of polynomial size. Our main techniques are new distance tools that are obtained via improved algorithms for sparse matrix multiplication, which we leverage to construct efficient hopsets and shortest paths. Furthermore, our techniques extend to additional distance problems for which we improve upon the state-of-the-art, including diameter approximation, and an exact single-source shortest paths algorithm for weighted undirected graphs in $tilde{O}(n^{1/6})$ rounds.
The coflow scheduling problem has emerged as a popular abstraction in the last few years to study data communication problems within a data center. In this basic framework, each coflow has a set of communication demands and the goal is to schedule many coflows in a manner that minimizes the total weighted completion time. A coflow is said to complete when all its communication needs are met. This problem has been extremely well studied for the case of complete bipartite graphs that model a data center with full bisection bandwidth and several approximation algorithms and effective heuristics have been proposed recently. In this work, we study a slightly different model of coflow scheduling in general graphs (to capture traffic between data centers) and develop practical and efficient approximation algorithms for it. Our main result is a randomized 2 approximation algorithm for the single path and free path model, significantly improving prior work. In addition, we demonstrate via extensive experiments that the algorithm is practical, easy to implement and performs well in practice.
While algebrisation constitutes a powerful technique in the design and analysis of centralised algorithms, to date there have been hardly any applications of algebraic techniques in the context of distributed graph algorithms. This work is a case study that demonstrates the potential of algebrisation in the distributed context. We will focus on distributed graph algorithms in the congested clique model; the graph problems that we will consider include, e.g., the triangle detection problem and the all-pairs shortest path problem (APSP). There is plenty of prior work on combinatorial algorithms in the congested clique model: for example, Dolev et al. (DISC 2012) gave an algorithm for triangle detection with a running time of $tilde O(n^{1/3})$, and Nanongkai (STOC 2014) gave an approximation algorithm for APSP with a running time of $tilde O(n^{1/2})$. In this work, we will use algebraic techniques -- in particular, algorithms based on fast matrix multiplication -- to solve both triangle detection and the unweighted APSP in time $O(n^{0.15715})$; for weighted APSP, we give a $(1+o(1))$-approximation with this running time, as well as an exact $tilde O(n^{1/3})$ solution.
We give the first Congested Clique algorithm that computes a sparse hopset with polylogarithmic hopbound in polylogarithmic time. Given a graph $G=(V,E)$, a $(beta,epsilon)$-hopset $H$ with hopbound $beta$, is a set of edges added to $G$ such that for any pair of nodes $u$ and $v$ in $G$ there is a path with at most $beta$ hops in $G cup H$ with length within $(1+epsilon)$ of the shortest path between $u$ and $v$ in $G$. Our hopsets are significantly sparser than the recent construction of Censor-Hillel et al. [6], that constructs a hopset of size $tilde{O}(n^{3/2})$, but with a smaller polylogarithmic hopbound. On the other hand, the previously known constructions of sparse hopsets with polylogarithmic hopbound in the Congested Clique model, proposed by Elkin and Neiman [10],[11],[12], all require polynomial rounds. One tool that we use is an efficient algorithm that constructs an $ell$-limited neighborhood cover, that may be of independent interest. Finally, as a side result, we also give a hopset construction in a variant of the low-memory Massively Parallel Computation model, with improved running time over existing algorithms.