No Arabic abstract
In this paper, we prove topology dependent bounds on the number of rounds needed to compute Functional Aggregate Queries (FAQs) studied by Abo Khamis et al. [PODS 2016] in a synchronous distributed network under the model considered by Chattopadhyay et al. [FOCS 2014, SODA 2017]. Unlike the recent work on computing database queries in the Massively Parallel Computation model, in the model of Chattopadhyay et al., nodes can communicate only via private point-to-point channels and we are interested in bounds that work over an {em arbitrary} communication topology. This is the first work to consider more practically motivated problems in this distributed model. For the sake of exposition, we focus on two special problems in this paper: Boolean Conjunctive Query (BCQ) and computing variable/factor marginals in Probabilistic Graphical Models (PGMs). We obtain tight bounds on the number of rounds needed to compute such queries as long as the underlying hypergraph of the query is $O(1)$-degenerate and has $O(1)$-arity. In particular, the $O(1)$-degeneracy condition covers most well-studied queries that are efficiently computable in the centralized computation model like queries with constant treewidth. These tight bounds depend on a new notion of `width (namely internal-node-width) for Generalized Hypertree Decompositions (GHDs) of acyclic hypergraphs, which minimizes the number of internal nodes in a sub-class of GHDs. To the best of our knowledge, this width has not been studied explicitly in the theoretical database literature. Finally, we consider the problem of computing the product of a vector with a chain of matrices and prove tight bounds on its round complexity (over the finite field of two elements) using a novel min-entropy based argument.
There are distributed graph algorithms for finding maximal matchings and maximal independent sets in $O(Delta + log^* n)$ communication rounds; here $n$ is the number of nodes and $Delta$ is the maximum degree. The lower bound by Linial (1987, 1992) shows that the dependency on $n$ is optimal: these problems cannot be solved in $o(log^* n)$ rounds even if $Delta = 2$. However, the dependency on $Delta$ is a long-standing open question, and there is currently an exponential gap between the upper and lower bounds. We prove that the upper bounds are tight. We show that maximal matchings and maximal independent sets cannot be found in $o(Delta + log log n / log log log n)$ rounds with any randomized algorithm in the LOCAL model of distributed computing. As a corollary, it follows that there is no deterministic algorithm for maximal matchings or maximal independent sets that runs in $o(Delta + log n / log log n)$ rounds; this is an improvement over prior lower bounds also as a function of $n$.
We use activity networks (task graphs) to model parallel programs and consider series-parallel extensions of these networks. Our motivation is two-fold: the benefits of series-parallel activity networks and the modelling of programming constructs, such as those imposed by current parallel computing environments. Series-parallelisation adds precedence constraints to an activity network, usually increasing its makespan (execution time). The slowdown ratio describes how additional constraints affect the makespan. We disprove an existing conjecture positing a bound of two on the slowdown when workload is not considered. Where workload is known, we conjecture that 4/3 slowdown is always achievable, and prove our conjecture for small networks using max-plus algebra. We analyse a polynomial-time algorithm showing that achieving 4/3 slowdown is in exp-APX. Finally, we discuss the implications of our results.
Determining the space complexity of $x$-obstruction-free $k$-set agreement for $xleq k$ is an open problem. In $x$-obstruction-free protocols, processes are required to return in executions where at most $x$ processes take steps. The best known upper bound on the number of registers needed to solve this problem among $n>k$ processes is $n-k+x$ registers. No general lower bound better than $2$ was known. We prove that any $x$-obstruction-free protocol solving $k$-set agreement among $n>k$ processes uses at least $lfloor(n-x)/(k+1-x)rfloor+1$ registers. Our main tool is a simulation that serves as a reduction from the impossibility of deterministic wait-free $k$-set agreement: if a protocol uses fewer registers, then it is possible for $k+1$ processes to simulate the protocol and deterministically solve $k$-set agreement in a wait-free manner, which is impossible. A critical component of the simulation is the ability of simulating processes to revise the past of simulated processes. We introduce a new augmented snapshot object, which facilitates this. We also prove that any space lower bound on the number of registers used by obstruction-free protocols applies to protocols that satisfy nondeterministic solo termination. Hence, our lower bound of $lfloor(n-1)/krfloor+1$ for the obstruction-free ($x=1$) case also holds for randomized wait-free free protocols. In particular, this gives a tight lower bound of exactly $n$ registers for solving obstruction-free and randomized wait-free consensus. Finally, our new techniques can be applied to get a space lower of $lfloor n/2rfloor+1$ for $epsilon$-approximate agreement, for sufficiently small $epsilon$. It requires participating processes to return values within $epsilon$ of each other. The best known upper bounds are $lceillog(1/epsilon)rceil$ and $n$, while no general lower bounds were known.
We prove tight network topology dependent bounds on the round complexity of computing well studied $k$-party functions such as set disjointness and element distinctness. Unlike the usual case in the CONGEST model in distributed computing, we fix the function and then vary the underlying network topology. This complements the recent such results on total communication that have received some attention. We also present some applications to distributed graph computation problems. Our main contribution is a proof technique that allows us to reduce the problem on a general graph topology to a relevant two-party communication complexity problem. However, unlike many previous works that also used the same high level strategy, we do not reason about a two-party communication problem that is induced by a cut in the graph. To `stitch back the various lower bounds from the two party communication problems, we use the notion of timed graph that has seen prior use in network coding. Our reductions use some tools from Steiner tree packing and multi-commodity flow problems that have a delay constraint.
We give tight cell-probe bounds for the time to compute convolution, multiplication and Hamming distance in a stream. The cell probe model is a particularly strong computational model and subsumes, for example, the popular word RAM model. We first consider online convolution where the task is to output the inner product between a fixed $n$-dimensional vector and a vector of the $n$ most recent values from a stream. One symbol of the stream arrives at a time and the each output must be computed before the next symbols arrives. Next we show bounds for online multiplication where the stream consists of pairs of digits, one from each of two $n$ digit numbers that are to be multiplied. One pair arrives at a time and the task is to output a single new digit from the product before the next pair of digits arrives. Finally we look at the online Hamming distance problem where the Hamming distance is outputted instead of the inner product. For each of these three problems, we give a lower bound of $Omega(frac{delta}{w}log n)$ time on average per output, where $delta$ is the number of bits needed to represent an input symbol and $w$ is the cell or word size. We argue that these bound are in fact tight within the cell probe model.