No Arabic abstract
We study the problem of clock synchronization in highly dynamic networks, where communication links can appear or disappear at any time. The nodes in the network are equipped with hardware clocks, but the rate of the hardware clocks can vary arbitrarily within specific bounds, and the estimates that nodes can obtain about the clock values of other nodes are inherently inaccurate. Our goal in this setting is to output a logical clock at each node such that the logical clocks of any two nodes are not too far apart, and nodes that remain close to each other in the network for a long time are better synchronized than distant nodes. This property is called gradient clock synchronization. Gradient clock synchronization has been widely studied in the static setting, where the network topology does not change. We show that the asymptotically optimal bounds obtained for the static case also apply to our highly dynamic setting: if two nodes remain at distance $d$ from each other for sufficiently long, it is possible to upper bound the difference between their clock values by $O(d log (D / d))$, where $D$ is the diameter of the network. This is known to be optimal even for static networks. Furthermore, we show that our algorithm has optimal stabilization time: when a path of length $d$ appears between two nodes, the time required until the clock skew between the two nodes is reduced to $O(d log (D / d))$ is $O(D)$, which we prove to be optimal. Finally, the techniques employed for the more intricate analysis of the algorithm for dynamic graphs provide additional insights that are also of interest for the static setting. In particular, we establish self-stabilization of the gradient property within $O(D)$ time.
In this paper we suggest a method by which reference broadcast synchronization (RBS), and other methods of estimating clock values, can be incorporated in standard clock synchronization algorithms to improve synchronization quality. We advocate a logical separation of the task of estimating the clock values of other nodes in the network from the task of using these estimates to output a logical clock value. The separation is achieved by means of a virtual estimate graph, overlaid on top of the real network graph, which represents the information various nodes can obtain about each other. RBS estimates are represented in the estimate graph as edges between nodes at distance 2 from each other in the original network graph. A clock synchronization algorithm then operates on the estimate graph as though it were the original network. To illustrate the merits of this approach, we modify a recent optimal gradient clock synchronization algorithm to work in this setting. The modified algorithm transparently takes advantage of RBS estimates and any other means by which nodes can estimate each others clock values.
The coflow scheduling problem has emerged as a popular abstraction in the last few years to study data communication problems within a data center. In this basic framework, each coflow has a set of communication demands and the goal is to schedule many coflows in a manner that minimizes the total weighted completion time. A coflow is said to complete when all its communication needs are met. This problem has been extremely well studied for the case of complete bipartite graphs that model a data center with full bisection bandwidth and several approximation algorithms and effective heuristics have been proposed recently. In this work, we study a slightly different model of coflow scheduling in general graphs (to capture traffic between data centers) and develop practical and efficient approximation algorithms for it. Our main result is a randomized 2 approximation algorithm for the single path and free path model, significantly improving prior work. In addition, we demonstrate via extensive experiments that the algorithm is practical, easy to implement and performs well in practice.
This paper provides an algorithmic framework for obtaining fast distributed algorithms for a highly-dynamic setting, in which *arbitrarily many* edge changes may occur in each round. Our algorithm significantly improves upon prior work in its combination of (1) having an $O(1)$ amortized time complexity, (2) using only $O(log{n})$-bit messages, (3) not posing any restrictions on the dynamic behavior of the environment, (4) being deterministic, (5) having strong guarantees for intermediate solutions, and (6) being applicable for a wide family of tasks. The tasks for which we deduce such an algorithm are maximal matching, $(degree+1)$-coloring, 2-approximation for minimum weight vertex cover, and maximal independent set (which is the most subtle case). For some of these tasks, node insertions can also be among the allowed topology changes, and for some of them also abrupt node deletions.
Clock synchronization and ranging over a wireless network with low communication overhead is a challenging goal with tremendous impact. In this paper, we study the use of time-to-digital converters in wireless sensors, which provides clock synchronization and ranging at negligible communication overhead through a sawtooth signal model for round trip times between two nodes. In particular, we derive Cram{e}r-Rao lower bounds for a linearitzation of the sawtooth signal model, and we thoroughly evaluate simple estimation techniques by simulation, giving clear and concise performance references for this technology.
We reduce the cost of communication and synchronization in graph processing by analyzing the fastest way to process graphs: pushing the updates to a shared state or pulling the updates to a private state.We investigate the applicability of this push-pull dichotomy to various algorithms and its impact on complexity, performance, and the amount of used locks, atomics, and reads/writes. We consider 11 graph algorithms, 3 programming models, 2 graph abstractions, and various families of graphs. The conducted analysis illustrates surprising differences between push and pull variants of different algorithms in performance, speed of convergence, and code complexity; the insights are backed up by performance data from hardware counters.We use these findings to illustrate which variant is faster for each algorithm and to develop generic strategies that enable even higher speedups. Our insights can be used to accelerate graph processing engines or libraries on both massively-parallel shared-memory machines as well as distributed-memory systems.