Self-stabilising Byzantine Clock Synchronisation is Almost as Easy as Consensus


Abstract in English

We give fault-tolerant algorithms for establishing synchrony in distributed systems in which each of the $n$ nodes has its own clock. Our algorithms operate in a very strong fault model: we require self-stabilisation, i.e., the initial state of the system may be arbitrary, and there can be up to $f<n/3$ ongoing Byzantine faults, i.e., nodes that deviate from the protocol in an arbitrary manner. Furthermore, we assume that the local clocks of the nodes may progress at different speeds (clock drift) and communication has bounded delay. In this model, we study the pulse synchronisation problem, where the task is to guarantee that eventually all correct nodes generate well-separated local pulse events (i.e., unlabelled logical clock ticks) in a synchronised manner. Compared to prior work, we achieve exponential improvements in stabilisation time and the number of communicated bits, and give the first sublinear-time algorithm for the problem: - In the deterministic setting, the state-of-the-art solutions stabilise in time $Theta(f)$ and have each node broadcast $Theta(f log f)$ bits per time unit. We exponentially reduce the number of bits broadcasted per time unit to $Theta(log f)$ while retaining the same stabilisation time. - In the randomised setting, the state-of-the-art solutions stabilise in time $Theta(f)$ and have each node broadcast $O(1)$ bits per time unit. We exponentially reduce the stabilisation time to $log^{O(1)} f$ while each node broadcasts $log^{O(1)} f$ bits per time unit. These results are obtained by means of a recursive approach reducing the above task of self-stabilising pulse synchronisation in the bounded-delay model to non-self-stabilising binary consensus in the synchronous model. In general, our approach introduces at most logarithmic overheads in terms of stabilisation time and broadcasted bits over the underlying consensus routine.

Download