$texttt{DeepSqueeze}$: Decentralization Meets Error-Compensated Compression


Abstract in English

Communication is a key bottleneck in distributed training. Recently, an emph{error-compensated} compression technology was particularly designed for the emph{centralized} learning and receives huge successes, by showing significant advantages over state-of-the-art compression based methods in saving the communication cost. Since the emph{decentralized} training has been witnessed to be superior to the traditional emph{centralized} training in the communication restricted scenario, therefore a natural question to ask is how to apply the error-compensated technology to the decentralized learning to further reduce the communication cost. However, a trivial extension of compression based centralized training algorithms does not exist for the decentralized scenario. key difference between centralized and decentralized training makes this extension extremely non-trivial. In this paper, we propose an elegant algorithmic design to employ error-compensated stochastic gradient descent for the decentralized scenario, named $texttt{DeepSqueeze}$. Both the theoretical analysis and the empirical study are provided to show the proposed $texttt{DeepSqueeze}$ algorithm outperforms the existing compression based decentralized learning algorithms. To the best of our knowledge, this is the first time to apply the error-compensated compression to the decentralized learning.

Download