On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization


Abstract in English

In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {em lossy compresse

Download