Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Communication-Efficient Distributed Optimization with Quantized Preconditioners

300 0 0.0 ( 0 )

Download Cite

Added by Foivos Alimisis

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Foivos Alimisis - Peter Davies - Dan Alistarh

Optimization and Control Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We investigate fast and communication-efficient algorithms for the classic problem of minimizing a sum of strongly convex and smooth functions that are distributed among $n$ different nodes, which can communicate using a limited number of bits. Most previous communication-efficient approaches for this problem are limited to first-order optimization, and therefore have emph{linear} dependence on the condition number in their communication complexity. We show that this dependence is not inherent: communication-efficient methods can in fact have sublinear dependence on the condition number. For this, we design and analyze the first communication-efficient distributed variants of preconditioned gradient descent for Generalized Linear Models, and for Newtons method. Our results rely on a new technique for quantizing both the preconditioner and the descent direction at each step of the algorithms, while controlling their convergence rate. We also validate our findings experimentally, showing fast convergence and reduced communication.

rate research

Innovation Compression for Communication-efficient Distributed Optimization with Linear Convergence

192 - Jiaqi Zhang , Keyou You , Lihua Xie 2021

Information compression is essential to reduce communication cost in distributed optimization over peer-to-peer networks. This paper proposes a communication-efficient linearly convergent distributed (COLD) algorithm to solve strongly convex optimization problems. By compressing innovation vectors, which are the differences between decision vectors and their estimates, COLD is able to achieve linear convergence for a class of $delta$-contracted compressors. We explicitly quantify how the compression affects the convergence rate and show that COLD matches the same rate of its uncompressed version. To accommodate a wider class of compressors that includes the binary quantizer, we further design a novel dynamical scaling mechanism and obtain the linearly convergent Dyna-COLD. Importantly, our results strictly improve existing results for the quantized consensus problem. Numerical experiments demonstrate the advantages of both algorithms under different compressors.

Optimization and Control Distributed Parallel and Cluster Computing Machine Learning

Parallel Restarted SPIDER -- Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity

124 - Pranay Sharma , Swatantra Kafle , Prashant Khanduri 2019

In this paper, we propose a distributed algorithm for stochastic smooth, non-convex optimization. We assume a worker-server architecture where $N$ nodes, each having $n$ (potentially infinite) number of samples, collaborate with the help of a central server to perform the optimization task. The global objective is to minimize the average of local cost functions available at individual nodes. The proposed approach is a non-trivial extension of the popular parallel-restarted SGD algorithm, incorporating the optimal variance-reduction based SPIDER gradient estimator into it. We prove convergence of our algorithm to a first-order stationary solution. The proposed approach achieves the best known communication complexity $O(epsilon^{-1})$ along with the optimal computation complexity. For finite-sum problems (finite $n$), we achieve the optimal computation (IFO) complexity $O(sqrt{Nn}epsilon^{-1})$. For online problems ($n$ unknown or infinite), we achieve the optimal IFO complexity $O(epsilon^{-3/2})$. In both the cases, we maintain the linear speedup achieved by existing methods. This is a massive improvement over the $O(epsilon^{-2})$ IFO complexity of the existing approaches. Additionally, our algorithm is general enough to allow non-identical distributions of data across workers, as in the recently proposed federated learning paradigm.

Optimization and Control Distributed Parallel and Cluster Computing Machine Learning

Nested Distributed Gradient Methods with Adaptive Quantized Communication

89 - Albert S. Berahas , Charikleia Iakovidou , Ermin Wei 2019

In this paper, we consider minimizing a sum of local convex objective functions in a distributed setting, where communication can be costly. We propose and analyze a class of nested distributed gradient methods with adaptive quantized communication (NEAR-DGD+Q). We show the effect of performing multiple quantized communication steps on the rate of convergence and on the size of the neighborhood of convergence, and prove R-Linear convergence to the exact solution with increasing number of consensus steps and adaptive quantization. We test the performance of the method, as well as some practical variants, on quadratic functions, and show the effects of multiple quantized communication steps in terms of iterations/gradient evaluations, communication and cost.

Optimization and Control

Communication-efficient distributed SGD with Sketching

191 - Nikita Ivkin , Daniel Rothchild , Enayat Ullah 2019

Large-scale distributed training of neural networks is often limited by network bandwidth, wherein the communication time overwhelms the local computation time. Motivated by the success of sketching methods in sub-linear/streaming algorithms, we introduce Sketched SGD, an algorithm for carrying out distributed SGD by communicating sketches instead of full gradients. We show that Sketched SGD has favorable convergence rates on several classes of functions. When considering all communication -- both of gradients and of updated model weights -- Sketched SGD reduces the amount of communication required compared to other gradient compression methods from $mathcal{O}(d)$ or $mathcal{O}(W)$ to $mathcal{O}(log d)$, where $d$ is the number of model parameters and $W$ is the number of workers participating in training. We run experiments on a transformer model, an LSTM, and a residual network, demonstrating up to a 40x reduction in total communication cost with no loss in final model performance. We also show experimentally that Sketched SGD scales to at least 256 workers without increasing communication cost or degrading model performance.

Machine Learning Distributed Parallel and Cluster Computing Optimization and Control

Walkman: A Communication-Efficient Random-Walk Algorithm for Decentralized Optimization

138 - Xianghui Mao , Kun Yuan , Yubin Hu 2018

This paper addresses consensus optimization problems in a multi-agent network, where all agents collaboratively find a minimizer for the sum of their private functions. We develop a new decentralized algorithm in which each agent communicates only with its neighbors. State-of-the-art decentralized algorithms use communications between either all pairs of adjacent agents or a random subset of them at each iteration. Another class of algorithms uses a random walk incremental strategy, which sequentially activates a succession of nodes; these incremental algorithms require diminishing step sizes to converge to the solution, so their convergence is relatively slow. In this work, we propose a random walk algorithm that uses a fixed step size and converges faster than the existing random walk incremental algorithms. Our algorithm is also communication efficient. Each iteration uses only one link to communicate the latest information for an agent to another. Since this communication rule mimics a man walking around the network, we call our new algorithm Walkman. We establish convergence for convex and nonconvex objectives. For decentralized least squares, we derive a linear rate of convergence and obtain a better communication complexity than those of other decentralized algorithms. Numerical experiments verify our analysis results.

Optimization and Control Distributed Parallel and Cluster Computing Multiagent Systems

comments

Fetching comments

Higher Institute of Business Administration

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Communication-Efficient Distributed Optimization with Quantized Preconditioners

Ask ChatGPT about the research

No Arabic abstract

Read More