Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Communication-Efficient Distributed Dual Coordinate Ascent

872 0 0.0 ( 0 )

Download Cite

Added by Martin Jaggi

Publication date 2014

fields Informatics Engineering

and research's language is English

Authors Martin Jaggi - Virginia Smith - Martin Takav{c}

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Communication remains the most significant bottleneck in the performance of distributed optimization algorithms for large-scale machine learning. In this paper, we propose a communication-efficient framework, CoCoA, that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication. We provide a strong convergence rate analysis for this class of algorithms, as well as experiments on real-world distributed datasets with implementations in Spark. In our experiments, we find that as compared to state-of-the-art mini-bat

rate research

Communication-efficient Distributed Multi-resource Allocation

75 - Syed Eqbal Alam , Robert Shorten , Fabian Wirth 2018

In several smart city applications, multiple resources must be allocated among competing agents that are coupled through such shared resources and are constrained --- either through limitations of communication infrastructure or privacy considerations. We propose a distributed algorithm to solve such distributed multi-resource allocation problems with no direct inter-agent communication. We do so by extending a recently introduced additive-increase multiplicative-decrease (AIMD) algorithm, which only uses very little communication between the system and agents. Namely, a control unit broadcasts a one-bit signal to agents whenever one of the allocated resources exceeds capacity. Agents then respond to this signal in a probabilistic manner. In the proposed algorithm, each agent makes decision of its resource demand locally and an agent is unaware of the resource allocation of other agents. In empirical results, we observe that the average allocations converge over time to optimal allocations.

Systems and Control Optimization and Control

A Communication Efficient Collaborative Learning Framework for Distributed Features

117 - Yang Liu , Yan Kang , Xinwei Zhang 2019

We introduce a collaborative learning framework allowing multiple parties having different sets of attributes about the same user to jointly build models without exposing their raw data or model parameters. In particular, we propose a Federated Stochastic Block Coordinate Descent (FedBCD) algorithm, in which each party conducts multiple local updates before each communication to effectively reduce the number of communication rounds among parties, a principal bottleneck for collaborative learning problems. We analyze theoretically the impact of the number of local updates and show that when the batch size, sample size, and the local iterations are selected appropriately, within $T$ iterations, the algorithm performs $mathcal{O}(sqrt{T})$ communication rounds and achieves some $mathcal{O}(1/sqrt{T})$ accuracy (measured by the average of the gradient norm squared). The approach is supported by our empirical evaluations on a variety of tasks and datasets, demonstrating advantages over stochastic gradient descent (SGD) approaches.

Machine Learning Artificial Intelligence Machine Learning

Stochastic Dual Coordinate Ascent with Alternating Direction Multiplier Method

414 - Taiji Suzuki 2013

We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems. Our method is based on Alternating Direction Multiplier Method (ADMM) to deal with complex regularization functions such as structured regularizations. Although the original ADMM is a batch method, the proposed method offers a stochastic update rule where each iteration requires only one or few sample observations. Moreover, our method can naturally afford mini-batch update and it gives speed up of convergence. We show that, under mild assumptions, our method converges exponentially. The numerical experiments show that our method actually performs efficiently.

Machine Learning

Communication-efficient distributed SGD with Sketching

191 - Nikita Ivkin , Daniel Rothchild , Enayat Ullah 2019

Large-scale distributed training of neural networks is often limited by network bandwidth, wherein the communication time overwhelms the local computation time. Motivated by the success of sketching methods in sub-linear/streaming algorithms, we introduce Sketched SGD, an algorithm for carrying out distributed SGD by communicating sketches instead of full gradients. We show that Sketched SGD has favorable convergence rates on several classes of functions. When considering all communication -- both of gradients and of updated model weights -- Sketched SGD reduces the amount of communication required compared to other gradient compression methods from $mathcal{O}(d)$ or $mathcal{O}(W)$ to $mathcal{O}(log d)$, where $d$ is the number of model parameters and $W$ is the number of workers participating in training. We run experiments on a transformer model, an LSTM, and a residual network, demonstrating up to a 40x reduction in total communication cost with no loss in final model performance. We also show experimentally that Sketched SGD scales to at least 256 workers without increasing communication cost or degrading model performance.

Machine Learning Distributed Parallel and Cluster Computing Optimization and Control

Communication-Efficient Network-Distributed Optimization with Differential-Coded Compressors

158 - Xin Zhang , Jia Liu , Zhengyuan Zhu 2019

Network-distributed optimization has attracted significant attention in recent years due to its ever-increasing applications. However, the classic decentralized gradient descent (DGD) algorithm is communication-inefficient for large-scale and high-dimensional network-distributed optimization problems. To address this challenge, many compressed DGD-based algorithms have been proposed. However, most of the existing works have high complexity and assume compressors with bounded noise power. To overcome these limitations, in this paper, we propose a new differential-coded compressed DGD (DC-DGD) algorithm. The key features of DC-DGD include: i) DC-DGD works with general SNR-constrained compressors, relaxing the bounded noise power assumption; ii) The differential-coded design entails the same convergence rate as the original DGD algorithm; and iii) DC-DGD has the same low-complexity structure as the original DGD due to a {em self-noise-reduction effect}. Moreover, the above features inspire us to develop a hybrid compression scheme that offers a systematic mechanism to minimize the communication cost. Finally, we conduct extensive experiments to verify the efficacy of the proposed DC-DGD and hybrid compressor.

Distributed Parallel and Cluster Computing Optimization and Control

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Communication-Efficient Distributed Dual Coordinate Ascent

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions