Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

98 0 0.0 ( 0 )

Download Cite

Added by Haoyu Zhao

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Haoyu Zhao - Zhize Li - Peter Richtarik

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Federated Averaging (FedAvg, also known as Local-SGD) (McMahan et al., 2017) is a classical federated learning algorithm in which clients run multiple local SGD steps before communicating their update to an orchestrating server. We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg. We show that FedPAGE uses much fewer communication rounds than previous local methods for both federated convex and nonconvex optimization. Concretely, 1) in the convex setting, the number of communication rounds of FedPAGE is $O(frac{N^{3/4}}{Sepsilon})$, improving the best-known result $O(frac{N}{Sepsilon})$ of SCAFFOLD (Karimireddy et al.,2020) by a factor of $N^{1/4}$, where $N$ is the total number of clients (usually is very large in federated learning), $S$ is the sampled subset of clients in each communication round, and $epsilon$ is the target error; 2) in the nonconvex setting, the number of communication rounds of FedPAGE is $O(frac{sqrt{N}+S}{Sepsilon^2})$, improving the best-known result $O(frac{N^{2/3}}{S^{2/3}epsilon^2})$ of SCAFFOLD (Karimireddy et al.,2020) by a factor of $N^{1/6}S^{1/3}$, if the sampled clients $Sleq sqrt{N}$. Note that in both settings, the communication cost for each round is the same for both FedPAGE and SCAFFOLD. As a result, FedPAGE achieves new state-of-the-art results in terms of communication complexity for both federated convex and nonconvex optimization.

rate research

Toward Communication Efficient Adaptive Gradient Method

169 - Xiangyi Chen , Xiaoyun Li , Ping Li 2021

In recent years, distributed optimization is proven to be an effective approach to accelerate training of large scale machine learning models such as deep neural networks. With the increasing computation power of GPUs, the bottleneck of training speed in distributed training is gradually shifting from computation to communication. Meanwhile, in the hope of training machine learning models on mobile devices, a new distributed training paradigm called ``federated learning has become popular. The communication time in federated learning is especially important due to the low bandwidth of mobile devices. While various approaches to improve the communication efficiency have been proposed for federated learning, most of them are designed with SGD as the prototype training algorithm. While adaptive gradient methods have been proven effective for training neural nets, the study of adaptive gradient methods in federated learning is scarce. In this paper, we propose an adaptive gradient method that can guarantee both the convergence and the communication efficiency for federated learning.

Machine Learning Distributed Parallel and Cluster Computing Machine Learning

CFedAvg: Achieving Efficient Communication and Fast Convergence in Non-IID Federated Learning

106 - Haibo Yang , Jia Liu , Elizabeth S. Bentley 2021

Federated learning (FL) is a prevailing distributed learning paradigm, where a large number of workers jointly learn a model without sharing their training data. However, high communication costs could arise in FL due to large-scale (deep) learning models and bandwidth-constrained connections. In this paper, we introduce a communication-efficient algorithmic framework called CFedAvg for FL with non-i.i.d. datasets, which works with general (biased or unbiased) SNR-constrained compressors. We analyze the convergence rate of CFedAvg for non-convex functions with constant and decaying learning rates. The CFedAvg algorithm can achieve an $mathcal{O}(1 / sqrt{mKT} + 1 / T)$ convergence rate with a constant learning rate, implying a linear speedup for convergence as the number of workers increases, where $K$ is the number of local steps, $T$ is the number of total communication rounds, and $m$ is the total worker number. This matches the convergence rate of distributed/federated learning without compression, thus achieving high communication efficiency while not sacrificing learning accuracy in FL. Furthermore, we extend CFedAvg to cases with heterogeneous local steps, which allows different workers to perform a different number of local steps to better adapt to their own circumstances. The interesting observation in general is that the noise/variance introduced by compressors does not affect the overall convergence rate order for non-i.i.d. FL. We verify the effectiveness of our CFedAvg algorithm on three datasets with two gradient compression schemes of different compression ratios.

Machine Learning Distributed Parallel and Cluster Computing

Dynamic Attention-based Communication-Efficient Federated Learning

199 - Zihan Chen , Kai Fong Ernest Chong , Tony Q. S. Quek 2021

Federated learning (FL) offers a solution to train a global machine learning model while still maintaining data privacy, without needing access to data stored locally at the clients. However, FL suffers performance degradation when client data distribution is non-IID, and a longer training duration to combat this degradation may not necessarily be feasible due to communication limitations. To address this challenge, we propose a new adaptive training algorithm $texttt{AdaFL}$, which comprises two components: (i) an attention-based client selection mechanism for a fairer training scheme among the clients; and (ii) a dynamic fraction method to balance the trade-off between performance stability and communication efficiency. Experimental results show that our $texttt{AdaFL}$ algorithm outperforms the usual $texttt{FedAvg}$ algorithm, and can be incorporated to further improve various state-of-the-art FL algorithms, with respect to three aspects: model accuracy, performance stability, and communication efficiency.

Machine Learning Distributed Parallel and Cluster Computing

Communication-Efficient Federated Learning with Compensated Overlap-FedAvg

201 - Yuhao Zhou , Ye Qing , 2020

Petabytes of data are generated each day by emerging Internet of Things (IoT), but only few of them can be finally collected and used for Machine Learning (ML) purposes due to the apprehension of data & privacy leakage, which seriously retarding MLs growth. To alleviate this problem, Federated learning is proposed to perform model training by multiple clients combined data without the dataset sharing within the cluster. Nevertheless, federated learning introduces massive communication overhead as the synchronized data in each epoch is of the same size as the model, and thereby leading to a low communication efficiency. Consequently, variant methods mainly focusing on the communication rounds reduction and data compression are proposed to reduce the communication overhead of federated learning. In this paper, we propose Overlap-FedAvg, a framework that parallels the model training phase with model uploading & downloading phase, so that the latter phase can be totally covered by the former phase. Compared to vanilla FedAvg, Overlap-FedAvg is further developed with a hierarchical computing strategy, a data compensation mechanism and a nesterov accelerated gradients~(NAG) algorithm. Besides, Overlap-FedAvg is orthogonal to many other compression methods so that they can be applied together to maximize the utilization of the cluster. Furthermore, the theoretical analysis is provided to prove the convergence of the proposed Overlap-FedAvg framework. Extensive experiments on both conventional and recurrent tasks with multiple models and datasets also demonstrate that the proposed Overlap-FedAvg framework substantially boosts the federated learning process.

Machine Learning Distributed Parallel and Cluster Computing

Adaptive Quantization of Model Updates for Communication-Efficient Federated Learning

138 - Divyansh Jhunjhunwala , Advait Gadhikar , Gauri Joshi 2021

Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning, especially in bandwidth-limited settings and high-dimensional models. Gradient quantization is an effective way of reducing the number of bits required to communicate each model update, albeit at the cost of having a higher error floor due to the higher variance of the stochastic gradients. In this work, we propose an adaptive quantization strategy called AdaQuantFL that aims to achieve communication efficiency as well as a low error floor by changing the number of quantization levels during the course of training. Experiments on training deep neural networks show that our method can converge in much fewer communicated bits as compared to fixed quantization level setups, with little or no impact on training and test accuracy.

Machine Learning Distributed Parallel and Cluster Computing Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions