CFedAvg: Achieving Efficient Communication and Fast Convergence in Non-IID Federated Learning

107 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Haibo Yang Mr

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Haibo Yang - Jia Liu - Elizabeth S. Bentley

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Federated learning (FL) is a prevailing distributed learning paradigm, where a large number of workers jointly learn a model without sharing their training data. However, high communication costs could arise in FL due to large-scale (deep) learning models and bandwidth-constrained connections. In this paper, we introduce a communication-efficient algorithmic framework called CFedAvg for FL with non-i.i.d. datasets, which works with general (biased or unbiased) SNR-constrained compressors. We analyze the convergence rate of CFedAvg for non-convex functions with constant and decaying learning rates. The CFedAvg algorithm can achieve an $mathcal{O}(1 / sqrt{mKT} + 1 / T)$ convergence rate with a constant learning rate, implying a linear speedup for convergence as the number of workers increases, where $K$ is the number of local steps, $T$ is the number of total communication rounds, and $m$ is the total worker number. This matches the convergence rate of distributed/federated learning without compression, thus achieving high communication efficiency while not sacrificing learning accuracy in FL. Furthermore, we extend CFedAvg to cases with heterogeneous local steps, which allows different workers to perform a different number of local steps to better adapt to their own circumstances. The interesting observation in general is that the noise/variance introduced by compressors does not affect the overall convergence rate order for non-i.i.d. FL. We verify the effectiveness of our CFedAvg algorithm on three datasets with two gradient compression schemes of different compression ratios.

قيم البحث

367 - Haibo Yang , Minghong Fang , Jia Liu 2021

Federated learning (FL) is a distributed machine learning architecture that leverages a large number of workers to jointly learn a model with decentralized data. FL has received increasing attention in recent years thanks to its data privacy protecti on, communication efficiency and a linear speedup for convergence in training (i.e., convergence performance increases linearly with respect to the number of workers). However, existing studies on linear speedup for convergence are only limited to the assumptions of i.i.d. datasets across workers and/or full worker participation, both of which rarely hold in practice. So far, it remains an open question whether or not the linear speedup for convergence is achievable under non-i.i.d. datasets with partial worker participation in FL. In this paper, we show that the answer is affirmative. Specifically, we show that the federated averaging (FedAvg) algorithm (with two-sided learning rates) on non-i.i.d. datasets in non-convex settings achieves a convergence rate $mathcal{O}(frac{1}{sqrt{mKT}} + frac{1}{T})$ for full worker participation and a convergence rate $mathcal{O}(frac{sqrt{K}}{sqrt{nT}} + frac{1}{T})$ for partial worker participation, where $K$ is the number of local steps, $T$ is the number of total communication rounds, $m$ is the total worker number and $n$ is the worker number in one communication round if for partial worker participation. Our results also reveal that the local steps in FL could help the convergence and show that the maximum number of local steps can be improved to $T/m$ in full worker participation. We conduct extensive experiments on MNIST and CIFAR-10 to verify our theoretical results.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

RingFed: Reducing Communication Costs in Federated Learning on Non-IID Data

121 - Guang Yang , Ke Mu , Chunhe Song 2021

Federated learning is a widely used distributed deep learning framework that protects the privacy of each client by exchanging model parameters rather than raw data. However, federated learning suffers from high communication costs, as a considerable number of model parameters need to be transmitted many times during the training process, making the approach inefficient, especially when the communication network bandwidth is limited. This article proposes RingFed, a novel framework to reduce communication overhead during the training process of federated learning. Rather than transmitting parameters between the center server and each client, as in original federated learning, in the proposed RingFed, the updated parameters are transmitted between each client in turn, and only the final result is transmitted to the central server, thereby reducing the communication overhead substantially. After several local updates, clients first send their parameters to another proximal client, not to the center server directly, to preaggregate. Experiments on two different public datasets show that RingFed has fast convergence, high model accuracy, and low communication cost.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

Federated Learning on Non-IID Data: A Survey

104 - Hangyu Zhu , Jinjin Xu , Shiqing Liu 2021

Federated learning is an emerging distributed machine learning framework for privacy preservation. However, models trained in federated learning usually have worse performance than those trained in the standard centralized learning mode, especially w hen the training data are not independent and identically distributed (Non-IID) on the local devices. In this survey, we pro-vide a detailed analysis of the influence of Non-IID data on both parametric and non-parametric machine learning models in both horizontal and vertical federated learning. In addition, cur-rent research work on handling challenges of Non-IID data in federated learning are reviewed, and both advantages and disadvantages of these approaches are discussed. Finally, we suggest several future research directions before concluding the paper.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

97 - Haoyu Zhao , Zhize Li , Peter Richtarik 2021

Federated Averaging (FedAvg, also known as Local-SGD) (McMahan et al., 2017) is a classical federated learning algorithm in which clients run multiple local SGD steps before communicating their update to an orchestrating server. We propose a new fede rated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg. We show that FedPAGE uses much fewer communication rounds than previous local methods for both federated convex and nonconvex optimization. Concretely, 1) in the convex setting, the number of communication rounds of FedPAGE is $O(frac{N^{3/4}}{Sepsilon})$, improving the best-known result $O(frac{N}{Sepsilon})$ of SCAFFOLD (Karimireddy et al.,2020) by a factor of $N^{1/4}$, where $N$ is the total number of clients (usually is very large in federated learning), $S$ is the sampled subset of clients in each communication round, and $epsilon$ is the target error; 2) in the nonconvex setting, the number of communication rounds of FedPAGE is $O(frac{sqrt{N}+S}{Sepsilon^2})$, improving the best-known result $O(frac{N^{2/3}}{S^{2/3}epsilon^2})$ of SCAFFOLD (Karimireddy et al.,2020) by a factor of $N^{1/6}S^{1/3}$, if the sampled clients $Sleq sqrt{N}$. Note that in both settings, the communication cost for each round is the same for both FedPAGE and SCAFFOLD. As a result, FedPAGE achieves new state-of-the-art results in terms of communication complexity for both federated convex and nonconvex optimization.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية بنى وهياكل البيانات والخوارزميات

Dynamic Attention-based Communication-Efficient Federated Learning

199 - Zihan Chen , Kai Fong Ernest Chong , Tony Q. S. Quek 2021

Federated learning (FL) offers a solution to train a global machine learning model while still maintaining data privacy, without needing access to data stored locally at the clients. However, FL suffers performance degradation when client data distri bution is non-IID, and a longer training duration to combat this degradation may not necessarily be feasible due to communication limitations. To address this challenge, we propose a new adaptive training algorithm $texttt{AdaFL}$, which comprises two components: (i) an attention-based client selection mechanism for a fairer training scheme among the clients; and (ii) a dynamic fraction method to balance the trade-off between performance stability and communication efficiency. Experimental results show that our $texttt{AdaFL}$ algorithm outperforms the usual $texttt{FedAvg}$ algorithm, and can be incorporated to further improve various state-of-the-art FL algorithms, with respect to three aspects: model accuracy, performance stability, and communication efficiency.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية