No Arabic abstract
Federated learning facilitates learning across clients without transferring local data on these clients to a central server. Despite the success of the federated learning method, it remains to improve further w.r.t communicating the most critical information to update a model under limited communication conditions, which can benefit this learning scheme into a wide range of application scenarios. In this work, we propose a nonlinear quantization for compressed stochastic gradient descent, which can be easily utilized in federated learning. Based on the proposed quantization, our system significantly reduces the communication cost by up to three orders of magnitude, while maintaining convergence and accuracy of the training process to a large extent. Extensive experiments are conducted on image classification and brain tumor semantic segmentation using the MNIST, CIFAR-10 and BraTS datasets where we show state-of-the-art effectiveness and impressive communication efficiency.
Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning, especially in bandwidth-limited settings and high-dimensional models. Gradient quantization is an effective way of reducing the number of bits required to communicate each model update, albeit at the cost of having a higher error floor due to the higher variance of the stochastic gradients. In this work, we propose an adaptive quantization strategy called AdaQuantFL that aims to achieve communication efficiency as well as a low error floor by changing the number of quantization levels during the course of training. Experiments on training deep neural networks show that our method can converge in much fewer communicated bits as compared to fixed quantization level setups, with little or no impact on training and test accuracy.
Federated learning (FL) has attracted tremendous attentions in recent years due to its privacy preserving measures and great potentials in some distributed but privacy-sensitive applications like finance and health. However, high communication overloads for transmitting high-dimensional networks and extra security masks remains a bottleneck of FL. This paper proposes a communication-efficient FL framework with Adaptive Quantized Gradient (AQG) which adaptively adjusts the quantization level based on local gradients update to fully utilize the heterogeneousness of local data distribution for reducing unnecessary transmissions. Besides, the client dropout issues are taken into account and the Augmented AQG is developed, which could limit the dropout noise with an appropriate amplification mechanism for transmitted gradients. Theoretical analysis and experiment results show that the proposed AQG leads to 25%-50% of additional transmission reduction as compared to existing popular methods including Quantized Gradient Descent (QGD) and Lazily Aggregated Quantized (LAQ) gradient-based method without deteriorating convergence properties. Particularly, experiments with heterogenous data distributions corroborate a more significant transmission reduction compared with independent identical data distributions. Meanwhile, the proposed AQG is robust to a client dropping rate up to 90% empirically, and the Augmented AQG manages to further improve the FL systems communication efficiency with the presence of moderate-scale client dropouts commonly seen in practical FL scenarios.
Federated learning (FL) offers a solution to train a global machine learning model while still maintaining data privacy, without needing access to data stored locally at the clients. However, FL suffers performance degradation when client data distribution is non-IID, and a longer training duration to combat this degradation may not necessarily be feasible due to communication limitations. To address this challenge, we propose a new adaptive training algorithm $texttt{AdaFL}$, which comprises two components: (i) an attention-based client selection mechanism for a fairer training scheme among the clients; and (ii) a dynamic fraction method to balance the trade-off between performance stability and communication efficiency. Experimental results show that our $texttt{AdaFL}$ algorithm outperforms the usual $texttt{FedAvg}$ algorithm, and can be incorporated to further improve various state-of-the-art FL algorithms, with respect to three aspects: model accuracy, performance stability, and communication efficiency.
Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm, called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch. This allows the algorithm to move momentum and error accumulation from clients to the central aggregator, overcoming the challenges of sparse client participation while still achieving high compression rates and good convergence. We prove that FetchSGD has favorable convergence guarantees, and we demonstrate its empirical effectiveness by training two residual networks and a transformer model.
Petabytes of data are generated each day by emerging Internet of Things (IoT), but only few of them can be finally collected and used for Machine Learning (ML) purposes due to the apprehension of data & privacy leakage, which seriously retarding MLs growth. To alleviate this problem, Federated learning is proposed to perform model training by multiple clients combined data without the dataset sharing within the cluster. Nevertheless, federated learning introduces massive communication overhead as the synchronized data in each epoch is of the same size as the model, and thereby leading to a low communication efficiency. Consequently, variant methods mainly focusing on the communication rounds reduction and data compression are proposed to reduce the communication overhead of federated learning. In this paper, we propose Overlap-FedAvg, a framework that parallels the model training phase with model uploading & downloading phase, so that the latter phase can be totally covered by the former phase. Compared to vanilla FedAvg, Overlap-FedAvg is further developed with a hierarchical computing strategy, a data compensation mechanism and a nesterov accelerated gradients~(NAG) algorithm. Besides, Overlap-FedAvg is orthogonal to many other compression methods so that they can be applied together to maximize the utilization of the cluster. Furthermore, the theoretical analysis is provided to prove the convergence of the proposed Overlap-FedAvg framework. Extensive experiments on both conventional and recurrent tasks with multiple models and datasets also demonstrate that the proposed Overlap-FedAvg framework substantially boosts the federated learning process.