No Arabic abstract
Federated Learning (FL) is known to perform Machine Learning tasks in a distributed manner. Over the years, this has become an emerging technology especially with various data protection and privacy policies being imposed FL allows performing machine learning tasks whilst adhering to these challenges. As with the emerging of any new technology, there are going to be challenges and benefits. A challenge that exists in FL is the communication costs, as FL takes place in a distributed environment where devices connected over the network have to constantly share their updates this can create a communication bottleneck. In this paper, we present a survey of the research that is performed to overcome the communication constraints in an FL setting.
Federated learning has allowed the training of statistical models over remote devices without the transfer of raw client data. In practice, training in heterogeneous and large networks introduce novel challenges in various aspects like network load, quality of client data, security and privacy. Recent works in FL have worked on improving communication efficiency and addressing uneven client data distribution independently, but none have provided a unified solution for both challenges. We introduce a new family of Federated Learning algorithms called CatFedAvg which not only improves the communication efficiency but improves the quality of learning using a category coverage maximization strategy. We use the FedAvg framework and introduce a simple and efficient step every epoch to collect meta-data about the clients training data structure which the central server uses to request a subset of weight updates. We explore two distinct variations which allow us to further explore the tradeoffs between communication efficiency and model accuracy. Our experiments based on a vision classification task have shown that an increase of 10% absolute points in accuracy using the MNIST dataset with 70% absolute points lower network transfer over FedAvg. We also run similar experiments with Fashion MNIST, KMNIST-10, KMNIST-49 and EMNIST-47. Further, under extreme data imbalance experiments for both globally and individual clients, we see the model performing better than FedAvg. The ablation study further explores its behaviour under varying data and client parameter conditions showcasing the robustness of the proposed approach.
We compare communication efficiencies of two compelling distributed machine learning approaches of split learning and federated learning. We show useful settings under which each method outperforms the other in terms of communication efficiency. We consider various practical scenarios of distributed learning setup and juxtapose the two methods under various real-life scenarios. We consider settings of small and large number of clients as well as small models (1M - 6M parameters), large models (10M - 200M parameters) and very large models (1 Billion-100 Billion parameters). We show that increasing number of clients or increasing model size favors split learning setup over the federated while increasing the number of data samples while keeping the number of clients or model size low makes federated learning more communication efficient.
Decentralized federated learning (DFL) is a powerful framework of distributed machine learning and decentralized stochastic gradient descent (SGD) is a driving engine for DFL. The performance of decentralized SGD is jointly influenced by communication-efficiency and convergence rate. In this paper, we propose a general decentralized federated learning framework to strike a balance between communication-efficiency and convergence performance. The proposed framework performs both multiple local updates and multiple inter-node communications periodically, unifying traditional decentralized SGD methods. We establish strong convergence guarantees for the proposed DFL algorithm without the assumption of convex objective function. The balance of communication and computation rounds is essential to optimize decentralized federated learning under constrained communication and computation resources. For further improving communication-efficiency of DFL, compressed communication is applied to DFL, named DFL with compressed communication (C-DFL). The proposed C-DFL exhibits linear convergence for strongly convex objectives. Experiment results based on MNIST and CIFAR-10 datasets illustrate the superiority of DFL over traditional decentralized SGD methods and show that C-DFL further enhances communication-efficiency.
The lottery ticket hypothesis (LTH) claims that a deep neural network (i.e., ground network) contains a number of subnetworks (i.e., winning tickets), each of which exhibiting identically accurate inference capability as that of the ground network. Federated learning (FL) has recently been applied in LotteryFL to discover such winning tickets in a distributed way, showing higher accuracy multi-task learning than Vanilla FL. Nonetheless, LotteryFL relies on unicast transmission on the downlink, and ignores mitigating stragglers, questioning scalability. Motivated by this, in this article we propose a personalized and communication-efficient federated lottery ticket learning algorithm, coined CELL, which exploits downlink broadcast for communication efficiency. Furthermore, it utilizes a novel user grouping method, thereby alternating between FL and lottery learning to mitigate stragglers. Numerical simulations validate that CELL achieves up to 3.6% higher personalized task classification accuracy with 4.3x smaller total communication cost until convergence under the CIFAR-10 dataset.
Federated learning (FL) offers a solution to train a global machine learning model while still maintaining data privacy, without needing access to data stored locally at the clients. However, FL suffers performance degradation when client data distribution is non-IID, and a longer training duration to combat this degradation may not necessarily be feasible due to communication limitations. To address this challenge, we propose a new adaptive training algorithm $texttt{AdaFL}$, which comprises two components: (i) an attention-based client selection mechanism for a fairer training scheme among the clients; and (ii) a dynamic fraction method to balance the trade-off between performance stability and communication efficiency. Experimental results show that our $texttt{AdaFL}$ algorithm outperforms the usual $texttt{FedAvg}$ algorithm, and can be incorporated to further improve various state-of-the-art FL algorithms, with respect to three aspects: model accuracy, performance stability, and communication efficiency.