No Arabic abstract
Federated learning has allowed the training of statistical models over remote devices without the transfer of raw client data. In practice, training in heterogeneous and large networks introduce novel challenges in various aspects like network load, quality of client data, security and privacy. Recent works in FL have worked on improving communication efficiency and addressing uneven client data distribution independently, but none have provided a unified solution for both challenges. We introduce a new family of Federated Learning algorithms called CatFedAvg which not only improves the communication efficiency but improves the quality of learning using a category coverage maximization strategy. We use the FedAvg framework and introduce a simple and efficient step every epoch to collect meta-data about the clients training data structure which the central server uses to request a subset of weight updates. We explore two distinct variations which allow us to further explore the tradeoffs between communication efficiency and model accuracy. Our experiments based on a vision classification task have shown that an increase of 10% absolute points in accuracy using the MNIST dataset with 70% absolute points lower network transfer over FedAvg. We also run similar experiments with Fashion MNIST, KMNIST-10, KMNIST-49 and EMNIST-47. Further, under extreme data imbalance experiments for both globally and individual clients, we see the model performing better than FedAvg. The ablation study further explores its behaviour under varying data and client parameter conditions showcasing the robustness of the proposed approach.
Federated Learning (FL) is known to perform Machine Learning tasks in a distributed manner. Over the years, this has become an emerging technology especially with various data protection and privacy policies being imposed FL allows performing machine learning tasks whilst adhering to these challenges. As with the emerging of any new technology, there are going to be challenges and benefits. A challenge that exists in FL is the communication costs, as FL takes place in a distributed environment where devices connected over the network have to constantly share their updates this can create a communication bottleneck. In this paper, we present a survey of the research that is performed to overcome the communication constraints in an FL setting.
We consider the problem of reinforcing federated learning with formal privacy guarantees. We propose to employ Bayesian differential privacy, a relaxation of differential privacy for similarly distributed data, to provide sharper privacy loss bounds. We adapt the Bayesian privacy accounting method to the federated setting and suggest multiple improvements for more efficient privacy budgeting at different levels. Our experiments show significant advantage over the state-of-the-art differential privacy bounds for federated learning on image classification tasks, including a medical application, bringing the privacy budget below 1 at the client level, and below 0.1 at the instance level. Lower amounts of noise also benefit the model accuracy and reduce the number of communication rounds.
We compare communication efficiencies of two compelling distributed machine learning approaches of split learning and federated learning. We show useful settings under which each method outperforms the other in terms of communication efficiency. We consider various practical scenarios of distributed learning setup and juxtapose the two methods under various real-life scenarios. We consider settings of small and large number of clients as well as small models (1M - 6M parameters), large models (10M - 200M parameters) and very large models (1 Billion-100 Billion parameters). We show that increasing number of clients or increasing model size favors split learning setup over the federated while increasing the number of data samples while keeping the number of clients or model size low makes federated learning more communication efficient.
Federated learning (FL) empowers distributed clients to collaboratively train a shared machine learning model through exchanging parameter information. Despite the fact that FL can protect clients raw data, malicious users can still crack original data with disclosed parameters. To amend this flaw, differential privacy (DP) is incorporated into FL clients to disturb original parameters, which however can significantly impair the accuracy of the trained model. In this work, we study a crucial question which has been vastly overlooked by existing works: what are the optimal numbers of queries and replies in FL with DP so that the final model accuracy is maximized. In FL, the parameter server (PS) needs to query participating clients for multiple global iterations to complete training. Each client responds a query from the PS by conducting a local iteration. Our work investigates how many times the PS should query clients and how many times each client should reply the PS. We investigate two most extensively used DP mechanisms (i.e., the Laplace mechanism and Gaussian mechanisms). Through conducting convergence rate analysis, we can determine the optimal numbers of queries and replies in FL with DP so that the final model accuracy can be maximized. Finally, extensive experiments are conducted with publicly available datasets: MNIST and FEMNIST, to verify our analysis and the results demonstrate that properly setting the numbers of queries and replies can significantly improve the final model accuracy in FL with DP.
Federated learning enables a global machine learning model to be trained collaboratively by distributed, mutually non-trusting learning agents who desire to maintain the privacy of their training data and their hardware. A global model is distributed to clients, who perform training, and submit their newly-trained model to be aggregated into a superior model. However, federated learning systems are vulnerable to interference from malicious learning agents who may desire to prevent training or induce targeted misclassification in the resulting global model. A class of Byzantine-tolerant aggregation algorithms has emerged, offering varying degrees of robustness against these attacks, often with the caveat that the number of attackers is bounded by some quantity known prior to training. This paper presents Simeon: a novel approach to aggregation that applies a reputation-based iterative filtering technique to achieve robustness even in the presence of attackers who can exhibit arbitrary behaviour. We compare Simeon to state-of-the-art aggregation techniques and find that Simeon achieves comparable or superior robustness to a variety of attacks. Notably, we show that Simeon is tolerant to sybil attacks, where other algorithms are not, presenting a key advantage of our approach.