Efficient and Less Centralized Federated Learning

91 0 0.0 ( 0 )

Download Cite

Added by Zichang Liu

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Li Chou - Zichang Liu - Zhuang Wang

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

With the rapid growth in mobile computing, massive amounts of data and computing resources are now located at the edge. To this end, Federated learning (FL) is becoming a widely adopted distributed machine learning (ML) paradigm, which aims to harness this expanding skewed data locally in order to develop rich and informative models. In centralized FL, a collection of devices collaboratively solve a ML task under the coordination of a central server. However, existing FL frameworks make an over-simplistic assumption about network connectivity and ignore the communication bandwidth of the different links in the network. In this paper, we present and study a novel FL algorithm, in which devices mostly collaborate with other devices in a pairwise manner. Our nonparametric approach is able to exploit network topology to reduce communication bottlenecks. We evaluate our approach on various FL benchmarks and demonstrate that our method achieves 10X better communication efficiency and around 8% increase in accuracy compared to the centralized approach.

rate research

Communication Efficient Federated Learning with Adaptive Quantization

113 - Yuzhu Mao , Zihao Zhao , Guangfeng Yan 2021

Federated learning (FL) has attracted tremendous attentions in recent years due to its privacy preserving measures and great potentials in some distributed but privacy-sensitive applications like finance and health. However, high communication overloads for transmitting high-dimensional networks and extra security masks remains a bottleneck of FL. This paper proposes a communication-efficient FL framework with Adaptive Quantized Gradient (AQG) which adaptively adjusts the quantization level based on local gradients update to fully utilize the heterogeneousness of local data distribution for reducing unnecessary transmissions. Besides, the client dropout issues are taken into account and the Augmented AQG is developed, which could limit the dropout noise with an appropriate amplification mechanism for transmitted gradients. Theoretical analysis and experiment results show that the proposed AQG leads to 25%-50% of additional transmission reduction as compared to existing popular methods including Quantized Gradient Descent (QGD) and Lazily Aggregated Quantized (LAQ) gradient-based method without deteriorating convergence properties. Particularly, experiments with heterogenous data distributions corroborate a more significant transmission reduction compared with independent identical data distributions. Meanwhile, the proposed AQG is robust to a client dropping rate up to 90% empirically, and the Augmented AQG manages to further improve the FL systems communication efficiency with the presence of moderate-scale client dropouts commonly seen in practical FL scenarios.

Distributed Parallel and Cluster Computing

Communication-Efficient Federated Learning via Predictive Coding

114 - Kai Yue , Richeng Jin , Chau-Wai Wong 2021

Federated learning can enable remote workers to collaboratively train a shared machine learning model while allowing training data to be kept locally. In the use case of wireless mobile devices, the communication overhead is a critical bottleneck due to limited power and bandwidth. Prior work has utilized various data compression tools such as quantization and sparsification to reduce the overhead. In this paper, we propose a predictive coding based communication scheme for federated learning. The scheme has shared prediction functions among all devices and allows each worker to transmit a compressed residual vector derived from the reference. In each communication round, we select the predictor and quantizer based on the rate-distortion cost, and further reduce the redundancy with entropy coding. Extensive simulations reveal that the communication cost can be reduced up to 99% with even better learning performance when compared with other baseline methods.

Distributed Parallel and Cluster Computing Artificial Intelligence Machine Learning

Federated Learning for Open Banking

89 - Guodong Long , Yue Tan , Jing Jiang 2021

Open banking enables individual customers to own their banking data, which provides fundamental support for the boosting of a new ecosystem of data marketplaces and financial services. In the near future, it is foreseeable to have decentralized data ownership in the finance sector using federated learning. This is a just-in-time technology that can learn intelligent models in a decentralized training manner. The most attractive aspect of federated learning is its ability to decompose model training into a centralized server and distributed nodes without collecting private data. This kind of decomposed learning framework has great potential to protect users privacy and sensitive data. Therefore, federated learning combines naturally with an open banking data marketplaces. This chapter will discuss the possible challenges for applying federated learning in the context of open banking, and the corresponding solutions have been explored as well.

Distributed Parallel and Cluster Computing Machine Learning

Distillation-Based Semi-Supervised Federated Learning for Communication-Efficient Collaborative Training with Non-IID Private Data

121 - Sohei Itahara , Takayuki Nishio , Yusuke Koda 2020

This study develops a federated learning (FL) framework overcoming largely incremental communication costs due to model sizes in typical frameworks without compromising model performance. To this end, based on the idea of leveraging an unlabeled open dataset, we propose a distillation-based semi-supervised FL (DS-FL) algorithm that exchanges the outputs of local models among mobile devices, instead of model parameter exchange employed by the typical frameworks. In DS-FL, the communication cost depends only on the output dimensions of the models and does not scale up according to the model size. The exchanged model outputs are used to label each sample of the open dataset, which creates an additionally labeled dataset. Based on the new dataset, local models are further trained, and model performance is enhanced owing to the data augmentation effect. We further highlight that in DS-FL, the heterogeneity of the devices dataset leads to ambiguous of each data sample and lowing of the training convergence. To prevent this, we propose entropy reduction averaging, where the aggregated model outputs are intentionally sharpened. Moreover, extensive experiments show that DS-FL reduces communication costs up to 99% relative to those of the FL benchmark while achieving similar or higher classification accuracy.

Distributed Parallel and Cluster Computing Machine Learning

Holdout SGD: Byzantine Tolerant Federated Learning

105 - Shahar Azulay , Lior Raz , Amir Globerson 2020

This work presents a new distributed Byzantine tolerant federated learning algorithm, HoldOut SGD, for Stochastic Gradient Descent (SGD) optimization. HoldOut SGD uses the well known machine learning technique of holdout estimation, in a distributed fashion, in order to select parameter updates that are likely to lead to models with low loss values. This makes it more effective at discarding Byzantine workers inputs than existing methods that eliminate outliers in the parameter-space of the learned model. HoldOut SGD first randomly selects a set of workers that use their private data in order to propose gradient updates. Next, a voting committee of workers is randomly selected, and each voter uses its private data as holdout data, in order to select the best proposals via a voting scheme. We propose two possible mechanisms for the coordination of workers in the distributed computation of HoldOut SGD. The first uses a truthful central server and corresponds to the typical setting of current federated learning. The second is fully distributed and requires no central server, paving the way to fully decentralized federated learning. The fully distributed version implements HoldOut SGD via ideas from the blockchain domain, and specifically the Algorand committee selection and consensus processes. We provide formal guarantees for the HoldOut SGD process in terms of its convergence to the optimal model, and its level of resilience to the fraction of Byzantine workers. Empirical evaluation shows that HoldOut SGD is Byzantine-resilient and efficiently converges to an effectual model for deep-learning tasks, as long as the total number of participating workers is large and the fraction of Byzantine workers is less than half (<1/3 for the fully distributed variant).

Distributed Parallel and Cluster Computing Machine Learning