Subscribe to the gold package and get unlimited access to Shamra Academy

Semi-Decentralized Federated Edge Learning for Fast Convergence on Non-IID Data

413 0 0.0 ( 0 )

Download Cite

Added by Yuchang Sun

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Yuchang Sun - Jiawei Shao - Yuyi Mao

Networking and Internet Architecture Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Federated edge learning (FEEL) has emerged as an effective alternative to reduce the large communication latency in Cloud-based machine learning solutions, while preserving data privacy. Unfortunately, the learning performance of FEEL may be compromised due to limited training data in a single edge cluster. In this paper, we investigate a novel framework of FEEL, namely semi-decentralized federated edge learning (SD-FEEL). By allowing model aggregation between different edge clusters, SD-FEEL enjoys the benefit of FEEL in reducing training latency and improves the learning performance by accessing richer training data from multiple edge clusters. A training algorithm for SD-FEEL with three main procedures in each round is presented, including local model updates, intra-cluster and inter-cluster model aggregations, and it is proved to converge on non-independent and identically distributed (non-IID) data. We also characterize the interplay between the network topology of the edge servers and the communication overhead of inter-cluster model aggregation on training performance. Experiment results corroborate our analysis and demonstrate the effectiveness of SD-FFEL in achieving fast convergence. Besides, guidelines on choosing critical hyper-parameters of the training algorithm are also provided.

rate research

Analysis and Optimal Edge Assignment For Hierarchical Federated Learning on Non-IID Data

442 - Naram Mhaisen , Alaa Awad , Amr Mohamed 2020

Distributed learning algorithms aim to leverage distributed and diverse data stored at users devices to learn a global phenomena by performing training amongst participating devices and periodically aggregating their local models parameters into a global model. Federated learning is a promising paradigm that allows for extending local training among the participant devices before aggregating the parameters, offering better communication efficiency. However, in the cases where the participants data are strongly skewed (i.e., non-IID), the local models can overfit local data, leading to low performing global model. In this paper, we first show that a major cause of the performance drop is the weighted distance between the distribution over classes on users devices and the global distribution. Then, to face this challenge, we leverage the edge computing paradigm to design a hierarchical learning system that performs Federated Gradient Descent on the user-edge layer and Federated Averaging on the edge-cloud layer. In this hierarchical architecture, we formalize and optimize this user-edge assignment problem such that edge-level data distributions turn to be similar (i.e., close to IID), which enhances the Federated Averaging performance. Our experiments on multiple real-world datasets show that the proposed optimized assignment is tractable and leads to faster convergence of models towards a better accuracy value.

Machine Learning Distributed Parallel and Cluster Computing

CFedAvg: Achieving Efficient Communication and Fast Convergence in Non-IID Federated Learning

106 - Haibo Yang , Jia Liu , Elizabeth S. Bentley 2021

Federated learning (FL) is a prevailing distributed learning paradigm, where a large number of workers jointly learn a model without sharing their training data. However, high communication costs could arise in FL due to large-scale (deep) learning models and bandwidth-constrained connections. In this paper, we introduce a communication-efficient algorithmic framework called CFedAvg for FL with non-i.i.d. datasets, which works with general (biased or unbiased) SNR-constrained compressors. We analyze the convergence rate of CFedAvg for non-convex functions with constant and decaying learning rates. The CFedAvg algorithm can achieve an $mathcal{O}(1 / sqrt{mKT} + 1 / T)$ convergence rate with a constant learning rate, implying a linear speedup for convergence as the number of workers increases, where $K$ is the number of local steps, $T$ is the number of total communication rounds, and $m$ is the total worker number. This matches the convergence rate of distributed/federated learning without compression, thus achieving high communication efficiency while not sacrificing learning accuracy in FL. Furthermore, we extend CFedAvg to cases with heterogeneous local steps, which allows different workers to perform a different number of local steps to better adapt to their own circumstances. The interesting observation in general is that the noise/variance introduced by compressors does not affect the overall convergence rate order for non-i.i.d. FL. We verify the effectiveness of our CFedAvg algorithm on three datasets with two gradient compression schemes of different compression ratios.

Machine Learning Distributed Parallel and Cluster Computing

Federated Learning on Non-IID Data: A Survey

104 - Hangyu Zhu , Jinjin Xu , Shiqing Liu 2021

Federated learning is an emerging distributed machine learning framework for privacy preservation. However, models trained in federated learning usually have worse performance than those trained in the standard centralized learning mode, especially when the training data are not independent and identically distributed (Non-IID) on the local devices. In this survey, we pro-vide a detailed analysis of the influence of Non-IID data on both parametric and non-parametric machine learning models in both horizontal and vertical federated learning. In addition, cur-rent research work on handling challenges of Non-IID data in federated learning are reviewed, and both advantages and disadvantages of these approaches are discussed. Finally, we suggest several future research directions before concluding the paper.

Machine Learning Distributed Parallel and Cluster Computing

Distillation-Based Semi-Supervised Federated Learning for Communication-Efficient Collaborative Training with Non-IID Private Data

121 - Sohei Itahara , Takayuki Nishio , Yusuke Koda 2020

This study develops a federated learning (FL) framework overcoming largely incremental communication costs due to model sizes in typical frameworks without compromising model performance. To this end, based on the idea of leveraging an unlabeled open dataset, we propose a distillation-based semi-supervised FL (DS-FL) algorithm that exchanges the outputs of local models among mobile devices, instead of model parameter exchange employed by the typical frameworks. In DS-FL, the communication cost depends only on the output dimensions of the models and does not scale up according to the model size. The exchanged model outputs are used to label each sample of the open dataset, which creates an additionally labeled dataset. Based on the new dataset, local models are further trained, and model performance is enhanced owing to the data augmentation effect. We further highlight that in DS-FL, the heterogeneity of the devices dataset leads to ambiguous of each data sample and lowing of the training convergence. To prevent this, we propose entropy reduction averaging, where the aggregated model outputs are intentionally sharpened. Moreover, extensive experiments show that DS-FL reduces communication costs up to 99% relative to those of the FL benchmark while achieving similar or higher classification accuracy.

Distributed Parallel and Cluster Computing Machine Learning

Device Scheduling with Fast Convergence for Wireless Federated Learning

161 - Wenqi Shi , Sheng Zhou , Zhisheng Niu 2019

Owing to the increasing need for massive data analysis and model training at the network edge, as well as the rising concerns about the data privacy, a new distributed training framework called federated learning (FL) has emerged. In each iteration of FL (called round), the edge devices update local models based on their own data and contribute to the global training by uploading the model updates via wireless channels. Due to the limited spectrum resources, only a portion of the devices can be scheduled in each round. While most of the existing work on scheduling focuses on the convergence of FL w.r.t. rounds, the convergence performance under a total training time budget is not yet explored. In this paper, a joint bandwidth allocation and scheduling problem is formulated to capture the long-term convergence performance of FL, and is solved by being decoupled into two sub-problems. For the bandwidth allocation sub-problem, the derived optimal solution suggests to allocate more bandwidth to the devices with worse channel conditions or weaker computation capabilities. For the device scheduling sub-problem, by revealing the trade-off between the number of rounds required to attain a certain model accuracy and the latency per round, a greedy policy is inspired, that continuously selects the device that consumes the least time in model updating until achieving a good trade-off between the learning efficiency and latency per round. The experiments show that the proposed policy outperforms other state-of-the-art scheduling policies, with the best achievable model accuracy under training time budgets.

Networking and Internet Architecture Information Theory Machine Learning

Semi-Decentralized Federated Edge Learning for Fast Convergence on Non-IID Data

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions