FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Convergence Analysis

100 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhao Song

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Baihe Huang - Xiaoxiao Li - Zhao Song

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Federated Learning (FL) is an emerging learning scheme that allows different distributed clients to train deep neural networks together without data sharing. Neural networks have become popular due to their unprecedented success. To the best of our knowledge, the theoretical guarantees of FL concerning neural networks with explicit forms and multi-step updates are unexplored. Nevertheless, training analysis of neural networks in FL is non-trivial for two reasons: first, the objective loss function we are optimizing is non-smooth and non-convex, and second, we are even not updating in the gradient direction. Existing convergence results for gradient descent-based methods heavily rely on the fact that the gradient direction is used for updating. This paper presents a new class of convergence analysis for FL, Federated Learning Neural Tangent Kernel (FL-NTK), which corresponds to overparamterized ReLU neural networks trained by gradient descent in FL and is inspired by the analysis in Neural Tangent Kernel (NTK). Theoretically, FL-NTK converges to a global-optimal solution at a linear rate with properly tuned learning parameters. Furthermore, with proper distributional assumptions, FL-NTK can also achieve good generalization.

قيم البحث

85 - Chunnan Wang , Bozhou Chen , Geng Li 2021

Recently, some Neural Architecture Search (NAS) techniques are proposed for the automatic design of Graph Convolutional Network (GCN) architectures. They bring great convenience to the use of GCN, but could hardly apply to the Federated Learning (FL) scenarios with distributed and private datasets, which limit their applications. Moreover, they need to train many candidate GCN models from scratch, which is inefficient for FL. To address these challenges, we propose FL-AGCNS, an efficient GCN NAS algorithm suitable for FL scenarios. FL-AGCNS designs a federated evolutionary optimization strategy to enable distributed agents to cooperatively design powerful GCN models while keeping personal information on local devices. Besides, it applies the GCN SuperNet and a weight sharing strategy to speed up the evaluation of GCN models. Experimental results show that FL-AGCNS can find better GCN models in short time under the FL framework, surpassing the state-of-the-arts NAS methods and GCN models.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

Blockchain Assisted Decentralized Federated Learning (BLADE-FL): Performance Analysis and Resource Allocation

88 - Jun Li , Yumeng Shao , Kang Wei 2021

Federated learning (FL), as a distributed machine learning paradigm, promotes personal privacy by local data processing at each client. However, relying on a centralized server for model aggregation, standard FL is vulnerable to server malfunctions, untrustworthy server, and external attacks. To address this issue, we propose a decentralized FL framework by integrating blockchain into FL, namely, blockchain assisted decentralized federated learning (BLADE-FL). In a round of the proposed BLADE-FL, each client broadcasts the trained model to other clients, aggregates its own model with received ones, and then competes to generate a block before its local training of the next round. We evaluate the learning performance of BLADE-FL, and develop an upper bound on the global loss function. Then we verify that this bound is convex with respect to the number of overall aggregation rounds K, and optimize the computing resource allocation for minimizing the upper bound. We also note that there is a critical problem of training deficiency, caused by lazy clients who plagiarize others trained models and add artificial noises to disguise their cheating behaviors. Focusing on this problem, we explore the impact of lazy clients on the learning performance of BLADE-FL, and characterize the relationship among the optimal K, the learning parameters, and the proportion of lazy clients. Based on MNIST and Fashion-MNIST datasets, we show that the experimental results are consistent with the analytical ones. To be specific, the gap between the developed upper bound and experimental results is lower than 5%, and the optimized K based on the upper bound can effectively minimize the loss function.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies

92 - Yae Jee Cho , Jianyu Wang , Gauri Joshi 2020

Federated learning is a distributed optimization paradigm that enables a large number of resource-limited client nodes to cooperatively train a model without data sharing. Several works have analyzed the convergence of federated learning by accountin g of data heterogeneity, communication and computation limitations, and partial client participation. However, they assume unbiased client participation, where clients are selected at random or in proportion of their data sizes. In this paper, we present the first convergence analysis of federated optimization for biased client selection strategies, and quantify how the selection bias affects convergence speed. We reveal that biasing client selection towards clients with higher local loss achieves faster error convergence. Using this insight, we propose Power-of-Choice, a communication- and computation-efficient client selection framework that can flexibly span the trade-off between convergence speed and solution bias. Our experiments demonstrate that Power-of-Choice strategies converge up to 3 $times$ faster and give $10$% higher test accuracy than the baseline random selection.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التعلم الالي

Convergence Analysis and System Design for Federated Learning over Wireless Networks

265 - Shuo Wan , Jiaxun Lu , Pingyi Fan 2021

Federated learning (FL) has recently emerged as an important and promising learning scheme in IoT, enabling devices to jointly learn a model without sharing their raw data sets. However, as the training data in FL is not collected and stored centrall y, FL training requires frequent model exchange, which is largely affected by the wireless communication network. Therein, limited bandwidth and random package loss restrict interactions in training. Meanwhile, the insufficient message synchronization among distributed clients could also affect FL convergence. In this paper, we analyze the convergence rate of FL training considering the joint impact of communication network and training settings. Further by considering the training costs in terms of time and power, the optimal scheduling problems for communication networks are formulated. The developed theoretical results can be used to assist the system parameter selections and explain the principle of how the wireless communication system could influence the distributed training process and network scheduling.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية أنظمة متعددة العملاء

Local Averaging Helps: Hierarchical Federated Learning and Convergence Analysis

265 - Jiayi Wang , Shiqiang Wang , Rong-Rong Chen 2020

Federated learning is an effective approach to realize collaborative learning among edge devices without exchanging raw data. In practice, these devices may connect to local hubs instead of connecting to the global server (aggregator) directly. Due t o the (possibly limited) computation capability of these local hubs, it is reasonable to assume that they can perform simple averaging operations. A natural question is whether such local averaging is beneficial under different system parameters and how much gain can be obtained compared to the case without such averaging. In this paper, we study hierarchical federated learning with stochastic gradient descent (HF-SGD) and conduct a thorough theoretical analysis to analyze its convergence behavior. In particular, we first consider the two-level HF-SGD (one level of local averaging) and then extend this result to arbitrary number of levels (multiple levels of local averaging). The analysis demonstrates the impact of local averaging precisely as a function of system parameters. Due to the higher communication cost of global averaging, a strategy of decreasing the global averaging frequency and increasing the local averaging frequency is proposed. Experiments validate the proposed theoretical analysis and the advantages of HF-SGD.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات