Local Averaging Helps: Hierarchical Federated Learning and Convergence Analysis

266 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Mingyue Ji

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Jiayi Wang - Shiqiang Wang - Rong-Rong Chen

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية نظرية المعلومات

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Federated learning is an effective approach to realize collaborative learning among edge devices without exchanging raw data. In practice, these devices may connect to local hubs instead of connecting to the global server (aggregator) directly. Due to the (possibly limited) computation capability of these local hubs, it is reasonable to assume that they can perform simple averaging operations. A natural question is whether such local averaging is beneficial under different system parameters and how much gain can be obtained compared to the case without such averaging. In this paper, we study hierarchical federated learning with stochastic gradient descent (HF-SGD) and conduct a thorough theoretical analysis to analyze its convergence behavior. In particular, we first consider the two-level HF-SGD (one level of local averaging) and then extend this result to arbitrary number of levels (multiple levels of local averaging). The analysis demonstrates the impact of local averaging precisely as a function of system parameters. Due to the higher communication cost of global averaging, a strategy of decreasing the global averaging frequency and increasing the local averaging frequency is proposed. Experiments validate the proposed theoretical analysis and the advantages of HF-SGD.

قيم البحث

125 - Jianyu Wang , Zheng Xu , Zachary Garrett 2021

The federated learning (FL) framework trains a machine learning model using decentralized data stored at edge client devices by periodically aggregating locally trained models. Popular optimization algorithms of FL use vanilla (stochastic) gradient d escent for both local updates at clients and global updates at the aggregating server. Recently, adaptive optimization methods such as AdaGrad have been studied for server updates. However, the effect of using adaptive optimization methods for local updates at clients is not yet understood. We show in both theory and practice that while local adaptive methods can accelerate convergence, they can cause a non-vanishing solution bias, where the final converged solution may be different from the stationary point of the global objective function. We propose correction techniques to overcome this inconsistency and complement the local adaptive methods for FL. Extensive experiments on realistic federated training tasks show that the proposed algorithms can achieve faster convergence and higher test accuracy than the baselines without local adaptivity.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التعلم الالي

Server Averaging for Federated Learning

304 - George Pu , Yanlin Zhou , Dapeng Wu 2021

Federated learning allows distributed devices to collectively train a model without sharing or disclosing the local dataset with a central server. The global model is optimized by training and averaging the model parameters of all local participants. However, the improved privacy of federated learning also introduces challenges including higher computation and communication costs. In particular, federated learning converges slower than centralized training. We propose the server averaging algorithm to accelerate convergence. Sever averaging constructs the shared global model by periodically averaging a set of previous global models. Our experiments indicate that server averaging not only converges faster, to a target accuracy, than federated averaging (FedAvg), but also reduces the computation costs on the client-level through epoch decay.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية

Federated Learning over Wireless Networks: Convergence Analysis and Resource Allocation

456 - Canh T. Dinh , Nguyen H. Tran , Minh N. H. Nguyen 2019

There is an increasing interest in a fast-growing machine learning technique called Federated Learning, in which the model training is distributed over mobile user equipments (UEs), exploiting UEs local computation and training data. Despite its adva ntages in data privacy-preserving, Federated Learning (FL) still has challenges in heterogeneity across UEs data and physical resources. We first propose a FL algorithm which can handle the heterogeneous UEs data challenge without further assumptions except strongly convex and smooth loss functions. We provide the convergence rate characterizing the trade-off between local computation rounds of UE to update its local model and global communication rounds to update the FL global model. We then employ the proposed FL algorithm in wireless networks as a resource allocation optimization problem that captures the trade-off between the FL convergence wall clock time and energy consumption of UEs with heterogeneous computing and power resources. Even though the wireless resource allocation problem of FL is non-convex, we exploit this problems structure to decompose it into three sub-problems and analyze their closed-form solutions as well as insights to problem design. Finally, we illustrate the theoretical analysis for the new algorithm with Tensorflow experiments and extensive numerical results for the wireless resource allocation sub-problems. The experiment results not only verify the theoretical convergence but also show that our proposed algorithm outperforms the vanilla FedAvg algorithm in terms of convergence rate and testing accuracy.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية بنية الشبكات والإنترنت

Convergence Analysis and System Design for Federated Learning over Wireless Networks

265 - Shuo Wan , Jiaxun Lu , Pingyi Fan 2021

Federated learning (FL) has recently emerged as an important and promising learning scheme in IoT, enabling devices to jointly learn a model without sharing their raw data sets. However, as the training data in FL is not collected and stored centrall y, FL training requires frequent model exchange, which is largely affected by the wireless communication network. Therein, limited bandwidth and random package loss restrict interactions in training. Meanwhile, the insufficient message synchronization among distributed clients could also affect FL convergence. In this paper, we analyze the convergence rate of FL training considering the joint impact of communication network and training settings. Further by considering the training costs in terms of time and power, the optimal scheduling problems for communication networks are formulated. The developed theoretical results can be used to assist the system parameter selections and explain the principle of how the wireless communication system could influence the distributed training process and network scheduling.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية أنظمة متعددة العملاء

Compositional Federated Learning: Applications in Distributionally Robust Averaging and Meta Learning

111 - Feihu Huang , Junyi Li , Heng Huang 2021

In the paper, we propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework, which frequently appears in many machine learning problems with a hierarchical structure such as distributionally robust federated learning and model-agnostic meta learning (MAML). Moreover, we study the convergence analysis of our ComFedL algorithm under some mild conditions, and prove that it achieves a fast convergence rate of $O(frac{1}{sqrt{T}})$, where $T$ denotes the number of iteration. To the best of our knowledge, our algorithm is the first work to bridge federated learning with composition stochastic optimization. In particular, we first transform the distributionally robust FL (i.e., a minimax optimization problem) into a simple composition optimization problem by using KL divergence regularization. At the same time, we also first transform the distribution-agnostic MAML problem (i.e., a minimax optimization problem) into a simple composition optimization problem. Finally, we apply two popular machine learning tasks, i.e., distributionally robust FL and MAML to demonstrate the effectiveness of our algorithm.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التحسين والتحكم