From Federated to Fog Learning: Distributed Machine Learning over Heterogeneous Wireless Networks

195 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Seyyedali Hosseinalipour

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Seyyedali Hosseinalipour - Christopher G. Brinton - Vaneetn Aggarwal

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي بنية الشبكات والإنترنت

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Machine learning (ML) tasks are becoming ubiquitous in todays network applications. Federated learning has emerged recently as a technique for training ML models at the network edge by leveraging processing capabilities across the nodes that collect the data. There are several challenges with employing conventional federated learning in contemporary networks, due to the significant heterogeneity in compute and communication capabilities that exist across devices. To address this, we advocate a new learning paradigm called fog learning which will intelligently distribute ML model training across the continuum of nodes from edge devices to cloud servers. Fog learning enhances federated learning along three major dimensions: network, heterogeneity, and proximity. It considers a multi-layer hybrid learning framework consisting of heterogeneous devices with various proximities. It accounts for the topology structures of the local networks among the heterogeneous nodes at each network layer, orchestrating them for collaborative/cooperative learning through device-to-device (D2D) communications. This migrates from star network topologies used for parameter transfers in federated learning to more distributed topologies at scale. We discuss several open research directions to realizing fog learning.

قيم البحث

364 - Ji Liu , Jizhou Huang , Yang Zhou 2021

In recent years, data and computing resources are typically distributed in the devices of end users, various regions or organizations. Because of laws or regulations, the distributed data and computing resources cannot be directly shared among differ ent regions or organizations for machine learning tasks. Federated learning emerges as an efficient approach to exploit distributed data and computing resources, so as to collaboratively train machine learning models, while obeying the laws and regulations and ensuring data security and data privacy. In this paper, we provide a comprehensive survey of existing works for federated learning. We propose a functional architecture of federated learning systems and a taxonomy of related techniques. Furthermore, we present the distributed training, data communication, and security of FL systems. Finally, we analyze their limitations and propose future research directions.

النظم الموزعة والتوازية والحوسبة العنقودية الذكاء الاصطناعي التعلم الآلي

Communication-efficient Decentralized Machine Learning over Heterogeneous Networks

100 - Pan Zhou , Qian Lin , Dumitrel Loghin 2020

In the last few years, distributed machine learning has been usually executed over heterogeneous networks such as a local area network within a multi-tenant cluster or a wide area network connecting data centers and edge clusters. In these heterogene ous networks, the link speeds among worker nodes vary significantly, making it challenging for state-of-the-art machine learning approaches to perform efficient training. Both centralized and decentralized training approaches suffer from low-speed links. In this paper, we propose a decentralized approach, namely NetMax, that enables worker nodes to communicate via high-speed links and, thus, significantly speed up the training process. NetMax possesses the following novel features. First, it consists of a novel consensus algorithm that allows worker nodes to train model copies on their local dataset asynchronously and exchange information via peer-to-peer communication to synchronize their local copies, instead of a central master node (i.e., parameter server). Second, each worker node selects one peer randomly with a fine-tuned probability to exchange information per iteration. In particular, peers with high-speed links are selected with high probability. Third, the probabilities of selecting peers are designed to minimize the total convergence time. Moreover, we mathematically prove the convergence of NetMax. We evaluate NetMax on heterogeneous cluster networks and show that it achieves speedups of 3.7X, 3.4X, and 1.9X in comparison with the state-of-the-art decentralized training approaches Prague, Allreduce-SGD, and AD-PSGD, respectively.

النظم الموزعة والتوازية والحوسبة العنقودية

Federated Learning over Wireless Networks: Convergence Analysis and Resource Allocation

456 - Canh T. Dinh , Nguyen H. Tran , Minh N. H. Nguyen 2019

There is an increasing interest in a fast-growing machine learning technique called Federated Learning, in which the model training is distributed over mobile user equipments (UEs), exploiting UEs local computation and training data. Despite its adva ntages in data privacy-preserving, Federated Learning (FL) still has challenges in heterogeneity across UEs data and physical resources. We first propose a FL algorithm which can handle the heterogeneous UEs data challenge without further assumptions except strongly convex and smooth loss functions. We provide the convergence rate characterizing the trade-off between local computation rounds of UE to update its local model and global communication rounds to update the FL global model. We then employ the proposed FL algorithm in wireless networks as a resource allocation optimization problem that captures the trade-off between the FL convergence wall clock time and energy consumption of UEs with heterogeneous computing and power resources. Even though the wireless resource allocation problem of FL is non-convex, we exploit this problems structure to decompose it into three sub-problems and analyze their closed-form solutions as well as insights to problem design. Finally, we illustrate the theoretical analysis for the new algorithm with Tensorflow experiments and extensive numerical results for the wireless resource allocation sub-problems. The experiment results not only verify the theoretical convergence but also show that our proposed algorithm outperforms the vanilla FedAvg algorithm in terms of convergence rate and testing accuracy.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية بنية الشبكات والإنترنت

Low-Latency Federated Learning over Wireless Channels with Differential Privacy

70 - Kang Wei , Jun Li , Chuan Ma 2021

In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server. The performance of uploaded models in such situations can vary widely due to imbalanced data distributions, potential demands on privacy protections, and quality of transmissions. In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each clients differential privacy (DP) requirement. We solve this problem in the framework of multi-agent multi-armed bandit (MAMAB) to deal with the situation where there are multiple clients confornting different unknown transmission environments, e.g., channel fading and interferences. Specifically, we first transform the long-term constraints on both training performance and each clients DP into a virtual queue based on the Lyapunov drift technique. Then, we convert the MAMAB to a max-min bipartite matching problem at each communication round, by estimating rewards with the upper confidence bound (UCB) approach. More importantly, we propose two efficient solutions to this matching problem, i.e., modified Hungarian algorithm and greedy matching with a better alternative (GMBA), in which the first one can achieve the optimal solution with a high complexity while the second one approaches a better trade-off by enabling a verified low-complexity with little performance loss. In addition, we develop an upper bound on the expected regret of this MAMAB based FL framework, which shows a linear growth over the logarithm of communication rounds, justifying its theoretical feasibility. Extensive experimental results are conducted to validate the effectiveness of our proposed algorithms, and the impacts of various parameters on the FL performance over wireless edge networks are also discussed.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Multi-Stage Hybrid Federated Learning over Large-Scale D2D-Enabled Fog Networks

101 - Seyyedali Hosseinalipour , Sheikh Shams Azam , Christopher G. Brinton 2020

Federated learning has generated significant interest, with nearly all works focused on a ``star topology where nodes/devices are each connected to a central server. We migrate away from this architecture and extend it through the network dimension t o the case where there are multiple layers of nodes between the end devices and the server. Specifically, we develop multi-stage hybrid federated learning (MH-FL), a hybrid of intra- and inter-layer model learning that considers the network as a multi-layer cluster-based structure, each layer of which consists of multiple device clusters. MH-FL considers the topology structures among the nodes in the clusters, including local networks formed via device-to-device (D2D) communications. It orchestrates the devices at different network layers in a collaborative/cooperative manner (i.e., using D2D interactions) to form local consensus on the model parameters, and combines it with multi-stage parameter relaying between layers of the tree-shaped hierarchy. We derive the upper bound of convergence for MH-FL with respect to parameters of the network topology (e.g., the spectral radius) and the learning algorithm (e.g., the number of D2D rounds in different clusters). We obtain a set of policies for the D2D rounds at different clusters to guarantee either a finite optimality gap or convergence to the global optimum. We then develop a distributed control algorithm for MH-FL to tune the D2D rounds in each cluster over time to meet specific convergence criteria. Our experiments on real-world datasets verify our analytical results and demonstrate the advantages of MH-FL in terms of resource utilization metrics.

بنية الشبكات والإنترنت التعلم الآلي