System Optimization in Synchronous Federated Training: A Survey

131 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhifeng Jiang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Zhifeng Jiang - Wei Wang

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي بنية الشبكات والإنترنت

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The unprecedented demand for collaborative machine learning in a privacy-preserving manner gives rise to a novel machine learning paradigm called federated learning (FL). Given a sufficient level of privacy guarantees, the practicality of an FL system mainly depends on its time-to-accuracy performance during the training process. Despite bearing some resemblance with traditional distributed training, FL has four distinct challenges that complicate the optimization towards shorter time-to-accuracy: information deficiency, coupling for contrasting factors, client heterogeneity, and huge configuration space. Motivated by the need for inspiring related research, in this paper we survey highly relevant attempts in the FL literature and organize them by the related training phases in the standard workflow: selection, configuration, and reporting. We also review exploratory work including measurement studies and benchmarking tools to friendly support FL developers. Although a few survey articles on FL already exist, our work differs from them in terms of the focus, classification, and implications.

قيم البحث

194 - Seyyedali Hosseinalipour , Christopher G. Brinton , Vaneetn Aggarwal 2020

Machine learning (ML) tasks are becoming ubiquitous in todays network applications. Federated learning has emerged recently as a technique for training ML models at the network edge by leveraging processing capabilities across the nodes that collect the data. There are several challenges with employing conventional federated learning in contemporary networks, due to the significant heterogeneity in compute and communication capabilities that exist across devices. To address this, we advocate a new learning paradigm called fog learning which will intelligently distribute ML model training across the continuum of nodes from edge devices to cloud servers. Fog learning enhances federated learning along three major dimensions: network, heterogeneity, and proximity. It considers a multi-layer hybrid learning framework consisting of heterogeneous devices with various proximities. It accounts for the topology structures of the local networks among the heterogeneous nodes at each network layer, orchestrating them for collaborative/cooperative learning through device-to-device (D2D) communications. This migrates from star network topologies used for parameter transfers in federated learning to more distributed topologies at scale. We discuss several open research directions to realizing fog learning.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي بنية الشبكات والإنترنت

Domain-specific Communication Optimization for Distributed DNN Training

146 - Hao Wang , Jingrong Chen , Xinchen Wan 2020

Communication overhead poses an important obstacle to distributed DNN training and draws increasing attention in recent years. Despite continuous efforts, prior solutions such as gradient compression/reduction, compute/communication overlapping and l ayer-wise flow scheduling, etc., are still coarse-grained and insufficient for an efficient distributed training especially when the network is under pressure. We present DLCP, a novel solution exploiting the domain-specific properties of deep learning to optimize communication overhead of DNN training in a fine-grained manner. At its heart, DLCP comprises of several key innovations beyond prior work: e.g., it exploits {em bounded loss tolerance} of SGD-based training to improve tail communication latency which cannot be avoided purely through gradient compression. It then performs fine-grained packet-level prioritization and dropping, as opposed to flow-level scheduling, based on layers and magnitudes of gradients to further speedup model convergence without affecting accuracy. In addition, it leverages inter-packet order-independency to perform per-packet load balancing without causing classical re-ordering issues. DLCP works with both Parameter Server and collective communication routines. We have implemented DLCP with commodity switches, integrated it with various training frameworks including TensorFlow, MXNet and PyTorch, and deployed it in our small-scale testbed with 10 Nvidia V100 GPUs. Our testbed experiments and large-scale simulations show that DLCP delivers up to $84.3%$ additional training acceleration over the best existing solutions.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Orchestrating the Development Lifecycle of Machine Learning-Based IoT Applications: A Taxonomy and Survey

126 - Bin Qian , Jie Su , Zhenyu Wen 2019

Machine Learning (ML) and Internet of Things (IoT) are complementary advances: ML techniques unlock complete potentials of IoT with intelligence, and IoT applications increasingly feed data collected by sensors into ML models, thereby employing resul ts to improve their business processes and services. Hence, orchestrating ML pipelines that encompasses model training and implication involved in holistic development lifecycle of an IoT application often leads to complex system integration. This paper provides a comprehensive and systematic survey on the development lifecycle of ML-based IoT application. We outline core roadmap and taxonomy, and subsequently assess and compare existing standard techniques used in individual stage.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي بنية الشبكات والإنترنت

Distillation-Based Semi-Supervised Federated Learning for Communication-Efficient Collaborative Training with Non-IID Private Data

121 - Sohei Itahara , Takayuki Nishio , Yusuke Koda 2020

This study develops a federated learning (FL) framework overcoming largely incremental communication costs due to model sizes in typical frameworks without compromising model performance. To this end, based on the idea of leveraging an unlabeled open dataset, we propose a distillation-based semi-supervised FL (DS-FL) algorithm that exchanges the outputs of local models among mobile devices, instead of model parameter exchange employed by the typical frameworks. In DS-FL, the communication cost depends only on the output dimensions of the models and does not scale up according to the model size. The exchanged model outputs are used to label each sample of the open dataset, which creates an additionally labeled dataset. Based on the new dataset, local models are further trained, and model performance is enhanced owing to the data augmentation effect. We further highlight that in DS-FL, the heterogeneity of the devices dataset leads to ambiguous of each data sample and lowing of the training convergence. To prevent this, we propose entropy reduction averaging, where the aggregated model outputs are intentionally sharpened. Moreover, extensive experiments show that DS-FL reduces communication costs up to 99% relative to those of the FL benchmark while achieving similar or higher classification accuracy.

النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Asynchronous Federated Learning on Heterogeneous Devices: A Survey

131 - Chenhao Xu , Youyang Qu , Yong Xiang 2021

Federated learning (FL) is experiencing a fast booming with the wave of distributed machine learning and ever-increasing privacy concerns. In the FL paradigm, global model aggregation is handled by a centralized aggregate server based on local update d gradients trained on local nodes, which mitigates privacy leakage caused by the collection of sensitive information. With the increased computing and communicating capabilities of edge and IoT devices, applying FL on heterogeneous devices to train machine learning models becomes a trend. The synchronous aggregation strategy in the classic FL paradigm cannot effectively use the resources, especially on heterogeneous devices, due to its waiting for straggler devices before aggregation in each training round. Furthermore, in real-world scenarios, the disparity of data dispersed on devices (i.e. data heterogeneity) downgrades the accuracy of models. As a result, many asynchronous FL (AFL) paradigms are presented in various application scenarios to improve efficiency, performance, privacy, and security. This survey comprehensively analyzes and summarizes existing variants of AFL according to a novel classification mechanism, including device heterogeneity, data heterogeneity, privacy and security on heterogeneous devices, and applications on heterogeneous devices. Finally, this survey reveals rising challenges and presents potentially promising research directions in this under-investigated field.

النظم الموزعة والتوازية والحوسبة العنقودية