Decentralized Deep Learning using Momentum-Accelerated Consensus

106 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Aditya Balu

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Aditya Balu - Zhanhong Jiang - Sin Yong Tan

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We consider the problem of decentralized deep learning where multiple agents collaborate to learn from a distributed dataset. While there exist several decentralized deep learning approaches, the majority consider a central parameter-server topology for aggregating the model parameters from the agents. However, such a topology may be inapplicable in networked systems such as ad-hoc mobile networks, field robotics, and power network systems where direct communication with the central parameter server may be inefficient. In this context, we propose and analyze a novel decentralized deep learning algorithm where the agents interact over a fixed communication topology (without a central server). Our algorithm is based on the heavy-ball acceleration method used in gradient-based optimization. We propose a novel consensus protocol where each agent shares with its neighbors its model parameters as well as gradient-momentum values during the optimization process. We consider both strongly convex and non-convex objective functions and theoretically analyze our algorithms performance. We present several empirical comparisons with competing decentralized learning methods to demonstrate the efficacy of our approach under different communication topologies.

قيم البحث

112 - Zhengjie Yang , Wei Bao , Dong Yuan 2020

Federated learning (FL) is a fast-developing technique that allows multiple workers to train a global model based on a distributed dataset. Conventional FL employs gradient descent algorithm, which may not be efficient enough. It is well known that N esterov Accelerated Gradient (NAG) is more advantageous in centralized training environment, but it is not clear how to quantify the benefits of NAG in FL so far. In this work, we focus on a version of FL based on NAG (FedNAG) and provide a detailed convergence analysis. The result is compared with conventional FL based on gradient descent. One interesting conclusion is that as long as the learning step size is sufficiently small, FedNAG outperforms FedAvg. Extensive experiments based on real-world datasets are conducted, verifying our conclusions and confirming the better convergence performance of FedNAG.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التعلم الالي

DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training

288 - Kun Yuan , Yiming Chen , Xinmeng Huang 2021

The scale of deep learning nowadays calls for efficient distributed training algorithms. Decentralized momentum SGD (DmSGD), in which each node averages only with its neighbors, is more communication efficient than vanilla Parallel momentum SGD that incurs global average across all computing nodes. On the other hand, the large-batch training has been demonstrated critical to achieve runtime speedup. This motivates us to investigate how DmSGD performs in the large-batch scenario. In this work, we find the momentum term can amplify the inconsistency bias in DmSGD. Such bias becomes more evident as batch-size grows large and hence results in severe performance degradation. We next propose DecentLaM, a novel decentralized large-batch momentum SGD to remove the momentum-incurred bias. The convergence rate for both non-convex and strongly-convex scenarios is established. Our theoretical results justify the superiority of DecentLaM to DmSGD especially in the large-batch scenario. Experimental results on a variety of computer vision tasks and models demonstrate that DecentLaM promises both efficient and high-quality training.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية التحسين والتحكم

Decentralized and Model-Free Federated Learning: Consensus-Based Distillation in Function Space

92 - Akihito Taya , Takayuki Nishio , Masahiro Morikura 2021

This paper proposes a decentralized FL scheme for IoE devices connected via multi-hop networks. FL has gained attention as an enabler of privacy-preserving algorithms, but it is not guaranteed that FL algorithms converge to the optimal point because of non-convexity when using decentralized parameter averaging schemes. Therefore, a distributed algorithm that converges to the optimal solution should be developed. The key idea of the proposed algorithm is to aggregate the local prediction functions, not in a parameter space but in a function space. Since machine learning tasks can be regarded as convex functional optimization problems, a consensus-based optimization algorithm achieves the global optimum if it is tailored to work in a function space. This paper at first analyzes the convergence of the proposed algorithm in a function space, which is referred to as a meta-algorithm. It is shown that spectral graph theory can be applied to the function space in a similar manner as that of numerical vectors. Then, a CMFD is developed for NN as an implementation of the meta-algorithm. CMFD leverages knowledge distillation to realize function aggregation among adjacent devices without parameter averaging. One of the advantages of CMFD is that it works even when NN models are different among the distributed learners. This paper shows that CMFD achieves higher accuracy than parameter aggregation under weakly-connected networks. The stability of CMFD is also higher than that of parameter aggregation methods.

بنية الشبكات والإنترنت النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Graph-Homomorphic Perturbations for Private Decentralized Learning

103 - Stefan Vlaski , Ali H. Sayed 2020

Decentralized algorithms for stochastic optimization and learning rely on the diffusion of information as a result of repeated local exchanges of intermediate estimates. Such structures are particularly appealing in situations where agents may be hes itant to share raw data due to privacy concerns. Nevertheless, in the absence of additional privacy-preserving mechanisms, the exchange of local estimates, which are generated based on private data can allow for the inference of the data itself. The most common mechanism for guaranteeing privacy is the addition of perturbations to local estimates before broadcasting. These perturbations are generally chosen independently at every agent, resulting in a significant performance loss. We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible (to first order in the step-size) to the network centroid, while preserving privacy guarantees. The analysis allows for general nonconvex loss functions, and is hence applicable to a large number of machine learning and signal processing problems, including deep learning.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية أنظمة متعددة العملاء

Decentralized Federated Learning via Mutual Knowledge Transfer

136 - Chengxi Li , Gang Li , Pramod K. Varshney 2020

In this paper, we investigate the problem of decentralized federated learning (DFL) in Internet of things (IoT) systems, where a number of IoT clients train models collectively for a common task without sharing their private training data in the abse nce of a central server. Most of the existing DFL schemes are composed of two alternating steps, i.e., model updating and model averaging. However, averaging model parameters directly to fuse different models at the local clients suffers from client-drift especially when the training data are heterogeneous across different clients. This leads to slow convergence and degraded learning performance. As a possible solution, we propose the decentralized federated earning via mutual knowledge transfer (Def-KT) algorithm where local clients fuse models by transferring their learnt knowledge to each other. Our experiments on the MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets reveal that the proposed Def-KT algorithm significantly outperforms the baseline DFL methods with model averaging, i.e., Combo and FullAvg, especially when the training data are not independent and identically distributed (non-IID) across different clients.

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية