New community

Subscribe to the gold package and get unlimited access to Shamra Academy

BRIDGE: Byzantine-resilient Decentralized Gradient Descent

178 0 0.0 ( 0 )

Download Cite

Added by Waheed Bajwa

Publication date 2019

fields Mathematical Statistics Informatics Engineering

and research's language is English

Authors Zhixiong Yang - Waheed U. Bajwa

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Decentralized optimization techniques are increasingly being used to learn machine learning models from data distributed over multiple locations without gathering the data at any one location. Unfortunately, methods that are designed for faultless networks typically fail in the presence of node failures. In particular, Byzantine failures---corresponding to the scenario in which faulty/compromised nodes are allowed to arbitrarily deviate from an agreed-upon protocol---are the hardest to safeguard against in decentralized settings. This paper introduces a Byzantine-resilient decentralized gradient descent (BRIDGE) method for decentralized learning that, when compared to existing works, is more efficient and scalable in higher-dimensional settings and that is deployable in networks having topologies that go beyond the star topology. The main contributions of this work include theoretical analysis of BRIDGE for strongly convex learning objectives and numerical experiments demonstrating the efficacy of BRIDGE for both convex and nonconvex learning tasks.

rate research

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

111 - Zeyuan Allen-Zhu , Faeze Ebrahimian , Jerry Li 2020

We study adversary-resilient stochastic distributed optimization, in which $m$ machines can independently compute stochastic gradients, and cooperate to jointly optimize over their local objective functions. However, an $alpha$-fraction of the machines are $textit{Byzantine}$, in that they may behave in arbitrary, adversarial ways. We consider a variant of this procedure in the challenging $textit{non-convex}$ case. Our main result is a new algorithm SafeguardSGD which can provably escape saddle points and find approximate local minima of the non-convex objective. The algorithm is based on a new concentration filtering technique, and its sample and time complexity bounds match the best known theoretical bounds in the stochastic, distributed setting when no Byzantine machines are present. Our algorithm is very practical: it improves upon the performance of all prior methods when training deep neural networks, it is relatively lightweight, and it is the first method to withstand two recently-proposed Byzantine attacks.

Machine Learning Distributed Parallel and Cluster Computing Data Structures and Algorithms

Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach

91 - Haibo Yang , Xin Zhang , Minghong Fang 2019

In this work, we consider the resilience of distributed algorithms based on stochastic gradient descent (SGD) in distributed learning with potentially Byzantine attackers, who could send arbitrary information to the parameter server to disrupt the training process. Toward this end, we propose a new Lipschitz-inspired coordinate-wise median approach (LICM-SGD) to mitigate Byzantine attacks. We show that our LICM-SGD algorithm can resist up to half of the workers being Byzantine attackers, while still converging almost surely to a stationary region in non-convex settings. Also, our LICM-SGD method does not require any information about the number of attackers and the Lipschitz constant, which makes it attractive for practical implementations. Moreover, our LICM-SGD method enjoys the optimal $O(md)$ computational time-complexity in the sense that the time-complexity is the same as that of the standard SGD under no attacks. We conduct extensive experiments to show that our LICM-SGD algorithm consistently outperforms existing methods in training multi-class logistic regression and convolutional neural networks with MNIST and CIFAR-10 datasets. In our experiments, LICM-SGD also achieves a much faster running time thanks to its low computational time-complexity.

Machine Learning Distributed Parallel and Cluster Computing Machine Learning

Byzantine-Resilient Secure Federated Learning

96 - Jinhyun So , Basak Guler , A. Salman Avestimehr 2020

Secure federated learning is a privacy-preserving framework to improve machine learning models by training over large volumes of data collected by mobile users. This is achieved through an iterative process where, at each iteration, users update a global model using their local datasets. Each user then masks its local model via random keys, and the masked models are aggregated at a central server to compute the global model for the next iteration. As the local models are protected by random masks, the server cannot observe their true values. This presents a major challenge for the resilience of the model against adversarial (Byzantine) users, who can manipulate the global model by modifying their local models or datasets. Towards addressing this challenge, this paper presents the first single-server Byzantine-resilient secure aggregation framework (BREA) for secure federated learning. BREA is based on an integrated stochastic quantization, verifiable outlier detection, and secure model aggregation approach to guarantee Byzantine-resilience, privacy, and convergence simultaneously. We provide theoretical convergence and privacy guarantees and characterize the fundamental trade-offs in terms of the network size, user dropouts, and privacy protection. Our experiments demonstrate convergence in the presence of Byzantine users, and comparable accuracy to conventional federated learning benchmarks.

Cryptography and Security Distributed Parallel and Cluster Computing Machine Learning

Byzantine Resilient Non-Convex SVRG with Distributed Batch Gradient Computations

99 - Prashant Khanduri , Saikiran Bulusu , Pranay Sharma 2019

In this work, we consider the distributed stochastic optimization problem of minimizing a non-convex function $f(x) = mathbb{E}_{xi sim mathcal{D}} f(x; xi)$ in an adversarial setting, where the individual functions $f(x; xi)$ can also be potentially non-convex. We assume that at most $alpha$-fraction of a total of $K$ nodes can be Byzantines. We propose a robust stochastic variance-reduced gradient (SVRG) like algorithm for the problem, where the batch gradients are computed at the worker nodes (WNs) and the stochastic gradients are computed at the server node (SN). For the non-convex optimization problem, we show that we need $tilde{O}left( frac{1}{epsilon^{5/3} K^{2/3}} + frac{alpha^{4/3}}{epsilon^{5/3}} right)$ gradient computations on average at each node (SN and WNs) to reach an $epsilon$-stationary point. The proposed algorithm guarantees convergence via the design of a novel Byzantine filtering rule which is independent of the problem dimension. Importantly, we capture the effect of the fraction of Byzantine nodes $alpha$ present in the network on the convergence performance of the algorithm.

Optimization and Control Distributed Parallel and Cluster Computing Multiagent Systems

Efficient Byzantine-Resilient Stochastic Gradient Desce

102 - Kaiyun Li , Xiaojun Chen , Ye Dong 2021

Distributed Learning often suffers from Byzantine failures, and there have been a number of works studying the problem of distributed stochastic optimization under Byzantine failures, where only a portion of workers, instead of all the workers in a distributed learning system, compute stochastic gradients at each iteration. These methods, albeit workable under Byzantine failures, have the shortcomings of either a sub-optimal convergence rate or high computation cost. To this end, we propose a new Byzantine-resilient stochastic gradient descent algorithm (BrSGD for short) which is provably robust against Byzantine failures. BrSGD obtains the optimal statistical performance and efficient computation simultaneously. In particular, BrSGD can achieve an order-optimal statistical error rate for strongly convex loss functions. The computation complexity of BrSGD is O(md), where d is the model dimension and m is the number of machines. Experimental results show that BrSGD can obtain competitive results compared with non-Byzantine machines in terms of effectiveness and convergence.

Distributed Parallel and Cluster Computing

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

BRIDGE: Byzantine-resilient Decentralized Gradient Descent

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions