Efficient Byzantine-Resilient Stochastic Gradient Desce

103 0 0.0 ( 0 )

Download Cite

Added by Kaiyun Li

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Kaiyun Li - Xiaojun Chen - Ye Dong

Distributed Parallel and Cluster Computing

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Distributed Learning often suffers from Byzantine failures, and there have been a number of works studying the problem of distributed stochastic optimization under Byzantine failures, where only a portion of workers, instead of all the workers in a distributed learning system, compute stochastic gradients at each iteration. These methods, albeit workable under Byzantine failures, have the shortcomings of either a sub-optimal convergence rate or high computation cost. To this end, we propose a new Byzantine-resilient stochastic gradient descent algorithm (BrSGD for short) which is provably robust against Byzantine failures. BrSGD obtains the optimal statistical performance and efficient computation simultaneously. In particular, BrSGD can achieve an order-optimal statistical error rate for strongly convex loss functions. The computation complexity of BrSGD is O(md), where d is the model dimension and m is the number of machines. Experimental results show that BrSGD can obtain competitive results compared with non-Byzantine machines in terms of effectiveness and convergence.

rate research

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

111 - Zeyuan Allen-Zhu , Faeze Ebrahimian , Jerry Li 2020

We study adversary-resilient stochastic distributed optimization, in which $m$ machines can independently compute stochastic gradients, and cooperate to jointly optimize over their local objective functions. However, an $alpha$-fraction of the machines are $textit{Byzantine}$, in that they may behave in arbitrary, adversarial ways. We consider a variant of this procedure in the challenging $textit{non-convex}$ case. Our main result is a new algorithm SafeguardSGD which can provably escape saddle points and find approximate local minima of the non-convex objective. The algorithm is based on a new concentration filtering technique, and its sample and time complexity bounds match the best known theoretical bounds in the stochastic, distributed setting when no Byzantine machines are present. Our algorithm is very practical: it improves upon the performance of all prior methods when training deep neural networks, it is relatively lightweight, and it is the first method to withstand two recently-proposed Byzantine attacks.

Machine Learning Distributed Parallel and Cluster Computing Data Structures and Algorithms

BRIDGE: Byzantine-resilient Decentralized Gradient Descent

177 - Zhixiong Yang , Waheed U. Bajwa 2019

Decentralized optimization techniques are increasingly being used to learn machine learning models from data distributed over multiple locations without gathering the data at any one location. Unfortunately, methods that are designed for faultless networks typically fail in the presence of node failures. In particular, Byzantine failures---corresponding to the scenario in which faulty/compromised nodes are allowed to arbitrarily deviate from an agreed-upon protocol---are the hardest to safeguard against in decentralized settings. This paper introduces a Byzantine-resilient decentralized gradient descent (BRIDGE) method for decentralized learning that, when compared to existing works, is more efficient and scalable in higher-dimensional settings and that is deployable in networks having topologies that go beyond the star topology. The main contributions of this work include theoretical analysis of BRIDGE for strongly convex learning objectives and numerical experiments demonstrating the efficacy of BRIDGE for both convex and nonconvex learning tasks.

Machine Learning Distributed Parallel and Cluster Computing Machine Learning

Optimal byzantine resilient convergence in oblivious robot networks

551 - Zohir Bouzid 2009

Given a set of robots with arbitrary initial location and no agreement on a global coordinate system, convergence requires that all robots asymptotically approach the exact same, but unknown beforehand, location. Robots are oblivious-- they do not recall the past computations -- and are allowed to move in a one-dimensional space. Additionally, robots cannot communicate directly, instead they obtain system related information only via visual sensors. We draw a connection between the convergence problem in robot networks, and the distributed emph{approximate agreement} problem (that requires correct processes to decide, for some constant $epsilon$, values distance $epsilon$ apart and within the range of initial proposed values). Surprisingly, even though specifications are similar, the convergence implementation in robot networks requires specific assumptions about synchrony and Byzantine resilience. In more details, we prove necessary and sufficient conditions for the convergence of mobile robots despite a subset of them being Byzantine (i.e. they can exhibit arbitrary behavior). Additionally, we propose a deterministic convergence algorithm for robot networks and analyze its correctness and complexity in various synchrony settings. The proposed algorithm tolerates f Byzantine robots for (2f+1)-sized robot networks in fully synchronous networks, (3f+1)-sized in semi-synchronous networks. These bounds are optimal for the class of cautious algorithms, which guarantee that correct robots always move inside the range of positions of the correct robots.

Distributed Parallel and Cluster Computing Robotics

Optimal Byzantine Resilient Convergence in Asynchronous Robot Networks

403 - Zohir Bouzid 2009

We propose the first deterministic algorithm that tolerates up to $f$ byzantine faults in $3f+1$-sized networks and performs in the asynchronous CORDA model. Our solution matches the previously established lower bound for the semi-synchronous ATOM model on the number of tolerated Byzantine robots. Our algorithm works under bounded scheduling assumptions for oblivious robots moving in a uni-dimensional space.

Distributed Parallel and Cluster Computing Robotics

Byzantine-Resilient Federated Learning with Heterogeneous Data Distribution

113 - Saurav Prakash , Hanieh Hashemi , Yongqin Wang 2020

For mitigating Byzantine behaviors in federated learning (FL), most state-of-the-art approaches, such as Bulyan, tend to leverage the similarity of updates from the benign clients. However, in many practical FL scenarios, data is non-IID across clients, thus the updates received from even the benign clients are quite dissimilar. Hence, using similarity based methods result in wasted opportunities to train a model from interesting non-IID data, and also slower model convergence. We propose DiverseFL to overcome this challenge in heterogeneous data distribution settings. Rather than comparing each clients update with other client updates to detect Byzantine clients, DiverseFL compares each clients update with a guiding update of that client. Any client whose update diverges from its associated guiding update is then tagged as a Byzantine node. The FL server in DiverseFL computes the guiding update in every round for each client over a small sample of the clients local data that is received only once before start of the training. However, sharing even a small sample of clients data with the FL server can compromise clients data privacy needs. To tackle this challenge, DiverseFL creates a Trusted Execution Environment (TEE)-based enclave to receive each clients sample and to compute its guiding updates. TEE provides a hardware assisted verification and attestation to each client that its data is not leaked outside of TEE. Through experiments involving neural networks, benchmark datasets and popular Byzantine attacks, we demonstrate that DiverseFL not only performs Byzantine mitigation quite effectively, it also almost matches the performance of OracleSGD, where the server only aggregates the updates from the benign clients.

Distributed Parallel and Cluster Computing