ﻻ يوجد ملخص باللغة العربية
We study the problem of optimizing a non-convex loss function (with saddle points) in a distributed framework in the presence of Byzantine machines. We consider a standard distributed setting with one central machine (parameter server) communicating with many worker machines. Our proposed algorithm is a variant of the celebrated cubic-regularized Newton method of Nesterov and Polyak cite{nest}, which avoids saddle points efficiently and converges to local minima. Furthermore, our algorithm resists the presence of Byzantine machines, which may create emph{fake local minima} near the saddle points of the loss function, also known as saddle-point attack. We robustify the cubic-regularized Newton algorithm such that it avoids the saddle points and the fake local minimas efficiently. Furthermore, being a second order algorithm, the iteration complexity is much lower than its first order counterparts, and thus our algorithm communicates little with the parameter server. We obtain theoretical guarantees for our proposed scheme under several settings including approximate (sub-sampled) gradients and Hessians. Moreover, we validate our theoretical findings with experiments using standard datasets and several types of Byzantine attacks.
We propose in this paper New Q-Newtons method. The update rule is very simple conceptually, for example $x_{n+1}=x_n-w_n$ where $w_n=pr_{A_n,+}(v_n)-pr_{A_n,-}(v_n)$, with $A_n= abla ^2f(x_n)+delta _n|| abla f(x_n)||^2.Id$ and $v_n=A_n^{-1}. abla f(x
In this paper, we study distributed algorithms for large-scale AUC maximization with a deep neural network as a predictive model. Although distributed learning techniques have been investigated extensively in deep learning, they are not directly appl
Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbe
In a recent joint work, the author has developed a modification of Newtons method, named New Q-Newtons method, which can avoid saddle points and has quadratic rate of convergence. While good theoretical convergence guarantee has not been established
We study robust distributed learning that involves minimizing a non-convex loss function with saddle points. We consider the Byzantine setting where some worker machines have abnormal or even arbitrary and adversarial behavior. In this setting, the B