ﻻ يوجد ملخص باللغة العربية
This paper considers decentralized minimization of $N:=nm$ smooth non-convex cost functions equally divided over a directed network of $n$ nodes. Specifically, we describe a stochastic first-order gradient method, called GT-SARAH, that employs a SARAH-type variance reduction technique and gradient tracking (GT) to address the stochastic and decentralized nature of the problem. We show that GT-SARAH, with appropriate algorithmic parameters, finds an $epsilon$-accurate first-order stationary point with $Obig(maxbig{N^{frac{1}{2}},n(1-lambda)^{-2},n^{frac{2}{3}}m^{frac{1}{3}}(1-lambda)^{-1}big}Lepsilon^{-2}big)$ gradient complexity, where ${(1-lambda)in(0,1]}$ is the spectral gap of the network weight matrix and $L$ is the smoothness parameter of the cost functions. This gradient complexity outperforms that of the existing decentralized stochastic gradient methods. In particular, in a big-data regime such that ${n = O(N^{frac{1}{2}}(1-lambda)^{3})}$, this gradient complexity furthers reduces to ${O(N^{frac{1}{2}}Lepsilon^{-2})}$, independent of the network topology, and matches that of the centralized near-optimal variance-reduced methods. Moreover, in this regime GT-SARAH achieves a non-asymptotic linear speedup, in that, the total number of gradient computations at each node is reduced by a factor of $1/n$ compared to the centralized near-optimal algorithms that perform all gradient computations at a single node. To the best of our knowledge, GT-SARAH is the first algorithm that achieves this property. In addition, we show that appropriate choices of local minibatch size balance the trade-offs between the gradient and communication complexity of GT-SARAH. Over infinite time horizon, we establish that all nodes in GT-SARAH asymptotically achieve consensus and converge to a first-order stationary point in the almost sure and mean-squared sense.
We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sum optimization problems. First, we show that, perhaps surprisingly, the finite sum structure by itself, is not sufficient for
In this paper, we study decentralized online stochastic non-convex optimization over a network of nodes. Integrating a technique called gradient tracking in decentralized stochastic gradient descent, we show that the resulting algorithm, GT-DSGD, enj
We study decentralized non-convex finite-sum minimization problems described over a network of nodes, where each node possesses a local batch of data samples. In this context, we analyze a single-timescale randomized incremental gradient method, call
In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of $K$ worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a global, potenti
This paper considers decentralized stochastic optimization over a network of $n$ nodes, where each node possesses a smooth non-convex local cost function and the goal of the networked nodes is to find an $epsilon$-accurate first-order stationary poin