Push-SAGA: A decentralized stochastic algorithm with variance reduction over directed graphs

353 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Usman Khan

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Muhammad I. Qureshi - Ran Xin - Soummya Kar

التعلم الآلي النظم الموزعة والتوازية والحوسبة العنقودية أنظمة متعددة العملاء

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we propose Push-SAGA, a decentralized stochastic first-order method for finite-sum minimization over a directed network of nodes. Push-SAGA combines node-level variance reduction to remove the uncertainty caused by stochastic gradients, network-level gradient tracking to address the distributed nature of the data, and push-sum consensus to tackle the challenge of directed communication links. We show that Push-SAGA achieves linear convergence to the exact solution for smooth and strongly convex problems and is thus the first linearly-convergent stochastic algorithm over arbitrary strongly connected directed graphs. We also characterize the regimes in which Push-SAGA achieves a linear speed-up compared to its centralized counterpart and achieves a network-independent convergence rate. We illustrate the behavior and convergence properties of Push-SAGA with the help of numerical experiments on strongly convex and non-convex problems.

قيم البحث

76 - Muhammad I. Qureshi , Ran Xin , Soummya Kar 2020

In this report, we study decentralized stochastic optimization to minimize a sum of smooth and strongly convex cost functions when the functions are distributed over a directed network of nodes. In contrast to the existing work, we use gradient track ing to improve certain aspects of the resulting algorithm. In particular, we propose the~textbf{texttt{S-ADDOPT}} algorithm that assumes a stochastic first-order oracle at each node and show that for a constant step-size~$alpha$, each node converges linearly inside an error ball around the optimal solution, the size of which is controlled by~$alpha$. For decaying step-sizes~$mathcal{O}(1/k)$, we show that~textbf{texttt{S-ADDOPT}} reaches the exact solution sublinearly at~$mathcal{O}(1/k)$ and its convergence is asymptotically network-independent. Thus the asymptotic behavior of~textbf{texttt{S-ADDOPT}} is comparable to the centralized stochastic gradient descent. Numerical experiments over both strongly convex and non-convex problems illustrate the convergence behavior and the performance comparison of the proposed algorithm.

التعلم الآلي أنظمة وتحكم أنظمة وتحكم

Provably Accelerated Decentralized Gradient Method Over Unbalanced Directed Graphs

88 - Zhuoqing Song , Lei Shi , Shi Pu 2021

In this work, we consider the decentralized optimization problem in which a network of $n$ agents, each possessing a smooth and convex objective function, wish to collaboratively minimize the average of all the objective functions through peer-to-pee r communication in a directed graph. To solve the problem, we propose two accelerated Push-DIGing methods termed APD and APD-SC for minimizing non-strongly convex objective functions and strongly convex ones, respectively. We show that APD and APD-SC respectively converge at the rates $Oleft(frac{1}{k^2}right)$ and $Oleft(left(1 - Csqrt{frac{mu}{L}}right)^kright)$ up to constant factors depending only on the mixing matrix. To the best of our knowledge, APD and APD-SC are the first decentralized methods to achieve provable acceleration over unbalanced directed graphs. Numerical experiments demonstrate the effectiveness of both methods.

التحسين والتحكم النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي

Fast decentralized non-convex finite-sum optimization with recursive variance reduction

209 - Ran Xin , Usman A. Khan , Soummya Kar 2020

This paper considers decentralized minimization of $N:=nm$ smooth non-convex cost functions equally divided over a directed network of $n$ nodes. Specifically, we describe a stochastic first-order gradient method, called GT-SARAH, that employs a SARA H-type variance reduction technique and gradient tracking (GT) to address the stochastic and decentralized nature of the problem. We show that GT-SARAH, with appropriate algorithmic parameters, finds an $epsilon$-accurate first-order stationary point with $Obig(maxbig{N^{frac{1}{2}},n(1-lambda)^{-2},n^{frac{2}{3}}m^{frac{1}{3}}(1-lambda)^{-1}big}Lepsilon^{-2}big)$ gradient complexity, where ${(1-lambda)in(0,1]}$ is the spectral gap of the network weight matrix and $L$ is the smoothness parameter of the cost functions. This gradient complexity outperforms that of the existing decentralized stochastic gradient methods. In particular, in a big-data regime such that ${n = O(N^{frac{1}{2}}(1-lambda)^{3})}$, this gradient complexity furthers reduces to ${O(N^{frac{1}{2}}Lepsilon^{-2})}$, independent of the network topology, and matches that of the centralized near-optimal variance-reduced methods. Moreover, in this regime GT-SARAH achieves a non-asymptotic linear speedup, in that, the total number of gradient computations at each node is reduced by a factor of $1/n$ compared to the centralized near-optimal algorithms that perform all gradient computations at a single node. To the best of our knowledge, GT-SARAH is the first algorithm that achieves this property. In addition, we show that appropriate choices of local minibatch size balance the trade-offs between the gradient and communication complexity of GT-SARAH. Over infinite time horizon, we establish that all nodes in GT-SARAH asymptotically achieve consensus and converge to a first-order stationary point in the almost sure and mean-squared sense.

التحسين والتحكم التعلم الآلي أنظمة متعددة العملاء

Stochastic Gradient Langevin Dynamics with Variance Reduction

78 - Zhishen Huang , Stephen Becker 2021

Stochastic gradient Langevin dynamics (SGLD) has gained the attention of optimization researchers due to its global optimization properties. This paper proves an improved convergence property to local minimizers of nonconvex objective functions using SGLD accelerated by variance reductions. Moreover, we prove an ergodicity property of the SGLD scheme, which gives insights on its potential to find global minimizers of nonconvex objectives.

التعلم الآلي التحسين والتحكم

A Hybrid Variance-Reduced Method for Decentralized Stochastic Non-Convex Optimization

88 - Ran Xin , Usman A. Khan , Soummya Kar 2021

This paper considers decentralized stochastic optimization over a network of $n$ nodes, where each node possesses a smooth non-convex local cost function and the goal of the networked nodes is to find an $epsilon$-accurate first-order stationary poin t of the sum of the local costs. We focus on an online setting, where each node accesses its local cost only by means of a stochastic first-order oracle that returns a noisy version of the exact gradient. In this context, we propose a novel single-loop decentralized hybrid variance-reduced stochastic gradient method, called GT-HSGD, that outperforms the existing approaches in terms of both the oracle complexity and practical implementation. The GT-HSGD algorithm implements specialized local hybrid stochastic gradient estimators that are fused over the network to track the global gradient. Remarkably, GT-HSGD achieves a network topology-independent oracle complexity of $O(n^{-1}epsilon^{-3})$ when the required error tolerance $epsilon$ is small enough, leading to a linear speedup with respect to the centralized optimal online variance-reduced approaches that operate on a single node. Numerical experiments are provided to illustrate our main technical results.

التحسين والتحكم النظم الموزعة والتوازية والحوسبة العنقودية التعلم الآلي