ترغب بنشر مسار تعليمي؟ اضغط هنا

On the Convergence of Nested Decentralized Gradient Methods with Multiple Consensus and Gradient Steps

327   0   0.0 ( 0 )
 نشر من قبل Albert Berahas
 تاريخ النشر 2020
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

In this paper, we consider minimizing a sum of local convex objective functions in a distributed setting, where the cost of communication and/or computation can be expensive. We extend and generalize the analysis for a class of nested gradient-based distributed algorithms (NEAR-DGD; Berahas, Bollapragada, Keskar and Wei, 2018) to account for multiple gradient steps at every iteration. We show the effect of performing multiple gradient steps on the rate of convergence and on the size of the neighborhood of convergence, and prove R-Linear convergence to the exact solution with a fixed number of gradient steps and increasing number of consensus steps. We test the performance of the generalized method on quadratic functions and show the effect of multiple consensus and gradient steps in terms of iterations, number of gradient evaluations, number of communications and cost.

قيم البحث

اقرأ أيضاً

Communication compression techniques are of growing interests for solving the decentralized optimization problem under limited communication, where the global objective is to minimize the average of local cost functions over a multi-agent network usi ng only local computation and peer-to-peer communication. In this paper, we first propose a novel compressed gradient tracking algorithm (C-GT) that combines gradient tracking technique with communication compression. In particular, C-GT is compatible with a general class of compression operators that unifies both unbiased and biased compressors. We show that C-GT inherits the advantages of gradient tracking-based algorithms and achieves linear convergence rate for strongly convex and smooth objective functions. In the second part of this paper, we propose an error feedback based compressed gradient tracking algorithm (EF-C-GT) to further improve the algorithm efficiency for biased compression operators. Numerical examples complement the theoretical findings and demonstrate the efficiency and flexibility of the proposed algorithms.
Adaptive gradient methods including Adam, AdaGrad, and their variants have been very successful for training deep learning models, such as neural networks. Meanwhile, given the need for distributed computing, distributed optimization algorithms are r apidly becoming a focal point. With the growth of computing power and the need for using machine learning models on mobile devices, the communication cost of distributed training algorithms needs careful consideration. In this paper, we introduce novel convergent decentralized adaptive gradient methods and rigorously incorporate adaptive gradient methods into decentralized training procedures. Specifically, we propose a general algorithmic framework that can convert existing adaptive gradient methods to their decentralized counterparts. In addition, we thoroughly analyze the convergence behavior of the proposed algorithmic framework and show that if a given adaptive gradient method converges, under some specific conditions, then its decentralized counterpart is also convergent. We illustrate the benefit of our generic decentralized framework on a prototype method, i.e., AMSGrad, both theoretically and numerically.
This work studies a class of non-smooth decentralized multi-agent optimization problems where the agents aim at minimizing a sum of local strongly-convex smooth components plus a common non-smooth term. We propose a general primal-dual algorithmic fr amework that unifies many existing state-of-the-art algorithms. We establish linear convergence of the proposed method to the exact solution in the presence of the non-smooth term. Moreover, for the more general class of problems with agent specific non-smooth terms, we show that linear convergence cannot be achieved (in the worst case) for the class of algorithms that uses the gradients and the proximal mappings of the smooth and non-smooth parts, respectively. We further provide a numerical counterexample that shows how some state-of-the-art algorithms fail to converge linearly for strongly-convex objectives and different local non-smooth terms.
In this paper, we consider minimizing a sum of local convex objective functions in a distributed setting, where communication can be costly. We propose and analyze a class of nested distributed gradient methods with adaptive quantized communication ( NEAR-DGD+Q). We show the effect of performing multiple quantized communication steps on the rate of convergence and on the size of the neighborhood of convergence, and prove R-Linear convergence to the exact solution with increasing number of consensus steps and adaptive quantization. We test the performance of the method, as well as some practical variants, on quadratic functions, and show the effects of multiple quantized communication steps in terms of iterations/gradient evaluations, communication and cost.
Stochastic gradient methods are scalable for solving large-scale optimization problems that involve empirical expectations of loss functions. Existing results mainly apply to optimization problems where the objectives are one- or two-level expectatio ns. In this paper, we consider the multi-level compositional optimization problem that involves compositions of multi-level component functions and nested expectations over a random path. It finds applications in risk-averse optimization and sequential planning. We propose a class of multi-level stochastic gradient methods that are motivated from the method of multi-timescale stochastic approximation. First we propose a basic $T$-level stochastic compositional gradient algorithm, establish its almost sure convergence and obtain an $n$-iteration error bound $O (n^{-1/2^T})$. Then we develop accelerated multi-level stochastic gradient methods by using an extrapolation-interpolation scheme to take advantage of the smoothness of individual component functions. When all component functions are smooth, we show that the convergence rate improves to $O(n^{-4/(7+T)})$ for general objectives and $O (n^{-4/(3+T)})$ for strongly convex objectives. We also provide almost sure convergence and rate of convergence results for nonconvex problems. The proposed methods and theoretical results are validated using numerical experiments.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا