Convergence Analysis of Nonconvex Distributed Stochastic Zeroth-order Coordinate Method

119 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Shengjun Zhang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Shengjun Zhang - Yunlong Dong - Dong Xie

التحسين والتحكم التعلم الآلي أنظمة وتحكم

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

This paper investigates the stochastic distributed nonconvex optimization problem of minimizing a global cost function formed by the summation of $n$ local cost functions. We solve such a problem by involving zeroth-order (ZO) information exchange. In this paper, we propose a ZO distributed primal-dual coordinate method (ZODIAC) to solve the stochastic optimization problem. Agents approximate their own local stochastic ZO oracle along with coordinates with an adaptive smoothing parameter. We show that the proposed algorithm achieves the convergence rate of $mathcal{O}(sqrt{p}/sqrt{T})$ for general nonconvex cost functions. We demonstrate the efficiency of proposed algorithms through a numerical example in comparison with the existing state-of-the-art centralized and distributed ZO algorithms.

قيم البحث

141 - Xinlei Yi , Shengjun Zhang , Tao Yang 2021

In this paper, we consider a stochastic distributed nonconvex optimization problem with the cost function being distributed over $n$ agents having access only to zeroth-order (ZO) information of the cost. This problem has various machine learning app lications. As a solution, we propose two distributed ZO algorithms, in which at each iteration each agent samples the local stochastic ZO oracle at two points with an adaptive smoothing parameter. We show that the proposed algorithms achieve the linear speedup convergence rate $mathcal{O}(sqrt{p/(nT)})$ for smooth cost functions and $mathcal{O}(p/(nT))$ convergence rate when the global cost function additionally satisfies the Polyak--Lojasiewicz (P--L) condition, where $p$ and $T$ are the dimension of the decision variable and the total number of iterations, respectively. To the best of our knowledge, this is the first linear speedup result for distributed ZO algorithms, which enables systematic processing performance improvements by adding more agents. We also show that the proposed algorithms converge linearly when considering deterministic centralized optimization problems under the P--L condition. We demonstrate through numerical experiments the efficiency of our algorithms on generating adversarial examples from deep neural networks in comparison with baseline and recently proposed centralized and distributed ZO algorithms.

التحسين والتحكم

Accelerated Zeroth-order Algorithm for Stochastic Distributed Nonconvex Optimization

163 - Shengjun Zhang , Colleen P. Bailey 2021

This paper investigates how to accelerate the convergence of distributed optimization algorithms on nonconvex problems with zeroth-order information available only. We propose a zeroth-order (ZO) distributed primal-dual stochastic coordinates algorit hm equipped with powerball method to accelerate. We prove that the proposed algorithm has a convergence rate of $mathcal{O}(sqrt{p}/sqrt{nT})$ for general nonconvex cost functions. We consider solving the generation of adversarial examples from black-box DNNs problem to compare with the existing state-of-the-art centralized and distributed ZO algorithms. The numerical results demonstrate the faster convergence rate of the proposed algorithm and match the theoretical analysis.

التحسين والتحكم

Zeroth-order Stochastic Compositional Algorithms for Risk-Aware Learning

114 - Dionysios S. Kalogerias , Warren B. Powell 2019

We present Free-MESSAGEp, the first zeroth-order algorithm for convex mean-semideviation-based risk-aware learning, which is also the first three-level zeroth-order compositional stochastic optimization algorithm, whatsoever. Using a non-trivial exte nsion of Nesterovs classical results on Gaussian smoothing, we develop the Free-MESSAGEp algorithm from first principles, and show that it essentially solves a smoothed surrogate to the original problem, the former being a uniform approximation of the latter, in a useful, convenient sense. We then present a complete analysis of the Free-MESSAGEp algorithm, which establishes convergence in a user-tunable neighborhood of the optimal solutions of the original problem, as well as explicit convergence rates for both convex and strongly convex costs. Orderwise, and for fixed problem parameters, our results demonstrate no sacrifice in convergence speed compared to existing first-order methods, while striking a certain balance among the condition of the problem, its dimensionality, as well as the accuracy of the obtained results, naturally extending previous results in zeroth-order risk-neutral learning.

التحسين والتحكم التعلم الآلي أنظمة وتحكم

Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

86 - Sijia Liu , Bhavya Kailkhura , Pin-Yu Chen 2018

As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying. This paper addresses these challenges by presenting: a) a comprehensive theoretical analysis of variance reduced zeroth-order (ZO) optimization, b) a novel variance reduced ZO algorithm, called ZO-SVRG, and c) an experimental evaluation of our approach in the context of two compelling applications, black-box chemical material classification and generation of adversarial examples from black-box deep neural network models. Our theoretical analysis uncovers an essential difficulty in the analysis of ZO-SVRG: the unbiased assumption on gradient estimates no longer holds. We prove that compared to its first-order counterpart, ZO-SVRG with a two-point random gradient estimator could suffer an additional error of order $O(1/b)$, where $b$ is the mini-batch size. To mitigate this error, we propose two accelerate

التعلم الآلي التعلم الالي

On Centralized and Distributed Mirror Descent: Exponential Convergence Analysis Using Quadratic Constraints

331 - Youbang Sun , Mahyar Fazlyab , Shahin Shahrampour 2021

Mirror descent (MD) is a powerful first-order optimization technique that subsumes several optimization algorithms including gradient descent (GD). In this work, we study the exact convergence rate of MD in both centralized and distributed cases for strongly convex and smooth problems. We view MD with a dynamical system lens and leverage quadratic constraints (QCs) to provide convergence guarantees based on the Lyapunov stability. For centralized MD, we establish a semi-definite programming (SDP) that certifies exponentially fast convergence of MD subject to a linear matrix inequality (LMI). We prove that the SDP always has a feasible solution that recovers the optimal GD rate. Next, we analyze the exponential convergence of distributed MD and characterize the rate using two LMIs. To the best of our knowledge, the exact (exponential) rate of distributed MD has not been previously explored in the literature. We present numerical results as a verification of our theory and observe that the richness of the Lyapunov function entails better (worst-case) convergence rates compared to existing works on distributed GD.

التحسين والتحكم التعلم الآلي أنظمة وتحكم