No Arabic abstract
In this paper, we propose a new method based on the Sliding Algorithm from Lan(2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. Our method uses the stochastic noised zeroth-order oracle for the non-smooth part and the first-order oracle for the smooth part. To the best of our knowledge, this is the first method in the literature that uses such a mixed oracle for the composite optimization. We prove the convergence rate for the new method that matches the corresponding rate for the first-order method up to a factor proportional to the dimension of the space or, in some cases, its squared logarithm. We apply this method for the decentralized distributed optimization and derive upper bounds for the number of communication rounds for this method that matches known lower bounds. Moreover, our bound for the number of zeroth-order oracle calls per node matches the similar state-of-the-art bound for the first-order decentralized distributed optimization up to to the factor proportional to the dimension of the space or, in some cases, even its squared logarithm.
This paper addresses a distributed optimization problem in a communication network where nodes are active sporadically. Each active node applies some learning method to control its action to maximize the global utility function, which is defined as the sum of the local utility functions of active nodes. We deal with stochastic optimization problem with the setting that utility functions are disturbed by some non-additive stochastic process. We consider a more challenging situation where the learning method has to be performed only based on a scalar approximation of the utility function, rather than its closed-form expression, so that the typical gradient descent method cannot be applied. This setting is quite realistic when the network is affected by some stochastic and time-varying process, and that each node cannot have the full knowledge of the network states. We propose a distributed optimization algorithm and prove its almost surely convergence to the optimum. Convergence rate is also derived with an additional assumption that the objective function is strongly concave. Numerical results are also presented to justify our claim.
We consider continuous-time dynamics for distributed optimization with set constraints in the note. To handle the computational complexity of projection-based dynamics due to solving a general quadratic optimization subproblem with projection, we propose a distributed projection-free dynamics by employing the Frank-Wolfe method, also known as the conditional gradient algorithm. The process searches a feasible descent direction with solving an alternative linear optimization instead of a quadratic one. To make the algorithm implementable over weight-balanced digraphs, we design one dynamics for the consensus of local decision variables and another dynamics of auxiliary variables to track the global gradient. Then we prove the convergence of the dynamical systems to the optimal solution, and provide detailed numerical comparisons with both projection-based dynamics and other distributed projection-free algorithms.
Decentralized optimization and communication compression have exhibited their great potential in accelerating distributed machine learning by mitigating the communication bottleneck in practice. While existing decentralized algorithms with communication compression mostly focus on the problems with only smooth components, we study the decentralized stochastic composite optimization problem with a potentially non-smooth component. A underline{Prox}imal gradient underline{L}inunderline{EA}r convergent underline{D}ecentralized algorithm with compression, Prox-LEAD, is proposed with rigorous theoretical analyses in the general stochastic setting and the finite-sum setting. Our theorems indicate that Prox-LEAD works with arbitrary compression precision, and it tremendously reduces the communication cost almost for free. The superiorities of the proposed algorithms are demonstrated through the comparison with state-of-the-art algorithms in terms of convergence complexities and numerical experiments. Our algorithmic framework also generally enlightens the compressed communication on other primal-dual algorithms by reducing the impact of inexact iterations, which might be of independent interest.
This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose a Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerances inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Lastly, we numerically test ZODPO on multi-zone HVAC systems.
We propose a new class of rigorous methods for derivative-free optimization with the aim of delivering efficient and robust numerical performance for functions of all types, from smooth to non-smooth, and under different noise regimes. To this end, we have developed Full-Low Evaluation methods, organized around two main types of iterations. The first iteration type is expensive in function evaluations, but exhibits good performance in the smooth and non-noisy cases. For the theory, we consider a line search based on an approximate gradient, backtracking until a sufficient decrease condition is satisfied. In practice, the gradient was approximated via finite differences, and the direction was calculated by a quasi-Newton step (BFGS). The second iteration type is cheap in function evaluations, yet more robust in the presence of noise or non-smoothness. For the theory, we consider direct search, and in practice we use probabilistic direct search with one random direction and its negative. A switch condition from Full-Eval to Low-Eval iterations was developed based on the values of the line-search and direct-search stepsizes. If enough Full-Eval steps are taken, we derive a complexity result of gradient-descent type. Under failure of Full-Eval, the Low-Eval iterations become the drivers of convergence yielding non-smooth convergence. Full-Low Evaluation methods are shown to be efficient and robust in practice across problems with different levels of smoothness and noise.