No Arabic abstract
Large scale, non-convex optimization problems arising in many complex networks such as the power system call for efficient and scalable distributed optimization algorithms. Existing distributed methods are usually iterative and require synchronization of all workers at each iteration, which is hard to scale and could result in the under-utilization of computation resources due to the heterogeneity of the subproblems. To address those limitations of synchronous schemes, this paper proposes an asynchronous distributed optimization method based on the Alternating Direction Method of Multipliers (ADMM) for non-convex optimization. The proposed method only requires local communications and allows each worker to perform local updates with information from a subset of but not all neighbors. We provide sufficient conditions on the problem formulation, the choice of algorithm parameter and network delay, and show that under those mild conditions, the proposed asynchronous ADMM method asymptotically converges to the KKT point of the non-convex problem. We validate the effectiveness of asynchronous ADMM by applying it to the Optimal Power Flow problem in multiple power systems and show that the convergence of the proposed asynchronous scheme could be faster than its synchronous counterpart in large-scale applications.
In this work, we propose a distributed algorithm for stochastic non-convex optimization. We consider a worker-server architecture where a set of $K$ worker nodes (WNs) in collaboration with a server node (SN) jointly aim to minimize a global, potentially non-convex objective function. The objective function is assumed to be the sum of local objective functions available at each WN, with each node having access to only the stochastic samples of its local objective function. In contrast to the existing approaches, we employ a momentum based single loop distributed algorithm which eliminates the need of computing large batch size gradients to achieve variance reduction. We propose two algorithms one with adaptive and the other with non-adaptive learning rates. We show that the proposed algorithms achieve the optimal computational complexity while attaining linear speedup with the number of WNs. Specifically, the algorithms reach an $epsilon$-stationary point $x_a$ with $mathbb{E}| abla f(x_a) | leq tilde{O}(K^{-1/3}T^{-1/2} + K^{-1/3}T^{-1/3})$ in $T$ iterations, thereby requiring $tilde{O}(K^{-1} epsilon^{-3})$ gradient computations at each WN. Moreover, our approach does not assume identical data distributions across WNs making the approach general enough for federated learning applications.
The Alternating Direction Method of Multipliers (ADMM) has been proved to be effective for solving separable convex optimization subject to linear constraints. In this paper, we propose a Generalized Symmetric ADMM (GS-ADMM), which updates the Lagrange multiplier twice with suitable stepsizes, to solve the multi-block separable convex programming. This GS-ADMM partitions the data into two group variables so that one group consists of $p$ block variables while the other has $q$ block variables, where $p ge 1$ and $q ge 1$ are two integers. The two grouped variables are updated in a {it Gauss-Seidel} scheme, while the variables within each group are updated in a {it Jacobi} scheme, which would make it very attractive for a big data setting. By adding proper proximal terms to the subproblems, we specify the domain of the stepsizes to guarantee that GS-ADMM is globally convergent with a worst-case $O(1/t)$ ergodic convergence rate. It turns out that our convergence domain of the stepsizes is significantly larger than other convergence domains in the literature. Hence, the GS-ADMM is more flexible and attractive on choosing and using larger stepsizes of the dual variable. Besides, two special cases of GS-ADMM, which allows using zero penalty terms, are also discussed and analyzed. Compared with several state-of-the-art methods, preliminary numerical experiments on solving a sparse matrix minimization problem in the statistical learning show that our proposed method is effective and promising.
This paper first proposes an N-block PCPM algorithm to solve N-block convex optimization problems with both linear and nonlinear constraints, with global convergence established. A linear convergence rate under the strong second-order conditions for optimality is observed in the numerical experiments. Next, for a starting point, an asynchronous N-block PCPM algorithm is proposed to solve linearly constrained N-block convex optimization problems. The numerical results demonstrate the sub-linear convergence rate under the bounded delay assumption, as well as the faster convergence with more short-time iterations than a synchronous iterative scheme.
We present AUQ-ADMM, an adaptive uncertainty-weighted consensus ADMM method for solving large-scale convex optimization problems in a distributed manner. Our key contribution is a novel adaptive weighting scheme that empirically increases the progress made by consensus ADMM scheme and is attractive when using a large number of subproblems. The weights are related to the uncertainty associated with the solutions of each subproblem, and are efficiently computed using low-rank approximations. We show AUQ-ADMM provably converges and demonstrate its effectiveness on a series of machine learning applications, including elastic net regression, multinomial logistic regression, and support vector machines. We provide an implementation based on the PyTorch package.
This paper investigates accelerating the convergence of distributed optimization algorithms on non-convex problems. We propose a distributed primal-dual stochastic gradient descent~(SGD) equipped with powerball method to accelerate. We show that the proposed algorithm achieves the linear speedup convergence rate $mathcal{O}(1/sqrt{nT})$ for general smooth (possibly non-convex) cost functions. We demonstrate the efficiency of the algorithm through numerical experiments by training two-layer fully connected neural networks and convolutional neural networks on the MNIST dataset to compare with state-of-the-art distributed SGD algorithms and centralized SGD algorithms.