No Arabic abstract
In this paper, we consider the optimization problem of minimizing a continuously differentiable function subject to both convex constraints and sparsity constraints. By exploiting a mixed-integer reformulation from the literature, we define a necessary optimality condition based on a tailored neighborhood that allows to take into account potential changes of the support set. We then propose an algorithmic framework to tackle the considered class of problems and prove its convergence to points satisfying the newly introduced concept of stationarity. We further show that, by suitably choosing the neighborhood, other well-known optimality conditions from the literature can be recovered at the limit points of the sequence produced by the algorithm. Finally, we analyze the computational impact of the neighborhood size within our framework and in the comparison with some state-of-the-art algorithms, namely, the Penalty Decomposition method and the Greedy Sparse-Simplex method. The algorithms have been tested using a benchmark related to sparse logistic regression problems.
Nonsmooth sparsity constrained optimization captures a broad spectrum of applications in machine learning and computer vision. However, this problem is NP-hard in general. Existing solutions to this problem suffer from one or more of the following limitations: they fail to solve general nonsmooth problems; they lack convergence analysis; they lead to weaker optimality conditions. This paper revisits the Penalty Alternating Direction Method (PADM) for nonsmooth sparsity constrained optimization problems. We consider two variants of the PADM, i.e., PADM based on Iterative Hard Thresholding (PADM-IHT) and PADM based on Block Coordinate Decomposition (PADM-BCD). We show that the PADM-BCD algorithm finds stronger stationary points of the optimization problem than previous methods. We also develop novel theories to analyze the convergence rate for both the PADM-IHT and the PADM-BCD algorithms. Our theoretical bounds can exploit the inherent sparsity of the optimization problem. Finally, numerical results demonstrate the superiority of PADM-BCD to existing sparse optimization algorithms. Keywords: Sparsity Recovery, Nonsmooth Optimization, Non-Convex Optimization, Block Coordinate Decomposition, Iterative Hard Thresholding, Convergence Analysis
We expand the scope of the alternating direction method of multipliers (ADMM). Specifically, we show that ADMM, when employed to solve problems with multiaffine constraints that satisfy certain verifiable assumptions, converges to the set of constrained stationary points if the penalty parameter in the augmented Lagrangian is sufficiently large. When the Kurdyka-L{}ojasiewicz (K-L{}) property holds, this is strengthened to convergence to a single constrained stationary point. Our analysis applies under assumptions that we have endeavored to make as weak as possible. It applies to problems that involve nonconvex and/or nonsmooth objective terms, in addition to the multiaffine constraints that can involve multiple (three or more) blocks of variables. To illustrate the applicability of our results, we describe examples including nonnegative matrix factorization, sparse learning, risk parity portfolio selection, nonconvex formulations of convex problems, and neural network training. In each case, our ADMM approach encounters only subproblems that have closed-form solutions.
We provide several algorithms for constrained optimization of a large class of convex problems, including softmax, $ell_p$ regression, and logistic regression. Central to our approach is the notion of width reduction, a technique which has proven immensely useful in the context of maximum flow [Christiano et al., STOC11] and, more recently, $ell_p$ regression [Adil et al., SODA19], in terms of improving the iteration complexity from $O(m^{1/2})$ to $tilde{O}(m^{1/3})$, where $m$ is the number of rows of the design matrix, and where each iteration amounts to a linear system solve. However, a considerable drawback is that these methods require both problem-specific potentials and individually tailored analyses. As our main contribution, we initiate a new direction of study by presenting the first unified approach to achieving $m^{1/3}$-type rates. Notably, our method goes beyond these previously considered problems to more broadly capture quasi-self-concordant losses, a class which has recently generated much interest and includes the well-studied problem of logistic regression, among others. In order to do so, we develop a unified width reduction method for carefully handling these losses based on a more general set of potentials. Additionally, we directly achieve $m^{1/3}$-type rates in the constrained setting without the need for any explicit acceleration schemes, thus naturally complementing recent work based on a ball-oracle approach [Carmon et al., NeurIPS20].
Minimax optimization problems arises from both modern machine learning including generative adversarial networks, adversarial training and multi-agent reinforcement learning, as well as from tradition research areas such as saddle point problems, numerical partial differential equations and optimality conditions of equality constrained optimization. For the unconstrained continuous nonconvex-nonconcave situation, Jin, Netrapalli and Jordan (2019) carefully considered the very basic question: what is a proper definition of local optima of a minimax optimization problem, and proposed a proper definition of local optimality called local minimax. We shall extend the definition of local minimax point to constrained nonconvex-nonconcave minimax optimization problems. By analyzing Jacobian uniqueness conditions for the lower-level maximization problem and the strong regularity of Karush-Kuhn-Tucker conditions of the maximization problem, we provide both necessary optimality conditions and sufficient optimality conditions for the local minimax points of constrained minimax optimization problems.
We introduce a novel primal-dual flow for affine constrained convex optimization problem. As a modification of the standard saddle-point system, our primal-dual flow is proved to possesses the exponential decay property, in terms of a tailored Lyapunov function. Then a class of primal-dual methods for the original optimization problem are obtained from numerical discretizations of the continuous flow, and with a unified discrete Lyapunov function, nonergodic convergence rates are established. Among those algorithms, we can recover the (linearized) augmented Lagrangian method and the quadratic penalty method with continuation technique. Also, new methods with a special inner problem, that is a linear symmetric positive definite system or a nonlinear equation which may be solved efficiently via the semi-smooth Newton method, have been proposed as well. Especially, numerical tests on the linearly constrained $l_1$-$l_2$ minimization show that our method outperforms the accelerated linearized Bregman method.