No Arabic abstract
This paper studies bilevel polynomial optimization problems. To solve them, we give a method based on polynomial optimization relaxations. Each relaxation is obtained from the Kurash-Kuhn-Tucker (KKT) conditions for the lower level optimization and the exchange technique for semi-infinite programming. For KKT conditions, Lagrange multipliers are represented as polynomial or rational functions. The Moment-SOS relaxations are used to solve the polynomial optimizattion relaxations. Under some general assumptions, we prove the convergence of the algorithm for solving bilevel polynomial optimization problems. Numerical experiments are presented to show the efficiency of the method.
Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $epsilon$-stationary point of the bilevel problem, STABLE requires ${cal O}(epsilon^{-2})$ samples in total; and to achieve an $epsilon$-optimal solution in the strongly convex case, STABLE requires ${cal O}(epsilon^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.
A bilevel program is an optimization problem whose constraints involve another optimization problem. This paper studies bilevel polynomial programs (BPPs), i.e., all the functions are polynomials. We reformulate BPPs equivalently as semi-infinite polynomial programs (SIPPs), using Fritz John conditions and Jacobian representations. Combining the exchange technique and Lasserre type semidefinite relaxations, we propose numerical methods for solving both simple and general BPPs. For simple BPPs, we prove the convergence to global optimal solutions. Numerical experiments are presented to show the efficiency of proposed algorithms.
This paper proposes a new algorithm -- the underline{S}ingle-timescale Dounderline{u}ble-momentum underline{St}ochastic underline{A}pproxunderline{i}matiounderline{n} (SUSTAIN) -- for tackling stochastic unconstrained bilevel optimization problems. We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth. Unlike prior works which rely on emph{two-timescale} or emph{double loop} techniques, we design a stochastic momentum-assisted gradient estimator for both the upper and lower level updates. The latter allows us to control the error in the stochastic gradient updates due to inaccurate solution to both subproblems. If the upper objective function is smooth but possibly non-convex, we show that {aname}~requires $mathcal{O}(epsilon^{-3/2})$ iterations (each using ${cal O}(1)$ samples) to find an $epsilon$-stationary solution. The $epsilon$-stationary solution is defined as the point whose squared norm of the gradient of the outer function is less than or equal to $epsilon$. The total number of stochastic gradient samples required for the upper and lower level objective functions matches the best-known complexity for single-level stochastic gradient algorithms. We also analyze the case when the upper level objective function is strongly-convex.
Bilevel optimization has been widely applied many machine learning problems such as hyperparameter optimization, policy optimization and meta learning. Although many bilevel optimization methods more recently have been proposed to solve the bilevel optimization problems, they still suffer from high computational complexities and do not consider the more general bilevel problems with nonsmooth regularization. In the paper, thus, we propose a class of efficient bilevel optimization methods based on Bregman distance. In our methods, we use the mirror decent iteration to solve the outer subproblem of the bilevel problem by using strongly-convex Bregman functions. Specifically, we propose a bilevel optimization method based on Bregman distance (BiO-BreD) for solving deterministic bilevel problems, which reaches the lower computational complexities than the best known results. We also propose a stochastic bilevel optimization method (SBiO-BreD) for solving stochastic bilevel problems based on the stochastic approximated gradients and Bregman distance. Further, we propose an accelerated version of SBiO-BreD method (ASBiO-BreD) by using the variance-reduced technique. Moreover, we prove that the ASBiO-BreD outperforms the best known computational complexities with respect to the condition number $kappa$ and the target accuracy $epsilon$ for finding an $epsilon$-stationary point of nonconvex-strongly-convex bilevel problems. In particular, our methods can solve the bilevel optimization problems with nonsmooth regularization with a lower computational complexity.
Bilevel optimization problems are at the center of several important machine learning problems such as hyperparameter tuning, data denoising, meta- and few-shot learning, and training-data poisoning. Different from simultaneous or multi-objective optimization, the steepest descent direction for minimizing the upper-level cost requires the inverse of the Hessian of the lower-level cost. In this paper, we propose a new method for solving bilevel optimization problems using the classical penalty function approach which avoids computing the inverse and can also handle additional constraints easily. We prove the convergence of the method under mild conditions and show that the exact hypergradient is obtained asymptotically. Our methods simplicity and small space and time complexities enable us to effectively solve large-scale bilevel problems involving deep neural networks. We present results on data denoising, few-shot learning, and training-data poisoning problems in a large scale setting and show that our method outperforms or is comparable to previously proposed methods based on automatic differentiation and approximate inversion in terms of accuracy, run-time and convergence speed.