ترغب بنشر مسار تعليمي؟ اضغط هنا

First-Order Methods for Nonconvex Quadratic Minimization

171   0   0.0 ( 0 )
 نشر من قبل Yair Carmon
 تاريخ النشر 2020
  مجال البحث
والبحث باللغة English




اسأل ChatGPT حول البحث

We consider minimization of indefinite quadratics with either trust-region (norm) constraints or cubic regularization. Despite the nonconvexity of these problems we prove that, under mild assumptions, gradient descent converges to their global solutions, and give a non-asymptotic rate of convergence for the cubic variant. We also consider Krylov subspace solutions and establish sharp convergence guarantees to the solutions of both trust-region and cubic-regularized problems. Our rates mirror the behavior of these methods on convex quadratics and eigenvector problems, highlighting their scalability. When we use Krylov subspace solutions to approximate the cubic-regularized Newton step, our results recover the strongest known convergence guarantees to approximate second-order stationary points of general smooth nonconvex functions.



قيم البحث

اقرأ أيضاً

For some typical and widely used non-convex half-quadratic regularization models and the Ambrosio-Tortorelli approximate Mumford-Shah model, based on the Kurdyka-L ojasiewicz analysis and the recent nonconvex proximal algorithms, we developed an effi cient preconditioned framework aiming at the linear subproblems that appeared in the nonlinear alternating minimization procedure. Solving large-scale linear subproblems is always important and challenging for lots of alternating minimization algorithms. By cooperating the efficient and classical preconditioned iterations into the nonlinear and nonconvex optimization, we prove that only one or any finite times preconditioned iterations are needed for the linear subproblems without controlling the error as the usual inexact solvers. The proposed preconditioned framework can provide great flexibility and efficiency for dealing with linear subproblems and guarantee the global convergence of the nonlinear alternating minimization method simultaneously.
The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements. In this work, we adapt the IQC theory to study first-order methods for smoot h and strongly-monotone games and show how to design tailored quadratic constraints to get tight upper bounds of convergence rates. Using this framework, we recover the existing bound for the gradient method~(GD), derive sharper bounds for the proximal point method~(PPM) and optimistic gradient method~(OG), and provide emph{for the first time} a global convergence rate for the negative momentum method~(NM) with an iteration complexity $mathcal{O}(kappa^{1.5})$, which matches its known lower bound. In addition, for time-varying systems, we prove that the gradient method with optimal step size achieves the fastest provable worst-case convergence rate with quadratic Lyapunov functions. Finally, we further extend our analysis to stochastic games and study the impact of multiplicative noise on different algorithms. We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch (in contrast with the stochastic strongly-convex optimization setting, where such acceleration has been demonstrated). However, we exhibit an algorithm which achieves acceleration with two gradient queries per batch.
85 - Ganzhao Yuan 2021
Difference-of-Convex (DC) minimization, referring to the problem of minimizing the difference of two convex functions, has been found rich applications in statistical learning and studied extensively for decades. However, existing methods are primari ly based on multi-stage convex relaxation, only leading to weak optimality of critical points. This paper proposes a coordinate descent method for minimizing DC functions based on sequential nonconvex approximation. Our approach iteratively solves a nonconvex one-dimensional subproblem globally, and it is guaranteed to converge to a coordinate-wise stationary point. We prove that this new optimality condition is always stronger than the critical point condition and the directional point condition when the objective function is weakly convex. For comparisons, we also include a naive variant of coordinate descent methods based on sequential convex approximation in our study. When the objective function satisfies an additional regularity condition called emph{sharpness}, coordinate descent methods with an appropriate initialization converge emph{linearly} to the optimal solution set. Also, for many applications of interest, we show that the nonconvex one-dimensional subproblem can be computed exactly and efficiently using a breakpoint searching method. We present some discussions and extensions of our proposed method. Finally, we have conducted extensive experiments on several statistical learning tasks to show the superiority of our approach. Keywords: Coordinate Descent, DC Minimization, DC Programming, Difference-of-Convex Programs, Nonconvex Optimization, Sparse Optimization, Binary Optimization.
We consider the minimization of an $L_0$-Lipschitz continuous and expectation-valued function, denoted by $f$ and defined as $f(x)triangleq mathbb{E}[tilde{f}(x,omega)]$, over a Cartesian product of closed and convex sets with a view towards obtainin g both asymptotics as well as rate and complexity guarantees for computing an approximate stationary point (in a Clarke sense). We adopt a smoothing-based approach reliant on minimizing $f_{eta}$ where $f_{eta}(x) triangleq mathbb{E}_{u}[f(x+eta u)]$, $u$ is a random variable defined on a unit sphere, and $eta > 0$. In fact, it is observed that a stationary point of the $eta$-smoothed problem is a $2eta$-stationary point for the original problem in the Clarke sense. In such a setting, we derive a suitable residual function that provides a metric for stationarity for the smoothed problem. By leveraging a zeroth-order framework reliant on utilizing sampled function evaluations implemented in a block-structured regime, we make two sets of contributions for the sequence generated by the proposed scheme. (i) The residual function of the smoothed problem tends to zero almost surely along the generated sequence; (ii) To compute an $x$ that ensures that the expected norm of the residual of the $eta$-smoothed problem is within $epsilon$ requires no greater than $mathcal{O}(tfrac{1}{eta epsilon^2})$ projection steps and $mathcal{O}left(tfrac{1}{eta^2 epsilon^4}right)$ function evaluations. These statements appear to be novel and there appear to be few results to contend with general nonsmooth, nonconvex, and stochastic regimes via zeroth-order approaches.
Image reconstruction in X-ray transmission tomography has been an important research field for decades. In light of data volume increasing faster than processor speeds, one needs accelerated iterative algorithms to solve the optimization problem in t he X-ray CT application. Incremental methods, in which a subset of data is being used at each iteration to accelerate the computations, have been getting more popular lately in the machine learning and mathematical optimization fields. The most popular member of this family of algorithms in the X-ray CT field is ordered-subsets. Even though it performs well in earlier iterations, the lack of convergence in later iterations is a known phenomenon. In this paper, we propose two incremental methods that use Jensen surrogates for the X-ray CT application, one stochastic and one ordered-subsets type. Using measured data, we show that the stochastic variant we propose outperforms other algorithms, including the gradient descent counterparts.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا