Do you want to publish a course? Click here

A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization

299   0   0.0 ( 0 )
 Added by Cedric Josz
 Publication date 2018
  fields
and research's language is English




Ask ChatGPT about the research

We study the set of continuous functions that admit no spurious local optima (i.e. local minima that are not global minima) which we term textit{global functions}. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Global functions are also endowed with useful properties regarding the composition of functions and change of variables. Using these new results, we show that a class of nonconvex and nonsmooth optimization problems arising in tensor decomposition applications are global functions. This is the first result concerning nonconvex methods for nonsmooth objective functions. Our result provides a theoretical guarantee for the widely-used $ell_1$ norm to avoid outliers in nonconvex optimization.



rate research

Read More

In this work, we present a globalized stochastic semismooth Newton method for solving stochastic optimization problems involving smooth nonconvex and nonsmooth convex terms in the objective function. We assume that only noisy gradient and Hessian information of the smooth part of the objective function is available via calling stochastic first and second order oracles. The proposed method can be seen as a hybrid approach combining stochastic semismooth Newton steps and stochastic proximal gradient steps. Two inexact growth conditions are incorporated to monitor the convergence and the acceptance of the semismooth Newton steps and it is shown that the algorithm converges globally to stationary points in expectation. Moreover, under standard assumptions and utilizing random matrix concentration inequalities, we prove that the proposed approach locally turns into a pure stochastic semismooth Newton method and converges r-superlinearly with high probability. We present numerical results and comparisons on $ell_1$-regularized logistic regression and nonconvex binary classification that demonstrate the efficiency of our algorithm.
159 - Siqi Zhang , Niao He 2018
In this paper, we investigate the non-asymptotic stationary convergence behavior of Stochastic Mirror Descent (SMD) for nonconvex optimization. We focus on a general class of nonconvex nonsmooth stochastic optimization problems, in which the objective can be decomposed into a relatively weakly convex function (possibly non-Lipschitz) and a simple non-smooth convex regularizer. We prove that SMD, without the use of mini-batch, is guaranteed to converge to a stationary point in a convergence rate of $ mathcal{O}(1/sqrt{t}) $. The efficiency estimate matches with existing results for stochastic subgradient method, but is evaluated under a stronger stationarity measure. Our convergence analysis applies to both the original SMD and its proximal version, as well as the deterministic variants, for solving relatively weakly convex problems.
While many distributed optimization algorithms have been proposed for solving smooth or convex problems over the networks, few of them can handle non-convex and non-smooth problems. Based on a proximal primal-dual approach, this paper presents a new (stochastic) distributed algorithm with Nesterov momentum for accelerated optimization of non-convex and non-smooth problems. Theoretically, we show that the proposed algorithm can achieve an $epsilon$-stationary solution under a constant step size with $mathcal{O}(1/epsilon^2)$ computation complexity and $mathcal{O}(1/epsilon)$ communication complexity. When compared to the existing gradient tracking based methods, the proposed algorithm has the same order of computation complexity but lower order of communication complexity. To the best of our knowledge, the presented result is the first stochastic algorithm with the $mathcal{O}(1/epsilon)$ communication complexity for non-convex and non-smooth problems. Numerical experiments for a distributed non-convex regression problem and a deep neural network based classification problem are presented to illustrate the effectiveness of the proposed algorithms.
140 - Yuhao Ding , Javad Lavaei , 2019
A major limitation of online algorithms that track the optimizers of time-varying nonconvex optimization problems is that they focus on a specific local minimum trajectory, which may lead to poor spurious local solutions. In this paper, we show that the natural temporal variation may help simple online tracking methods find and track time-varying global minima. To this end, we investigate the properties of a time-varying projected gradient flow system with inertia, which can be regarded as the continuous-time limit of (1) the optimality conditions for a discretized sequential optimization problem with a proximal regularization and (2) the online tracking scheme. We introduce the notion of the dominant trajectory and show that the inherent temporal variation could re-shape the landscape of the Lagrange functional and help a proximal algorithm escape the spurious local minimum trajectories if the global minimum trajectory is dominant. For a problem with twice continuously differentiable objective function and constraints, sufficient conditions are derived to guarantee that no matter how a local search method is initialized, it will track a time-varying global solution after some time. The results are illustrated on a benchmark example with many local minima.
We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an $epsilon$-stationary point with first-order methods is impossible in finite time. We then introduce the notion of $(delta, epsilon)$-stationarity, which allows for an $epsilon$-approximate gradient to be the convex combination of generalized gradients evaluated at points within distance $delta$ to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a $(delta, epsilon)$-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on $delta$. Empirically, our methods perform well for training ReLU neural networks.
comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا