No Arabic abstract
We show that Holder continuity of the gradient is not only a sufficient condition, but also a necessary condition for the existence of a global upper bound on the error of the first-order Taylor approximation. We also relate this global upper bound to the Holder constant of the gradient. This relation is expressed as an interval, depending on the Holder constant, in which the error of the first-order Taylor approximation is guaranteed to be. We show that, for the Lipschitz continuous case, the interval cannot be reduced. An application to the norms of quadratic forms is proposed, which allows us to derive a novel characterization of Euclidean norms.
Various methods can obtain certified estimates for roots of polynomials. Many applications in science and engineering additionally utilize the value of functions evaluated at roots. For example, critical values are obtained by evaluating an objective function at critical points. For analytic evaluation functions, Newtons method naturally applies to yield certified estimates. These estimates no longer apply, however, for Holder continuous functions, which are a generalization of Lipschitz continuous functions where continuous derivatives need not exist. This work develops and analyzes an alternative approach for certified estimates of evaluating locally Holder continuous functions at roots of polynomials. An implementation of the method in Maple demonstrates efficacy and efficiency.
We consider the minimization of an $L_0$-Lipschitz continuous and expectation-valued function, denoted by $f$ and defined as $f(x)triangleq mathbb{E}[tilde{f}(x,omega)]$, over a Cartesian product of closed and convex sets with a view towards obtaining both asymptotics as well as rate and complexity guarantees for computing an approximate stationary point (in a Clarke sense). We adopt a smoothing-based approach reliant on minimizing $f_{eta}$ where $f_{eta}(x) triangleq mathbb{E}_{u}[f(x+eta u)]$, $u$ is a random variable defined on a unit sphere, and $eta > 0$. In fact, it is observed that a stationary point of the $eta$-smoothed problem is a $2eta$-stationary point for the original problem in the Clarke sense. In such a setting, we derive a suitable residual function that provides a metric for stationarity for the smoothed problem. By leveraging a zeroth-order framework reliant on utilizing sampled function evaluations implemented in a block-structured regime, we make two sets of contributions for the sequence generated by the proposed scheme. (i) The residual function of the smoothed problem tends to zero almost surely along the generated sequence; (ii) To compute an $x$ that ensures that the expected norm of the residual of the $eta$-smoothed problem is within $epsilon$ requires no greater than $mathcal{O}(tfrac{1}{eta epsilon^2})$ projection steps and $mathcal{O}left(tfrac{1}{eta^2 epsilon^4}right)$ function evaluations. These statements appear to be novel and there appear to be few results to contend with general nonsmooth, nonconvex, and stochastic regimes via zeroth-order approaches.
While Generative Adversarial Networks (GANs) have demonstrated promising performance on multiple vision tasks, their learning dynamics are not yet well understood, both in theory and in practice. To address this issue, we study GAN dynamics in a simple yet rich parametric model that exhibits several of the common problematic convergence behaviors such as vanishing gradients, mode collapse, and diverging or oscillatory behavior. In spite of the non-convex nature of our model, we are able to perform a rigorous theoretical analysis of its convergence behavior. Our analysis reveals an interesting dichotomy: a GAN with an optimal discriminator provably converges, while first order approximations of the discriminator steps lead to unstable GAN dynamics and mode collapse. Our result suggests that using first order discriminator steps (the de-facto standard in most existing GAN setups) might be one of the factors that makes GAN training challenging in practice.
In recent years, the success of deep learning has inspired many researchers to study the optimization of general smooth non-convex functions. However, recent works have established pessimistic worst-case complexities for this class functions, which is in stark contrast with their superior performance in real-world applications (e.g. training deep neural networks). On the other hand, it is found that many popular non-convex optimization problems enjoy certain structured properties which bear some similarities to convexity. In this paper, we study the class of textit{quasar-convex functions} to close the gap between theory and practice. We study the convergence of first order methods in a variety of different settings and under different optimality criterions. We prove complexity upper bounds that are similar to standard results established for convex functions and much better that state-of-the-art convergence rates of non-convex functions. Overall, this paper suggests that textit{quasar-convexity} allows efficient optimization procedures, and we are looking forward to seeing more problems that demonstrate similar properties in practice.
We present a unified convergence analysis for first order convex optimization methods using the concept of strong Lyapunov conditions. Combining this with suitable time scaling factors, we are able to handle both convex and strong convex cases, and establish contraction properties of Lyapunov functions for many existing ordinary differential equation models. Then we derive prevailing first order optimization algorithms, such as proximal gradient methods, heavy ball methods (also known as momentum methods), Nesterov accelerated gradient methods, and accelerated proximal gradient methods from numerical discretizations of corresponding dynamical systems. We also apply strong Lyapunov conditions to the discrete level and provide a more systematical analysis framework. Another contribution is a novel second order dynamical system called Hessian-driven Nesterov accelerated gradient flow which can be used to design and analyze accelerated first order methods for smooth and non-smooth convex optimizations.