Necessary and Sufficient Geometries for Gradient Methods

298 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Daniel L\\'evy

تاريخ النشر 2019

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Daniel Levy - John C. Duchi

التحسين والتحكم نظرية المعلومات التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal. We further provide a converse that shows that when the constraints are not quadratically convex---for example, any $ell_p$-ball for $p < 2$---the methods are far from optimal. Based on this, we can provide concrete recommendations for when one should use adaptive, mirror or stochastic gradient methods.

قيم البحث

118 - Thulasi Tholeti , Sheetal Kalyani 2019

Convergence of the gradient descent algorithm has been attracting renewed interest due to its utility in deep learning applications. Even as multiple variants of gradient descent were proposed, the assumption that the gradient of the objective is Lip schitz continuous remained an integral part of the analysis until recently. In this work, we look at convergence analysis by focusing on a property that we term as concavifiability, instead of Lipschitz continuity of gradients. We show that concavifiability is a necessary and sufficient condition to satisfy the upper quadratic approximation which is key in proving that the objective function decreases after every gradient descent update. We also show that any gradient Lipschitz function satisfies concavifiability. A constant known as the concavifier analogous to the gradient Lipschitz constant is derived which is indicative of the optimal step size. As an application, we demonstrate the utility of finding the concavifier the in convergence of gradient descent through an example inspired by neural networks. We derive bounds on the concavifier to obtain a fixed step size for a single hidden layer ReLU network.

التحسين والتحكم التعلم الآلي

Necessary and Sufficient Conditions for Success of the Nuclear Norm Heuristic for Rank Minimization

141 - Benjamin Recht , Weiyu Xu , Babak Hassibi 2008

Minimizing the rank of a matrix subject to constraints is a challenging problem that arises in many applications in control theory, machine learning, and discrete geometry. This class of optimization problems, known as rank minimization, is NP-HARD, and for most practical problems there are no efficient algorithms that yield exact solutions. A popular heuristic algorithm replaces the rank function with the nuclear norm--equal to the sum of the singular values--of the decision variable. In this paper, we provide a necessary and sufficient condition that quantifies when this heuristic successfully finds the minimum rank solution of a linear constraint set. We additionally provide a probability distribution over instances of the affine rank minimization problem such that instances sampled from this distribution satisfy our conditions for success with overwhelming probability provided the number of constraints is appropriately large. Finally, we give empirical evidence that these probabilistic bounds provide accurate predictions of the heuristics performance in non-asymptotic scenarios.

التحسين والتحكم حساب التعلم الالي

A Flat Triangular Form for Nonlinear Systems with Two Inputs: Necessary and Sufficient Conditions

123 - Hector Bessa Silveira , Paulo Sergio Pereira da Silva , Pierre Rouchon 2013

The present work establishes necessary and sufficient conditions for a nonlinear system with two inputs to be described by a specific triangular form. Except for some regularity conditions, such triangular form is flat. This may lead to the discovery of new flat systems. The proof relies on well-known results for driftless systems with two controls (the chained form) and on geometric tools from exterior differential systems. The paper also illustrates the application of its results on an academic example and on a reduced order model of an induction motor.

التحسين والتحكم

Provable Bregman-divergence based Methods for Nonconvex and Non-Lipschitz Problems

131 - Qiuwei Li , Zhihui Zhu , Gongguo Tang 2019

The (global) Lipschitz smoothness condition is crucial in establishing the convergence theory for most optimization methods. Unfortunately, most machine learning and signal processing problems are not Lipschitz smooth. This motivates us to generalize the concept of Lipschitz smoothness condition to the relative smoothness condition, which is satisfied by any finite-order polynomial objective function. Further, this work develops new Bregman-divergence based algorithms that are guaranteed to converge to a second-order stationary point for any relatively smooth problem. In addition, the proposed optimization methods cover both the proximal alternating minimization and the proximal alternating linearized minimization when we specialize the Bregman divergence to the Euclidian distance. Therefore, this work not only develops guaranteed optimization methods for non-Lipschitz smooth problems but also solves an open problem of showing the second-order convergence guarantees for these alternating minimization methods.

التحسين والتحكم نظرية المعلومات التعلم الآلي

Stochastic Bias-Reduced Gradient Methods

153 - Hilal Asi , Yair Carmon , Arun Jambulapati 2021

We develop a new primitive for stochastic optimization: a low-bias, low-cost estimator of the minimizer $x_star$ of any Lipschitz strongly-convex function. In particular, we use a multilevel Monte-Carlo approach due to Blanchet and Glynn to turn any optimal stochastic gradient method into an estimator of $x_star$ with bias $delta$, variance $O(log(1/delta))$, and an expected sampling cost of $O(log(1/delta))$ stochastic gradient evaluations. As an immediate consequence, we obtain cheap and nearly unbiased gradient estimators for the Moreau-Yoshida envelope of any Lipschitz convex function, allowing us to perform dimension-free randomized smoothing. We demonstrate the potential of our estimator through four applications. First, we develop a method for minimizing the maximum of $N$ functions, improving on recent results and matching a lower bound up logarithmic factors. Second and third, we recover state-of-the-art rates for projection-efficient and gradient-efficient optimization using simple algorithms with a transparent analysis. Finally, we show that an improved version of our estimator would yield a nearly linear-time, optimal-utility, differentially-private non-smooth stochastic optimization method.

التحسين والتحكم بنى وهياكل البيانات والخوارزميات التعلم الآلي