The Strength of Nesterovs Extrapolation in the Individual Convergence of Nonsmooth Optimization


Abstract in English

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this article, the convergence of individual iterates of projected subgradient (PSG) methods for nonsmooth convex optimization problems is theoretically studied based on Nesterovs extrapolation, which we name individual convergence. We prove that Nesterovs extrapolation has the strength to make the individual convergence of PSG optimal for nonsmooth problems. In light of this consideration, a direct modification of the subgradient evaluation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step toward the open question about stochastic gradient descent (SGD) posed by Shamir. Furthermore, we give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in stochastic settings. Compared with other state-of-the-art nonsmooth methods, the derived algorithms can serve as an alternative to the basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence. Typically, our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems. Several comparison experiments demonstrate that our individual output not only achieves an optimal convergence rate but also guarantees better sparsity than the averaged solution.

Download