Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform

72 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Jun Li

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Jun Li - Li Fuxin - Sinisa Todorovic

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the performance of the CNN. Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs.

قيم البحث

147 - Bin Gao , Nguyen Thanh Son , P.-A. Absil 2020

The symplectic Stiefel manifold, denoted by $mathrm{Sp}(2p,2n)$, is the set of linear symplectic maps between the standard symplectic spaces $mathbb{R}^{2p}$ and $mathbb{R}^{2n}$. When $p=n$, it reduces to the well-known set of $2ntimes 2n$ symplecti c matrices. Optimization problems on $mathrm{Sp}(2p,2n)$ find applications in various areas, such as optics, quantum physics, numerical linear algebra and model order reduction of dynamical systems. The purpose of this paper is to propose and analyze gradient-descent methods on $mathrm{Sp}(2p,2n)$, where the notion of gradient stems from a Riemannian metric. We consider a novel Riemannian metric on $mathrm{Sp}(2p,2n)$ akin to the canonical metric of the (standard) Stiefel manifold. In order to perform a feasible step along the antigradient, we develop two types of search strategies: one is based on quasi-geodesic curves, and the other one on the symplectic Cayley transform. The resulting optimization algorithms are proved to converge globally to critical points of the objective function. Numerical experiments illustrate the efficiency of the proposed methods.

التحسين والتحكم النظم الديناميكية

Riemannian Stochastic Proximal Gradient Methods for Nonsmooth Optimization over the Stiefel Manifold

87 - Bokun Wang , Shiqian Ma , Lingzhou Xue 2020

Riemannian optimization has drawn a lot of attention due to its wide applications in practice. Riemannian stochastic first-order algorithms have been studied in the literature to solve large-scale machine learning problems over Riemannian manifolds. However, most of the existing Riemannian stochastic algorithms require the objective function to be differentiable, and they do not apply to the case where the objective function is nonsmooth. In this paper, we present two Riemannian stochastic proximal gradient methods for minimizing nonsmooth function over the Stiefel manifold. The two methods, named R-ProxSGD and R-ProxSPB, are generalizations of proximal SGD and proximal SpiderBoost in Euclidean setting to the Riemannian setting. Analysis on the incremental first-order oracle (IFO) complexity of the proposed algorithms is provided. Specifically, the R-ProxSPB algorithm finds an $epsilon$-stationary point with $mathcal{O}(epsilon^{-3})$ IFOs in the online case, and $mathcal{O}(n+sqrt{n}epsilon^{-3})$ IFOs in the finite-sum case with $n$ being the number of summands in the objective. Experimental results on online sparse PCA and robust low-rank matrix completion show that our proposed methods significantly outperform the existing methods that uses Riemannian subgradient information.

التحسين والتحكم التعلم الآلي التعلم الالي

Orthogonalizing Convolutional Layers with the Cayley Transform

130 - Asher Trockman , J. Zico Kolter 2021

Recent work has highlighted several advantages of enforcing orthogonality in the weight layers of deep networks, such as maintaining the stability of activations, preserving gradient norms, and enhancing adversarial robustness by enforcing low Lipsch itz constants. Although numerous methods exist for enforcing the orthogonality of fully-connected layers, those for convolutional layers are more heuristic in nature, often focusing on penalty methods or limited classes of convolutions. In this work, we propose and evaluate an alternative approach to directly parameterize convolutional layers that are constrained to be orthogonal. Specifically, we propose to apply the Cayley transform to a skew-symmetric convolution in the Fourier domain, so that the inverse convolution needed by the Cayley transform can be computed efficiently. We compare our method to previous Lipschitz-constrained and orthogonal convolutional layers and show that it indeed preserves orthogonality to a high degree even for large convolutions. Applied to the problem of certified adversarial robustness, we show that networks incorporating the layer outperform existing deterministic methods for certified defense against $ell_2$-norm-bounded adversaries, while scaling to larger architectures than previously investigated. Code is available at https://github.com/locuslab/orthogonal-convolutions.

التعلم الآلي التعلم الالي

Semi-Riemannian Manifold Optimization

171 - Tingran Gao , Lek-Heng Lim , Ke Ye 2018

We introduce in this paper a manifold optimization framework that utilizes semi-Riemannian structures on the underlying smooth manifolds. Unlike in Riemannian geometry, where each tangent space is equipped with a positive definite inner product, a se mi-Riemannian manifold allows the metric tensor to be indefinite on each tangent space, i.e., possessing both positive and negative definite subspaces; differential geometric objects such as geodesics and parallel-transport can be defined on non-degenerate semi-Riemannian manifolds as well, and can be carefully leveraged to adapt Riemannian optimization algorithms to the semi-Riemannian setting. In particular, we discuss the metric independence of manifold optimization algorithms, and illustrate that the weaker but more general semi-Riemannian geometry often suffices for the purpose of optimizing smooth functions on smooth manifolds in practice.

التحسين والتحكم التحليل العددي

Towards Almost Global Synchronization on the Stiefel Manifold

114 - Johan Markdahl , Johan Thunberg , Jorge Goncalves 2018

A graph $mathcal{G}$ is referred to as $mathsf{S}^1$-synchronizing if, roughly speaking, the Kuramoto-like model whose interaction topology is given by $mathcal{G}$ synchronizes almost globally. The Kuramoto model evolves on the unit circle, ie the $ 1$-sphere $mathsf{S}^1$. This paper concerns generalizations of the Kuramoto-like model and the concept of synchronizing graphs on the Stiefel manifold $mathsf{St}(p,n)$. Previous work on state-space oscillators have largely been influenced by results and techniques that pertain to the $mathsf{S}^1$-case. It has recently been shown that all connected graphs are $mathsf{S}^n$-synchronizing for all $ngeq2$. The previous point of departure may thus have been overly conservative. The $n$-sphere is a special case of the Stiefel manifold, namely $mathsf{St}(1,n+1)$. As such, it is natural to ask for the extent to which the results on $mathsf{S}^{n}$ can be extended to the Stiefel manifold. This paper shows that all connected graphs are $mathsf{St}(p,n)$-synchronizing provided the pair $(p,n)$ satisfies $pleq tfrac{2n}{3}-1$.

التحسين والتحكم