ترغب بنشر مسار تعليمي؟ اضغط هنا

Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform

72   0   0.0 ( 0 )
 نشر من قبل Jun Li
 تاريخ النشر 2020
والبحث باللغة English




اسأل ChatGPT حول البحث

Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the performance of the CNN. Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs.



قيم البحث

اقرأ أيضاً

The symplectic Stiefel manifold, denoted by $mathrm{Sp}(2p,2n)$, is the set of linear symplectic maps between the standard symplectic spaces $mathbb{R}^{2p}$ and $mathbb{R}^{2n}$. When $p=n$, it reduces to the well-known set of $2ntimes 2n$ symplecti c matrices. Optimization problems on $mathrm{Sp}(2p,2n)$ find applications in various areas, such as optics, quantum physics, numerical linear algebra and model order reduction of dynamical systems. The purpose of this paper is to propose and analyze gradient-descent methods on $mathrm{Sp}(2p,2n)$, where the notion of gradient stems from a Riemannian metric. We consider a novel Riemannian metric on $mathrm{Sp}(2p,2n)$ akin to the canonical metric of the (standard) Stiefel manifold. In order to perform a feasible step along the antigradient, we develop two types of search strategies: one is based on quasi-geodesic curves, and the other one on the symplectic Cayley transform. The resulting optimization algorithms are proved to converge globally to critical points of the objective function. Numerical experiments illustrate the efficiency of the proposed methods.
Riemannian optimization has drawn a lot of attention due to its wide applications in practice. Riemannian stochastic first-order algorithms have been studied in the literature to solve large-scale machine learning problems over Riemannian manifolds. However, most of the existing Riemannian stochastic algorithms require the objective function to be differentiable, and they do not apply to the case where the objective function is nonsmooth. In this paper, we present two Riemannian stochastic proximal gradient methods for minimizing nonsmooth function over the Stiefel manifold. The two methods, named R-ProxSGD and R-ProxSPB, are generalizations of proximal SGD and proximal SpiderBoost in Euclidean setting to the Riemannian setting. Analysis on the incremental first-order oracle (IFO) complexity of the proposed algorithms is provided. Specifically, the R-ProxSPB algorithm finds an $epsilon$-stationary point with $mathcal{O}(epsilon^{-3})$ IFOs in the online case, and $mathcal{O}(n+sqrt{n}epsilon^{-3})$ IFOs in the finite-sum case with $n$ being the number of summands in the objective. Experimental results on online sparse PCA and robust low-rank matrix completion show that our proposed methods significantly outperform the existing methods that uses Riemannian subgradient information.
Recent work has highlighted several advantages of enforcing orthogonality in the weight layers of deep networks, such as maintaining the stability of activations, preserving gradient norms, and enhancing adversarial robustness by enforcing low Lipsch itz constants. Although numerous methods exist for enforcing the orthogonality of fully-connected layers, those for convolutional layers are more heuristic in nature, often focusing on penalty methods or limited classes of convolutions. In this work, we propose and evaluate an alternative approach to directly parameterize convolutional layers that are constrained to be orthogonal. Specifically, we propose to apply the Cayley transform to a skew-symmetric convolution in the Fourier domain, so that the inverse convolution needed by the Cayley transform can be computed efficiently. We compare our method to previous Lipschitz-constrained and orthogonal convolutional layers and show that it indeed preserves orthogonality to a high degree even for large convolutions. Applied to the problem of certified adversarial robustness, we show that networks incorporating the layer outperform existing deterministic methods for certified defense against $ell_2$-norm-bounded adversaries, while scaling to larger architectures than previously investigated. Code is available at https://github.com/locuslab/orthogonal-convolutions.
171 - Tingran Gao , Lek-Heng Lim , Ke Ye 2018
We introduce in this paper a manifold optimization framework that utilizes semi-Riemannian structures on the underlying smooth manifolds. Unlike in Riemannian geometry, where each tangent space is equipped with a positive definite inner product, a se mi-Riemannian manifold allows the metric tensor to be indefinite on each tangent space, i.e., possessing both positive and negative definite subspaces; differential geometric objects such as geodesics and parallel-transport can be defined on non-degenerate semi-Riemannian manifolds as well, and can be carefully leveraged to adapt Riemannian optimization algorithms to the semi-Riemannian setting. In particular, we discuss the metric independence of manifold optimization algorithms, and illustrate that the weaker but more general semi-Riemannian geometry often suffices for the purpose of optimizing smooth functions on smooth manifolds in practice.
A graph $mathcal{G}$ is referred to as $mathsf{S}^1$-synchronizing if, roughly speaking, the Kuramoto-like model whose interaction topology is given by $mathcal{G}$ synchronizes almost globally. The Kuramoto model evolves on the unit circle, ie the $ 1$-sphere $mathsf{S}^1$. This paper concerns generalizations of the Kuramoto-like model and the concept of synchronizing graphs on the Stiefel manifold $mathsf{St}(p,n)$. Previous work on state-space oscillators have largely been influenced by results and techniques that pertain to the $mathsf{S}^1$-case. It has recently been shown that all connected graphs are $mathsf{S}^n$-synchronizing for all $ngeq2$. The previous point of departure may thus have been overly conservative. The $n$-sphere is a special case of the Stiefel manifold, namely $mathsf{St}(1,n+1)$. As such, it is natural to ask for the extent to which the results on $mathsf{S}^{n}$ can be extended to the Stiefel manifold. This paper shows that all connected graphs are $mathsf{St}(p,n)$-synchronizing provided the pair $(p,n)$ satisfies $pleq tfrac{2n}{3}-1$.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا