No Arabic abstract
The symplectic Stiefel manifold, denoted by $mathrm{Sp}(2p,2n)$, is the set of linear symplectic maps between the standard symplectic spaces $mathbb{R}^{2p}$ and $mathbb{R}^{2n}$. When $p=n$, it reduces to the well-known set of $2ntimes 2n$ symplectic matrices. Optimization problems on $mathrm{Sp}(2p,2n)$ find applications in various areas, such as optics, quantum physics, numerical linear algebra and model order reduction of dynamical systems. The purpose of this paper is to propose and analyze gradient-descent methods on $mathrm{Sp}(2p,2n)$, where the notion of gradient stems from a Riemannian metric. We consider a novel Riemannian metric on $mathrm{Sp}(2p,2n)$ akin to the canonical metric of the (standard) Stiefel manifold. In order to perform a feasible step along the antigradient, we develop two types of search strategies: one is based on quasi-geodesic curves, and the other one on the symplectic Cayley transform. The resulting optimization algorithms are proved to converge globally to critical points of the objective function. Numerical experiments illustrate the efficiency of the proposed methods.
Riemannian optimization has drawn a lot of attention due to its wide applications in practice. Riemannian stochastic first-order algorithms have been studied in the literature to solve large-scale machine learning problems over Riemannian manifolds. However, most of the existing Riemannian stochastic algorithms require the objective function to be differentiable, and they do not apply to the case where the objective function is nonsmooth. In this paper, we present two Riemannian stochastic proximal gradient methods for minimizing nonsmooth function over the Stiefel manifold. The two methods, named R-ProxSGD and R-ProxSPB, are generalizations of proximal SGD and proximal SpiderBoost in Euclidean setting to the Riemannian setting. Analysis on the incremental first-order oracle (IFO) complexity of the proposed algorithms is provided. Specifically, the R-ProxSPB algorithm finds an $epsilon$-stationary point with $mathcal{O}(epsilon^{-3})$ IFOs in the online case, and $mathcal{O}(n+sqrt{n}epsilon^{-3})$ IFOs in the finite-sum case with $n$ being the number of summands in the objective. Experimental results on online sparse PCA and robust low-rank matrix completion show that our proposed methods significantly outperform the existing methods that uses Riemannian subgradient information.
Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b) Achieve faster convergence rates than the baseline SGD and ADAM algorithms without compromising the performance of the CNN. Cayley SGD and Cayley ADAM are also shown to reduce the training time for optimizing the unitary transition matrices in RNNs.
We introduce in this paper a manifold optimization framework that utilizes semi-Riemannian structures on the underlying smooth manifolds. Unlike in Riemannian geometry, where each tangent space is equipped with a positive definite inner product, a semi-Riemannian manifold allows the metric tensor to be indefinite on each tangent space, i.e., possessing both positive and negative definite subspaces; differential geometric objects such as geodesics and parallel-transport can be defined on non-degenerate semi-Riemannian manifolds as well, and can be carefully leveraged to adapt Riemannian optimization algorithms to the semi-Riemannian setting. In particular, we discuss the metric independence of manifold optimization algorithms, and illustrate that the weaker but more general semi-Riemannian geometry often suffices for the purpose of optimizing smooth functions on smooth manifolds in practice.
A graph $mathcal{G}$ is referred to as $mathsf{S}^1$-synchronizing if, roughly speaking, the Kuramoto-like model whose interaction topology is given by $mathcal{G}$ synchronizes almost globally. The Kuramoto model evolves on the unit circle, ie the $1$-sphere $mathsf{S}^1$. This paper concerns generalizations of the Kuramoto-like model and the concept of synchronizing graphs on the Stiefel manifold $mathsf{St}(p,n)$. Previous work on state-space oscillators have largely been influenced by results and techniques that pertain to the $mathsf{S}^1$-case. It has recently been shown that all connected graphs are $mathsf{S}^n$-synchronizing for all $ngeq2$. The previous point of departure may thus have been overly conservative. The $n$-sphere is a special case of the Stiefel manifold, namely $mathsf{St}(1,n+1)$. As such, it is natural to ask for the extent to which the results on $mathsf{S}^{n}$ can be extended to the Stiefel manifold. This paper shows that all connected graphs are $mathsf{St}(p,n)$-synchronizing provided the pair $(p,n)$ satisfies $pleq tfrac{2n}{3}-1$.
We address the problem of computing the smallest symplectic eigenvalues and the corresponding eigenvectors of symmetric positive-definite matrices in the sense of Williamsons theorem. It is formulated as minimizing a trace cost function over the symplectic Stiefel manifold. We first investigate various theoretical aspects of this optimization problem such as characterizing the sets of critical points, saddle points, and global minimizers as well as proving that non-global local minimizers do not exist. Based on our recent results on constructing Riemannian structures on the symplectic Stiefel manifold and the associated optimization algorithms, we then propose solving the symplectic eigenvalue problem in the framework of Riemannian optimization. Moreover, a connection of the sought solution with the eigenvalues of a special class of Hamiltonian matrices is discussed. Numerical examples are presented.