On Riemannian Optimization over Positive Definite Matrices with the Bures-Wasserstein Geometry

71 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Andi Han

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Andi Han - Bamdev Mishra - Pratik Jawanpuria

التحسين والتحكم التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affine-Invariant (AI) geometry for Riemannian optimization on the symmetric positive definite (SPD) matrix manifold. Our study begins with an observation that the BW metric has a linear dependence on SPD matrices in contrast to the quadratic dependence of the AI metric. We build on this to show that the BW metric is a more suitable and robust choice for several Riemannian optimization problems over ill-conditioned SPD matrices. We show that the BW geometry has a non-negative curvature, which further improves convergence rates of algorithms over the non-positively curved AI geometry. Finally, we verify that several popular cost functions, which are known to be geodesic convex under the AI geometry, are also geodesic convex under the BW geometry. Extensive experiments on various applications support our findings.

قيم البحث

187 - F. Hiai , D. Petz 2008

The Riemannian metric on the manifold of positive definite matrices is defined by a kernel function $phi$ in the form $K_D^phi(H,K)=sum_{i,j}phi(lambda_i,lambda_j)^{-1} Tr P_iHP_jK$ when $sum_ilambda_iP_i$ is the spectral decomposition of the foot po int $D$ and the Hermitian matrices $H,K$ are tangent vectors. For such kernel metrics the tangent space has an orthogonal decomposition. The pull-back of a kernel metric under a mapping $Dmapsto G(D)$ is a kernel metric as well. Several Riemannian geometries of the literature are particular cases, for example, the Fisher-Rao metric for multivariate Gaussian distributions and the quantum Fisher information. In the paper the case $phi(x,y)=M(x,y)^theta$ is mostly studied when $M(x,y)$ is a mean of the positive numbers $x$ and $y$. There are results about the geodesic curves and geodesic distances. The geometric mean, the logarithmic mean and the root mean are important cases.

الفيزياء الرياضية تحليل وظيفي الفيزياء الرياضية

Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent

104 - Jason M. Altschuler , Sinho Chewi , Patrik Gerber 2021

We study first-order optimization algorithms for computing the barycenter of Gaussian distributions with respect to the optimal transport metric. Although the objective is geodesically non-convex, Riemannian GD empirically converges rapidly, in fact faster than off-the-shelf methods such as Euclidean GD and SDP solvers. This stands in stark contrast to the best-known theoretical results for Riemannian GD, which depend exponentially on the dimension. In this work, we prove new geodesic convexity results which provide stronger control of the iterates, yielding a dimension-free convergence rate. Our techniques also enable the analysis of two related notions of averaging, the entropically-regularized barycenter and the geometric median, providing the first convergence guarantees for Riemannian GD for these problems.

التحسين والتحكم التعلم الآلي

Riemannian Stochastic Proximal Gradient Methods for Nonsmooth Optimization over the Stiefel Manifold

87 - Bokun Wang , Shiqian Ma , Lingzhou Xue 2020

Riemannian optimization has drawn a lot of attention due to its wide applications in practice. Riemannian stochastic first-order algorithms have been studied in the literature to solve large-scale machine learning problems over Riemannian manifolds. However, most of the existing Riemannian stochastic algorithms require the objective function to be differentiable, and they do not apply to the case where the objective function is nonsmooth. In this paper, we present two Riemannian stochastic proximal gradient methods for minimizing nonsmooth function over the Stiefel manifold. The two methods, named R-ProxSGD and R-ProxSPB, are generalizations of proximal SGD and proximal SpiderBoost in Euclidean setting to the Riemannian setting. Analysis on the incremental first-order oracle (IFO) complexity of the proposed algorithms is provided. Specifically, the R-ProxSPB algorithm finds an $epsilon$-stationary point with $mathcal{O}(epsilon^{-3})$ IFOs in the online case, and $mathcal{O}(n+sqrt{n}epsilon^{-3})$ IFOs in the finite-sum case with $n$ being the number of summands in the objective. Experimental results on online sparse PCA and robust low-rank matrix completion show that our proposed methods significantly outperform the existing methods that uses Riemannian subgradient information.

التحسين والتحكم التعلم الآلي التعلم الالي

Projection-free nonconvex stochastic optimization on Riemannian manifolds

101 - Melanie Weber , Suvrit Sra 2019

We study stochastic projection-free methods for constrained optimization of smooth functions on Riemannian manifolds, i.e., with additional constraints beyond the parameter domain being a manifold. Specifically, we introduce stochastic Riemannian Fra nk-Wolfe methods for nonconvex and geodesically convex problems. We present algorithms for both purely stochastic optimization and finite-sum problems. For the latter, we develop variance-reduced methods, including a Riemannian adaptation of the recently proposed Spider technique. For all settings, we recover convergence rates that are comparable to the best-known rates for their Euclidean counterparts. Finally, we discuss applications to two classic tasks: The computation of the Karcher mean of positive definite matrices and Wasserstein barycenters for multivariate normal distributions. For both tasks, stochastic Fw methods yield state-of-the-art empirical performance.

التحسين والتحكم التعلم الآلي

Wasserstein Distributionally Robust Inverse Multiobjective Optimization

238 - Chaosheng Dong , Bo Zeng 2020

Inverse multiobjective optimization provides a general framework for the unsupervised learning task of inferring parameters of a multiobjective decision making problem (DMP), based on a set of observed decisions from the human expert. However, the pe rformance of this framework relies critically on the availability of an accurate DMP, sufficient decisions of high quality, and a parameter space that contains enough information about the DMP. To hedge against the uncertainties in the hypothetical DMP, the data, and the parameter space, we investigate in this paper the distributionally robust approach for inverse multiobjective optimization. Specifically, we leverage the Wasserstein metric to construct a ball centered at the empirical distribution of these decisions. We then formulate a Wasserstein distributionally robust inverse multiobjective optimization problem (WRO-IMOP) that minimizes a worst-case expected loss function, where the worst case is taken over all distributions in the Wasserstein ball. We show that the excess risk of the WRO-IMOP estimator has a sub-linear convergence rate. Furthermore, we propose the semi-infinite reformulations of the WRO-IMOP and develop a cutting-plane algorithm that converges to an approximate solution in finite iterations. Finally, we demonstrate the effectiveness of our method on both a synthetic multiobjective quadratic program and a real world portfolio optimization problem.

التحسين والتحكم التعلم الآلي التعلم الالي