No Arabic abstract
The Riemannian metric on the manifold of positive definite matrices is defined by a kernel function $phi$ in the form $K_D^phi(H,K)=sum_{i,j}phi(lambda_i,lambda_j)^{-1} Tr P_iHP_jK$ when $sum_ilambda_iP_i$ is the spectral decomposition of the foot point $D$ and the Hermitian matrices $H,K$ are tangent vectors. For such kernel metrics the tangent space has an orthogonal decomposition. The pull-back of a kernel metric under a mapping $Dmapsto G(D)$ is a kernel metric as well. Several Riemannian geometries of the literature are particular cases, for example, the Fisher-Rao metric for multivariate Gaussian distributions and the quantum Fisher information. In the paper the case $phi(x,y)=M(x,y)^theta$ is mostly studied when $M(x,y)$ is a mean of the positive numbers $x$ and $y$. There are results about the geodesic curves and geodesic distances. The geometric mean, the logarithmic mean and the root mean are important cases.
We study metric properties of symmetric divergences on Hermitian positive definite matrices. In particular, we prove that the square root of these divergences is a distance metric. As a corollary we obtain a proof of the metric property for Quantum Jensen-Shannon-(Tsallis) divergences (parameterized by $alphain [0,2]$), which in turn (for $alpha=1$) yields a proof of the metric property of the Quantum Jensen-Shannon divergence that was conjectured by Lamberti emph{et al.} a decade ago (emph{Metric character of the quantum Jensen-Shannon divergence}, Phy. Rev. A, textbf{79}, (2008).) A somewhat more intricate argument also establishes metric properties of Jensen-Renyi divergences (for $alpha in (0,1)$), and outlines a technique that may be of independent interest.
In this paper, we comparatively analyze the Bures-Wasserstein (BW) geometry with the popular Affine-Invariant (AI) geometry for Riemannian optimization on the symmetric positive definite (SPD) matrix manifold. Our study begins with an observation that the BW metric has a linear dependence on SPD matrices in contrast to the quadratic dependence of the AI metric. We build on this to show that the BW metric is a more suitable and robust choice for several Riemannian optimization problems over ill-conditioned SPD matrices. We show that the BW geometry has a non-negative curvature, which further improves convergence rates of algorithms over the non-positively curved AI geometry. Finally, we verify that several popular cost functions, which are known to be geodesic convex under the AI geometry, are also geodesic convex under the BW geometry. Extensive experiments on various applications support our findings.
Symmetric Positive Definite (SPD) matrices are ubiquitous in data analysis under the form of covariance matrices or correlation matrices. Several O(n)-invariant Riemannian metrics were defined on the SPD cone, in particular the kernel metrics introduced by Hiai and Petz. The class of kernel metrics interpolates between many classical O(n)-invariant metrics and it satisfies key results of stability and completeness. However, it does not contain all the classical O(n)-invariant metrics. Therefore in this work, we investigate super-classes of kernel metrics and we study which key results remain true. We also introduce an additional key result called cometric-stability, a crucial property to implement geodesics with a Hamiltonian formulation. Our method to build intermediate embedded classes between O(n)-invariant metrics and kernel metrics is to give a characterization of the whole class of O(n)-invariant metrics on SPD matrices and to specify requirements on metrics one by one until we reach kernel metrics. As a secondary contribution, we synthesize the literature on the main O(n)-invariant metrics, we provide the complete formula of the sectional curvature of the affine-invariant metric and the formula of the geodesic parallel transport between commuting matrices for the Bures-Wasserstein metric.
We study a Riemannian metric on the cone of symmetric positive-definite matrices obtained from the Hessian of the power potential function $(1-det(X)^beta)/beta$. We give explicit expressions for the geodesics and distance function, under suitable conditions. In the scalar case, the geodesic between two positive numbers coincides with a weighted power mean, while for matrices of size at least two it yields a notion of weighted power mean different from the ones given in the literature. As $beta$ tends to zero, the power potential converges to the logarithmic potential, that yields a well-known metric associated with the matrix geometric mean; we show that the geodesic and the distance associated with the power potential converge to the weighted matrix geometric mean and the distance associated with the logarithmic potential, respectively.
Representations in the form of Symmetric Positive Definite (SPD) matrices have been popularized in a variety of visual learning applications due to their demonstrated ability to capture rich second-order statistics of visual data. There exist several similarity measures for comparing SPD matrices with documented benefits. However, selecting an appropriate measure for a given problem remains a challenge and in most cases, is the result of a trial-and-error process. In this paper, we propose to learn similarity measures in a data-driven manner. To this end, we capitalize on the alphabeta-log-det divergence, which is a meta-divergence parametrized by scalars alpha and beta, subsuming a wide family of popular information divergences on SPD matrices for distinct and discrete values of these parameters. Our key idea is to cast these parameters in a continuum and learn them from data. We systematically extend this idea to learn vector-valued parameters, thereby increasing the expressiveness of the underlying non-linear measure. We conjoin the divergence learning problem with several standard tasks in machine learning, including supervised discriminative dictionary learning and unsupervised SPD matrix clustering. We present Riemannian gradient descent schemes for optimizing our formulations efficiently, and show the usefulness of our method on eight standard computer vision tasks.