No Arabic abstract
Many applications in computational science require computing the elements of a function of a large matrix. A commonly used approach is based on the the evaluation of the eigenvalue decomposition, a task that, in general, involves a computing time that scales with the cube of the size of the matrix. We present here a method that can be used to evaluate the elements of a function of a positive-definite matrix with a scaling that is linear for sparse matrices and quadratic in the general case. This methodology is based on the properties of the dynamics of a multidimensional harmonic potential coupled with colored noise generalized Langevin equation (GLE) thermostats. This $f-$thermostat (FTH) approach allows us to calculate directly elements of functions of a positive-definite matrix by carefully tailoring the properties of the stochastic dynamics. We demonstrate the scaling and the accuracy of this approach for both dense and sparse problems and compare the results with other established methodologies.
We show that a recently introduced stochastic thermostat [J. Chem. Phys. 126 (2007) 014101] can be considered as a global version of the Langevin thermostat. We compare the global scheme and the local one (Langevin) from a formal point of view and through practical calculations on a model Lennard-Jones liquid. At variance with the local scheme, the global thermostat preserves the dynamical properties for a wide range of coupling parameters, and allows for a faster sampling of the phase-space.
The Riemannian metric on the manifold of positive definite matrices is defined by a kernel function $phi$ in the form $K_D^phi(H,K)=sum_{i,j}phi(lambda_i,lambda_j)^{-1} Tr P_iHP_jK$ when $sum_ilambda_iP_i$ is the spectral decomposition of the foot point $D$ and the Hermitian matrices $H,K$ are tangent vectors. For such kernel metrics the tangent space has an orthogonal decomposition. The pull-back of a kernel metric under a mapping $Dmapsto G(D)$ is a kernel metric as well. Several Riemannian geometries of the literature are particular cases, for example, the Fisher-Rao metric for multivariate Gaussian distributions and the quantum Fisher information. In the paper the case $phi(x,y)=M(x,y)^theta$ is mostly studied when $M(x,y)$ is a mean of the positive numbers $x$ and $y$. There are results about the geodesic curves and geodesic distances. The geometric mean, the logarithmic mean and the root mean are important cases.
With view to applications in stochastic analysis and geometry, we introduce a new correspondence for positive definite kernels (p.d.) $K$ and their associated reproducing kernel Hilbert spaces. With this we establish two kinds of factorizations: (i) Probabilistic: Starting with a positive definite kernel $K$ we analyze associated Gaussian processes $V$. Properties of the Gaussian processes will be derived from certain factorizations of $K$, arising as a covariance kernel of $V$. (ii) Geometric analysis: We discuss families of measure spaces arising as boundaries for $K$. Our results entail an analysis of a partial order on families of p.d. kernels, a duality for operators and frames, optimization, Karhunen--Lo`eve expansions, and factorizations. Applications include a new boundary analysis for the Drury-Arveson kernel, and for certain fractals arising as iterated function systems; and an identification of optimal feature spaces in machine learning models.
Algorithms for simulating complex physical systems or solving difficult optimization problems often resort to an annealing process. Rather than simulating the system at the temperature of interest, an annealing algorithm starts at a temperature that is high enough to ensure ergodicity and gradually decreases it until the destination temperature is reached. This idea is used in popular algorithms such as parallel tempering and simulated annealing. A general problem with annealing methods is that they require a temperature schedule. Choosing well-balanced temperature schedules can be tedious and time-consuming. Imbalanced schedules can have a negative impact on the convergence, runtime and success of annealing algorithms. This article outlines a unifying framework, ensemble annealing, that combines ideas from simulated annealing, histogram reweighting and nested sampling with concepts in thermodynamic control. Ensemble annealing simultaneously simulates a physical system and estimates its density of states. The temperatures are lowered not according to a prefixed schedule but adaptively so as to maintain a constant relative entropy between successive ensembles. After each step on the temperature ladder an estimate of the density of states is updated and a new temperature is chosen. Ensemble annealing is highly practical and broadly applicable. This is illustrated for various systems including Ising, Potts, and protein models.
Representations in the form of Symmetric Positive Definite (SPD) matrices have been popularized in a variety of visual learning applications due to their demonstrated ability to capture rich second-order statistics of visual data. There exist several similarity measures for comparing SPD matrices with documented benefits. However, selecting an appropriate measure for a given problem remains a challenge and in most cases, is the result of a trial-and-error process. In this paper, we propose to learn similarity measures in a data-driven manner. To this end, we capitalize on the alphabeta-log-det divergence, which is a meta-divergence parametrized by scalars alpha and beta, subsuming a wide family of popular information divergences on SPD matrices for distinct and discrete values of these parameters. Our key idea is to cast these parameters in a continuum and learn them from data. We systematically extend this idea to learn vector-valued parameters, thereby increasing the expressiveness of the underlying non-linear measure. We conjoin the divergence learning problem with several standard tasks in machine learning, including supervised discriminative dictionary learning and unsupervised SPD matrix clustering. We present Riemannian gradient descent schemes for optimizing our formulations efficiently, and show the usefulness of our method on eight standard computer vision tasks.