Do you want to publish a course? Click here

A Fast Proximal Point Method for Computing Exact Wasserstein Distance

84   0   0.0 ( 0 )
 Added by Yujia Xie
 Publication date 2018
and research's language is English




Ask ChatGPT about the research

Wasserstein distance plays increasingly important roles in machine learning, stochastic programming and image processing. Major efforts have been under way to address its high computational complexity, some leading to approximate or regularized variations such as Sinkhorn distance. However, as we will demonstrate, regularized variations with large regularization parameter will degradate the performance in several important machine learning applications, and small regularization parameter will fail due to numerical stability issues with existing algorithms. We address this challenge by developing an Inexact Proximal point method for exact Optimal Transport problem (IPOT) with the proximal operator approximately evaluated at each iteration using projections to the probability simplex. The algorithm (a) converges to exact Wasserstein distance with theoretical guarantee and robust regularization parameter selection, (b) alleviates numerical stability issue, (c) has similar computational complexity to Sinkhorn, and (d) avoids the shrinking problem when apply to generative models. Furthermore, a new algorithm is proposed based on IPOT to obtain sharper Wasserstein barycenter.



rate research

Read More

In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective function and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point method for solving this problem. We show that when the objective function is $rho$-weakly convex-weakly concave the iterates converge to approximate stationarity with a rate of $mathcal{O}(1/sqrt{T})$ where the approximation error depends linearly on $sqrt{rho}$. We further show that when the objective function satisfies the Minty VI condition (which generalizes the convex-concave case) we obtain convergence to stationarity with a rate of $mathcal{O}(1/sqrt{T})$. To the best of our knowledge, our proposed method is the first decentralized algorithm with theoretical guarantees for solving a non-convex non-concave decentralized saddle point problem. Our numerical results for training a general adversarial network (GAN) in a decentralized manner match our theoretical guarantees.
78 - Laurence Aitchison 2021
Deep kernel processes (DKPs) generalise Bayesian neural networks, but do not require us to represent either features or weights. Instead, at each hidden layer they represent and optimize a flexible kernel. Here, we develop a Newton-like method for DKPs that converges in around 10 steps, exploiting matrix solvers initially developed in the control theory literature. These are many times faster the usual gradient descent approach. We generalise to arbitrary DKP architectures, by developing kernel backprop, and algorithms for kernel autodiff. While these methods currently are not Bayesian as they give point estimates and scale poorly as they are cubic in the number of datapoints, we hope they will form the basis of a new class of much more efficient approaches to optimizing deep nonlinear function approximators.
Sparse principal component analysis (PCA) and sparse canonical correlation analysis (CCA) are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimization problem with nonsmooth objective and nonconvex constraints. Since non-smoothness and nonconvexity bring numerical difficulties, most algorithms suggested in the literature either solve some relaxations or are heuristic and lack convergence guarantees. In this paper, we propose a new alternating manifold proximal gradient method to solve these two high-dimensional problems and provide a unified convergence analysis. Numerical experiment results are reported to demonstrate the advantages of our algorithm.
Distance-based tests, also called energy statistics, are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from kernel mean embeddings, are leading methods for two-sample and independence tests from the machine learning community. A fixed-point transformation was previously proposed to connect the distance methods and kernel methods for the population statistics. In this paper, we propose a new bijective transformation between metrics and kernels. It simplifies the fixed-point transformation, inherits similar theoretical properties, allows distance methods to be exactly the same as kernel methods for sample statistics and p-value, and better preserves the data structure upon transformation. Our results further advance the understanding in distance and kernel-based tests, streamline the code base for implementing these tests, and enable a rich literature of distance-based and kernel-based methodologies to directly communicate with each other.
Spectral clustering is one of the fundamental unsupervised learning methods widely used in data analysis. Sparse spectral clustering (SSC) imposes sparsity to the spectral clustering and it improves the interpretability of the model. This paper considers a widely adopted model for SSC, which can be formulated as an optimization problem over the Stiefel manifold with nonsmooth and nonconvex objective. Such an optimization problem is very challenging to solve. Existing methods usually solve its convex relaxation or need to smooth its nonsmooth part using certain smoothing techniques. In this paper, we propose a manifold proximal linear method (ManPL) that solves the original SSC formulation. We also extend the algorithm to solve the multiple-kernel SSC problems, for which an alternating ManPL algorithm is proposed. Convergence and iteration complexity results of the proposed methods are established. We demonstrate the advantage of our proposed methods over existing methods via the single-cell RNA sequencing data analysis.

suggested questions

comments
Fetching comments Fetching comments
Sign in to be able to follow your search criteria
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا