A Fast Proximal Point Method for Computing Exact Wasserstein Distance

84 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yujia Xie

تاريخ النشر 2018

مجال البحث الاحصاء الرياضي الهندسة المعلوماتية

والبحث باللغة English

تأليف Yujia Xie - Xiangfeng Wang - Ruijia Wang

التعلم الالي التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Wasserstein distance plays increasingly important roles in machine learning, stochastic programming and image processing. Major efforts have been under way to address its high computational complexity, some leading to approximate or regularized variations such as Sinkhorn distance. However, as we will demonstrate, regularized variations with large regularization parameter will degradate the performance in several important machine learning applications, and small regularization parameter will fail due to numerical stability issues with existing algorithms. We address this challenge by developing an Inexact Proximal point method for exact Optimal Transport problem (IPOT) with the proximal operator approximately evaluated at each iteration using projections to the probability simplex. The algorithm (a) converges to exact Wasserstein distance with theoretical guarantee and robust regularization parameter selection, (b) alleviates numerical stability issue, (c) has similar computational complexity to Sinkhorn, and (d) avoids the shrinking problem when apply to generative models. Furthermore, a new algorithm is proposed based on IPOT to obtain sharper Wasserstein barycenter.

قيم البحث

104 - Weijie Liu , Aryan Mokhtari , Asuman Ozdaglar 2019

In this paper, we focus on solving a class of constrained non-convex non-concave saddle point problems in a decentralized manner by a group of nodes in a network. Specifically, we assume that each node has access to a summand of a global objective fu nction and nodes are allowed to exchange information only with their neighboring nodes. We propose a decentralized variant of the proximal point method for solving this problem. We show that when the objective function is $rho$-weakly convex-weakly concave the iterates converge to approximate stationarity with a rate of $mathcal{O}(1/sqrt{T})$ where the approximation error depends linearly on $sqrt{rho}$. We further show that when the objective function satisfies the Minty VI condition (which generalizes the convex-concave case) we obtain convergence to stationarity with a rate of $mathcal{O}(1/sqrt{T})$. To the best of our knowledge, our proposed method is the first decentralized algorithm with theoretical guarantees for solving a non-convex non-concave decentralized saddle point problem. Our numerical results for training a general adversarial network (GAN) in a decentralized manner match our theoretical guarantees.

التحسين والتحكم التعلم الآلي التعلم الالي

A fast point solver for deep nonlinear function approximators

78 - Laurence Aitchison 2021

Deep kernel processes (DKPs) generalise Bayesian neural networks, but do not require us to represent either features or weights. Instead, at each hidden layer they represent and optimize a flexible kernel. Here, we develop a Newton-like method for DK Ps that converges in around 10 steps, exploiting matrix solvers initially developed in the control theory literature. These are many times faster the usual gradient descent approach. We generalise to arbitrary DKP architectures, by developing kernel backprop, and algorithms for kernel autodiff. While these methods currently are not Bayesian as they give point estimates and scale poorly as they are cubic in the number of datapoints, we hope they will form the basis of a new class of much more efficient approaches to optimizing deep nonlinear function approximators.

التعلم الالي التعلم الآلي

An Alternating Manifold Proximal Gradient Method for Sparse PCA and Sparse CCA

123 - Shixiang Chen , Shiqian Ma , Lingzhou Xue 2019

Sparse principal component analysis (PCA) and sparse canonical correlation analysis (CCA) are two essential techniques from high-dimensional statistics and machine learning for analyzing large-scale data. Both problems can be formulated as an optimiz ation problem with nonsmooth objective and nonconvex constraints. Since non-smoothness and nonconvexity bring numerical difficulties, most algorithms suggested in the literature either solve some relaxations or are heuristic and lack convergence guarantees. In this paper, we propose a new alternating manifold proximal gradient method to solve these two high-dimensional problems and provide a unified convergence analysis. Numerical experiment results are reported to demonstrate the advantages of our algorithm.

التعلم الالي التعلم الآلي التحسين والتحكم

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

229 - Cencheng Shen , Joshua T. Vogelstein 2018

Distance-based tests, also called energy statistics, are leading methods for two-sample and independence tests from the statistics community. Kernel-based tests, developed from kernel mean embeddings, are leading methods for two-sample and independen ce tests from the machine learning community. A fixed-point transformation was previously proposed to connect the distance methods and kernel methods for the population statistics. In this paper, we propose a new bijective transformation between metrics and kernels. It simplifies the fixed-point transformation, inherits similar theoretical properties, allows distance methods to be exactly the same as kernel methods for sample statistics and p-value, and better preserves the data structure upon transformation. Our results further advance the understanding in distance and kernel-based tests, streamline the code base for implementing these tests, and enable a rich literature of distance-based and kernel-based methodologies to directly communicate with each other.

التعلم الالي التعلم الآلي

A Manifold Proximal Linear Method for Sparse Spectral Clustering with Application to Single-Cell RNA Sequencing Data Analysis

104 - Zhongruo Wang , Bingyuan Liu , Shixiang Chen 2020

Spectral clustering is one of the fundamental unsupervised learning methods widely used in data analysis. Sparse spectral clustering (SSC) imposes sparsity to the spectral clustering and it improves the interpretability of the model. This paper consi ders a widely adopted model for SSC, which can be formulated as an optimization problem over the Stiefel manifold with nonsmooth and nonconvex objective. Such an optimization problem is very challenging to solve. Existing methods usually solve its convex relaxation or need to smooth its nonsmooth part using certain smoothing techniques. In this paper, we propose a manifold proximal linear method (ManPL) that solves the original SSC formulation. We also extend the algorithm to solve the multiple-kernel SSC problems, for which an alternating ManPL algorithm is proposed. Convergence and iteration complexity results of the proposed methods are established. We demonstrate the advantage of our proposed methods over existing methods via the single-cell RNA sequencing data analysis.

التعلم الالي التعلم الآلي التحسين والتحكم