Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Convergence of Gaussian-smoothed optimal transport distance with sub-gamma distributions and dependent samples

63 0 0.0 ( 0 )

Download Cite

Added by Yixing Zhang

Publication date 2021

fields Informatics Engineering Mathematical Statistics

and research's language is English

Authors Yixing Zhang - Xiuyuan Cheng - Galen Reeves

Machine Learning Machine Learning

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The Gaussian-smoothed optimal transport (GOT) framework, recently proposed by Goldfeld et al., scales to high dimensions in estimation and provides an alternative to entropy regularization. This paper provides convergence guarantees for estimating the GOT distance under more general settings. For the Gaussian-smoothed $p$-Wasserstein distance in $d$ dimensions, our results require only the existence of a moment greater than $d + 2p$. For the special case of sub-gamma distributions, we quantify the dependence on the dimension $d$ and establish a phase transition with respect to the scale parameter. We also prove convergence for dependent samples, only requiring a condition on the pairwise dependence of the samples measured by the covariance of the feature map of a kernel space. A key step in our analysis is to show that the GOT distance is dominated by a family of kernel maximum mean discrepancy (MMD) distances with a kernel that depends on the cost function as well as the amount of Gaussian smoothing. This insight provides further interpretability for the GOT framework and also introduces a class of kernel MMD distances with desirable properties. The theoretical results are supported by numerical experiments.

rate research

ID3 Learns Juntas for Smoothed Product Distributions

182 - Alon Brutzkus , Amit Daniely , Eran Malach 2019

In recent years, there are many attempts to understand popular heuristics. An example of such a heuristic algorithm is the ID3 algorithm for learning decision trees. This algorithm is commonly used in practice, but there are very few theoretical works studying its behavior. In this paper, we analyze the ID3 algorithm, when the target function is a $k$-Junta, a function that depends on $k$ out of $n$ variables of the input. We prove that when $k = log n$, the ID3 algorithm learns in polynomial time $k$-Juntas, in the smoothed analysis model of Kalai & Teng. That is, we show a learnability result when the observed distribution is a noisy variant of the original distribution.

Machine Learning Machine Learning

Differentiable Top-k Operator with Optimal Transport

83 - Yujia Xie , Hanjun Dai , Minshuo Chen 2020

The top-k operation, i.e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulting model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely the SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator. Specifically, our SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT operator can then be efficiently approximated based on the optimality conditions of EOT problem. We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance.

Machine Learning Machine Learning

Training Generative Networks with general Optimal Transport distances

173 - Vaios Laschos , Jan Tinapp , Klaus Obermayer 2019

We propose a new algorithm that uses an auxiliary neural network to express the potential of the optimal transport map between two data distributions. In the sequel, we use the aforementioned map to train generative networks. Unlike WGANs, where the Euclidean distance is ${it implicitly}$ used, this new method allows to ${it explicitly}$ use ${it any}$ transportation cost function that can be chosen to match the problem at hand. For example, it allows to use the squared distance as a transportation cost function, giving rise to the Wasserstein-2 metric for probability distributions, which results in fast and stable gradient descends. It also allows to use image centered distances, like the structure similarity index, with notable differences in the results.

Machine Learning Machine Learning

Adversarial Computation of Optimal Transport Maps

443 - Jacob Leygonie , Jennifer She , Amjad Almahairi 2019

Computing optimal transport maps between high-dimensional and continuous distributions is a challenging problem in optimal transport (OT). Generative adversarial networks (GANs) are powerful generative models which have been successfully applied to learn maps across high-dimensional domains. However, little is known about the nature of the map learned with a GAN objective. To address this problem, we propose a generative adversarial model in which the discriminators objective is the $2$-Wasserstein metric. We show that during training, our generator follows the $W_2$-geodesic between the initial and the target distributions. As a consequence, it reproduces an optimal map at the end of training. We validate our approach empirically in both low-dimensional and high-dimensional continuous settings, and show that it outperforms prior methods on image data.

Machine Learning Machine Learning

Representation Transfer by Optimal Transport

156 - Xuhong Li , Yves Grandvalet , Remi Flamary 2020

Learning generic representations with deep networks requires massive training samples and significant computer resources. To learn a new specific task, an important issue is to transfer the generic teachers representation to a student network. In this paper, we propose to use a metric between representations that is based on a functional view of neurons. We use optimal transport to quantify the match between two representations, yielding a distance that embeds some invariances inherent to the representation of deep networks. This distance defines a regularizer promoting the similarity of the students representation with that of the teacher. Our approach can be used in any learning context where representation transfer is applicable. We experiment here on two standard settings: inductive transfer learning, where the teachers representation is transferred to a student network of same architecture for a new related task, and knowledge distillation, where the teachers representation is transferred to a student of simpler architecture for the same task (model compression). Our approach also lends itself to solving new learning problems; we demonstrate this by showing how to directly transfer the teachers representation to a simpler architecture student for a new related task.

Machine Learning Machine Learning

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

Convergence of Gaussian-smoothed optimal transport distance with sub-gamma distributions and dependent samples

Ask ChatGPT about the research

No Arabic abstract

Read More

suggested questions