Convergence of Gaussian-smoothed optimal transport distance with sub-gamma distributions and dependent samples

63 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Yixing Zhang

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية الاحصاء الرياضي

والبحث باللغة English

تأليف Yixing Zhang - Xiuyuan Cheng - Galen Reeves

التعلم الآلي التعلم الالي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The Gaussian-smoothed optimal transport (GOT) framework, recently proposed by Goldfeld et al., scales to high dimensions in estimation and provides an alternative to entropy regularization. This paper provides convergence guarantees for estimating the GOT distance under more general settings. For the Gaussian-smoothed $p$-Wasserstein distance in $d$ dimensions, our results require only the existence of a moment greater than $d + 2p$. For the special case of sub-gamma distributions, we quantify the dependence on the dimension $d$ and establish a phase transition with respect to the scale parameter. We also prove convergence for dependent samples, only requiring a condition on the pairwise dependence of the samples measured by the covariance of the feature map of a kernel space. A key step in our analysis is to show that the GOT distance is dominated by a family of kernel maximum mean discrepancy (MMD) distances with a kernel that depends on the cost function as well as the amount of Gaussian smoothing. This insight provides further interpretability for the GOT framework and also introduces a class of kernel MMD distances with desirable properties. The theoretical results are supported by numerical experiments.

قيم البحث

182 - Alon Brutzkus , Amit Daniely , Eran Malach 2019

In recent years, there are many attempts to understand popular heuristics. An example of such a heuristic algorithm is the ID3 algorithm for learning decision trees. This algorithm is commonly used in practice, but there are very few theoretical work s studying its behavior. In this paper, we analyze the ID3 algorithm, when the target function is a $k$-Junta, a function that depends on $k$ out of $n$ variables of the input. We prove that when $k = log n$, the ID3 algorithm learns in polynomial time $k$-Juntas, in the smoothed analysis model of Kalai & Teng. That is, we show a learnability result when the observed distribution is a noisy variant of the original distribution.

التعلم الآلي التعلم الالي

Differentiable Top-k Operator with Optimal Transport

83 - Yujia Xie , Hanjun Dai , Minshuo Chen 2020

The top-k operation, i.e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is i mplemented in an algorithmic way, e.g., using bubble algorithm, the resulting model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely the SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator. Specifically, our SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT operator can then be efficiently approximated based on the optimality conditions of EOT problem. We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance.

التعلم الآلي التعلم الالي

Training Generative Networks with general Optimal Transport distances

173 - Vaios Laschos , Jan Tinapp , Klaus Obermayer 2019

We propose a new algorithm that uses an auxiliary neural network to express the potential of the optimal transport map between two data distributions. In the sequel, we use the aforementioned map to train generative networks. Unlike WGANs, where the Euclidean distance is ${it implicitly}$ used, this new method allows to ${it explicitly}$ use ${it any}$ transportation cost function that can be chosen to match the problem at hand. For example, it allows to use the squared distance as a transportation cost function, giving rise to the Wasserstein-2 metric for probability distributions, which results in fast and stable gradient descends. It also allows to use image centered distances, like the structure similarity index, with notable differences in the results.

التعلم الآلي التعلم الالي

Adversarial Computation of Optimal Transport Maps

443 - Jacob Leygonie , Jennifer She , Amjad Almahairi 2019

Computing optimal transport maps between high-dimensional and continuous distributions is a challenging problem in optimal transport (OT). Generative adversarial networks (GANs) are powerful generative models which have been successfully applied to l earn maps across high-dimensional domains. However, little is known about the nature of the map learned with a GAN objective. To address this problem, we propose a generative adversarial model in which the discriminators objective is the $2$-Wasserstein metric. We show that during training, our generator follows the $W_2$-geodesic between the initial and the target distributions. As a consequence, it reproduces an optimal map at the end of training. We validate our approach empirically in both low-dimensional and high-dimensional continuous settings, and show that it outperforms prior methods on image data.

التعلم الآلي التعلم الالي

Representation Transfer by Optimal Transport

156 - Xuhong Li , Yves Grandvalet , Remi Flamary 2020

Learning generic representations with deep networks requires massive training samples and significant computer resources. To learn a new specific task, an important issue is to transfer the generic teachers representation to a student network. In thi s paper, we propose to use a metric between representations that is based on a functional view of neurons. We use optimal transport to quantify the match between two representations, yielding a distance that embeds some invariances inherent to the representation of deep networks. This distance defines a regularizer promoting the similarity of the students representation with that of the teacher. Our approach can be used in any learning context where representation transfer is applicable. We experiment here on two standard settings: inductive transfer learning, where the teachers representation is transferred to a student network of same architecture for a new related task, and knowledge distillation, where the teachers representation is transferred to a student of simpler architecture for the same task (model compression). Our approach also lends itself to solving new learning problems; we demonstrate this by showing how to directly transfer the teachers representation to a simpler architecture student for a new related task.

التعلم الآلي التعلم الالي