No Arabic abstract
The objective in statistical Optimal Transport (OT) is to consistently estimate the optimal transport plan/map solely using samples from the given source and target marginal distributions. This work takes the novel approach of posing statistical OT as that of learning the transport plans kernel mean embedding from sample based estimates of marginal embeddings. The proposed estimator controls overfitting by employing maximum mean discrepancy based regularization, which is complementary to $phi$-divergence (entropy) based regularization popularly employed in existing estimators. A key result is that, under very mild conditions, $epsilon$-optimal recovery of the transport plan as well as the Barycentric-projection based transport map is possible with a sample complexity that is completely dimension-free. Moreover, the implicit smoothing in the kernel mean embeddings enables out-of-sample estimation. An appropriate representer theorem is proved leading to a kernelized convex formulation for the estimator, which can then be potentially used to perform OT even in non-standard domains. Empirical results illustrate the efficacy of the proposed approach.
Inverse optimal transport (OT) refers to the problem of learning the cost function for OT from observed transport plan or its samples. In this paper, we derive an unconstrained convex optimization formulation of the inverse OT problem, which can be further augmented by any customizable regularization. We provide a comprehensive characterization of the properties of inverse OT, including uniqueness of solutions. We also develop two numerical algorithms, one is a fast matrix scaling method based on the Sinkhorn-Knopp algorithm for discrete OT, and the other one is a learning based algorithm that parameterizes the cost function as a deep neural network for continuous OT. The novel framework proposed in the work avoids repeatedly solving a forward OT in each iteration which has been a thorny computational bottleneck for the bi-level optimization in existing inverse OT approaches. Numerical results demonstrate promising efficiency and accuracy advantages of the proposed algorithms over existing state-of-the-art methods.
We address the problem of learning on sets of features, motivated by the need of performing pooling operations in long biological sequences of varying sizes, with long-range dependencies, and possibly few labeled data. To address this challenging task, we introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost. Our aggregation technique admits two useful interpretations: it may be seen as a mechanism related to attention layers in neural networks, or it may be seen as a scalable surrogate of a classical optimal transport-based kernel. We experimentally demonstrate the effectiveness of our approach on biological sequences, achieving state-of-the-art results for protein fold recognition and detection of chromatin profiles tasks, and, as a proof of concept, we show promising results for processing natural language sequences. We provide an open-source implementation of our embedding that can be used alone or as a module in larger learning models at https://github.com/claying/OTK.
Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has been developed and successfully applied in various models, such as low-rank representation, sparse subspace learning, semi-supervised learning. However, it just tries to reconstruct the original data and some valuable information, e.g., the manifold structure, is largely ignored. In this paper, we argue that it is beneficial to preserve the overall relations when we extract similarity information. Specifically, we propose a novel similarity learning framework by minimizing the reconstruction error of kernel matrices, rather than the reconstruction error of original data adopted by existing work. Taking the clustering task as an example to evaluate our method, we observe considerable improvements compared to other state-of-the-art methods. More importantly, our proposed framework is very general and provides a novel and fundamental building block for many other similarity-based tasks. Besides, our proposed kernel preserving opens up a large number of possibilities to embed high-dimensional data into low-dimensional space.
Traditional multi-view learning methods often rely on two assumptions: ($i$) the samples in different views are well-aligned, and ($ii$) their representations in latent space obey the same distribution. Unfortunately, these two assumptions may be questionable in practice, which limits the application of multi-view learning. In this work, we propose a hierarchical optimal transport (HOT) method to mitigate the dependency on these two assumptions. Given unaligned multi-view data, the HOT method penalizes the sliced Wasserstein distance between the distributions of different views. These sliced Wasserstein distances are used as the ground distance to calculate the entropic optimal transport across different views, which explicitly indicates the clustering structure of the views. The HOT method is applicable to both unsupervised and semi-supervised learning, and experimental results show that it performs robustly on both synthetic and real-world tasks.
Given a publicly available pool of machine learning models constructed for various tasks, when a user plans to build a model for her own machine learning application, is it possible to build upon models in the pool such that the previous efforts on these existing models can be reused rather than starting from scratch? Here, a grand challenge is how to find models that are helpful for the current application, without accessing the raw training data for the models in the pool. In this paper, we present a two-phase framework. In the upload phase, when a model is uploading into the pool, we construct a reduced kernel mean embedding (RKME) as a specification for the model. Then in the deployment phase, the relatedness of the current task and pre-trained models will be measured based on the value of the RKME specification. Theoretical results and extensive experiments validate the effectiveness of our approach.